Characterization of a method for profiling gene expression in ... - Nature

5 downloads 108 Views 564KB Size Report
Jun 20, 2006 - recovered from intact human prostate tissue using RNA linear amplification ... Keywords: gene array; gene expression; prostate cancer; laser capture microdissection; linear amplification .... final volume of 11 μl, incubating 5min at 651C, and ..... amplification of genes whose expression approaches zero.
Prostate Cancer and Prostatic Diseases (2006) 9, 379–391 & 2006 Nature Publishing Group All rights reserved 1365-7852/06 $30.00 www.nature.com/pcan

ORIGINAL ARTICLE

Characterization of a method for profiling gene expression in cells recovered from intact human prostate tissue using RNA linear amplification Y Ding1, L Xu1, S Chen1, BD Jovanovic2, IB Helenowski2, DL Kelly3, WJ Catalona4, XJ Yang5, M Pins5, V Ananthanarayanan2 and RC Bergan1 1

Department of Medicine, Division of Hematology/Oncology, Northwestern University Medical School and the Robert H Lurie Cancer Center of Northwestern University, Chicago, IL, USA; 2Department of Preventive Medicine, Northwestern University Medical School and the Robert H Lurie Cancer Center of Northwestern University, Chicago, IL, USA; 3The Eppley Institute for Research in Cancer and Allied Diseases, University of Nebraska Medical Center, Omaha, NE, USA; 4Department of Urology, Northwestern University Medical School and the Robert H Lurie Cancer Center of Northwestern University, Chicago, IL, USA and 5Department of Pathology, Northwestern University Medical School and the Robert H Lurie Cancer Center of Northwestern University, Chicago, IL, USA

Coupling array technology to laser capture microdissection (LCM) has the potential to yield gene expression profiles of specific cell populations within tissue. However, remaining problems with linear amplification preclude accurate expression profiling when using the low nanogram amounts of RNA recovered after LCM of human tissue. We describe a novel robust method to reliably amplify RNA after LCM, allowing direct probing of 12K gene arrays. The fidelity of amplification was demonstrated by comparing the ability of amplified RNA (aRNA) versus that of native RNA to identify differentially expressed genes between two different cell lines, demonstrating a 99.3% concordance between observations. Array findings were validated by quantitative polymerase chain reaction analysis of a randomly selected subset of 32 genes. Using LCM to recover normal (N ¼ 5 subjects) or cancer (N ¼ 3) cell populations from intact human prostate tissue, three differentially expressed genes were identified. Independent investigators have previously identified differential expression of two of these three genes, hepsin and beta-microseminoprotein, in prostate cancer. Taken together, the current study demonstrates that accurate gene expression profiling can readily be performed on specific cell populations present within complex tissue. It also demonstrates that this approach efficiently identifies biologically relevant genes. Prostate Cancer and Prostatic Diseases (2006) 9, 379–391. doi:10.1038/sj.pcan.4500888; published online 20 June 2006

Keywords: gene array; gene expression; prostate cancer; laser capture microdissection; linear amplification

Introduction The development of microarray technology has made it possible to carry out large-scale analyses of gene expression.1,2 Gene expression profiling has been used to identify differentially expressed, biologically relevant, genes in different types of cancer and in a variety of other diseases, and is particularly suited to evaluating the effects of therapeutic intervention upon target cells in vivo.3–6 Correspondence: Dr RC Bergan, Department of Medicine, Division of Hematology/Oncology, Northwestern University Medical School and the Robert H Lurie Cancer Center of Northwestern University, Olson 8321, 240 East Huron St., Chicago, IL, USA. E-mail: [email protected] Received 9 January 2006; revised 25 April 2006; accepted 25 April 2006; published online 20 June 2006

Standard protocols require B5 mg of total RNA per sample as starting material for high-density oligonucleotide microarrays; B20 mg is required for cDNA-based microarrays.7,8 Whereas microgram quantities of RNA are readily extracted from whole biopsy specimens, individual specimens consist of a heterogeneous mixture of cell types. This, in turn, imposes practical limitations upon interpretation of resultant array data, particularly when one seeks to identify changes in a single-cell type, as is typically the case. Extraction of single-cell populations from intact tissue can be achieved through the use of laser capture microdissection (LCM). However, LCM yields of RNA for single-cell populations are typically significantly below 100 ng, and thus cannot support microarray analysis. Small amounts of RNA can be amplified, thus providing a means whereby LCM can be coupled to microarray analysis. Two different methods can be used

Prostate RNA linear amplification Y Ding et al

380

to amplify RNA. In the first method, the polymerase chain reaction (PCR) is used to achieve either exponential9,10 or linear amplification.11 The second method utilizes in vitro transcription to achieve linear amplification.12,13 With amplification of a large population of genes, distortion of initial relationships becomes an important consideration. As all methods rely upon multiple enzymatic manipulations of RNA (which is chemically unstable) as well as of cDNA, the introduction of noise and systematic bias is essentially inevitable. The performance of linear amplification has been described by a number of groups, and appears to offer less distortion than exponential amplification.10,14 However, it is important to consider that if one wants to measure gene expression in vivo, in a specific population of cells, one first needs to consider performance, when beginning with low nanogram quantities of total RNA and to then demonstrate utility in a target cell population within a given tissue type. Thus, whereas individual components of this multi-step process have been described, there is a lack of reports demonstrating that low nanogram quantities of RNA can be linearly amplified in a manner which provides for the maintenance of gene representation. Reports are also lacking which describe utility when applied to analysis of individual cell populations extracted from complex tissue, in particular human prostate tissue. The current study describes a novel basic method, optimized to human prostate, for first- and secondstrand synthesis, a linked linear amplification method in which RNA is labeled in situ, although being amplified 1500-fold after only a single round, thus allowing direct and robust probing of microarrays. Faithful amplification is demonstrated with both microarray technology and real-time quantitative reverse tanscriptase/polymerase chain reaction. Using normal and cancer epithelial cell populations, recovered from intact human prostate tissue by LCM, we go on to show utility in a practical setting. In doing so, we also demonstrate the power and robust nature of this methodology. This was accomplished by demonstrating that with only eight different clinical samples, we could identify three different genes from over 12 000 analyzed whose expression was altered in cancer compared to that of normal prostate. Of those three genes, two have already been confirmed by independent investigators to be differentially expressed in human prostate cancer. Although prior confirmation of these genes required analysis of many specimens for each gene, both were simultaneously identified in the current study after analysis of only eight specimens total.

Materials and methods Cell culture The origin and culture conditions for PC3 and metastatic variant PC3-M cells has been described previously.15,16 All cells were maintained at 371C in a humidified atmosphere of 5% carbon dioxide with biweekly media changes. Cell growth was monitored daily to ensure continued exponential growth in a non-confluent state. Cells were drawn from stored stock cells, and replenished on a standardized periodic basis. All cells were routinely monitored for mycoplasma. Prostate Cancer and Prostatic Diseases

Isolation of RNA With cell lines, microgram quantities of total RNA were extracted as described previously,17 using Trizol reagent (Invitrogen, Carlsbad, CA, USA). RNA was treated with RNase-free DNase (Qiagen, Valencia, CA, USA), as per the manufacturer’s instructions. RNA quality and quantity was evaluated by measuring optical density (OD) 260/280, as well as by visual inspection of agarose gels, as described previously.17 With prostate epithelial cells microdissected from intact human prostate tissue by LCM, nanogram quantities of total RNA were extracted using the PicoPure RNA Extraction kit (Arcturus, Mountain View, CA, USA) according to the manufacturer’s instructions, with modifications. The first modification was that after RNA extraction buffer was added to microdissected tissue on the LCM cap, it was frozen on dry ice until a total of four LCM caps were completed. Four separate LCM caps were obtained for each cell type (i.e., cancer or normal) from a given tissue specimen. As it takes B45 min to microdissect one LCM cap, over 2 h would have elapsed from the time of completion of the first cap until completion of the fourth cap. Freezing on dry ice increased RNA stability during this time period. The second modification was that RNA extraction solutions from four separate LCM caps were combined and loaded onto the purification column. This resulted in combined and concentrated RNA, minimizing downstream manipulations. Next, RNA was treated with RNase-free DNase I (Qiagen) for 15 min at room temperature. Finally, after washing as per manufacturer’s instructions, RNA was eluted from the column by adding 11 ml RNase-free water. RNA quantity and quality was measured by loading 1 ml of column eluent (diluted 1:3 in RNase-free water) onto an RNA LabChip (Agilent Technologies, Palo Alto, CA, USA) and analyzing on an Agilent 2100 Bioanalyzer (Agilent Technologies). Associated software was used for area-under-the-curve analysis for the calculation of concentration, as well as for the construction of a virtual gel, thus allowing visualization under emulated agarose gel conditions. To compliment software-based analysis of RNA concentration, various known amounts of stock RNA were also included on each LabChip, thus allowing construction of a standard curve. RNA was considered to be of acceptable quality if the 28s and 18s ribosomal bands were the predominant ones, if both were sharp in appearance, if the ratio of the 28s to the 18s band was X1.8, and if excessive additional bands were not present. Samples considered non-acceptable were not processed further. Linear amplification of RNA For cell lines, 40 ng total RNA were used. For microdissected prostate epithelial cells, RNA extracted from four LCM caps was combined and used, yielding B40 ng. cDNA was synthesized by first adding 1.8 mm of oligo-dT-T7 primer (50 -GACGGCCAGTGAATTGTAA TACGACTCACTATAGGGAGATCTGTATGCTGGTTTT TTTTTTTTTTTT-30 ) and 200 ng poly-dI/dC to RNA in a final volume of 11 ml, incubating 5 min at 651C, and immediately placing on ice. Poly-dI/dC reduces loss of RNA by nonspecific binding to the wall of tubes, but does not compete with deoxynucleoside triphosphates (dNTPs) in reverse transcription. To this were added

Prostate RNA linear amplification Y Ding et al

200 U SuperScript II (Invitrogen), 40 U RNaseOUTt Recombinant Ribonuclease Inhibitor (Invitrogen), in first-strand buffer (0.5 mm dNTPs, 10 mm dithiothreitol (DTT)) in a volume of 20 ml and incubated 120 min at 421C, 10 min at 701C. Then, 2 U RNase H (Invitrogen) were added, incubated 20 min at 371C, followed by 5 min at 951C. After adding 6.7 mm random hexamer (Invitrogen), the reaction was incubated 2 min at 951C, and placed on ice. To this were added 50 U Escherichia coli DNA polymerase I (Invitrogen), 1  reaction buffer, 4 U T4 DNA ligase (Stratagene, La Jolla, CA, USA), 0.6 mm dNTPs, in a volume of 50 ml and incubated 5 min at 251C, 10 min at 371C and 5 min at 701C. Unincorporated reaction products were removed by passage through a QIAquick PCR purification column (Qiagen), according to the manufacturer’s instructions. In vitro transcription was performed using a T7 MEGAscript kit (Ambion, Austin, TX, USA), according to the manufacturer’s instructions, with modifications. Briefly, double-stranded DNA, 7.5 mM each of adenosine 50 triphosphate, guanine 50 triphosphate and cytidine 50 triphosphate, 4.5 mM aminoallyl-UTP (aaUTP; Ambion), 4.5 mM UTP, in 1  reaction buffer (containing T7 RNA polymerase and the RNase inhibitor, SUPERase), were brought up to 20 ml, incubated 14 h at 371C and then 2 U DNase I (Ambion) added, according to the manufacturer’s instructions. The resulting amplified RNA (aRNA) was isolated with a Qiagen RNeasy minikit, according to the manufacturer’s instructions, except that the final QIAquick wash buffer was replaced with RNase-free water, and the eluent was dried under vacuum.

Laser capture microdissection Under a protocol approved by the Institutional Review Board of Northwestern University, fresh–frozen prostate tissue was harvested from men undergoing radical prostatectomy for clinically organ confined prostate cancer. Prostate tissue was harvested according to a well-established protocol of the Northwestern Prostate SPORE Tissue Bank Core Facility, and was frozen within 20–30 min after surgical removal. Five serial 6 mm sections, from four different areas within the specimen, were cut on a cryostat. The first section in a series was stained with hematoxylin and eosin (H&E), and histology confirmed by a pathologist specializing in genitourinary cancer (XY, MP). Frozen sections were processed immediately before LCM as follows: incubated for 30 s each in 70, 95 and then 100% ethanol, for 2 min in xylene  2 washes and air-dried. After overlaying tissue with a CapMacro LCM Cap (Arcturus), B1000 normal epithelial or cancer cells were microdissected on a PixCell II LCM Workstation (Arcturus). For each LCM cap, images of pre-LCM whole tissue, post-LCM cap tissue (i.e., captured epithelial cells) and post-LCM residual tissue (i.e., cells still remaining on the slide) were archived, and independently confirmed. All aspects of tissue processing were performed under RNase-free conditions. Microarray-based measurement of gene expression Human oligonucleotide arrays were custom-printed by a dedicated gene array core facility at the Eppley Institute

for Research in Cancer and Allied Diseases, University of Nebraska Medical Center, Omaha, NE, USA (by DLK). Arrays were constructed from a set of 12 140 sense oligonucleotide (60-mers) probes designed for each human target gene by Compugen Inc. (Rockville, MD, USA), and manufactured by Sigma-Genosys Inc. (The Woodlands. TX, USA). Individual arrays contained 12 288 spot features, including 12 107 different genes, 28 replications of glyceraldehyde-3-phosphate dehydrogenase and negative controls. Each oligonucleotide was resuspended (30 mM) in 3  standard sodium citrate (SSC) and 1.5 pl spotted onto poly-L-lysine-coated glass slides using a MagnaSpotter robot (BioAutomation Corp., Dallas, TX, USA) with a 12-pin (Telechem, Sunnyvale, CA, USA) configuration in a humidified, HEPA (High Efficiency Particulate Air)-filtered hood. After spotting, the DNA was crosslinked to the slides by ultraviolet (UV) irradiation (350 mJ/cm2) with a Stratalinker UV Crosslinker (Stratagene Inc., La Jolla, CA, USA) and the printed arrays were stored in the dark in a dehumidified atmosphere at room temperature. Before use, arrays were blocked by succinic anhydride treatment (165 mM succinic anhydride in a 24:1 ratio of 1methyl-2-pyrrolidinone:1 M sodium borate, pH 8.0), and rinsed in ethanol. Array quality was evaluated both upon initial synthesis, as well as over time, by hybridizing with previously characterized stock RNA, and comparing current with prior gene expression profiles using array median centering and Corelation Coefficients Mapping programmed in MATLAB (The Mathworks Inc., Natick, MA, USA). Arrays were hybridized with either fluorescently labeled cDNA (derived from native RNA) or with fluorescently labeled aRNA. Fluorescently labeled single-stranded cDNA probe was generated by annealing 50 mg native RNA (i.e. total RNA extracted from cell lines) with anchored oligodeoxythymidylate primer (13 mM) (50 -TTTTTTTTTTTTTTTTTTTVN-30 ; Molecular Biology Shared Resource, University of Nebraska Medical Center), in a total volume of 30 ml high-pressure liquid chromatography-purified water (Sigma, St. Louis, MO, USA) for 10 min at 701C and then placing on ice. To each sample was added 1  first strand buffer, 0.6 mM DTT, 30 mM each of dATP, [32P]deoxycytidimine triphosphate and dGTP, 18 mM dTTP, 12 mM aminoallyl-dUTP, 20 U RNasin (Promega, Madison, WI, USA) and 150 U StrataScript reverse transcriptase (Stratagene) in a final volume of 60 ml. After incubation for 60 min at 481C, an additional 50 U of StrataScript reverse transcriptase was added, incubated 60 min at 481C, and the reaction stopped by the addition of 6.0 mM ethylenediaminetetraacetic acid (pH 8). After hydrolyzing RNA with 12 mM NaOH for 15 min at 651C, the reaction was cooled to room temperature and neutralized with 16.8 ml of 1 N HCl. First-strand cDNA was unincorporated purified from aminoallyl-dUTPs on QIAquick PCR purification columns (Qiagen) according to the manufacturer’s protocols, except that QIAquick wash buffer was replaced with 5 mM potassium phosphate buffer (pH 8.5) containing 80% ethanol, and cDNA was eluted with 4 mM potassium phosphate buffer (pH 8.5) and centrifuged at 14 000 g for 1 min, under vacuum. Resultant cDNA and 2.0 mg aRNA, with incorporated aminoallyl binding sites, were resuspended in 10 ml of 50 mM Na2CO3 buffer (pH 9), mixed with either Cy3 or

381

Prostate Cancer and Prostatic Diseases

Prostate RNA linear amplification Y Ding et al

382

Cy5 monofunctional NHS-ester (Amersham Pharmacia, Piscataway, NJ, USA) and incubated 90 min at room temperature in the dark. Cy3- and Cy5-conjugated cDNA targets were purified by passage through QIAquick PCR purification columns, vacuum-dried, resuspended in 55 ml hybridization solution (50% formamide, 4.1  Denhardts, 4.4  SSC, 15 mg human Cot1 DNA (Invitrogen), 12 mg poly-deoxyadenylate and 5 mg of yeast tRNA (Sigma)), clarified by centrifugation and hybridized to arrays overnight at 421C. After hybridization, array slides were washed by immersion in 2  SSC, 0.1% sodium dodecyl sulfate (SDS) for 5 min at room temperature, 1  SSC, 0.01% SDS for 5 min at room temperature and 0.2  SSC for 2 min at room temperature. Arrays were dried by centrifugation for 5 min at 1000 r.p.m., and immediately scanned on a ScanArray 4000 confocal laser system (Perkin-Elmer, Boston, MA, USA). Background fluorescence was subtracted, and normalization and filtering of the data were performed using the QuantArray software package (Perkin-Elmer), and expression ratios were calculated for each feature. All resultant data were included in the final analysis.

Quantitative polymerase chain reaction Primers and probes for the 32 selected genes were purchased from Applied Biosystems Inc. (Foster City, CA, USA); their sequences are provided in Supplementary Table 1 (http://www.basic.northwestern.edu/ Prostate_SPORE/RayBergan_NAR_Data.html). Reverse transcription was performed with 0.2 mg of either native total RNA or aRNA, using TaqMan reverse transcriptase (Applied Biosystems) with random hexamers, as described previously.18 For quantitative PCR (qPCR), cDNA, from the equivalent of 40 ng of either total RNA or aRNA were used for each 20 ml reaction, which was performed on an Applied Biosystems 7500 Real Time PCR System workstation, using a TaqMan Universal PCR kit (Applied Biosystems), according to the manufacturer’s instructions. Reaction conditions were as follows: 10 min at 501C  1, 10 min at 951C  1, followed by 40 cycles of 15 s at 951C and 1 min at 601C. For qPCR: (1) exon spanning primers were used; (2) products were separated on an agarose gel to confirm the presence of a single dominant band of the expected length, the band excised, subcloned into a TA cloning vector (Invitrogen) and sequenced to confirm identity; (3) reactions were run in replicates of two for each gene; (4) reactions were repeated at a separate time, also in replicates of two; (5) all 32 genes evaluated were run at the same time on a single plate for each sample evaluated (i.e., native PC3, native PC3-M, amplified PC3 and amplified PC3-M RNA); (6) all reactions had an internal fluorescent standard, thus allowing direct comparison between different reactions run at both the same time, as well as different times; and (7) with each run, identical stock RNA was included as a quality control. For each reaction, the threshold cycle was identified, using software associated with the Applied Biosystems 7500 Real-Time PCR System. As reactions were run in duplicate, the mean threshold cycle for each gene in a given run was calculated and used for further analysis. Finally, only primer/probe combinations that provided a high-quality qPCR reaction were used. A reaction was considered high quality (1) if it generated a sigmoidal Prostate Cancer and Prostatic Diseases

curve when reaction product was plotted against cycle number, and (2) if it was able to measure accurately changes in the amount of the targeted gene. This was determined by using different known amounts of input stock RNA, measuring the threshold cycle number, and calculating the fold difference in gene number according to equation (1). ð1Þ log2 ðA=BÞ ¼ cA  cB where, cA and cB are the threshold cycle numbers for genes A and B, respectively, and A/B is the fold difference in gene number between two different amounts of input stock RNA being evaluated.

Statistics In considering experiments using both the same batch of RNA and those comparing different batches of RNA, we selected genes with a grand mean of expression intensities (i.e., averaged over both duplicates, both cell lines and both methods of obtaining the mRNA, native versus amplified) greater than 1000, or greater than 500; we did this process separately for each experiment. We standardized the data via the quantile method,19 where we sorted each repeat, each of which contained 12 288 genes, ranked the sorted genes, divided the ranks by the total number of genes þ 1 (12 289) and used that number as the probability to obtain the standard normal inverse. Each repeat was then sorted by gene name and all repeats for the first experiment were sorted and merged by gene name. All repeats for the second experiment were also sorted and merged by gene name. Within each newly created data set, RnativeðiÞ ¼ eZnativePC3ðiÞ ZnativePC3MðiÞ and RamplifiedðiÞ ¼ eZamplifiedPC3ðiÞ ZamplifiedPC3MðiÞ were calculated, where ZnativePC3(i) and ZnativePC3–M(i) are the standard normal inverses for the native PC3 and native PC3-M cell line genes, respectively, and i indicates either the first or second repeat of the same day from the same batch of RNA experiment, or the different batch of RNA repeat from the second experiment. Likewise, ZamplifiedPC3(i) and ZamplifiedPC3-M(i) are the standard normal inverses for the amplified data, and i again indicates the repeat within each experiments. The ratios obtained were then averaged across repeats, separately for native and amplified data, giving Rnative and Ramplified, respectively, for each experiment. Standard deviations were also obtained separately for native and amplified data for each experiment.

Results RNA amplification The established prostate cancer cell lines, PC3 and PC3M cells, have been extensively characterized by us,15,20,21 and were used to develop and validate the linear amplification protocol. Typical yields of total RNA after LCM were B40 ng (see below). Thus, with cell line studies 40 ng total RNA were used to optimize reaction conditions. Once optimized, the complete reaction involved synthesis of cDNA, using an oligo-dT-T7 primer, second-strand synthesis, using random hexam-

Prostate RNA linear amplification Y Ding et al

ers, T4 DNA ligase and DNA polymerase I, and finally generation of amplified antisense RNA by an in vitro transcription reaction in which aaUTP was incorporated. Incorporation of aaUTP during synthesis of aRNA allowed direct conjugation to fluorescent dyes, allowed multiple flourochromes to be incorporated into each RNA strand (thus increasing the probe signal) and allowed fluorescently labeled RNA to then be used for direct hybridization to arrays. In order to measure the degree of amplification, evaluate day-to-day variability and to determine whether the reaction was robust, RNA amplification was performed each week for 12 weeks, for each cell line. This approach permitted evaluation of overall variability, of change over time and of change as a function of different batches of reaction components, some from different manufacturers (no differences were detected between manufacturers; data not shown). The mean (range) yield of aRNA for PC3 and PC3-M cells was 3.0 mg (1.8–4.0 mg) and 3.1 mg (1.9–4.8 mg), respectively, Table 1. To evaluate the effect of operator experience, aRNA yields from reactions performed early in the 3month period (reactions 1–6) were compared to those performed late (7–12), for both PC3 and PC3-M cells. For PC3 cells, mean yields for early and late reactions were 2.9 and 3.0, respectively, whereas for PC3-M cells, they were 3.1 and 3.1, respectively. Differences of yields between early and late reactions were not significant for either cell line (two-sided t-test P-value 40.05 in each case). Conservatively assuming that mRNA constitutes 5% of total RNA, PC3 and PC3-M mRNA was amplified on average at least 1493- and 1553-fold, respectively. Analysis of aRNA by capillary electrophoresis demonstrated that the peak length of transcripts ranged from 500 to 1000 bases (Figure 1).

aRNA gene representation by microarray analysis A common practical application is to apply gene array technology to clinical samples in an effort to identify differences between two cell populations, A versus B (e.g., normal versus cancer), where the ratio A/B is commonly evaluated. Because amounts of A and B are typically limiting, differences between amplified products, aA versus aB are measured, and extrapolated to A versus B. For an ideal amplification process, therefore, (aA/aB)/(A/B) should equal 1 for each gene measured. To assess the extent to which this was the case, we first used microgram quantities of total RNA (i.e., native RNA) isolated from two different cell types, PC3 and PC3-M, to probe gene arrays, and compared differences in gene expression (i.e., A versus B). In parallel, we used 40 ng of the same native RNA, amplified it and used the resultant aRNA to probe gene arrays, and compared differences between aPC3 and aPC3-M (i.e., aA versus aB). However, when aPC3/aPC3-M was plotted against PC3/PC3-M, data did not cluster around the line of unity, Figure 2a (for data see http://www.basic. northwestern.edu/Prostate_SPORE/RayBergan_NAR_ Data.html). Furthermore, 14.9% of all genes evaluated exhibited greater than a two-fold difference in the ratio (r) of aPC3/aPC3-M to PC3/PC3-M. That is, 14.9% of all genes were outside of 0.5oro2.0, which represents traditional boundaries on gene arrays for excessive difference. In a replicate experiment, 39.8% of all genes

1

4.0 kb

2.0 kb

2

3

383

4

28 S

18 S

1.0 kb

0.8 kb

0.2 kb Figure 1 Length of aRNA. After separation of RNA by capillary electrophoresis on an RNA LabChip, associated software was used to construct the image of a virtual agarose slab gel. Lane 1, RNA ladder; lane 2, total (native) RNA from PC3 cells; lane 3, aRNA from PC3 cells; and lane 4, negative control (sham RNA amplification). For sham RNA amplification, all reactions associated with cDNA synthesis and linear amplification were performed in parallel with lane 3 reactions, except no RNA was included in the initial cDNA synthesis reaction. The photograph is representative of findings seen with both PC3 and PC3-M cells.

were outside of 0.5oro2.0 (data not shown; available at http://www.basic.northwestern.edu/Prostate_SPORE/ RayBergan_NAR_Data.html). This finding did not provide convincing evidence that amplification maintains gene representation. However, data in Figure 2a was inclusive of all 12 000 genes on the array. Using data from PC3 native RNA, the intensity of gene expression was plotted against rank order of expression for each gene (Figure 2b). It can be seen that only a fraction of genes are expressed. Similar findings were observed when using native RNA from PC3-M cells, or when using aRNA from either PC3 or PC3-M cells (data not shown). Thus, most of the genes evaluated in Figure 2a were, in fact, not even expressed by the cell. It is not accurate to measure amplification of genes whose expression approaches zero (or background). This is particularly troublesome when dealing with ratios of gene expression. To overcome this problem, low-expressing genes were excluded from analysis. This was performed by first determining the average signal intensity of all genes within a given gene pool being considered (specifically, for PC3, PC3-M, aPC3 and aPC3-M), then normalizing each gene within that pool by the average and deleting those genes whose normalized expression was below 1.0. The 1.0 cutoff point corresponds to the inflection point of the gene expression curves depicted in Figure 2b. As such, it also represents genes that are expressed in a given cell type. For PC3, PC3-M, aPC3 and aPC3-M data Prostate Cancer and Prostatic Diseases

Prostate RNA linear amplification Y Ding et al

384

a

be low. However, under the current circumstance we are seeking to validate methodology. We have thus imposed the additional requirement of non-amplified RNA. Thus, we are evaluating ratios of ratios. It is therefore an absolute requirement that only non-trivial gene expression data be evaluated. The resultant graph of aPC3/ aPC3-M versus PC3/PC3-M, for these 538 genes, is depicted in Figure 3a. Findings in Figure 3a, where only non-trivial data was analyzed, differ significantly from those depicted in Figure 2a, where data from all 12 000 genes was evaluated. Specifically, in Figure 3a data points cluster around the line of unity and 99.3% of all genes evaluated had a ratio (r), of aPC3/aPC3-M to PC3/ PC3-M, where 0.5oro2.0.

14

aPC3 / aPC3M

12 10 8 6 4 2 0 0

2

4

6

8

10

12

14

native PC3 / native PC3-M

b 70000

Signal intensity

60000 50000 40000 30000 20000 10000 0 0

5000

10000

15000

gene Figure 2 Expression profiles of all genes. For native (nonamplified) RNA, 50 mg were isolated from either PC3 or PC3-M cells, labeled during cDNA synthesis and used to probe microarrays. For aRNA, 40 ng native RNA were subjected to linear amplification, labeled during in vitro transcription and 2.0 mg of resultant aRNA used to probe microarrays. In (a) differences in gene expression between native and aRNA were compared. On a single microarray, native PC3 and native PC3-M were compared, and the PC3/PC3-M gene expression ratio determined, for each of B12 000 genes. On a separate microarray, an identical approach was taken for amplified PC3 (aPC3) and aPC3-M, thus generating a aPC3/ aPC3-M gene expression ratio. Depicted is a plot of aPC3/aPC3-M versus PC3/PC3-M. In (b) intensity of gene expression is plotted against rank order of expression for all genes for PC3.

sets, this included 19, 18, 13 and 11% of all genes, respectively. Next, the normalized expression of a given gene was compared across all pools. Only those genes exhibiting expression of 1.0 or greater in each pool being considered were included in the final analysis. Finally, data from a replicate experiment was also included, and subjected to the same exclusion criteria. This left only 538 genes (or 4.4% of 12 289) for analysis. Although these genes only represent a small percentage of all genes on the array, they are all expressed at non-trivial levels in each of the four gene pools across two experiments. In practice, low-expressing genes would not be excluded. This is because you would only be comparing two data sets resulting from amplified RNA. You would then impose the requirement that one of the values would not Prostate Cancer and Prostatic Diseases

Validation of aRNA gene representation Array findings were validated by qPCR of a subset of 32 randomly selected genes, from the pool of 538 genes evaluated above. See Supplementary Table 1 (http:// www.basic.northwestern.edu/Prostate_SPORE/RayBergan_ NAR_Data.html) for a complete list of genes, and PCR primer and probe sequences. Genes selected for qPCR analysis had the following characteristics: first, a high-quality qPCR reaction had to be available, as described in Materials and methods. A second requirement was that genes had to have an intron within 500 bp of the poly-A tail. As qPCR reactions used exon spanning primers (in addition to DNase treating RNA), requiring an intron within 500 bp of the poly-A tail optimized sequence overlap between products of qPCR reactions and products of linear amplification (which had a peak length of 500–1000 bases), see Figure 1. A third requirement was that the group of 32 selected genes included representative members, which were expressed at low, intermediate and high levels, within the larger group of 538 genes. From qPCR-based measurements of gene expression levels, a plot of aPC3/aPC3-M versus PC3/PC3-M was constructed (Figure 3b). As can be seen, data clusters around the line of unity. In 31 of 32 genes (97%), the ratio (r), of aPC3/aPC3-M to PC3/PC3-M, is such that 0.5oro 2.0, and thus data in Figure 3b emulate that obtained with gene arrays, presented previously in Figure 3a. Evaluating other factors: variability in gene expression In practice, gene expression is compared between individual clinical samples, that is, not from the same batch of RNA. In studies using only non-amplified (or native) RNA, we have previously reported that there is increased variability in gene expression profiles when comparing RNA from different, as apposed to identical RNA batches.22 Data in Figure 3a were generated from the same batch of RNA, for each cell line tested. To further characterize sources of variability, RNA was isolated from PC3 and PC3-M cells at two separate times, aRNA generated, and aPC3/aPC3-M versus PC3/PC3-M determined, for non-trivial genes (Figure 4). A visual inspection reveals that clustering of genes around the line of unity is less pronounced, when compared to Figure 3a, where profiles were constructed from the same batch of RNA. To further analyze fidelity and reproducibility, additional statistical analysis was performed on microarray

Prostate RNA linear amplification Y Ding et al

8 aPC-3/aPC-3M

8 aPC3 / aPC3M

385

10

a 10

6 4

6 4 2

2 0 0 0

2 4 6 8 native PC3 / native PC3-M

10

2 4 6 8 native PC-3 / native PC-3M

10

Figure 4 Expression profiles of genes from different batches of RNA. Gene expression profiles, of non-trivial genes, were determined as described in Figure 3a, except that different batches of RNA for individual PC3 and PC3-M cell lines were used for the two separate replicate experiments performed. The resultant plot of aPC3/aPC3-M versus PC3/PC3-M is depicted. Values are the mean7s.d. of two separate experiments (N ¼ 2).

b 20 16 aPC3 / aPC3M

0

12 8 4 0 0

4 8 12 16 native PC3 / native PC3-M

20

Figure 3 Expression profiles of genes expressed at non-trivial levels. (a) Measurement of gene expression by microarray. Starting with gene expression data for all B12 000 genes (as in Figure 2), genes exhibiting a low level of expression were excluded, as described in the text. The resultant plot of aPC3/aPC3-M versus PC3/PC3-M is depicted. Values are the mean7s.d. of two separate experiments (N ¼ 2). (b) Measurement of gene expression by realtime qPCR. Thirty-two genes were randomly selected from those evaluated in (a), gene expression measured by qPCR and the resultant plot of aPC3/aPC3-M versus PC3/PC3-M depicted. Values are the mean7s.d. of two separate experiments (N ¼ 2). For each experiment, qPCR reactions were run in replicates of two.

Table 1 Yield of aRNA after linear amplification Cell line

Number of replicates

PC3 PC3-M

12 12

Mean yield of aRNA (mg) (range)

Mean fold of amplification (range)a

3.0 (1.8–4.0) 1493 (900–2600) 3.1 (1.9–4.8) 1553 (950–2400)

Average amplification folda 1493 1553

Abbreviation: aRNA, amplified RNA. a Fold of amplification is based upon the assumption that 5% of the total RNA is mRNA.

data. Initial analysis focused upon comparing data obtained from the same batch of RNA, as compared to that obtained from different batches. For these studies, all 12K genes were first rank ordered, based upon the level of expression, as described in Figure 2. Considering only genes with a mean level of expression of greater than 1000 (B50% of all genes), over 99% of genes had a

aPC3/aPC3-M to PC3/PC3-M ratio (r), where 0.5oro2.0, when RNA from the same batch was evaluated, and 98% when RNA from different batches was considered. However, if the cutoff for the level of gene expression was lowered to 500 (B98% of all genes), percentages decreased to 90 and 79% for the same and different batches of RNA, respectively. If one now considers coefficient of variation (CV), and include only genes having mean expression intensities greater than 1000, CV was o0.5 in both native and aRNA for a given gene, in 99 and 71% of all genes evaluated for same and different RNA batches, respectively. If the cutoff for the level of gene expression was decreased to 500, percentages for CVs o0.5 decreased to 72 and 27% for the same and different batches of RNA, respectively. Taken together, the above analysis indicates good precision between aRNA and native RNA, for purposes of identifying differences in gene expression between two different gene pools; in this case, PC3 cells versus PC3-M cells. However, precision is compromised for genes expressed at low levels. Historically, when dealing with linear amplification, investigators have provided comparisons between native and amplified RNA.23–25 To this end, we have performed such a calculation. For genes with expression intensity greater than 1000, the correlation coefficient, R, when native and aRNA was compared, was 0.074 and 0.082, for PC3 and PC3-M cells, respectively. However, if genes with expression intensities as low as 500 were also included, R values increased to 0.441 and 0.469, for PC3 and PC3-M cells, respectively. Thus, if one analyzes genes that are in essence not expressed in a given cell, then the amplified to native R values appear to increase. To investigate other possible factors, which may contribute to low correlation coefficients, the same aliquot of RNA was split in two, labeled in parallel with either Cy3 or Cy5 as described in Materials and methods and a self– self hybridization on a single array was performed. The resultant R value was 0.98 (http://www.basic.north western.edu/Prostate_SPORE/RayBergan_NAR_Data. html). Finally, to ensure that poor correlations were not Prostate Cancer and Prostatic Diseases

Prostate RNA linear amplification Y Ding et al

386

owing to factors related to microarray manufacturing and/or variations in hybridization and reading, the same aliquot of RNA was identically labeled and used to probe different array chips. The resultant R value was 0.997 (http://www.basic.northwestern.edu/Prostate_SPORE/ RayBergan_NAR_Data.html). Using data from qPCR experiments and comparing native to amplified RNA, R values of 0.729 and 0.817 for PC3 and PC3-M cells, respectively, were observed. Although this is an improvement over values observed with microarrays, they are still far below the R values of B0.9, for comparisons of native to amplified RNA, reported by other investigators.23,25 To determine if low R values, for comparisons between native and amplified RNA, were owing to errors associated with qPCR-based methods of measurement, further analysis was performed. From the same batch of native RNA, the expression of all 32 genes was measured by qPCR on two separate occasions. Comparison of these two measurements of native RNA gives an R value of 0.998 and 0.998, for PC3 and PC3-M cells, respectively. Performing the same comparison for two separate measurements of amplified RNA, which in turn was synthesized two separate times from the same batch of native RNA, R values were 0.993 and 0.995, for PC3 and PC3-M cells, respectively. These findings demonstrate that errors associated with qPCR-based methods of measurement were negligible. The above findings demonstrate that aRNA cannot be used to predict relationships of gene expression within a given cell. Importantly, although biases exist in the amplification of RNA, they are replicated accurately, and thus allow accurate comparison of gene expression between two different cell populations. Taken together, findings support the notion that it should be possible to accurately measure differences in gene expression between two cell populations present within human tissue. This was therefore investigated.

Gene profiling of prostate epithelial cells isolated from intact human prostate tissue by LCM With LCM, individual cells are identified by light microscopy, and their selective transfer onto an overlying plastic cap is mediated through the use of a directed laser pulse.26 Figure 5a depicts a typical outcome after LCMmediated selective removal of prostate epithelial cells (in this case, normal cells) from intact human prostate tissue. The quality and quantity of RNA recovered from microdissected normal or cancer epithelial cells was then evaluated by capillary electrophoresis in six subjects (Figure 5b and c). RNA was degraded in normal cells from subject 1, and in cancer cells from subject 2. There were insufficient cancer cells in subjects 5 and 6 to support LCM, and thus only normal cells were microdissected. Subjects 3 and 4 yielded non-degraded RNA from both normal and cancer epithelial cells. Thus, of the 10 RNA samples evaluated in Figure 5b and c, eight (80%) were not degraded. To date, 21 samples from 16 different subjects have been acquired by LCM in our laboratory. Of these 21 samples, 17 (81%) have yielded non-degraded RNA in amounts of 40 ng or more (data not shown). This demonstrates that high-quality RNA can readily be recovered from prostate epithelial cells, Prostate Cancer and Prostatic Diseases

a

b

subject number MW stds

1N

0.40 0.80 1.60 3.20 stock RNA, ng

c

1C

1.42

2N

2C

1.54

3N

3C

1.36 1.45

sample RNA, ng subject number

MW stds

4N

4C

5N

6N

0.40 0.80 1.60 3.20 1.27 1.48 1.58 1.39 stock RNA, ng

sample RNA, ng

Figure 5 Use of LCM to selectively recover RNA from a single-cell population. (a) Representative photomicrographs of human prostate tissue. Tissue architecture is first examined in H&E-stained tissue. Adjacent sections of whole tissue are then subjected to LCM, thereby yielding microdissected epithelial cells, and residual stromal tissue. (b, c) Analysis of RNA by capillary electrophoresis. RNA was extracted from microdissected normal epithelial (N) and cancer (C) cells, from six different subjects (1–6), and analyzed by capillary electrophoresis, as described in Materials and methods. MW standards: RNA molecular weight ladder; 0.4–3.2 ng stock RNA, as indicated; amount of calculated RNA from samples is indicated. Arrows denote RNA of poor quality, which was not processed further.

which were microdissected from whole human prostate specimens. Next, the ability to profile gene expression in microdissected specimens was demonstrated. These studies used non-degraded RNA from five normal and three

Prostate RNA linear amplification Y Ding et al

cancer samples, from the six subjects described in Figure 5b and c. There were no differences in numbers of captured cells or RNA yield between cancer and normal cells (Table 2). Ten percent fewer laser pulses were required on average for cancer than for normal cells (Po0.001), reflecting a greater cell yield for each pulse. RNA was linear amplified, conjugated to fluorescent dye and then used to probe gene arrays. RNA from four LCM caps was combined in order to minimize down stream manipulations, and reduce potential sources of variability. Resultant data was normalized and that from genes exhibiting expression in the lowest 20% was discarded. The focus of this experiment was to evaluate performance of array profiling methods when applied to human tissue. Analysis, therefore, focused upon normal prostate epithelial cells, which represent a relatively homogeneous population. In contrast to normal cells, cancer is heterogeneous. In the case of prostate cancer in particular, its molecular profile has been shown to vary significantly between subjects.27 The mean7s.e.m. of gene expression for normal epithelial cells was therefore plotted against mean values for cancer (Figure 6a). It can be seen that the ratios of normal to cancer gene expression cluster around the line of unity. Also of importance is that sample-to-sample variation is limited. Specifically, CV was less than 0.5 in 55% of genes examined, even though samples were from different individuals. There are five genes that appear to be outliers. However, taking into consideration variance in both normal epithelial cells as well as in cancer cells, only three, b-microseminoprotein, hepsin and TRPM4B, are statistically significant outliers (t-test P-value o0.05) (Figure 6b). Of these three genes, two have already been identified as exhibiting differential expression in human prostate cancer by independent investigators. One is bmicroseminoprotein, which is a major protein secreted by prostate cells into the seminal fluid. It suppresses the growth of human prostate cells,28 and is high in normal prostate epithelial cells, decreased in prostate cancer.29,30 The second is hepsin which is a cell surface serine protease. Hepsin expression promotes prostate cancer metastasis,31 and is high in prostate cancer as compared to normal prostate epithelial cells.32 Independent investigations found b-microseminoprotein to be up in

normal, and hepsin to be up in cancer, identical to findings in the current study. Importantly, confirmation of differential expression by independent investigators required analysis of large numbers of subject samples in each instance. In contrast, in the current study specimens from only eight subjects were analyzed and differential expression of both b-microseminoprotein and hepsin were detected. The third gene, TRPM4B, appears to code for a novel calcium channel protein, and its role in prostate cancer remains uncharacterized.

387

Discussion To be able to perform gene array analysis on a pure cell population isolated from intact tissue, a series of interdependent steps were required. In the current study, we have developed and characterized those components

Table 2 Characteristics of cancer and normal cells recovered from intact prostate tissue by laser capture microdissection (LCM) Subject/cell typea S#2-N S#3-N S#4-N S#5-N S#6-N S#1-C S#3-C S#4-C Normal, mean7s.d. Cancer, mean7s.d. T-test a

Laser pulsesb

Number of cellsb,c

RNA yield (ng)b

2518 2656 2638 2610 2475 2260 2340 2290 2579779 2297740 o0.001

4028 4251 4220 4166 3986 4080 4218 4020 41307117 41067102 0.77

47 51 45 48 42 49 52 46 46.673.4 49.073.0 0.35

C, cancer cells; N, normal prostate epithelial cells; S#, for subjects 1–6. Combined values for four LCM caps. Estimated: based upon the number of pulses, area of each pulse and average number of cells removed with each pulse.

b c

Figure 6 Gene expression profiles of prostate cells from intact human tissue. Normal prostate epithelial cells (from five subjects) and prostate cancer cells (three subjects) were selectively removed by LCM of intact prostate tissue. Resultant RNA was linear amplified, microarrays probed and data across subjects normalized, all as described in Materials and methods. (a) Depicted is the mean7s.e.m. (N ¼ 5) of normal cells plotted against the mean (N ¼ 3) of cancer cells. (b) This is the same as (a), except s.e.m. values are now included for both normal and cancer cells. Lines denote unity, and two-fold change. Genes that appear to be outliers are indicated. Genes whose expression differs significantly between normal and cancer are marked by * (two-sided t-test P-value p0.05). Prostate Cancer and Prostatic Diseases

Prostate RNA linear amplification Y Ding et al

388

which were heretofore lacking, have described the complete set of reactions and went on to show practical utility. The design and execution of our experiments meets all components of the MIAMI compliance checklist (available at web site: http://www.mged.org/Workgroups/Minimum Information About a Microarray (MIAME)/miame_checklist.html). In the process, we have identified genes that were differentially expressed in prostate cancer. This data have been deposited into Gene Expression Omnibus (GEO; http://www.ncbi.nlm. nih.gov/projects/geo/). The development and optimization of this complete set of linked reactions involved a process of multiple iterations, multiple comparisons of different methods and step-wise advancement. These developmental components were not described in the Materials and methods section. However, based upon our experience with the process, there are specific aspects that emerged as important. As knowledge of these will likely benefit future researchers who may wish to tailor the current protocol to suit their specific needs, they are discussed below. Established methods for initial early capture and freezing of tissue, as defined by a set of standard operating procedures (SOPs) and implemented accordingly, was an absolute necessity. Delays in processing and/or deviation from SOPs were associated with degradation of RNA. Likewise, steps associated with LCM were critical. These included performing two xylene washes to ensure complete dehydration of tissue, immediate placement of RNA stabilization media on dry ice (during ongoing acquisition of additional LCM caps) and combining extracts from four LCM caps for the RNA isolation step and eluting from the column with 11 ml water. High-quality RNA was recovered from cells removed from clinical specimens by LCM over 80% of the time in our hands. Evaluation of the quality of recovered RNA by capillary electrophoresis was an important quality control measure. It prevented the otherwise wasted analysis of 20% of samples, and more importantly excluded tainted gene expression data. The source of enzymes and other reagents for first- and second-strand synthesis reactions did not have a great impact upon outcome. However, only high-performance enzymes from established manufacturers were used. Commercial in vitro transcription kits are widely available, were used in the current study and differences in performance between different manufactures were not apparent. It was, however, critical to extend in vitro transcription reaction times to the maximum in order to sustain yield. The current protocol represents the first that is able to use RNA isolated from LCM-derived epithelial cells to probe gene arrays after only a single round of in vitro transcription. It was therefore important that we demonstrated the robust nature of the methodology. Specifically, we demonstrated the ability to amplify consistently total RNA to yield enough aRNA to probe arrays over a time span of 3 months, using reagents from different manufactures, and did so with two separate cell lines. The average yield of aRNA using the current protocol was 3 mg. Most other methods require two or more rounds of amplification and require starting amounts of total RNA of 30–100 ng.33–35 Increases in the number of rounds of amplification not only increases cost and processing time but also variability.33 Yields of up to

Prostate Cancer and Prostatic Diseases

2.0 mg aRNA have been reported by others after only a single round of amplification starting from 30 to 60 ng of total RNA.13 However, array signal drops off significantly below 2.0 mg of probe. Thus, one cannot rely upon a protocol whose maximum yield of probe is 2.0 mg. With the current protocol, peak transcript length after in vitro transcription ranged from 500 to 1000 bases, with prior reports describing peak lengths of around 500 after a single round of amplification.36,37 As it is important to optimize overlap between probe and array sequence, a longer transcript length is desirable. Other investigators have used LCM followed by gene profiling in a variety of cancer types, for example, gastric, prostate, pancreatic and oral cavity.38–41 However, amplification of RNA required either PCR-based methods or required multiple rounds of in vitro transcription. A variety of commercial kits are available, which utilize a single round of in vitro transcription to amplify RNA. However, the minimal recommended amount of input RNA is 100 ng. Further, apart from measuring yield of amplified product, the performance characteristics of many of these protocols have not been evaluated. The primary goal of gene expression profiling is to identify differences between two different cell populations. It was therefore important that initial investigations focused upon comparing differences in gene expression between two different cell lines. Although this approach has previously been described, much higher amounts of starting RNA were used.10,42,43 It was therefore important that we went on to validate and characterize the current methodology, demonstrating its ability to measure accurately differences in gene expression between different cell types starting with low nanogram amounts of RNA. As this is the amount of RNA typically recovered from a single-cell population extracted from intact tissue, it was also important that we then went on to demonstrate its practical applicability in this regard. Findings from the current study highlight the fact that any given cell only expresses a fraction of genes present within the whole genome. This is a necessary requirement for cellular specialization.44,45 The importance of this fact when taking into consideration output from gene arrays was also demonstrated. Specifically, we showed that analysis of data from genes not expressed, that is, trivial data, led to false high correlation values when native and aRNA were compared within a given cell. In order to compare ratios of gene expression for the two cell lines between native and amplified RNA, it was necessary to exclude genes, which were not expressed. As there is no clear cutoff point for a lower limit of gene expression in this regard, we selected an arbitrary point of 1.0 for normalized expression. This value closely approximated the inflection point on the plot of gene intensity versus rank order of gene expression (see Figure 2b). Genes above this value were clearly expressed by the cell types being studied. As we imposed this requirement across multiple data sets, it resulted in the exclusion of most genes. Although these were artificial circumstances, they were non-biased, and only designed to allow us to compare the performance of native and amplified RNA. Importantly, these studies allowed us to demonstrate that the fidelity of differences in gene expression across cell types is maintained when aRNA was compared to native total RNA. Array findings

Prostate RNA linear amplification Y Ding et al

were corroborated by qPCR studies, thus demonstrating that outcomes were not method specific. For arrays, gene expression data were normalized by taking into consideration gene expression across all genes. Thus, for qPCR we elected not to normalize to a single gene (as we and others commonly do46), but instead chose to use uniform amounts of RNA in each reaction. This was ensured by the use of stock RNA whose amount was determined by OD. Reports by other investigators appear to conflict in this area, with some describing a high correlation when native and aRNA are compared within a specific cell type, whereas others report a low correlation.23–25,42,47,48 As the efficiency of transcription is not equal for all genes, one would expect to find a low correlation. It is possible that apparent differences reported in the literature relate to some extent to including genes in the analysis, which are only expressed at trivial levels. Such genes are not expressed in a given cell, but give a low signal on microarrays. Taken as a whole, the above discussion highlights the point that amplification of RNA introduces gene-specific biases, thus precluding the use of aRNA to measure differences in gene expression within a given cell. However, it is very important to emphasize the point that these biases are accurately replicated for a given gene across all genes evaluated, and across cell types. This is evidenced by the fact that a high R value is observed when one compares gene expression profiles between different batches of aRNA for a given cell type. This means that aRNA provides an excellent measure for differences in gene expression between different cells. This is directly supported by a number of findings in the current study. First, when comparing differences in gene expression between PC3 and PC3-M cells, there is a high correlation between findings seen with native and amplified RNA. Second, expression profiling of clinical samples self-organized into a biologically relevant pattern. Specifically, data aligned along the line of unity when normal and cancer were compared. If data were not representative of the in vivo situation, then a different pattern, or no pattern, would emerge. Third, there was a remarkably low level of variance for individual genes, given that data were combined across different individuals. If data were not representative of the in vivo situation, then high levels of variance would be observed. Finally, of three genes identified, two have already been independently shown to be differentially expressed in prostate cancer. If data were not representative of the in vivo situation, then it would not be possible to go from over 12 000 genes to a situation where two of three identified genes have already been shown to be important and differentially expressed in prostate cancer. In addition to biases introduced by linear amplification, the current study identified other sources of variability, which are of importance when seeking to apply this technology to a practical situation. Tissue from patients is acquired over time and thus processed on different days. We used different preparations of RNA from the same cell line but isolated on different days to model this situation. In this manner, we demonstrated that there is intrinsic variation from preparation-topreparation in both native and amplified RNA. Preparation-to-preparation variation stems from multiple causes,

including variation related to day-to-day changes in gene expression in cultured cells, as well as variation related to technique and reagents. We have previously described similar findings before in studies involving only nonamplified (i.e., native) RNA.22 Most importantly, when LCM was coupled to linear amplification and array data pooled across different subjects, genes were found to be differentially expressed when normal epithelial cells were compared to cancer. There are many ways to analyze gene array data, and no consensus upon which is the optimal.49,50 We elected to pursue a conservative approach to identify outlier genes, thus yielding only three differentially expressed genes. A different, possibly less conservative approach, would have likely have yielded more genes. Importantly, using this prospectively chosen approach two of three identified genes, hepsin and b-microseminoprotein, have already been validated by other investigators. Notably, although studies by these independent investigators required analysis of many clinical samples each for hepsin and b-microseminoprotein, both were identified in the current study after analysis of only eight samples total. This finding highlights the power of the current methodology. We have described and characterized a method by which gene profiling can be performed on single-cell populations that have been selectively isolated from intact tissue. The practical utility and power of this methodology was demonstrated using human prostate tissue and comparing normal epithelium to cancer. However, it can be applied to essentially any tissue type, in different disease states, and in the context of therapeutic intervention. Thus, it is widely applicable and poised to provide important information related to in vivo gene regulation and biology.

389

Acknowledgements This work was supported by the National Cancer Institute (CA099263, CA37403, and Specialized Program of Research Excellence grant CA90386, to RCB), and a National Center for Research Resources Grant (M01 RR00048), both from the National Institutes of Health, Department of Health and Human Services

References 1 Schena M, Shalon D, Davis RW, Brown PO. Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 1995; 270: 467–470. 2 Schena M, Shalon D, Heller R, Chai A, Brown PO, Davis RW. Parallel human genome analysis: microarray-based expression monitoring of 1000 genes. Proc Natl Acad Sci USA 1996; 93: 10614–10619. 3 DeRisi J, Penland L, Brown PO, Bittner ML, Meltzer PS, Ray M et al. Use of a cDNA microarray to analyse gene expression patterns in human cancer. Nat Genet 1996; 14: 457–460. 4 Heller RA, Schena M, Chai A, Shalon D, Bedilion T, Gilmore J et al. Discovery and analysis of inflammatory disease-related genes using cDNA microarrays. Proc Natl Acad Sci USA 1997; 94: 2150–2155. 5 Perou CM, Sorlie T, Eisen MB, van de Rijn M, Jeffrey SS, Rees CA et al. Molecular portraits of human breast tumours. Nature 2000; 406: 747–752. Prostate Cancer and Prostatic Diseases

Prostate RNA linear amplification Y Ding et al

390

6 Welsh JB, Zarrinkar PP, Sapinoso LM, Kern SG, Behling CA, Monk BJ et al. Analysis of gene expression profiles in normal and neoplastic ovarian tissue samples identifies candidate molecular markers of epithelial ovarian cancer. Proc Natl Acad Sci USA 2001; 98: 1176–1181. 7 Dudley AM, Aach J, Steffen MA, Church GM. Measuring absolute expression with microarrays with a calibrated reference sample and an extended signal intensity range. Proc Natl Acad Sci USA 2002; 99: 7554–7559. 8 Yue H, Eastman PS, Wang BB, Minor J, Doctolero MH, Nuttall RL et al. An evaluation of the performance of cDNA microarrays for detecting changes in global mRNA expression. Nucleic Acids Res 2001; 29: E41–1. 9 Iscove NN, Barbara M, Gu M, Gibson M, Modi C, Winegarden N. Representation is faithfully preserved in global cDNA amplified exponentially from sub-picogram quantities of mRNA. Nat Biotechnol 2002; 20: 940–943. 10 Puskas LG, Zvara A, Hackler Jr L, Van Hummelen P. RNA amplification results in reproducible microarray data with slight ratio bias. Biotechniques 2002; 32: 1330–1334, 1336, 1338, 1340. 11 Smith L, Underhill P, Pritchard C, Tymowska-Lalanne Z, AbdulHussein S, Hilton H et al. Single primer amplification (SPA) of cDNA for microarray expression analysis. Nucleic Acids Res 2003; 31: e9. 12 Pabon C, Modrusan Z, Ruvolo MV, Coleman IM, Daniel S, Yue H et al. Optimized T7 amplification system for microarray analysis. Biotechniques 2001; 31: 874–879. 13 Wang E, Miller LD, Ohnmacht GA, Liu ET, Marincola FM. Highfidelity mRNA amplification for gene profiling. Nat Biotechnol 2000; 18: 457–459. 14 Wadenback J, Clapham DH, Craig D, Sederoff R, Peter GF, von Arnold S et al. Comparison of standard exponential and linear techniques to amplify small cDNA samples for microarrays. BMC Genom 2005; 6: 61. 15 Huang X, Chen S, Xu L, Liu Y, Deb DK, Platanias LC et al. Genistein inhibits p38 map kinase activation, matrix metalloproteinase type 2, and cell invasion in human prostate epithelial cells. Cancer Res 2005; 65: 3470–3478. 16 Hayes SA, Huang X, Kambhampati S, Platanias LC, Bergan RC. p38 MAP kinase modulates Smad-dependent changes in human prostate cell adhesion. Oncogene 2003; 22: 4841–4850. 17 Liu Y, Jovanovic B, Pins M, Lee C, Bergan RC. Over expression of endoglin in human prostate cancer suppresses cell detachment, migration and invasion. Oncogene 2002; 21: 8272–8281. 18 Schwartz GN, Liu YQ, Tisdale J, Walshe K, Fowler D, Gress R et al. Growth inhibition of chronic myelogenous leukemia cells by ODN-1, an aptameric inhibitor of p210bcr-abl tyrosine kinase activity. Antisense Nucleic Acid Drug Dev 1998; 8: 329–339. 19 Lin Y NS, Lan H, Attie AD, Yandell BS. A daptive gene picking with microarray data: detecting important low abundance signals. In: G Parmigiani EG, RA Irizarry, SL Zeger (eds). The Analysis of Gene Expression Data: Methods and Software. SpringerVerlag: New York, 2003. 20 Kyle E, Neckers L, Takimoto C, Curt G, Bergan R. Genisteininduced apoptosis of prostate cancer cells is preceded by a specific decrease in focal adhesion kinase activity. Mol Pharmacol 1997; 51: 193–200. 21 Rohlff C, Blagosklonny MV, Kyle E, Kesari A, Kim IY, Zelner DJ et al. Prostate cancer cell growth inhibition by tamoxifen is associated with inhibition of protein kinase C and induction of p21(waf1/cip1). Prostate 1998; 37: 51–59. 22 Jovanovic BD, Huang S, Liu Y, Naguib KN, Bergan RC. A simple analysis of gene expression and variability in gene arrays based on repeated observations. Am J Pharmacogenom 2001; 1: 145–152. 23 Baugh LR, Hill AA, Brown EL, Hunter CP. Quantitative analysis of mRNA amplification by in vitro transcription. Nucleic Acids Res 2001; 29: E29. 24 Goley EM, Anderson SJ, Menard C, Chuang E, Lu X, Tofilon PJ et al. Microarray analysis in clinical oncology: pre-clinical optimization using needle core biopsies from xenograft tumors. BMC Cancer 2004; 4: 20.

Prostate Cancer and Prostatic Diseases

25 Gomes LI, Silva RL, Stolf BS, Cristo EB, Hirata R, Soares FA et al. Comparative analysis of amplified and nonamplified RNA for hybridization in cDNA microarray. Anal Biochem 2003; 321: 244–251. 26 Emmert-Buck MR, Bonner RF, Smith PD, Chuaqui RF, Zhuang Z, Goldstein SR et al. Laser capture microdissection [see comments]. Science 1996; 274: 998–1001. 27 Shah RB, Mehra R, Chinnaiyan AM, Shen R, Ghosh D, Zhou M et al. Androgen-independent prostate cancer is a heterogeneous group of diseases: lessons from a rapid autopsy program. Cancer Res 2004; 64: 9209–9216. 28 Garde SV, Basrur VS, Li L, Finkelman MA, Krishan A, Wellham L et al. Prostate secretory protein (PSP94) suppresses the growth of androgen-independent prostate cancer cell line (PC3) and xenografts by inducing apoptosis. Prostate 1999; 38: 118–125. 29 Tsurusaki T, Koji T, Sakai H, Kanetake H, Nakane PK, Saito Y. Cellular expression of beta-microseminoprotein (beta-MSP) mRNA and its protein in untreated prostate cancer. Prostate 1998; 35: 109–116. 30 Chan PS, Chan LW, Xuan JW, Chin JL, Choi HL, Chan FL. In situ hybridization study of PSP94 (prostatic secretory protein of 94 amino acids) expression in human prostates. Prostate 1999; 41: 99–109. 31 Klezovitch O, Chevillet J, Mirosevich J, Roberts RL, Matusik RJ, Vasioukhin V. Hepsin promotes prostate cancer progression and metastasis. Cancer Cell 2004; 6: 185–195. 32 Magee JA, Araki T, Patil S, Ehrig T, True L, Humphrey PA et al. Expression profiling reveals hepsin overexpression in prostate cancer. Cancer Res 2001; 61: 5692–5696. 33 Kenzelmann M, Klaren R, Hergenhahn M, Bonrouhi M, Grone HJ, Schmid W et al. High-accuracy amplification of nanogram total RNA amounts for gene profiling. Genomics 2004; 83: 550–558. 34 Mutsuga N, Shahar T, Verbalis JG, Xiang CC, Brownstein MJ, Gainer H. Regulation of gene expression in magnocellular neurons in rat supraoptic nucleus during sustained hypoosmolality. Endocrinology 2005; 143: 1254–1267. 35 Alevizos I, Mahadevappa M, Zhang X, Ohyama H, Kohno Y, Posner M et al. Oral cancer in vivo gene expression profiling assisted by laser capture microdissection and microarray analysis. Oncogene 2001; 20: 6196–6204. 36 Jenson SD, Robetorye RS, Bohling SD, Schumacher JA, Morgan JW, Lim MS et al. Validation of cDNA microarray gene expression data obtained from linearly amplified RNA. Mol Pathol 2003; 56: 307–312. 37 Luzzi V, Mahadevappa M, Raja R, Warrington JA, Watson MA. Accurate and reproducible gene expression profiles from laser capture microdissection, transcript amplification, and high density oligonucleotide microarray analysis. J Mol Diagn 2003; 5: 9–14. 38 Wu MS, Lin YS, Chang YT, Shun CT, Lin MT, Lin JT. Gene expression profiling of gastric cancer by microarray combined with laser capture microdissection. World J Gastroenterol 2005; 11: 7405–7412. 39 Crnogorac-Jurcevic T, Efthimiou E, Nielsen T, Loader J, Terris B, Stamp G et al. Expression profiling of microdissected pancreatic adenocarcinomas. Oncogene 2002; 21: 4587–4594. 40 Leethanakul C, Patel V, Gillespie J, Shillitoe E, Kellman RM, Ensley JF et al. Gene expression profiles in squamous cell carcinomas of the oral cavity: use of laser capture microdissection for the construction and analysis of stage-specific cDNA libraries. Oral Oncol 2000; 36: 474–483. 41 Febbo PG, Thorner A, Rubin MA, Loda M, Kantoff PW, Oh WK et al. Application of oligonucleotide microarrays to assess the biological effects of neoadjuvant imatinib mesylate treatment for localized prostate cancer. Clin Cancer Res 2006; 12: 152–158. 42 Stoyanova R, Upson JJ, Patriotis C, Ross EA, Henske EP, Datta K et al. Use of RNA amplification in the optimal characterization of global gene expression using cDNA microarrays. J Cell Physiol 2004; 201: 359–365.

Prostate RNA linear amplification Y Ding et al 43 Mutsuga N, Shahar T, Verbalis JG, Xiang CC, Brownstein MJ, Gainer H. Regulation of gene expression in magnocellular neurons in rat supraoptic nucleus during sustained hypoosmolality. Endocrinology 2005; 146: 1254–1267. 44 Arbeitman MN, Furlong EE, Imam F, Johnson E, Null BH, Baker BS et al. Gene expression during the life cycle of Drosophila melanogaster. Science 2002; 297: 2270–2275. 45 Wray GA, Hahn MW, Abouheif E, Balhoff JP, Pizer M, Rockman MV et al. The evolution of transcriptional regulation in eukaryotes. Mol Biol Evol 2003; 20: 1377–1419. 46 Kelley MJ, Glaser EM, Herdon JE, Becker F, Santella RM, Hecht SS et al. Safety and efficacy of weekly oral oltipraz in chronic smokers. Cancer Epidemiol Biomarkers Prev 2005; 14: 892–899.

47 Zhao H, Hastie T, Whitfield ML, Borresen-Dale AL, Jeffrey SS. Optimization and evaluation of T7 based RNA linear amplification protocols for cDNA microarray analysis. BMC Genom 2002; 3: 31. 48 Schneider J, Buness A, Huber W, Volz J, Kioschis P, Hafner M et al. Systematic analysis of T7 RNA polymerase based in vitro linear RNA amplification for use in microarray experiments. BMC Genom 2004; 5: 29. 49 Jovanovic BD, Helenowski IB, Rademaker AW, Bedford DF, Bergan RB. Some statistical issues in study of variability of gene expression with applications in prostate cancer study. J Probab Statist Quant Manage 2004; 1: 51–60. 50 Jovanovic BD, Bergan RC, Kibbe WA. Some aspects of analysis of gene array data. Cancer Treat Res 2002; 113: 71–89.

391

Prostate Cancer and Prostatic Diseases