substantial equivalence is to compare the transcript profiles of developing grain and other tissues of ... ually selected genes (often called âboutique arraysâ). However, ..... GenePix software (Gene Pix version 5, Axon Instruments, USA).
Chapter 15 Establishing Substantial Equivalence: Transcriptomics María Marcela Baudo, Stephen J. Powers, Rowan A. C. Mitchell, and Peter R. Shewry Abstract Regulatory authorities in Western Europe require transgenic crops to be substantially equivalent to conventionally bred forms if they are to be approved for commercial production. One way to establish substantial equivalence is to compare the transcript profiles of developing grain and other tissues of transgenic and conventionally bred lines, in order to identify any unintended effects of the transformation process. We present detailed protocols for transcriptomic comparisons of developing wheat grain and leaf material, and illustrate their use by reference to our own studies of lines transformed to express additional gluten protein genes controlled by their own endosperm-specific promoters. The results show that the transgenes present in these lines (which included those encoding marker genes) did not have any significant unpredicted effects on the expression of endogenous genes and that the transgenic plants were therefore substantially equivalent to the corresponding parental lines. Key words: Bread wheat, transgenics, transcriptomics, transgene expression, substantial equivalence, gluten proteins.
1. Introduction Feeding an expanding world population is a major challenge for the twenty-first century (1) and there is no doubt that transgenesis could make a major contribution to improving crop yields and quality. However, the acceptability of transgenic crops in Western Europe is low, owing partly to public concerns about the safety of the technology. Regulatory authorities also require detailed studies of transgenic crops to be carried out before they are approved for commercial production, including the demonstration that they are “substantially equivalent” to crops produced by conventional breeding (2). Huw D. Jones and Peter R. Shewry (eds.), Methods in Molecular Biology, Transgenic Wheat, Barley and Oats, vol. 478 © Humana Press, a part of Springer Science + Business Media, LLC 2009 DOI: 10.1007/978-1-59745-379-0_15
247
248
Baudo et al.
Although “substantial equivalence” is difficult to define precisely, it is usually considered that the composition of the material should be within the range for conventionally bred lines grown under the same conditions. We have used a custom cDNA microarray (3–5) to determine the substantial equivalence between the transcriptomes of transgenic and conventionally bred lines of wheat expressing the same genes encoding high-molecular-weight (HMW) subunits of wheat glutenins, under the control of their own endospermspecific promoters. The data were initially analysed to identify genes that showed statistically significant (p < 0.05) differences in expression, and then from this set of genes those with fold changes greater than a critical value (1.5) were identified. The transgenic lines also expressed the bar and uidA marker genes and contained the ampR gene and plasmid backbone sequences (6–8). We, therefore, also compared transgenic lines transformed with whole plasmids or with excised DNA fragments containing only the HMW subunit gene and the bar selectable marker gene (5) (see Note 1). The results demonstrated that the expression of the transgene studied had little statistically significant impact on the global genomic expression in the developing grain, particularly when compared to the greater differences (numbers of significantly differentially expressed genes and fold changes) observed between sibling lines produced by conventional breeding (5). The methods described here were developed for use with a cDNA microarray system, and such arrays are still widely used, particularly for small-scale analyses of small numbers of individually selected genes (often called “boutique arrays”). However, most large-scale gene expression studies now use Affymetrix oligonucleotide-based arrays (e.g., the Wheat GeneChip® Probe Array), which provide a much wider coverage and greater reproducibility and flexibility. Many of the methods described here are equally applicable to this system (e.g., for RNA preparation), but specific aspects of GeneChip® expression analysis are also covered.
2. Materials 2.1. Plant Material
1. Conventional bred lines: L88-31 (9), L88-18 (9) and cv. Cadenza (B1084-0-1) (5). 2. Transgenic bread wheat line in the L88-31 line background: homozygous line selected from B102-1-1 (6, 7). 3. Transgenic bread wheat lines in the cv. Cadenza background: “clean” fragment transgenic line B1355-4-2(18) and “whole plasmid” transgenic line B1118-8-4(6) (5).
Establishing Substantial Equivalence: Transcriptomics
2.2. Plant Growth Conditions
249
1. Containment glasshouse or controlled environment chamber maintained at 18–20°C/10–14°C day/night cycle; 16-h day/8-h night, 50–70% humidity, 750 mE/s/m2 irradiance. 2. Automatic controlled watering.
2.3. SDS– Polyacrylamide Gel Electrophoresis (SDS–PAGE) (see (10))
1. Separating gel buffer: 1.25 M Tris-borate, 1% (w/v) SDS, pH 6.8 (no adjustment required). 2. Stacking buffer: 1.0 M Tris–HCl, pH 6.8, 10% (w/v) SDS. 3. Ammonium persulfate (prepared fresh), 10% (w/v). 4. Sample buffer: 6.55 ml of stacking buffer, pH 6.8, 3.3% (w/v) SDS, 10% (v/v) of glycerol, and 1.54% (w/v) of DTT (100 mM final concentration). Make up to 100 ml with water. 5. Running buffer: 10Xseparating gel buffer. 6. Acrylamide solution: 40% (w/v) acrylamide and 2% (w/v) NN ¢-methylenebisacrylamide.
2.4. Total RNA Extraction (see Note 2)
●
Extraction of total RNA from wheat endosperm (11) 1. Extraction buffer: 2% (w/v) CTAB (hexadecyltrimethylammonium bromide), 2% (w/v) PVP (polyvinylpyrrolidinoline K 30), 100 mM Tris–HCl, pH 8.0, 25 mM EDTA, 2.0 M NaCl, 0.5 g/l spermidine (see Note 3). 2. 2-Mercaptoethanol (see Note 4). 3. Chloroform: isoamyl alcohol (24:1) (see Note 4). 4. 10 M LiCl (lithium chloride). 5. SSTE buffer: 1.0 M NaCl, 0.5% SDS, 10 mM Tris–HCl, pH 8.0, 1 mM EDTA.
●
Extraction of total RNA from wheat endosperm (12) 1. Homogenization buffer (see Note 5): 1.4% (w/v) SDS, 0.1 M sodium acetate, pH 8.0, 0.5 M NaCl, 0.05 M EDTA, pH 8.0, 0.1% (w/v) 2-mercaptoethanol. 2. Tris– HCl, pH 8.0, buffered phenol/chloroform (1:1, see Note 6).
2.5. cDNA Microarray Labelling
1. cDNA synthesis: oligo (dT)23 anchor primer (Sigma-Genosis, Haverhill, UK) 0.5 µg/µl/70 µM, Superscript III Reverse transcriptase (200 U/ml) and 5× First Strand Buffer (Invitrogen, Paisley, UK), 50× aa-dNTP mix (Sigma, 10 µl dATP 100 mM, 10 µl dCTP 100 mM, 10 µl dGTP 100 mM, 5 µl dTTP 100 mM, 5 µl amino-allyl -dUTP 100 mM). 2. Purification of aa-dUTP-cDNA: MiniElute columns (Qiagen, Crawley, UK), 75% (v/v) ethanol. 3. Amino-allyl-labelled first strand cDNA: ready to use succinimidyl esters of Alexa dyes (Alexa Fluor dye 555/647, Molecular Probes) (see Note 7), 1 M sodium bicarbonate
250
Baudo et al.
labelling buffer, pH 9.0, MiniElute Columns (Qiagen, Crawley, UK). 2.6. Microarray Hybridization and Washing
1. 2× hybridization mix: 400 ml of 50% (w/v) formamide, 450 ml of 10× SSC, 16 ml of 0.2% (w/v) SDS, 2 mg/ml of poly(dA). 2. Wash chamber (50 ml Falcon tube). 3.
2.7. Real Time RT-PCR
●
Solution A (2× SSC, 1% (w/v) SDS up to 50 ml with d-H2O), solution B (1× SSC, 0.2% (w/v) SDS, make up to 50 ml with d-H2O), solution C (0.1× SSC, 0.2% (w/v) SDS, make up to 50 ml with d-H2O). cDNA synthesis 1. SuperScriptTM III RT and RNaseOUT®. 2. 2× RT-Reaction Mix (Invitrogen): 2.5 mM oligo (dT)20, 2.5 ng/ml random hexamers and 10 mM MgCl2 and dNTPs.
●
Real time RT-PCR reaction: components 1.
SYBR Green I dye, Platinum Taq DNA Polymerase (60 U/ml).
2.
dNTPs (400 mM dGTP, 400 mM dATP, 400 mM dCTP, 400 mM dUTP).
3.
40 mM Tris–HCl, pH 8.4, 100 mM KCl, 6 mM MgCl2.
4.
UDG (uracyl-DNA glycosylase 40 U/ml).
5.
ROX Reference Dye (glycine conjugate of 5-carboxyX-rhodamine, succinimidyl ester).
3. Methods 3.1. Background to cDNA Microarrays
For the detailed transcriptomic studies we used a wheat cDNA microarray of 19,846 spots containing 9,246 unigene sequences (http://www.cerealsdb.uk.net/index.htm) (3). Duplicated unigene set arrays were spotted onto Codelink-activated slides (Amersham Biosciences Ltd, UK). Arrays were hybridized in reverse dye labelling to fluor dye aa-dUTP-cDNA samples using Alexa Fluor dyes 555 and 647, and hybridizations were performed as single-pair comparisons between the transgenic wheat line (B102-1-1 or B1118-8-4 or B1355-4-2) and its background, non-transgenic line (L88-31 or Cadenza) at two stages of endosperm development (14 and 28 days post-anthesis (dpa)), and with leaf tissue at 8 days post-
Establishing Substantial Equivalence: Transcriptomics
251
germination (dpg) (5). Each of these comparisons was made using three biological replicates for the line, tissue, and developmental stage selected with two technical replicates for each biological replicate, as provided by a dye swap. cDNA-based microarrays have advantages in that they are economical to use and allow the operator full control over the content and design (customized arrays). In contrast, Affymetrix oliogonucleotide-based arrays are more flexible, as they are a single dye system allowing for simpler experimental designs, are more specific (with improved discrimination between similar isoforms and members of multigene families), and provide more quantitative and readily comparable data. The GeneChip® Genome Array uses a set of 25-mer oligonucleotide “probes” designed to match each transcript sequence. In the case of the wheat chip, there are 11 probes in each set (probe set), with the majority being designed to consensus sequences from the assembly of public domain expressed sequence tags (ESTs). Thus, each probe set is representative of many ESTs rather than a single one as in cDNA arrays. The probes are chosen by automated procedures designed to discriminate between different transcripts, and to satisfy other criteria such as relatively consistent GC content. For each probe designed for a target transcript (perfect matches, PM) there is a matching one with a single base change in the middle of the sequence (mismatches, MM). This gives an estimate of non-specific hybridization, which can be corrected for. However, there is a debate about the real value of the signals from the MM probes, and many widely used analysis approaches do not use them (see Section 3.10). Discrimination between transcripts with similar sequences is most effective with shorter (e.g. 25-mer) probes, since a single base mismatch is sufficient to destabilize the hybridization and the fixed length means the hybridization conditions can be standardized to suit all the probes. Conversely, longer, variable length probes such as those used in cDNA microarray platforms will inevitably hybridize with any transcript that shares sequence similarity along a portion of its length, so the actual signal integrates over several different transcript molecules. 3.2. Plant Material and Growth Conditions
Endosperms and leaves of three transgenic lines of hexaploid bread wheat (Triticum aestivum) were used for transcriptome comparison studies. The transgenic wheat lines B102-1-1 (in the L88-31 background) (6, 7) and B1118-8-4 (in the Cadenza background) (5) were produced by co-bombardment with the p1Ax1 plasmid (13) containing the HMW glutenin subunit 1Ax1 (Glu-1Ax) gene under the control of its own endosperm-specific promoter, and a plasmid carrying
252
Baudo et al.
the selectable bar gene and the marker gene uidA under the control of the maize ubiquitin promoter (14). The transgenic line B1355-4-2 (5), also derived from Cadenza, was obtained by co-transformation using “clean” fragments corresponding to the HMW glutenin subunit 1Ax1 gene and the bar gene coding sequences. A conventionally bred, sister line to L8831, L88-18 (9), was also used for transcriptome comparison. For the two conventionally bred lines (L88-31, L88-18) and transgenic line (B102-1-1), the transcriptomes compared were B102-1-1 vs. L88-31, L88-18 vs. L88-31, and B102-1-1 vs. L88-18. For the comparison of transformation methods (i.e., clean fragments vs. whole plasmids), the comparisons were B1355-4-2 vs. Cadenza, B1118-8-4 vs. Cadenza, and B13554-2 vs. B1118-8-4. Details of the relevant gene composition Nomme of the different bread wheat lines studied are shown in Table 1.
Table 1 Relevant gene composition of the different wheat lines under study. The table is based on data in Lawrence et al. (9), Barro et al. (6), and Rooke et al. (7). HMW high molecular weight Endogenous HMW subunit genes
HMW subunit Marker transgenes genes
Parental line. Sister line derived from same cross as L88-6
1A null, 1Bx17, 1By18, 1D null
None
None
L88-18
Control line. Sister line derived from same cross as L88-6 and L88-31
1Ax1, 1Bx17, 1By18, 1D null
None
None
B102-1-1
Transgenic. L88-31 transformed with 1Ax1 gene as whole plasmid
1A null, 1Bx17, 1By18, 1D null
1Ax1
bar, uidA
Cadenza
Commercial cultivar
1A null, 1Bx14, 1By15, 1Dx5, 1Dy10
None
None
B13554-4-2
Cadenza transformed with 1Ax1 gene as clean fragment
1A null, 1Bx14, 1By15, 1Dx5, 1Dy10
1Ax1
bar
B1118-8-4
Cadenza transformed with 1Ax1 gene as whole plasmid
1A null, 1Bx14, 1By15, 1Dx5, 1Dy10
1Ax1
bar, uidA
Line
Characteristics
L88-31
Establishing Substantial Equivalence: Transcriptomics
253
1. Plants were grown in pots and arranged in a balanced row and column design, providing three biological replicates per treatment (wheat line by developmental stage). 2. Two plants were grown per pot for the endosperms harvested at 14 and 28 dpa. Only two tillers were kept per plant. Selected pots (three biological replicates in the design) contained a third plant, which was sampled for leaf tissue at 8 dpg. 3. Spikes were checked daily and tagged when anthesis was observed in the central spikelets. 4. Seed endosperms were manually dissected from caryopses under aseptic conditions; samples from each pot comprised a minimum of 24 endosperms taken from the central parts only of two spikes. 5. Samples were taken at the same time of the day to avoid effects of diurnal rhythms. 3.3. SDS–PAGE
3.4. RNA Extraction
The expression of the HMW subunit protein was determined for all wheat lines by SDS–PAGE of total grain protein (Fig. 1) using 10% (w/v) acrylamide gels and a Tris-borate buffer system (10). Extraction of total RNA from wheat endosperms The method used was adapted from Chang et al. (11). 1. Grind 2–3 g of tissue to a fine powder in liquid nitrogen using a pre-cooled, small mortar and pestle (-70°C) (see Note 8).
●
Fig. 1. SDS–PAGE of high-molecular-weight (HMW) glutenin subunits from the bread wheat lines used for transcriptomic analysis. Lane 1 non-transformed background L88-31 line; lane 2 transgenic B102-1-1 line; lane 3 conventionally bred L88-18 line (sibling line of L88-31); lane 4 background non-transformed Cadenza (B1084) line; lane 5 “clean fragment” transgenic (B1355) line; lane 6 “whole plasmid” transgenic (B1118) line. The HMW glutenin subunits are indicated by the bracket. The position of HMW subunit 1Ax1 protein is indicated by the asterisks in the different wheat lines (transgenic lines in lane 2, 5, and 6 and conventionally bred line in lane 3). Data are from Baudo et al. (5).
254
Baudo et al.
2. Add the ground tissue quickly to 15 ml of extraction buffer at room temperature (with 300 µl of 2-mercaptoethanol added) and mix completely by inverting the tube (see Note 9). 3. Extract twice with an equal volume of chloroform to isoamyl alcohol (15 ml final volume), separating phases by centrifugation at 15,000 × g at room temperature (RT) for 10 min. 4. Add 0.25 volume 10 M LiCl to the supernatant and mix. Precipitate the RNA at 4°C overnight and harvest by centrifugation at 15,000 × g for 20 min at 4°C. 5. Suspend the resulting pellet in 500 ml of SSTE. 6. Extract the suspension once with an equal volume of chloroform:isoamyl alcohol. 7. Add two volumes of ethanol to the supernatant and precipitate at −70°C for at least 30 min, or 2 h at −20°C. 8. Spin for 20 min at 15,000 × g to pellet RNA. 9. Wash with 75% (v/v) ethanol. 10. Dry pellet and re-suspend in nuclease-free water. Extraction of total RNA from 8-dpg seedlings The extraction method was adapted from Cheng et al. (12). 1. Grind a known amount of tissue (approximately 1 g of young leaves) in liquid nitrogen to a fine powder using a pre-cooled, small mortar and pestle (–70°C) (see Note 9). ●
2. Transfer the frozen powder quickly into 5–10 ml of homogenization buffer prepared in a second mortar (see Note 10), and continue to grind until extract is homogeneous (see Note 11). 3. Transfer the homogenate into a 50-ml oak-ridge centrifuge tube with a cap and incubate at 65°C for 10 min (homogenate final volume 5–10 ml). 4. Cool the tube on ice and add 0.2 volume of 5 M potassium acetate, pH 5.5 (see Note 12). Mix gently but thoroughly and incubate on ice for 10–15 min. 5. Centrifuge (10,000 × g, 4°C, 15 min) and retain the supernatant in a fresh centrifuge tube. 6. Add to the supernatant an equal volume of phenol: chloroform (1:1, v/v) mixture, tightly cap the tube, and shake vigorously for 10 min. Centrifuge (10,000 × g, 21°C, 10 min). Retain and transfer the upper, aqueous layer to a new tube (see Note 13). 7. Repeat the phenol:chloroform extraction and partition of the aqueous layer as in the previous step.
Establishing Substantial Equivalence: Transcriptomics
255
8. Add 0.1 volume of 3 M sodium acetate (pH 5.3) and 2.5 volumes of ethanol to the retained aqueous phase. Mix well and incubate overnight at -20°C to give more efficient precipitation of nucleic acids. 9. Centrifuge (10,000 × g, 4°C, 30 min) to pellet the nucleic acids and discard the supernatant. Leave the tube inverted for a few minutes. 10. Dissolve the pellet in a small volume (300–700 µl) of nuclease-free water. Transfer the nucleic acid solution to an Eppendorff tube and add 0.67 volume of 10 M LiCl to precipitate the RNA. Mix well and incubate on ice for 20–30 min (see Note 14). 11. Pellet the precipitated RNA by micro-centrifugation (15,000 × g, room temperature, 20 min) and discard the supernatant. 12. Dissolve the pellet in the smallest possible volume (approximately 200–300 µl) of nuclease-free water, repeat the precipitation with LiCl, and pellet the RNA by centrifugation as before. 13. Dissolve the pellet in 200 µl of nuclease-free water and add 15 µl of 5 M potassium acetate (pH 5.5) and 800 µl of ethanol. Mix well and pellet the RNA by microcentrifugation (15,000 × g, room temperature, 20 min) (see Note 15). 14. Remove the supernatant and wash the pellet by the addition of 1.0 ml of 80% (v/v) ethanol followed by centrifugation (15,000 × g, room temperature, 10 min). 15. Dry the pellet at room temperature by leaving the tubes open on the bench for no more than 10 min. Dissolve in 100–200 µl of nuclease-free water. Divide each sample into aliquots to avoid contamination or degradation during repeated thawing and refreezing. Removal of genomic DNA from total RNA samples After the RNA extractions, the RNA fractions were treated with DNA-free (DNase Treatment & Removal Reagents Kit, Ambion) following the manufacturer’s instructions. The system is designed for the removal of contaminating DNA from RNA samples and for the removal of DNase I enzyme after treatment without the need for heat or phenol extraction. ●
RNA quantification and quality control The concentration, integrity, and quality of the RNA are determined using the Nanodrop ND 1000 spectrophotometer (Labtech Int, UK) and Agilent 2100 Bioanalyser (RNA 6000 Nano Assay, Agilent Technologies, Palo Alto, CA, USA) (see Note 16). ●
256
Baudo et al.
Sample storage RNA samples are stored for short periods (up to 3 months) at -20°C, or for longer periods at -80°C. ●
3.5. cDNA Synthesis
1. Add to 100 µg of DNAse-treated total RNA (up to 20 µl volume), 8 µl of oligo (dT)23-anchored primer and nuclease-free water to provide a final mixture volume of 28 µl. 2. Incubate the priming reaction mixture at 70°C for 10 min and then place it on ice for 5 min. 3. Add to the priming reaction mixture the cDNA synthesis reaction mixture: 10 µl 5× first strand buffer, 10 µl of 0.1 M DTT, 5 µl of 50× aa-dNTP mix, 2 µl of Superscript III reverse transcriptase (200 U/L) and nuclease-free water to give a final volume of 50 µl. 4. Incubate at 42°C for 2–3 h. 5. Purify the aa-dUTP-cDNA product using MinElute Columns following the manufacturer’s instructions (Qiagen). 6. The final eluate can be collected as a 10 µl sample. The cDNA product will be used to prepare the probe for the microarray hybridization and for real time RT-PCR. This last reaction can be performed without a 50× aa-dNTP mixture (see Note 17).
3.6. cDNA Microarray Labelling
1. To couple the cDNA with a reactive fluorescent dye, divide the total cDNA volume (see step 6 – Section 3.5) into two aliquots (5 µl each) for reverse dye labelling, and set up the reactions for the different dyes into separate tubes. 2. Add to each tube: 5 µl of amino-allyl purified cDNA (see step 3 – Section 3.5), 3 µl of 1 M sodium bicarbonate labelling buffer (pH 9.0), 2 µl of Alexa Fluor dye 555 or 647, and 10 ml of nuclease-free water. 3. Mix well using a pipette and incubate in the dark for 1 h at room temperature. 4. Remove the uncoupled dye from labelled aa-dUTP-cDNA using mini Elute columns (Qiagen) (see Note 18).
3.7. cDNA Microarray Hybridization
1. Incorporate 20 µl of mixed labelled cDNA (from step 2 – Section 3.6 for both Alexa dyes) to 25 µl of 2x hybridization mix and 2 ml of poly(dA). 2. Denature the probe at 95°C for 3 min. 3. Apply the labelled probe to cover slip and place the slide with the printed Code Link side face down. 4. Place the slide hybridization chamber in an oven at 42°C and incubate overnight.
Establishing Substantial Equivalence: Transcriptomics
3.8. cDNA Microarray Data Analysis
257
5.
Place the slide in a Falcon (blue cap) tube containing wash solution A (see step 2 – Section 2.6) and invert for 15 min at room temperature (see Note 19).
6.
Transfer the slide to a second Falcon tube also containing wash solution A and invert for 15 min, at room temperature.
7.
Transfer the slide to a third Falcon tube containing wash solution B (see step 3 – Section 2.6) and invert for 8 min, at room temperature.
8.
Transfer the slide to a fourth Falcon tube containing wash solution C (see step 3 – Section 2.6) and invert for 5 min, at room temperature.
9.
Place the slide in a dry Falcon tube and spin immediately at 8,000 × g to dry.
10.
Scan hybridized slides using the Axon Instruments GenePix 400B dual laser scanner.
In order to make an assessment of the differential expression of genes in the transcriptomes for any pair of wheat lines, the microarray slides are subjected to image analysis to determine the intensities of the two fluorescent dyes for any observation (spot). After data normalization, a statistical analysis is applied to fit a model, account for the experimental design used, and test the significance of differential expression. 1. Image analysis and normalization The spots on the scanned slides are visualized using the GenePix software (Gene Pix version 5, Axon Instruments, USA) and all are investigated manually to exclude those where hybridization is poor, or where pixelation is not well defined (weak signal). The data provided from such image analysis, given all pixels from each spot, include the mean intensities of the fluorescent (647 or 555) dyes and their log2 ratio. In particular, these values are used in the analysis of differential expression. The data are imported from GenePix to the GeneSpring package (GenSpring 6.2, Silicon Genetics, USA) for normalization of the log2 ratios of the intensity values. In our study, a plot of the log2 ratio (of differential expression) (M) vs. the log2 product (of intensities) (A) revealed that a locally weighted scatter plot smooth (LOWESS) normalization could be applied to remove undesirable trends in the data. In other words, the log2 ratios should show a constant range across the levels of intensity (log product) for all spots, but the plot revealed that there was some trend with increasing A. The purpose of normalization is therefore to remove systematic variation (e.g., due to experimental procedures) from the log2 ratio data. The LOWESS normalization does this by determining the relationship between M and A on a point-by-point basis and
258
Baudo et al.
accounting for that relationship to adjust the M values accordingly (adjusted M=M-LOWESS-fitted M). As there were two separate experiments, the data from the L88-31, L88-18, and B102-1-1 lines were treated separately from the data for the Cadenza lines. 2. Statistical analysis With reference to methods discussed by Kerr (15), the normalized data can be analysed using the GenStat (GenStat 7th Edition, GenStat Procedure Library Release PL15, Lawes Agricultural Trust, Rothamsted Research, Harpenden, UK) statistical system (see (5) and supplementary material supporting (5) for further details). A linear mixed model is fitted to the log2 ratios to take account of the experimental design (biological and technical replicates) as random effects terms before assessing the (fixed) effect of the genes. For this model term, its parameters (one for each gene) “provided with standard errors” are assessed in terms of the overall residual variability (noise) of the data having accounted for the model (signal). The ratio of each parameter to its standard error gives a t-statistic on the residual degrees of freedom and this allows the statistical significance of the differential expression from 0 (on the log2 scale) to be assessed. In our study, genes with significant differential expression (p < 0.05) were filtered on expression and the number of replicates present, so that only those genes with differential expression greater than 1.5-fold and with two or more replicates were retained for further analysis. Full details of the modelling procedure are included in the appendix at the end of this chapter. 3. Confirming differential expression using real time RT-qPCR Real time RTq-PCR should be applied to selected transcripts to confirm expression data from the microarrays. See Section 3.13. 3.9. Microarray Data Presentation
For a simple substantial equivalence experiment, a scatter plot of gene expression in control against expression in trangenic is appropriate. Examples are shown in Fig. 2 from the two experiments undertaken in (5). The GeneSpring package is used to display the results, plotting the pairs of mean intensities for each gene in each comparison of wheat lines and highlighting the small numbers of (statistically significant and differentially expressed) genes of interest. The results are also summarized numerically in Table 2. The results suggested that transgenesis did not affect the expression of significant numbers of endogenous genes and that the transgenic plants were substantially equivalent to their corresponding non-transgenic control (or parental) lines (5). The results also confirmed that the method of transformation (i.e., with clean fragments or whole plasmids) had little impact on the gene expression patterns. The test of substantial equivalence places emphasis on statistical rigour and, for cDNA arrays, takes
Establishing Substantial Equivalence: Transcriptomics
259
Fig.2. Scatter plot representation of 14-day post-anthesis endosperm transcriptome for the following pair-wise comparisons: transgenic B102-1-1 line vs. control L88-31 line (left), conventionally bred L88-18 vs. control L88-31 line (middle), and transgenic B102-1-1 line vs. conventionally bred L88-18 line (right). The scatter plot shows the visualization of data obtained after the analysis of 14-dpa endosperm tissue transcriptome for the different pair-wise comparison performed between L88-31, L8818, and B102-1-1 bread wheat lines. Dots represent the normalized relative expression level of each arrayed gene for the transcriptome comparisons described. Dots highlighted in black represent statistically significant, differentially expressed genes (DEG) at an arbitrary cut off >1.5. The inner line on each graph represents no change in expression. The two offset lines in each graph are set at a relative cut-off of two-fold. The white-black palette bar (right site of the figure) shows the different degree of gene expression level, and the data trust scale. The vertical axis of the bar represents the relative expression levels (expressed as fold change): white-light grey tones represent no significant change in expression, greys under expression, and dark grey to black over-expression. The horizontal axis of the bar represents the degree to which the data can be trusted: dark or un-saturated tones represent low trust, and bright or saturated tones represent high trust. Data are from Baudo et al. (5).
account of issues such as dye bias and spatial variation across the chip (see Note 20 for an appraisal of current data analysis of cDNA array experiments). Whereas a simple scatter plot of expression in control vs. expression in transgenic is appropriate for simple substantial equivalence experiments, more complex designs may require other means of display. A powerful method of providing an overview of transcriptomic data, whether from cDNA, oligo array, or other platform, is hierarchical clustering (16). This methodology groups correlated gene expression of samples and/or genes together in a tree structure to visualize different levels of relatedness. Often it is used for a tree of genes in one dimension and a tree of samples in the other with gene expression represented by colour to give a “heat map” display. If the treatment being tested has an effect on expression, then differently treated samples will appear as different branches with all replicates appearing as leaves within these; the genes that differ in expression will also be clustered together in the other dimension. Non-hierarchical forms of clustering co-expressed genes such as k-means, Quality Threshold (QT), and self-organizing maps
260
Baudo et al.
Table 2 Numbers of statistically significant and differentially expressed genes in the pairwise comparisons of various transgenic and control (non-transformed) bread wheat lines under study. The table shows the total number and percentages of statistically significant and differentially expressed genes at an arbitrary 1.5-fold cut-off for the different transcriptome comparisons. Data are from Baudo et al. (5) Lines used for comparison
14 dpa endosperms
28 dpa endosperms 8 dpg leaves
No
%
No
%
No
%
5
0.05
2
0.02
6
0.06
Control untransRelated control formed line without line with endogenous 1Ax1 gene (L88-31) 1Ax1 gene (L88-18)
92
0.99
527
0.59
26
0.27
Line with 1Ax transgene (B102-1-1)
154
1.63
118
1.25
4
0.04
Line transformed Control untransformed line with 1Ax1 clean (Cadenza) fragment (B13554)
6
0.06
9
0.1
1
0.01
Line transformed with Control untransformed line 1Ax1 gene as whole (Cadenza) plasmid (B1118)
97
0.07
12
0.13
2
0.02
Line transformed with Line transformed with 1Ax1 26 gene as whole 1Ax1 gene as clean plasmid (B1118) fragment clean gene (B13554)
0.28
4
0.04
3
0.03
Line with 1Ax1 transgene (B102-1-1)
Control untransformed line without 1Ax1 gene (L88-31)
Related control line with endogenous 1Ax1 gene (L88-18)
of are also frequently used. The average expression of such gene clusters can be an informative method of simplifying transcriptome data. Co-expression implies shared transcriptional control and possible functional relationships. Further inspection of the genes found in such clusters can then be carried out to determine whether any share known functions (such as protein storage, response to stress, or defence) or participate in common pathways. For the majority of genes in crops such as wheat, function can be inferred only from sequence similarity. Clustering, display,
Establishing Substantial Equivalence: Transcriptomics
261
and annotation tools are available in open-source resources such as Bioconductor (http://www.bioconductor.org/) and in commercial packages like GeneSpring (Agilent Technologies, Inc). 3.10. Background to Wheat GeneChip® Analysis
We present here an overview of the steps that should be followed for the Affymetrix GeneChip® expression analysis. Details of the standard protocols can be found in “Affymetrix GeneChip® Expression Analysis Technical Manual” (see Note 21). GeneChip® probe arrays are manufactured by Affymetrix (see Note 22). Many universities and private companies are now fully equipped with the Affymetrix GeneChip TM® array platform and offer customers various levels of service for array probe processing (GeneChip® array probe purchase, cDNA labelling, array hybridization, scanning, array analysis, etc.). The Affymetrix Wheat GeneChip® array, created within the Affymetrix GeneChip® Consortia Program, contains 61,127 probe sets representing 55,052 transcripts for all 42 chromosomes in the wheat genome. The design of the array was based on public domain data from GenBank® and dbEST (http://www.affymetrix. com/community/research/consortia.affx). The gene chip wheat genome array can be used for gene expression studies in the different wheat species: T. aestivum (UniGene Build ~38, April 24, 2004), T. tauschii, T. monococcum, T. turgidum, T. turgidum ssp. durum. The array includes probes designed to ESTs and full-length sequences from all these species through May 2004. The GeneChip® probe arrays are manufactured using a process that combines photolithography and combinational chemistry. Each 1.7 cm2 chip comprises tens to hundreds of thousands of different oligonucleotide probes, with each probe “spot” (Probe Cell) being 20 mm. Each target transcript is measured by a probe set consisting of 11 PM probes and 11 MM probes which are 25 bases long. These PM and MM probes (Probe Pairs) are placed next to each other. The gene expression level can be calculated on the basis of the intensity differences between the PM and MM for all the probe pairs using Affymetrix software (see Note 22) or using the intensity from the PM probes alone (RMA and gcRMA analysis).
3.11. Wheat GeneChip® Expression Analysis
1. RNA sample preparation The isolation and purification of RNA (total RNA or purified poly (RNA species) can be carried out using established protocols for the specific tissue (protocols similar to those described in Section 3.4). There are also many commercially available kits for RNA isolation. For example, the TRIZOL-Reagent® (Invitrogen, see Note 23) protocol is recommended for total RNA extraction from wheat flag leaves. Modification of standard protocol for homogenization (TRIZOL instructions for RNA isolation, step 1) and RNA precipitation (TRIZOL-Reagent® instructions
262
Baudo et al.
for RNA isolation, step 3) should be included and combined if the extract has high contamination with proteoglycans and polysaccharides. During the homogenization step, an additional centrifugation of the initial homogenate (see Note 24) should be performed. During the RNA precipitation step, total RNA recovered from aqueous phase should be precipitated in 2-Propanol and a high-salt precipitation solution (see Note 25). For high purity (particularly for A260/A230 ratio >1.8), we also recommend that the RNA is cleaned up through an RNeasy column (Qiagen, see Note 26). The RNA clean-up is recommended after genomic DNA contamination is removed from total RNA samples (see Section 3.4). The nucleic acid concentration and quality are checked by using the Nanodrop ND 1000 spectrophotometer (Labtech Int, UK) and Agilent 2100 Bioanalyser (RNA 6000 Nano Assay, Agilent Technologies, Palo Alto, CA, USA), respectively (see Note 16). 2. cDNA synthesis and labelling (see Note 27) Double-stranded cDNA is synthesized from total RNA (or purified poly(A)RNA). A biotin-labelled cRNA is then produced by in vitro transcription (IVT) from the cDNA. Fragmentation of the cRNA before hybridization onto the GeneChip® probe array has been shown to be critical for maximum sensitivity. 3. Target hybridization (see Note 28) A hybridization cocktail is prepared including the fragmented cRNA (target) and probe array controls. The hybridization to the probe array takes place during 16-h incubation. 4. Probe array washing and staining 1. Fluidic station set up: The fluidic station is used to wash and stain the probe array. It is operated using a GeneChip® Operating System (GCOS)/Microarray Suite on a PCcompatible workstation. This step includes setting up and priming the fluidic station (see Notes 29 and 30). 2. Probe array washing and staining (see Note 31): After 16 h of hybridization, the hybridization cocktail is removed from the probe array (see Note 28) and the probe array is filled completely with appropriate volume of recommended wash buffer (see Note 31). 5. Probe array scan (see Note 31) Once scanned, each complete probe array image is stored in a data file identified by the experiment name and saved with a data image file (.dat) extension. The GCOS captures and analyses the array image and experimental data: probe cells are defined and the intensity for each cell is computed (see Note 22). Owing to the higher quality control during manufacture, many of the issues on image analysis for cDNA arrays (Section 3.8), e.g., spatial trends, are considered not to apply to Affymetrix chips. Output
Establishing Substantial Equivalence: Transcriptomics
263
files containing a signal value for every PM and MM probe are generated (cel files). 3.12. Wheat GeneChip® Data Analysis
As Affymetrix chips are so widely used across many organisms, considerable effort has been invested in developing and testing different methods of analysing the signals to give the best measure of gene expression. The method developed by Affymetrix uses the difference between the PM and MM probes averaged across a “probe set” to give a measure of expression (MAS5). However, there is some doubt as to the usefulness of MM signals and some of the alternative methods seem empirically to outperform MAS5. The Robust Multichip Average (RMA) algorithm takes the PM data from all the samples in an experiment (i.e., all the cel files) and not only normalizes to the median expression value of each chip but also imposes the same variance on the expression data from each chip (17). The gcRMA algorithm is a variant of RMA that takes into account the GC composition of each probe to weight the contribution to the probe set signal (18) and has been shown to perform well for Affymetrix array data compared to alternative methods when assessed by agreement to spike-ins (19) and to real-time RT-PCR measurements (20). The gcRMA algorithm can be applied using the open source Bioconductor package (http://www.bioconductor.org/) or using commercial software, e.g., GeneSpring® 7 (Agilent Technologies, Inc). Once the method of expression has been chosen (i.e., MAS5, RMA, gcRMA, or other), the subsequent analysis is the same, with the exception that unnormalized data such as MAS5 must first be normalized (e.g., by dividing by the median expression value for each chip). It is recommended practice to first filter the probe sets considered, to remove all except those that show absolute expression above a threshold (this can be judged from the signal given by non-wheat controls included on the chip) in at least one sample. These probe sets are further filtered to those that show differential expression above a threshold between any pair of samples; usually a fold change of 1.4 is considered the smallest that can be detected. For a substantial equivalence experiment, the design will normally consist of two or more genotypes with at least three biological replicates of each. To identify a significantly differentially expressed set of genes between genotypes, analysis of variance (ANOVA) is applied to the expression values on a log scale (since they usually have log-normal distribution) for every probe set. Given the very large number of probe sets typically tested at this stage, even after filtering, a multiple-testing correction can be applied. The Benjamini–Hochberg false discovery rate (FDR) correction (21) is an appropriate choice. Applying ANOVA to many probe sets with a typical p-value threshold of 0.05 and Benjamini–Hochberg multiple-testing correction, about 5% of the resulting list of genes that pass can be expected
264
Baudo et al.
to be present by chance alone. However, if few or no genes pass this correction, it can be omitted and the predicted FDR taken into account in interpretation. For example, if 1,000 probe sets are tested at p < 0.05 and 50 pass with no multiple testing correction, this is no more than would be expected by chance alone. The criteria for substantial equivalence in interpreting the lists of significant genes are then the same as described for cDNA array experiments (5). 3.13. Validation of Transcriptomic Data by Real Time RT-PCR
Two general methods for the quantitative detection of genes (amplicon) have become established: gene-specific fluorescent probes (e.g., TaqMan chemistry) or specific double strand DNA binding agent (SYBR green chemistry) (22). We used SYBR green chemistry to validate by real time RT-PCR the expression of selected DEGs identified in the cDNA microarray. Specific primers were designed for the different genes (see Note 32). ●
Preparation of real time RT-PCR reactions 1. PCRs are performed in optical 96-well plates with an ABI® PRISMA 7500 Sequence Detection System (Applied Biosystems, Foster City, CA, USA). 2. Total RNAs (2 µg of DNase treated RNA, from Section 3.4) are reverse transcribed using the Reverse Transcriptase (RT) and buffer (SuperScriptTM III RT, Invitrogen) following manufacturer’s instructions. 3. The PCR reactions are performed with an estimated 100 ng of cDNA, 12.5 µl of 2x Platinum® qPCR Super MixUDG with SYBR green (Invitrogen), 0.5 µl of ROX Reference Dye (Invitrogen), and specific pair of primers (200 ng of each specific primer) in a final volume of 25 µl (see Note 33). 4. The following standard thermal profile is used for all PCR reactions: one cycle at 50°C for 2 min, one cycle at 95°C for 2 min, 40 cycles at 95°C for 15 s and 60°C for 1 min (see Note 34).
●
Real-time PCR data extraction and analysis 1. Raw data extraction. Threshold cycle (Ct) values were collected from each sample (23). In order to compare the Ct values from different cDNA samples, the Ct of all the validated genes were normalized to that of the housekeeping gene actin (see Note 35). 2. Data analysis. The relative expression of the selected genes was calculated using the equations as proposed by Pfaffl (24). These algorithms include the correction for gene amplification efficiency. PCR amplification efficiency for the different target genes and reference gene
Establishing Substantial Equivalence: Transcriptomics
265
was estimated using the equations as proposed by Ramakers et al. (25). 3.14. Submission of Microarray Dataset to a Public Database: ArrayExpress
Microarray data should be deposited in a public repository database. ArrayExpress (26) is a public database at European Bioinformatics Institute (EBI) for high-throughput functional genomic data. This database consists of two parts: the ArrayExpress Repository (MIAME, the compliant primary archive) and the ArrayExpress DataWarehouse (a database of selected gene expression profiles from the repository which is consistently reannotated). ArrayExpress is one of the three databases recommended by MGED (Microarray Gene Expression Data Society) for deposition of public microarray data. It stores data used in publications in a confidential form while allowing access to authorized users such as journal editors and referees, with specified data being made publicly available upon publication of the paper to which they relate. The microarray data submission includes, in general, four main steps: (1) creating a new account for submission (MX account), (2) protocol submission (array preparation protocols, sample growth and extraction protocols, sample labelling, hybridization, scanning, analysis protocols, etc.), (3) array design submission (name, design, technology, etc.), and (4) experiment submission (experiment design, publications, samples, extracts, etc.). For more information, see webpage: http:// www.ebi.ac.uk/miamexpress/Help.
3.15. Statistical Modelling
The method of Residual Maximum Likelihood (REML) (27), implemented in the GenStat Seventh Edition (2003) statistical system, was used to fit a mixed model (consisting of random and fixed effects) to the data as a complete set, for any particular comparison (for example, B102-1-1 to L88-31 at 14 dpa), with up to six observations per gene. The design of the experiment in terms of the variance structure relating to biological replicates and technical replicates was assessed in terms of the model deviance, changes in which are distributed as X2 on the corresponding change in degrees of freedom when testing between models. Fixed terms in the model are assessed using the Wald test (28), the test statistic for this also being distributed as X2. The modelling procedure therefore takes account of variation due to the design terms (random effects) and fits a fixed term for them (9,246 genes). Having assessed the significance of random and fixed terms, the model used for B102-1-1 to L88-31 at 14 dpa was yijk = BioRepi + (BioRep ´ TechRep)ij + Genek + eijk, where yijk represents the log2 ratio of gene expression of B1021-1 to L88-31 at 14 dpa for biological replicate (BioRep) i, i = 1,…, 3, technical replicate (TechRep) j, j = 1, 2 (i.e., dye swap: j
266
Baudo et al.
= 1 for dye 555/dye 647 ratio, j = 2 for dye 647/dye 555 ratio) and Gene k, k = 1,…, 9,246. The error term is given by eijkl. To fit the model, all terms were set up as factors (indicator columns) in GenStat. Use of the x between BioRep and TechRep model terms implies an interaction. The main effect of biological replicate followed by this interaction indicates that the random part of the model is the technical replicate (dye swap) nested within biological replicate. This form for the random part therefore matches the design used. The Gene term in the model contains parameters for all 9,246 genes. On the log-scale, the ratio of any one parameter for a gene to its standard error forms a t-test on the overall degrees of freedom for the model, for that gene. This allowed the statistical significance of differential expression for any gene (e.g., when comparing B102-1-1 to L88-31 at 14 dpa using the above model) to be assessed. Similar modeling was used for the comparisons of L88-31 with L88-18 and for B1021-1 with L88-18 at the three material/time points. For the three Cadenza lines, data from each set of the three experimental material/time points (seed/14 dpa and 28 dpa; leaf/8 dpg) were modelled by combining data from all three comparisons using a design term to indicate the precise comparison: B1355-4-2(18) vs. control Cadenza, B1118-8-4(6) vs. control Cadenza, or B1118-8-4(6) vs. B1355-4-2(18), and two indicator variables to denote the comparison with B1355-4-2(18) or with B1118-8-4(6) for the fixed effect of each gene. Having assessed the significance of the design terms, the best model was the same for all three material/time points: yijkl = Comparisoni + (Comparison ´ BioRep)ij + (Comparison ´ BioR ep ´ TechRep)ijk + (Genel ´ Cad1) + (Genel ´ (Cad2 ) + eijkl, where yijkl denotes the log2 ratio for comparison i = 1, 2, 3 (B13554-2(18) vs. control Cadenza, B1118-8-4(6) vs. control Cadenza, and B1118-8-4(6) vs. B1355-4-2(18)); biological replicate j, j = 1, 2, 3; technical replicate k, k = 1, 2 (dye swap: dye 555/dye 647, dye 647/dye 555) and gene l, l = 1,…, 9,246 for a comparison with B1355-4-2(18) (Cad1) as “treated line” (numerator of log2 ratios) (Cad1 = 1) or “control line” (denominator of log2 ratios) (Cad1 = –1) or not present (Cad1 = 0), or for comparison with B1118-8-4(6) (Cad2) as “treated line” (Cad2 = 1) or not present (Cad2 = 0), and where eijkl is the error in fitting yijkl. The dot between terms indicates their interaction, so that in the above model, for the random terms, the biological replicates are nested within the comparisons being made, while the technical replicates (dye swaps) are nested within the biological replicates. Hence, again here, the form of the random part of the model found to be best also relates intuitively to the form of the design
Establishing Substantial Equivalence: Transcriptomics
267
used for the Cadenza study. Also, because of the form of the fixed part of the model used, i.e., with the indicator variables, the test of any gene for a particular comparison benefits from the extra information derived for that gene over all comparisons, all three tests for each gene being made using the same overall residual variation.
4. Notes 1. Data deposition. The gene expression data presented here have been deposited in ArrayExpress database (http:// www.ebi.ac.uk/arrayexpress/Submissions/index.html) with the accession number A-MEXP-177. 2. Unless stated otherwise, all chemical solutions should be nuclease free. 3. Extraction buffer is best stored at room temperature. 0.5 g/l of spermidine should be added to the extraction buffer after autoclaving. 2-Mercaptoethanol should be added to the aliquoted buffer immediately before use. 4. Reagents should be handled carefully and under a flow hood as some are toxic, flammable, or irritants. 5. Fresh solutions from concentrated stocks of the individual constituents should be extracted just before they are to be used. 6. Tris–HCl, pH 8.0, buffered phenol:chloroform (1:1) is prepared by carefully adding 800 ml of 10 mM Tris–HCl, pH 8.0, to 400 g of redistilled ultra-pure phenol crystals and 0.4 g of 8-hydroxiquinolene (antioxidant). The mixture is stirred for 1–2 h and then allowed to stand and separate. The upper aqueous buffer layer is discarded. The lower yellow-coloured phenol:chloroform layer is retained for use and is stable at room temperature for 1–2 months, if kept in the dark. 7. Dissolve one vial of the Alexa-dye in 2 ml of DMSO (dimethyl sulfoxide) 8. Pre-cool the mortar, pestle, and spatulas in liquid nitrogen and keep frozen before use. 9. Warm 15 ml of extraction buffer + 300 µl of 2-mercaptoethanol to 65°C in a water bath. 10. Keep a second mortar and pestle and a buffer for homogenization at room temperature.
268
Baudo et al.
11. Add more buffer in small amounts (1–2 ml) if the homogenate is too viscous. 12. Add potassium acetate (pH 5.5) to precipitate K+-SDS/protein/genomic-DNA/carbohydrate complexes. 13. Be careful not to contaminate the retained aqueous layer with material from the precipitated, interfacial, denatured protein layer. 14. The efficiency of precipitation of RNA by LiCl is dependent on the nucleic acid concentration, and there is a sudden and marked reduction in precipitation efficiency when the RNA concentration falls bellow 100 mg/ml. 15. To exchange K+ for the residual Li+ ions bound to the purified RNA. 16. For well-purified RNA, the A260/A280 ratio should be approximately 2.0 and the A260/A230 ratio should be >1.8. Concentration and purity can be determined using the Nanodrop ND 1000 spectrophotometer and the Agilent 2100 Bioanalyzer (http://www.agilent.com). 17. Alternatively, cDNA for relative quantification by real time RT-PCR can be produced with SuperScriptTM III RT and RNaseOUT kit (Invitrogen), following the manufacturer’s instructions. 18. To elute the aa-dUTP-labelled cDNA, add 10 µl of nucleasefree water to the centre of the Mini Elute column membrane, allow the column to stand for 5 min and centrifuge for 5 min. 19. Place each slide in a 50-ml Falcon tube. Cover all tubes with aluminium foil. 20. In our study, the data were analysed using the GenStat system because GeneSpring was not convenient for the modelling that was required to test for the significance of individual genes in the context of an overall model for all genes rather than on a gene-by-gene basis. In many studies, the normalization procedure is most conveniently done using the GenSpring system. However, since our study was carried out, the GenStat Statistical System has been developed further (in the eighth and ninth editions), allowing a more sophisticated approach to be taken in modelling microarray data. The data after the image analysis can now be more simply imported to GenStat for normalization and full analysis. Furthermore, subsequent and on-going research suggests that a different modelling technique could be applied that accounts specifically for the spatial variability of spots within blocks on a slide (29, 30). Alternatively, hierarchical mixture modelling (31, 32) of the normalized expression values
Establishing Substantial Equivalence: Transcriptomics
269
could be used in order to deal with the complex variability of the data while still having power to detect the differential expression where there are few observations per gene. Finally, the analysis of multiple laser scans of microarray slides (33) followed by functional regression modelling may allow the effects of the intrinsic noise level of the scanner on the censoring of highly expressed genes to be overcome. 21. Where a large number of genes are tested using the modelling approach discussed here, an investigation of the false discovery rate of significant genes would be useful through the fitting of a mixture distribution to the p-values for genes using theory (34), now incorporated as the FDRMIXTURE procedure in GenStat. Finally, the problem of having a small number of data points for each of a large number of genes to be tested can be alleviated by using a variance shrinkage method (35). This uses a test based on variance estimates that are gene-specific but combining information across the genes. This is more powerful than tests on individual genes but avoids the problem of false discovery rate associated with using an assumption of common underlying variance for all genes (as taken in the modelling described here). 22. Protocols can be downloaded from the web page: http:// www.affymetrix.com/support/technical/manual/netaffx_ MAGE_ML_manual.affx. 23. See http://www.affymetrix.com for current technology references. Up to 1.3 million different oligonucleotide “probes” (where a probe is any standard 25-mer oligo sequence synthesized on the array used to detect a complementary target in solution) are synthesized on each array. Each probe is located in a specific area called a “probe cell”. Each probe cell contains between hundreds of thousands and millions of copies of a given nucleotide. The “probe pair” refers to the fundamental detection unit on an Affymetrix probe set consisting of a perfect match (PM) and corresponding mistmach (MM) oligo. On the array, a set of probe pairs (the “probe set”) represents the selected expression sequences. 24. TRIZOL-Reagent, Invirtogen catalog number 15596-026. 25. Following homogenization (TRIZOL-Reagent® instructions for RNA isolation, step 1), remove insoluble material from the homogenate by centrifugation at 12,000 x g for 10 min. 26. Total RNA in the aqueous phase should be precipitated in 0.25 ml of 2-propanol (Sigma) followed by adding 0.25 ml of a high-salt precipitation solution (0.8 M sodium citrate and 1.2 M NaCL) per 1 ml TRIZOL-Reagent®. 27. RNs easy column (Qiagen) used to clean up and concentrated eluted total RNA, following manufacturer’s instructions.
270
Baudo et al.
28. Protocols described in GeneChip® expression Analysis Technical Manual, Section 2, Chapter 1. 29. Protocols described in GeneChip® Expression Analysis Technical Manual, Section 2, Chapter 2. 30. Protocols described in GeneChip® Expression Analysis Technical Manual, Section 2, Chapter 3, Part 7. 31. Refer to GeneChip® Fluidics Station User’s Guide for instructions. 32. Protocols described in GeneChip® Expression Analysis Technical Manual, Section 2, Chapter 3.9 for “Probe wash and Stain” and Chapter 3.15 for Probe Array Scan). Review the scanner user’s manual for safety precautions and more information. 33. Specific pairs of primers for SYBR-green detection and quantification of selected differentially expressed genes (DEG) for microarray validation (EST clone sequences searched in http://www.cerealsdb.uk.net/index.htm) were designed using Primer Express® software (ABITM PRISMA) following the TaqMan® Probe and Primer Design guides. 34. A master mix with sufficient cDNA and reaction components is prepared prior to dispensing into individual wells to ensure that each reaction (i.e., three technical replicates for each of the three biological replicates per sample tested) contains an equal amount of cDNA, and that pipetting and other errors are reduced. 35. A dissociation curve analysis is performed for each pair of specific primers to detect if non-specific amplification has occurred. To generate a baseline-subtracted plot of the logarithmic increase in fluorescence signal (DRn) vs. cycle number, the baseline data were collected for most of the amplifications between cycle 3 and 15. 36. The Actin gene (clone ID: H01_p335_plate_6; http://www. cerealsdb.uk.net/index.htm) used as internal control gene does not show differential expression between the lines, tissues and developmental stages under study. 37. Availability: http://www.ebi.ac.uk/arrayexpress.
Acknowledgements Rothamsted Research receives grant-aided support from the Biotechnology and Biological Sciences Research Council of the UK. The transcriptomic studies were supported by a grant under the BBSRC, Gene Flow Initiative (ref. GM 14152). The authors would
Establishing Substantial Equivalence: Transcriptomics
271
like to thank Mr. Adrian Price at Rothamsted Research for discussions of methods for microarray data analysis. We also acknowledge our colleagues Prof. Michael Holdsworth (University of Nottingham), Prof. Keith Edwards (University of Bristol), Ms. Rebecca Lyons, and Dr. Gabriela M. Pastori (Rothamsted Research).
References 1. Evans, L. T.(1993)Crop Evolution, Adaptation and Yield.Cambridge University Press,Cambridge. 2. FAO/WHO (1996) Biotechnology and Food Safety, Report of a Joint FAO/WHO Consultation. 3. Wilson, I. D., Barker, G. L. A., Beswick, R. W., Shepherd, S. K., Lu, C., Coghill, J. A., Edwards, D., Owen, P., Lyons, R., Parker, J. S., Lenton, J. R., Holdsworth, M. J., Shewry, P. R. and Edwards, K. J. (2004) A transcriptomics resource for wheat functional genomics. Plant Biotechnol. J. 2, 495–506. 4. Wilson, I. D., Barker, G. L. A., Lu, C., Coghill, J. A., Beswick, R. W., Lenton, J. R. and Edwards, K. J. (2005) Alteration of the embryo transcriptome of hexaploid winter wheat (Triticum aestivum cv. Mercia) during maturation and germination. Funct. Integ. Genomics 5, 144–154. 5. Baudo, M. M., Lyons, R., Powers, S., Pastori, G. M., Edwards, K. J., Holdsworth, M. J. and Shewry, P. R. (2006) Transgenesis has less impact on the transcriptome of wheat grain than conventional breeding. Plant Biotechnol. J. 4, 369–380. 6. Barro, F., Rooke, L., Békés, F., Gras, P., Tatham, A. S., Fido, R. J., Lazzeri, P., Shewry, P. R. and Barcelo, P. (1997) Transformation of wheat with HMW subunit genes results in improved functional properties. Nat. Biotechnol. 15, 1295–1299. 7. Rooke, L., Steele, S. H., Barcelo, P., Shewry, P. R. and Lazzeri, P. (2003) Transgene inheritance, segregation and expression in bread wheat. Euphytica 129, 301–309. 8. Shewry, P. R., Halford, N. G., Tatham, A. S., Popineau, Y., Lafiandra, D. and Belton, P. S. (2003) The high molecular weight subunits of wheat glutenin and their role in determining wheat processing properties. Adv. Food Nutr. Res. 45, 221–302. 9. Lawrence, G. J., Macritchie, F. and Wrigley, C. W. (1998) Dough and baking quality of
10.
11.
12.
13.
14.
15.
16.
17.
18.
wheat lines in glutenin subunits controlled by Glu-A1, Glu-B1 and Glu-D1 loci. J. Cereal Sci. 7, 109–112. Shewry, P. R., Tatham, A. S. and Fido, R. J. (1995) Plant Gene Transfer and Expression Protocols: Separation of Plant Proteins by Electrophoresis, Vol. 49. Humana, Totowa. Chang, S., Puryear, J. and Cairney, J. A. (1993) Simple and efficient method for isolating RNA from pine trees. Plant Mol. Biol. Rep. 11, 113–116. Cheng, G. P., Wilson, I. D., Kim, S. H. and Grierson, D. (2001) Inhibiting expression of a tomato ripening-associated membrane protein increases organic acids and reduces sugar levels of fruit. Planta 212, 799–807. Halford, N. G., Field, J. M., Blair, H., Urwin, P., Moore, K., Robert, L., Thompson, R., Flavell, R. B., Tatham, A. S. and Shewry, P. R. (1992) Analysis of HMW glutenin subunits encoded by chromosome 1A of bread wheat (Triticum aestivum L.) indicates quantitative effects on grain quality. Theor. Appl. Genet. 83, 373–378. Christensen, A. H. and Quail, P. H. (1996) Ubiquitin promotor-based vectors for highlevel expression of selectable and/or screenable marker genes in monocotyledonous plants. Transgen. Res. 5, 213–218. Kerr, M. K. (2003) Linear models for microarray data analysis: hidden similarities and differences. J. Comput. Biol. 10, 891–901. Eisen, M. B., Spellman, P. T., Brown, P. O. and Botstein, D. (1998) Cluster analysis and display of genome-wide expression patterns. P.N.A.S. USA 95, 14863–14868. Irizarry, R. A., Hobbs, B., Collin, F., BeazerBarclay, Y. D., Antonellis, K. J., Scherf, U. and Speed, T. P. (2003) Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 4, 249–264. Wu, Z. J., Irizarry, R. A., Gentleman, R., Martinez-Murillo, F. and Spencer, F. (2004)
272
19.
20.
21.
22.
23. 24.
25.
26.
Baudo et al. A model-based background adjustment for oligonucleotide expression arrays. J. Am. Stat. Assoc. 99, 909–917. Choe, S. E., Boutros, M., Michelson, A. M., Church, G. M. and Halfon, M. S. (2005) Preferred analysis methods for Affymetrix GeneChips revealed by a wholly defined control dataset. Genome Biol. 6, Artn r16. Qin, L. X., Beyer, R. P., Hudson, F. N., Linford, N. J., Morris, D. E. and Kerr, K. F. (2006) Evaluation of methods for oligonucleotide array data via quantitative real-time PCR. Bioinformatics 7, Artn 23. Benjamini, Y. andHochberg, Y. (1995) Controlling the false discovery rate – a practical and powerful approach to multiple testing. J. Royal Stat. Soc. Ser. B-Methodological 57, 289–300. Bustin, S. A. (2004) A-Z of Quantitative PCR: Quantification Strategies in Real-Time PCR. IUL Biotechnology Series, Int. Univ. Line, La Jolla. ABI-Prisma (2001) 7700 Sequence Detection System: Relative Quantification, Vol. 2. Pfaffl, M. W. (2001) A new mathematical method for relative quantification in real-time RT-PCR. Nucleic Acids Res. 29, 2003–2007. Ramakers, C., Ruijter, J. M., Lekanne Deprez, R. H. and Moorman, A. F. M. (2003) Assumption-free analysis of quantitative real-time polymerase chain reaction (PCR) data. Neurosci. Lett. 339, 62–69. Parkinson, H., Kapushesky, M., Shojatalab, M., Abeygunawardena, R., Coulson, R., Farne, A., Holloway, E., Kolesnykov, N., Lilja, P., Lukk, M., Mni, R., Rayner, T., Sharma, A., William, E., Sarkans, U. and Brazma, A. (2006) ArrayExpress- a public database of microarray experiments and gene expression profiles. Nucleic Acid Res. doi:10.1093/nar/gkl995.
27. Patterson, H. D., and Thompson, R. (1971). Recovery of inter-block information when block sizes are unequal. Biometrika 58, 545–554. 28. Welham, S. J., and Thompson, R. (1997). Likelihood ratio tests for fixed model terms using residual maximum likelihood. J. Royal Stat. Soc. Ser B 59, 701–714. 29. Burgueño, J., Crossa, J., Grimanelli, D., Leblanc, O. and Autran, D. (2005) Spatial analysis of cDNA microarray experiments. Crop Sci. 45, 748–757. 30. Baird, D., Johnston, P. and Wilson, T. (2004) Normalisation of microarray data using a spatial mixed model analysis which includes splines. Bioinformatics 20, 3196– 3205. 31. Smyth, G. K. (2004) Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Stat. Appl. Genet. Mol. Biol. 3. 32. Newton, M. A., Noueiry, A., Sarkar, D. and Ahlquist, P. (2004) Detecting differential gene expression with a semiparametric hierarchical mixture method. Biostatistics 5, 155–176. 33. Khondoker, M. R., Glasbey, C. A. and Worton, B. J. (2006) Statistical estimation of gene expression using multiple laser scans of microarrays. Bioinformatics 22, 215–219. 34. Allison, D. B., Gadbury, G. L., Heo, M., Fernández, J. R., Lee, C. -K., Prolla, T. A. and Weindruch, R. (2002) A mixture model approach for the analysis of microarray gene data. Comput. Stat. Data Anal. 39, 1–20. 35. Cui, X., Gene-Hwang, J. T., Qui, J., Blades, N. J. and Churchill, G. A. (2003) Improved statistical tests for differential gene expression by shrinking variance components estimates. Biostatistics 6, 59–75.