© 2001 Nature Publishing Group http://biotech.nature.com
RESEARCH ARTICLE
© 2001 Nature Publishing Group http://biotech.nature.com
Expression profiling using microarrays fabricated by an ink-jet oligonucleotide synthesizer Timothy R. Hughes1, Mao Mao1, Allan R. Jones1, Julja Burchard1, Matthew J. Marton1, Karen W. Shannon2, Steven M. Lefkowitz2, Michael Ziman1, Janell M. Schelter1, Michael R. Meyer1, Sumire Kobayashi1, Colleen Davis1, Hongyue Dai1, Yudong D. He1, Sergey B. Stephaniants1, Guy Cavet1, Wynn L. Walker1, Anne West1, Ernest Coffey1, Daniel D. Shoemaker1, Roland Stoughton1, Alan P. Blanchard1, Stephen H. Friend1, and Peter S. Linsley1*
We describe a flexible system for gene expression profiling using arrays of tens of thousands of oligonucleotides synthesized in situ by an ink-jet printing method employing standard phosphoramidite chemistry. We have characterized the dependence of hybridization specificity and sensitivity on parameters including oligonucleotide length, hybridization stringency, sequence identity, sample abundance, and sample preparation method. We find that 60-mer oligonucleotides reliably detect transcript ratios at one copy per cell in complex biological samples, and that ink-jet arrays are compatible with several different sample amplification and labeling techniques. Furthermore, results using only a single carefully selected oligonucleotide per gene correlate closely with those obtained using complementary DNA (cDNA) arrays. Most of the genes for which measurements differ are members of gene families that can only be distinguished by oligonucleotides. Because different oligonucleotide sequences can be specified for each array, we anticipate that ink-jet oligonucleotide array technology will be useful in a wide variety of DNA microarray applications.
Results and discussion
DNA microarrays provide a means to quantify tens of thousands of discrete sequences in a single assay. Among the most widespread uses of microarrays is expression profiling1,2, which has found many applications including discovery of gene functions3,4, drug evaluation4–6, pathway dissection7, and classification of clinical samples8–10. Two major platforms for high-density microarray manufacture are in common use. The first involves 25-mer oligonucleotides made by a photolithographic process similar to manufacture of computer chips11. The second utilizes robotic deposition or "spotting" of DNA molecules1. Spotted arrays are commonly referred to as "cDNA microarrays", although clones, PCR products, or oligonucleotides can all be spotted. Oligonucleotides offer greater specificity than cDNAs or PCR products, having the capacity to distinguish singlenucleotide polymorphisms12 and discern splice variants. In both systems, creation of a new array design is relatively inconvenient and/or expensive, requiring either a new set of masks (for photolithography) or new samples to deposit (for spotted cDNA arrays). Flexibility to create new arrays is becoming increasingly important as more genomes are sequenced and more applications for microarrays are described. One possible solution to this need is a recently described system for maskless light-directed oligonucleotide synthesis using a micromirror array13. Here, we describe a flexible platform for microarray expression profiling, centered around an in situ oligonucleotide synthesis method in which the ink-jet printing process is modified to accommodate delivery of phosphoramidites to directed locations on a glass surface14. Using the flexibility offered by this system, we have characterized the importance of various experimental parameters to hybridization specificity and sensitivity. We show that results obtained are in good overall agreement with spotted cDNA microarrays, that oligonucleotides have a superior ability to distinguish closely related sequences, and that a single oligonucleotide is suitable for detection of single-copy genes in human cells. 1Rosetta
342
A second-generation ink-jet oligonucleotide synthesizer. We have constructed a second-generation ink-jet oligonucleotide synthesizer that delivers a grid of~25,000 phosphoramidite-containing propylene carbonate microdroplets to a 25 × 75 mm glass support with a hydrophobic surface containing exposed hydroxyl groups. Coupling efficiency was estimated to be 94–98% by a highperformance liquid chromatographic (HPLC) analysis of 25-mers that were synthesized on an array surface utilizing a cleavable linkage to the surface (G. Haddad and M. Perbost, personal communication). Impact of oligonucleotide length and hybridization stringency on hybridization properties. Although short (20–25 base) oligonucleotides should theoretically provide the greatest discrimination between related sequences, observations by others have indicated that short surface-bound oligonucleotides often have poor hybridization properties2,12,15. Spacers that move oligonucleotides away from the surface and closer to the solution state have been used to enhance hybridization yields12,15. Guo et al.12 created spacers by adding a poly-dT to the sequences of interest. We reasoned that adding specific sequence, instead of poly-dT, might further augment sensitivity and possibly specificity. Therefore, we initially investigated the impact of modulating oligonucleotide length on hybridization properties. To separate the effects of length from those of synthesis efficiency, capping was omitted from synthesis cycles. In this way, coupling inefficiency results in a random distribution of single-base deletions among the millions of oligonucleotides in each spot, rather than a reduction of total oligonucleotide synthesized. The two major aspects of oligonucleotide performance are sensitivity (signal intensity) and specificity (ratio of specific to nonspecific hybridization). Figure 1A shows an experiment in which the dependence of sensitivity and specificity on oligonucleotide length is assessed. In vitro-transcribed cRNA from the cloned yeast gene GCN4 (a unique gene) was labeled with cyanin 5 (Cy5; red), and
Inpharmatics, Inc., 12040 115th Avenue NE, Kirkland, WA 98034. 2Agilent Technologies, Inc., 3500 Deer Creek Road, Palo Alto, CA 94304. *Corresponding author (
[email protected]). nature biotechnology
•
VOLUME 19
•
APRIL 2001
•
http://biotech.nature.com
© 2001 Nature Publishing Group http://biotech.nature.com
RESEARCH ARTICLE A
GCN4
Figure 1. Effects of oligonucleotide length and hybridization stringency on hybridization to tiled DNA oligonucleotide microarrays. (A) Pseudocolor image of a GCN4 tiled array hybridized in 32% formamide. Oligonucleotides hybridizing with Cy5-labeled GCN4 cRNA are colored red; those hybridizing to the Cy3-labeled gcn4∆ cRNAs are green; and those hybridizing to both samples equally are yellow. Regularly spaced yellow spots (indicated with arrows; also along left and right edges) hybridize to "gridline" control oligonucleotide. (B) Dependence of sensitivity and specificity on oligonucleotide length and hybridization stringency. (C) Prediction of nonspecific hybridization. Predicted crosshybridization (taken as proportional to exp(∆G - ∆G0), where ∆G (calculated using the nearest-neighbor model) is the binding energy to the perfect match of each 60-mer, and ∆G0 is the binding energy to the longest ungapped identical non-HXT3 sequence within the yeast genome) correlates with actual cross-hybridization (r = 0.53). The small yellow box indicates the best-choice 60-mer.
Length (nt)
Gridline
20 25 30 35 40 45 50 55 60
3'
GCN4 - Average sensitivity
B
GCN4 - Average specificity 60
60
50
250 45
200 40
150
35
30
100
25
50
9
55
8
50
7 45
6 40
5 35
4
30
3
25
2
0
16
32
48
0
16
Formamide (%)
32
48
Formamide (%)
1
HXT3 0.1
1 0.01
0.5
0.001
0
150
300
450
600
750
900
1050
1200
1350
1500
0
Cross-hyb strength (predicted)
C
Specificity of hybridization to 60-mers. To better understand the relationship between sequence identity and hybridization efficiency to surface-bound 60-mers, we examined hybridization of a synthetic transcript to an array of oligonucleotides carrying permutations in a single 60-base region of the transcript. We first examined the effects of different numbers of randomly placed single-base mismatches and deletions (ranging from 2 to 20 per oligonucleotide) on hybridization yield at three different sample concentrations (Fig. 2A, B). Eighteen or more randomly placed mismatches per 60-mer reduced hybridization to background levels, even with samples at 1,000 copies per cell equivalent in human cells. This suggests that, on average, maximal discrimination is achieved by choosing oligonucleotides that differ from all other sequences within the genome by 18 or more out of 60 bases. However, this general rule underestimates the specificity that can be achieved among closely related sequences, because the position of the mismatch relative to the surface can have a dramatic impact on the stability of the duplex (Fig. 2C). For example, on average five or more randomly placed mismatches are required to reduce signal to
300
Oligonucleotide length (nt)
55
< Intensity - background > (arbitrary units)
Oligonucleotide length (nt)
350
Non-specific, specific intensity
© 2001 Nature Publishing Group http://biotech.nature.com
5'
Tiling start position (nt)
in vitro-transcribed cRNA from a strain deleted for the same gene was labeled with Cy3 (green). The two samples were hybridized simultaneously to an ink-jet oligonucleotide array with a "tiling" design, in which an oligonucleotide was synthesized starting at every third base (i.e., bases 1, 4, 7, 11, etc. of the 843-nucleotide GCN4 open reading frame (ORF), as indicated below Fig. 1A). This pattern was repeated with oligonucleotide lengths ranging from 20 to 60 bases (as indicated on the right in Fig. 1A). The Cy5 signal at each spot gives a measure of sensitivity, whereas the Cy5:Cy3 (specific:nonspecific) ratio measures specificity. This experiment included arrays hybridized at four different hybridization stringencies (0%, 16%, 32%, and 48% formamide). The contour plots in Figure 1B show that average sensitivity is highest at 16% formamide and increases with oligonucleotide length. However, specificity appears to decrease slightly going from 55 to 60 bases, and is highest at 32% formamide. Similar results were obtained when this experiment was repeated with other yeast genes (YER019C, a unique gene, and HXT3, a member of a family of 17 homologs in the yeast genome; data not shown). On average, 60-base oligonucleotides hybridized at 30–32% formamide represented a practical compromise between maximal sensitivity and specificity (Fig.1B and data not shown). Figure 1C shows the specific (red) and nonspecific (green) intensity for 60-mers at each tiling position for the HXT3 experiment hybridized at 32% formamide. Oligonucleotides with good specificity (those with high red:green ratios in Fig. 1C) can be predicted because they have the least sequence similarity to other HXT family members (the blue trace in Fig. 1C plots a sequence-based theoretical estimate of cross-hybridization potential). Thus maximization of hybridization specificity requires selection of oligonucleotides with minimal sequence similarity elsewhere in the genome. http://biotech.nature.com
•
APRIL 2001
•
VOLUME 19
•
nature biotechnology
343
© 2001 Nature Publishing Group http://biotech.nature.com
Figure 2. Dependence of hybridization intensity on number and location of mismatches and deletions. (A) Hybridization intensities are plotted as a function of the number of randomly placed single-nucleotide mismatches in the 60-mer sequence. Intensities are normalized to sample abundance. (B) Mean hybridization intensities are plotted as a function of the number of randomly placed single-nucleotide deletions in the 60-mer sequence. Intensities are normalized to sample abundance. (C) Mean hybridization intensities at 30 copies/cell equivalent are plotted as a function of the starting position for the indicated number of contiguous nucleotide mismatches in the 60-mer sequence.
Intensity (normalized units)
RESEARCH ARTICLE A
•
VOLUME 19
30 1 copy/cell 30 copies/cell 1000 copies/cell
1.5 1 0.5 0
0
2
4
Intensity (normalized units)
Deletion (%)
6
8
10
10
12
14
16
20
18
20
30 1 copy/cell 30 copies/cell 1000 copies/cell
2 1.5 1 0.5 0
0
2
4
6
8
10
12
14
16
18
20
Number of deletions Intensity (normalized units)
© 2001 Nature Publishing Group http://biotech.nature.com
B
nature biotechnology
20
Number of mismatches
Reliability and sensitivity of detected transcript ratios. The results above indicate that cross-hybridization to specific transcripts can be minimized or avoided, but do not address the possibility that cumulative low-magnitude, nonspecific hybridization could obscure rare transcripts. To determine whether transcript ratios at various abundances could be detected within the milieu of a complex mixture of sequences, we conducted experiments in which 19 different synthetic transcripts with randomly generated 60-mer inserts (similar to the one used above) were spiked into total RNA from the human Jurkat cell line at different ratios between the two channels. These RNAs were amplified, labeled, and hybridized to an array with 23,965 oligonucleotides representing human genes as well as oligonucleotides complementary to the 60-mer inserts (Fig. 3A–C). Spike-in ratios were detected in all cases, including ratios of 1:3 centered at 1 copy/cell equivalent (i.e., 0.5 copies/cell to 1.5 copies/cell). Thus, crosshybridization does not obscure specific signal even at low transcript abundance. These data also provide a practical means to calibrate the system for transcript ratio measurements. These experiments also compared three different sample amplification and labeling techniques: reverse transcription (RT) (in which cDNA is hybridized to the array), in vitro transcription (IVT), and PCR-coupled in vitro transcription (PCR-IVT) (in which cRNA is hybridized to the array). The three methods gave nearly identical transcript ratios for the synthetic spike-ins, even at low transcript abundance (Fig. 3A–C). The three methods also gave low falsepositive rates among the 23,965 human transcripts (Fig. 3D–F). When two different cell lines (Jurkat and K-562) were compared in duplicate, the three methods yielded highly reproducible ratios (Fig. 3G–I). However, the three methods are not interchangeable (Fig. 3J–L). Whereas cRNA samples prepared by different methods gave very similar results (Fig. 3L), cDNA samples detected more apparent transcript ratios than samples prepared by either cRNA method (Fig. 3J, K). Most of the ratios detected by cDNA but not cRNA corresponded to oligonucleotides with high deoxycytidine (dC) content. These oligonucleotides hybridized rapidly to sequences present in all cRNA samples tested (data not shown), indicating that they cross-hybridized with common sequences. Cross-hybridization with oligonucleotides enriched with dC rather than dG is consistent with the reported higher stability of d(pyrimidine):r(purine) versus d(purine):r(pyrimidine) hybrids16. This suggests that oligonucleotides with high dC content should be avoided when cRNA methods are used. Similar transcript ratios obtained from single oligonucleotides, multiple oligonucleotides and from spotted cDNAs. To further verify measurement reliability, we asked whether results obtained using ink-jet arrays were consistent with results from spotted cDNA microarrays. We hybridized a pair of yeast samples, one grown in synthetic complete (SC) medium and one grown in sporulation (Spo) medium, to both a spotted cDNA microarray1,6 and an ink-jet array containing as many as eight different 60-mer oligonucleotides per gene. The transcript abundance ratios (Spo:SC) obtained from the cDNA microarray (an array of double-stranded PCR products of average length 671 base pairs) and the average of the multiple 60-mers correlate at r = 0.97 (Fig. 4A). Whereas the intensity of different oligonucleotides from the same gene varied by >30% on average, transcript abundance ratios were typ344
Mismatch (%) 10 2
C
2.5 2 1.5 perfect match 1 mismatch 2 mismatches 5 mismatches 10 mismatches 20 mismatches
1 0.5 0
0
10
20
30
40
50
60
Mismatch start position solution end
surface end
ically more consistent, averaging