JOURNAL OF CLINICAL MICROBIOLOGY, Apr. 2006, p. 1209–1218 0095-1137/06/$08.00⫹0 doi:10.1128/JCM.44.4.1209–1218.2006 Copyright © 2006, American Society for Microbiology. All Rights Reserved.
Vol. 44, No. 4
Use of Semiconductor-Based Oligonucleotide Microarrays for Influenza A Virus Subtype Identification and Sequencing Michael J. Lodes,1†* Dominic Suciu,1† Mark Elliott,1 Axel G. Stover,1 Marty Ross,1 Marcelo Caraballo,1 Kim Dix,1 James Crye,1 Richard J. Webby,2 Wanda J. Lyon,3 David L. Danley,1 and Andrew McShea1 CombiMatrix Corporation, Mukilteo, Washington 982751; St. Jude Children’s Research Hospital, Memphis, Tennessee 381052; and AFIOH/SDE, Brooks City AFB, San Antonio, Texas 782353 Received 12 October 2005/Returned for modification 24 November 2005/Accepted 14 January 2006
In the face of concerns over an influenza pandemic, identification of virulent influenza A virus isolates must be obtained quickly for effective responses. Rapid subtype identification, however, is difficult even in wellequipped virology laboratories or is unobtainable in the field under more austere conditions. Here we describe a genome assay and microarray design that can be used to rapidly identify influenza A virus hemagglutinin subtypes 1 through 15 and neuraminidase subtypes 1 through 9. Also described is an array-based enzymatic assay that can be used to sequence portions of both genes or any other sequence of interest. In many situations, identification of the circulating subtype is not sufficient and a specific gene sequence is required. For example, genotype Z, the dominant avian H5N1 influenza virus genotype currently circulating in Vietnam and Thailand, contains a mutation that is associated with resistance to amantadine and rimantadine (8, 36). Antiviral therapies generally should be given within 48 h of the onset of illness to be effective against human influenza (36). Thus, rapid and specific identification of this subtype and the availability of accurate sequence information are crucial for proper treatment. Identification of influenza virus subtypes is routinely accomplished with viral detection (cell culture) and serological techniques, such as complement fixation, hemagglutination, hemagglutination inhibition assays, and immunofluorescence methods (3, 5, 25, 31). Traditional methods are generally effective but involve labor-intensive and highly trained personnel. Because of their speed, specificity, and sensitivity, genomic assays are ideal complements to serological assays for the identification of the genotype of an unknown specimen, especially in cases where antigenic tests are not specific enough to differentiate closely related groups (10, 20, 27, 29, 33, 37). Reverse transcription-PCR (RT-PCR) is widely used for virus identification (1, 14, 30). However, a positive amplification can be verified only by subsequent assays to elaborate sequence information. By overcoming this limitation, microarrays and biosensors have become valuable tools for viral discovery, detection, and genotyping (5, 6, 10, 15, 19, 20, 21, 28, 33, 34). Microarrays that contain several thousand different DNA sequences (probes) can theoretically identify several thousand different organisms. However, by using standard thermal hybridization, microarrays have their own issues, such as suboptimal hybridization conditions for all probes, mismatches that are difficult to detect (G-G or G-T, for example), or mismatches that reside in a context (GC-rich) that makes detection problematic. Even well-designed probes can display differences in maximal hybridization capacity of 2 orders of magnitude under different hybridization conditions (7); and thus, it is difficult to find one set of conditions that is optimal for all probes on an array (16, 32). New technologies must be developed to overcome the difficulties associated with tradi-
Influenza A virus is a negative-strand RNA virus with a segmented genome that can infect a broad range of animals, including humans. Identification of a virus subtype is typically by serological or molecular identification of the subtype of the viral hemagglutinin (HA) and neuraminidase (NA) genes. Viruses with any combination of the 16 HA (11) and 9 NA subtypes can infect aquatic birds, while fewer subtypes have been found to infect humans. However, interspecies transmission can occur after recombination or mixing of subtypes in birds or pigs (14, 22, 26). In addition, new human strains of virus can arise by reassortment to accomplish antigenic shift when two or more subtypes infect the same host (24, 35). Maintenance of a subtype in the human population can also occur by antigenic drift (35), which occurs when genetic mutations of the HA and NA genes create virions that escape immune surveillance. Reference, clinical, and military laboratories must evaluate antigenic drift in the common influenza virus strains circulating each year so that these changes can be addressed in vaccine development. Also, avian influenza virus isolates must be monitored on a worldwide basis to detect virulent isolates that have the potential to infect humans and produce a future epidemic or pandemic. These needs have led to the creation of a global surveillance program to monitor outbreaks (13). The goal of surveillance is to gather information on the influenza virus subtypes that are circulating in human and animal populations so that recommendations can be made on the content of vaccines for the next season. This information is important, because genetic changes in certain influenza virus subtypes can occur rapidly. For example, variability of the H3N2 subtype has required 19 changes in the vaccine component over 29 years (from 1972) (13). New antigenic variants that require revisions in vaccine components can arise with a frequency of one every 1 to 2 years; and thus, diagnostic assays that are sensitive, specific, and accurate are required (13). * Corresponding author. Mailing address: CombiMatrix Corporation, 6500 Harbour Heights Parkway, Suite 301, Mukilteo, WA 98275. Phone: (425) 493-2272. Fax: (425) 493-2010. E-mail:
[email protected]. † M.J.L. and D.S. contributed equally to this study. 1209
1210
LODES ET AL.
J. CLIN. MICROBIOL.
tional microarray thermal techniques and also to develop rapid and less expensive target labeling systems and, eventually, replace expensive slide scanning devices (9). Here we describe an improved assay that combines the sensitivity and specificity of enzymatic reactions with the costeffective strategy of using a labeled common oligonucleotide primer that is extended to the site of the match or mismatch. This assay system can be used to identify influenza A virus subtypes and sequence the subtype of interest and requires only a 1.0- to 1.5-h hybridization and enzymatic extensionligation step. MATERIALS AND METHODS Influenza virus subtype reference samples and unknown samples. Influenza viruses were isolated by using Madin-Darby canine kidney (MDCK) cells supplemented with 1 g/ml L-(tosylamido-2-phenyl) ethyl chloromethyl ketonetreated trypsin. Briefly, the samples were added to monolayers of MDCK cells and incubated for 1 h at 37°C to allow virus adsorption to the cells. The inoculum was decanted, Eagle’s minimum essential medium supplemented with 0.2% bovine serum albumin was added, and the monolayers were incubated for 3 to 5 days at 37°C. After cytopathic effects appeared, the presence of influenza virus was confirmed by using hemagglutination of chicken erythrocytes and RT-PCR against the HA gene. A panel of reference influenza A virus RNA samples for 15 HA and 9 NA subtypes was developed by conducting hemagglutination inhibition assays with a panel of reference antisera against HA subtypes 1 through 15 and NA subtypes 1 through 9. Reference virus sequences were confirmed for each HA and NA subtype. Viral RNA was extracted from the supernatants of cultures of infected cells by using the RNeasy mini kit (QIAGEN, Chatsworth, CA), according to the manufacturer’s instructions. Reverse transcription and PCR amplification were carried out under standard conditions by using influenza virus-specific primers. The PCR products were purified with QIAquick PCR purification kits (QIAGEN), and sequencing reactions were performed at the Hartwell Center for Bioinformatics and Biotechnology at St. Jude Children’s Research Hospital (Memphis, TN). Broad scan array probe design. Viral sequence data were obtained from the GenBank database and from the Influenza Sequence Database (23). For HA serotypes, 1,614 animal isolates and 1,937 human isolates were selected; and for NA serotypes, 552 human isolates and 831 animal isolates were selected. Both data sets were treated in the same way by use of a modification of the method of Wang et al. (34), in which probe uniqueness was based on subtype differences. For each sequence, nonoverlapping appended primers were made, tiling the entire sequence. These oligonucleotides were designed to have similar annealing stabilities, as judged by a nearest-neighbor thermodynamic model (2), and were designed to have a melting temperature (Tm) of 50°C. Probes that had significant secondary structures (Tm ⬎ 40°C) were taken out of the set. Finally, probes from only the first 500 bp of sequence were used (bp 50 to 500). After tiling and culling, 23,568 HA probes and 15,191 NA probes were left. Each sequence and probe was grouped and labeled by its serotype, and databases were generated from the compiled HA and NA sequences of the isolates. Probes were selected to be exclusive to a given subtype, as judged by pairwise BLASTN search (4). Figure 1 shows the overall scope of the design. It shows the number of sequences in the initial database, the number of probes designed, and the final number of probes selected for each subtype. A poly(T)10 spacer was added to the 3⬘ ends of all probes to avoid surface inhibition. Probe design files for array synthesis were generated with Layout Designer (CombiMatrix Corp., Mukilteo, WA). Oligonucleotide microarrays were synthesized on semiconductor microchips containing over 12,000 independently addressable electrodes (CustomArray; CombiMatrix Corp.). Sequencing chip probe design. Sequencing chips were designed by using sequences that were representative of each subtype of interest. For this study, we chose sequences that represented subtype H9N2 (GenBank accession numbers AF156378 and AF222654). Probes were tiled by one nucleotide to cover the sequences of interest, and probes were designed to have a Tm of approximately 55°C. Four probes were designed for each nucleotide to be examined and were identical except for the 5⬘ nucleotides, which consisted of either an A, C, T, or G (Fig. 2). This chip design was hybridized with both HA and NA gene targets prepared from influenza A virus isolates A/Quail/Hong Kong/G1/97 (H9N2) and A/Chicken/Hong Kong/NT17/99 (H9N2). Target preparation. A 500- to 600-bp amplicon that contained the 5⬘ end of either the HA or NA gene was amplified from first-strand cDNA. First-strand
FIG. 1. Number of sequences in the initial database, number of probes designed, and final number of probes selected for each subtype. BinStr, subtype identity for each probe after BLASTN searches.
cDNA was produced from influenza virus RNA (Table 1) with SuperScript II reverse transcriptase (Invitrogen, Carlsbad, CA) and a tagged universal primer, TAATACGACTCACTATAGGAGCAAAAGCAGG (the tag sequence is underlined, and the universal influenza virus sequence is in boldface). Amplifications were accomplished with a universal forward tag primer GCATCCTAATACGA CTCACTATAGG and a specific reverse primer or, in the case of unknown samples, a pool of 100 M primers and the first-strand cDNA from reverse transcription. Specific reverse primers for reference isolate amplification were selected to produce an amplicon of approximately 500 to 600 bp. Primer sequences for HA (23 primers) and NA (15 primers) pools were designed by following the probe design procedure presented above. However, sequences in the 5⬘ 500- to 600-bp region of the HA and NA genes that were common to multiple subtypes (H1, H2, H3, H4, H5, H7, and H9 and N1, N2, N3, N7, and N8) were chosen. The reaction conditions consisted of a 5-min denaturation at 94°C, followed by 40 cycles of a 30-s 94°C denaturation step, a 30-s 55°C annealing step, and a 30-s 72°C extension; and finally, a 10-min extension at 72°C was performed. The resulting PCR product was cleaned with a QIAGEN QIAquick PCR purification kit, and a second, one-way amplification resulted in a singlestranded target. One-way amplifications were accomplished with the respective specific reverse primer or primer pool only and 2 to 5 l of cleaned amplification product from amplification 1. The reaction conditions were similar to those described above, with 50 cycles of amplification. The resulting tagged, singlestranded target was purified with a QIAGEN QIAquick PCR purification kit. For standard hybridizations, a biotinylated, single-stranded target was produced as described above for one-way PCR; however, biotin-14-dCTP (Invitrogen) was incorporated into the product during amplification. Hybridizations and enzymatic reactions. The oligonucleotide probes synthesized on the chip were 5⬘ phosphorylated with T4 polynucleotide kinase (New England Biolabs, Beverly, MA) for 30 min at 37°C. The single-stranded target was heated to 95°C for 10 min and placed on ice, and T4 ligase buffer was added to a 1⫻ concentration. A 5⬘-labeled T7 oligonucleotide (Cy3 or Cy5; Integrated DNA Technologies, Inc., Coralville, IA) was added to a concentration of 1 M to provide a signal for detection and a primer for extension. This solution was added to the chip array hybridization chamber, and the chip was incubated at 45°C for 1 h. After the array was washed with 2⫻ phosphate-buffered saline (PBS)–0.5% Tween 20 (PBST), 2⫻ PBS, and 1⫻ Escherichia coli ligase buffer, a mixture consisting of 1⫻ E. coli ligase buffer, 0.2 mM deoxynucleoside triphosphates, and 20 units each of AmpliTaq DNA polymerase, Stoffel fragment (Applied Biosystems, Foster City, CA), and E. coli ligase was added to the array
VOL. 44, 2006
INFLUENZA A VIRUS SUBTYPE IDENTIFICATION AND SEQUENCING
1211
TABLE 1. Reference antigens for HA and NA subtypes Serotype and subtype
HA H1 H2 H3 H4 H5 H6 H7 H8 H9 H9 H10 H11 H12 H13 H14 H15 NA N1 N2 N2 N2 N3 N4 N5 FIG. 2. Diagram of the strategy for microarray DNA sequencing. Four target-specific (antisense) probes, one set for each base of desired sequence, are identical except for the 5⬘-terminal residue (the probe terminates in bases A, C, G, and T). After hybridization of the probes, target DNA, and Cy3-labeled primer, a mixture of enzymes, buffer, and deoxynucleoside triphosphates is added to the array. The labeled primer is extended and ligated to the matching probe, and finally, the array is washed with 0.1 N NaOH and scanned for fluorescence. High fluorescence intensity indicates that the labeled primer is covalently bound to the probe that is a perfect match to the target. The sequence can be extracted by joining bases that are associated with an elevated signal.
and the array was incubated at 37°C for 30 min. The array was washed twice for 2 min each time with 0.1 N NaOH at room temperature and viewed with a slide reader (GenePix 4000B; Axon Instruments, Molecular Devices, Sunnyvale, CA). Standard hybridization conditions for the biotinylated target included preblocking for 15 min at 45°C with 6⫻ SSPE (1⫻ SSPE is 0.18 M NaCl, 10 mM NaH2PO4, and 1 mM EDTA [pH 7.7]) containing 0.05% Tween 20 (SSPET), 2.0 mM EDTA, 5⫻ Denhardt’s solution, and 0.05% sodium dodecyl sulfate, followed by hybridization of the biotinylated target in preblock solution for 1 h at 45°C. The arrays were then washed once for 5 min with 6⫻ SSPET at 45°C and then for 30 s each with 3⫻ SSPET, 0.5⫻ SSPET, 2⫻ PBST, and 2⫻ PBS at room temperature. The hybridized array was then blocked with 5⫻ casein-PBS buffer (BioFX Laboratories, Owings Mills, MD) for 15 min at room temperature and labeled for 30 min with Cy5-streptavidin (GE Healthcare, Amersham Biosciences, Piscataway, NJ) diluted 1:1,000 in 5⫻ casein-PBS buffer. The arrays were scanned after they were washed twice with 2⫻ PBST and twice with 2⫻ PBS. Data analysis. The microarrays were scanned for probe and target fluorescence intensity with a GenePix 4000B optical scanner (Molecular Devices). Image intensities were quantified with Microarray Imager (CombiMatrix Corp.) and graphed with Microsoft Excel software. Subtype identification was accomplished by averaging HA and NA subtype intensity values and then plotting the results in Excel software. HA and NA subtype DNA sequence information was generated from sequencing array intensity data with an Excel routine designed to interrogate units of 4 datum points and then associate the most intense signal with the nucleotide represented by that probe. Sequence strings were then used to search the GenBank nonredundant database with the BLASTN program.
N6 N7 N8 N9
Virus
Subtype
GenBank accession no.
A/PR/8/34 A/Mallard/NY/6750/78 A/Duck/Ukraine/1/63 Duck/Czech/56 A/Ruddy Turnstone/DE/244/91 A/Shearwater/Australia/1/72 A/Equine/Prague/1/56 A/Turkey/Ontario/6118/68 A/Chicken/HK/NT16/99 A/Quail/Hong Kong/G1/97 A/Chicken/Germany/N/49 A/Duck/England/56 A/Mallard/Alberta/60/76 A/Gull/MD/704/77 A/Mallard/Gurjev/263/82 A/Duck/Australia/341/83
H1N1 H2N2 H3N8 H4N6 H5N2 H6N5 H7N7 H8N4 H9N2 H9N2 H10N7 H11N6 H12N5 H13N6 H14N5 H15N8
ISDN13422 L11137 V01087 D90302 U05330 D90303 X62552 D90304 AF222608 AF156378 M21647 D90306 D90307 M26090 M35997 L43916
A/NWS-A/NJ/76 A/Chicken/HK/NT16/99 A/Quail/Hong Kong/G1/97 A/Duck/Hong Kong/Y280/97 A/NWS-A/Tern/South Africa A/NWS-A/Turkey/Ontario/ 6118/68 A/NWS-A/Shearwater/ Australia/72 A/NWS-A/Duck/England/56 A/Equine/Prague/1/56 A/Duck/Chabarovsk/1610/72 A/Duck/Memphis/564/74
H1N1 H9N2 H9N2 H9N2 H1N3 H1N4
M27970 AF222654 AF156396 AF156394 NA K01013
H1N5
M24740
H1N6 H7N7 H3N8 H11N9
NA U85989 L06573 #N/A
Statistical analysis of influenza virus subtype. The data used for analysis comprised measurements carried out with 15 HA subtypes represented by 7,354 probes and 9 NA subtypes represented by 4,646 probes. Data were collected in three steps: after hybridization, after an enzymatic method was used to covalently attach the perfectly matched sequence at each probe, and after a stringent wash procedure was performed to remove spurious signals. In total, there were 177,264 observations. Each observation comprised four categorical variables, hybridization label (hyblabel), hybridization type (wash), protein type (code prot), and predicted bound sequence (imlabel), and seven continuous variables, sum of intensity (sumhit), average intensity (avg), standard deviation (standev), predicted number of BLAST hits (prednumhit), probes number of probes not hit by BLAST (predno), energy of hybridization (dG), and correlation of dG with average intensity (correl). The data were normalized and reported as standard scores (Z values). A binary variable (call) was coded as a correct influenza virus association (a value of 1) or an incorrect influenza virus association (a value of 0).
RESULTS Broad-scan hybridization. The broad-scan microarray was designed to identify influenza A virus subtypes based on unique probe sets for HA subtypes 1 through 15 and NA subtypes 1 through 9. Figure 3A shows the pattern obtained from the hybridization of the H2 and N2 amplicons to the array. Figure 3B shows that much of the background hybridization of the target to the probes was eliminated by the enzymatic extension and ligation steps and stringent washing. The hybridization patterns correspond to the probe design on the array (Fig. 3C). When HA and NA probe intensity data were sorted and graphed to show the relationship between signal intensity and correct subtype hybridization, the majority
1212
LODES ET AL.
J. CLIN. MICROBIOL.
FIG. 3. Graphed probe intensity values for an array hybridized with an H2N2 target. When the data were extracted immediately after hybridization (A), abundant cross hybridization was visible. After enzymatic reactions and stringent washing, the background signal was drastically reduced to expose specific hybridizations (B). Regions of the array that were populated with subtype-specific probes are shown (C). Data were sorted by gene (HA or NA), and the greatest intensities were identified (D).
of the high-intensity signal could be attributed to binding to the correct probes (Fig. 3D). For example, data for H2N2 hybridization show that 43 of the first 50 H2 probes and 50 of the first 50 N2 probes hybridized to the correct target. The amplification of unknown human influenza A virus isolates was accomplished with a pool of reverse primers at a concentration of 100 M each. Two pools were used: one pool for the HA gene and one pool for the NA gene. All unknown targets were successfully amplified, and when they were hybridized to the arrays, they were identified as subtype H3N2 (Fig. 4). The hybridization patterns of the three unknown samples indicated that the HA and NA sequences from samples 2 and 3 were very similar. The H3 hybridization pattern and intensity differed in sample 1 and showed a potential sequence relationship to H1 (data not shown). Very distinct hybridization patterns were also seen for hemagglutinins H5 from H5N2 (GenBank accession number U05330) and H5N1 (current bird influenza strain) (data not shown), indicating that the arrays could potentially be used to identify subgroupings of influenza virus subtypes.
Analysis of influenza virus HA and NA subtypes. Visual HA and NA subtype determinations were made in two ways: (i) by marking the array images with the location of subtype-specific probes and (ii) by graphing of the subtype probe intensities. Correct subtype-specific calls could be made by aligning areas of intense signal with the respective subtype-specific probes (Fig. 5). In addition, bar graphs of mean subtype intensity values could be used to predict the correct HA and NA subtypes (Fig. 6 and 7). All 15 HA subtypes generated from reference isolates (Fig. 6) could be correctly identified with the array, as could all 9 NA subtypes (Fig. 7). We also showed in mixing experiments that two different subtypes, present at a 1 to 10 dilution, could be detected on the same array (array 1, 1% H8, 9% H12, 1% N9, and 9% N8; array 2, 9% H8, 1% H12, 9% N9, and 1% N8; data not shown). In several cases, the mean signals for specific subtypes showed inverse relationships with the number of HA- and NA-specific probes. Intense positive signals within subtype-specific groups of probes numbering approximately 1,000 or more were averaged due to the large number of lower signals (Fig. 8). This effect can be
VOL. 44, 2006
INFLUENZA A VIRUS SUBTYPE IDENTIFICATION AND SEQUENCING
FIG. 4. Identification of unknown human influenza A virus isolates. Three influenza A virus isolates of unknown subtype were amplified with a universal forward and pooled reverse primers, followed by one-way amplification with the same pooled reverse primers; and then the isolates were hybridized to the broad-scan influenza virus array. After the array was scanned, the extracted fluorescence intensity data for each subtype probe group were averaged and plotted. The correct subtypes were indicated by the highest average signal. The background signal (BG) was calculated from the average signal of the quality control probes.
reduced or eliminated by clustering similar sequences within a subtype and then taking the mean of each cluster or by reducing the number of probes for each subtype to between 50 and 150 of the most universal subtype sequences. Statistical analysis. Statistical analyses consisted of descriptive statistics and distribution analysis, univariate correlation, analysis of variance, logistic regression, and selection of the best marker set. The histograms of the average intensities of the probes (Z scores) showed wide separation between the background (negative) and signal measurements (positive) (Fig. 9). Inspection of the distributions showed normality in the independent variable correlation. Analysis of variance indicated that the statistically significant independent variables were average intensity, standard deviation, the interaction of dG and average intensity, protein type, and hybridization method. The average intensity was the single most significant independent variable for the identification of the viral subtype and thus was used for analysis. The correlation matrix showed that the most significant pairwise correlations were between the average intensity and the variables sumhits, standard deviation, call, and correl. Influenza subtype sequencing. Viral HA and NA subtype sequencing was accomplished with a simple and rapid enzymatic assay. A mixture including DNA polymerase and ligase
1213
was used to extend labeled common primers on singlestranded target DNA to the 5⬘ end of the hybridized HA and NA sequencing probes and then ligate the extended primer to probes that matched the target sequence. A 0.1 N NaOH wash removed any unligated signal; and after the array was scanned, intensity data were exported to an Excel worksheet. For example, a probe with a 5⬘-terminal match to the target DNA (e.g., G) was ligated to the extended, labeled primer, while probes with terminal mismatches (e.g., T, C, and A) were not ligated and the labeled target was removed upon washing. Sequence information was extracted with a routine designed to associate the correct base with the highest signal from sets of four probes that were tiled by 1 nucleotide to cover the sequence of interest (Fig. 2). Hybridization to the array probes was generally uniform, as probes with matches and mismatches are identical except for the 5⬘ nucleotides. An example of neuraminidase subtype sequencing on an array designed to interrogate 100 nucleotides of both the HA and NA genes is shown in Fig. 10. Here we have sequenced target DNA that was identical to the array probe sequences. Probe sequences for the HA gene were generated from the A/Chicken/Hong Kong/G1/97 (H9N2) HA gene sequence, and probes for the NA gene were generated from the A/Chicken/Hong Kong/NT16/99 (H9N2) NA gene sequence. Figure 10A illustrates the probe signal intensity data for the NA gene, while Fig. 10B shows the results of a search of the GenBank database with the sequence generated from this array. With a 12,000-point array, it is possible to sequence up to 3,000 bases. Here we sequenced approximately 500 to 600 nucleotides, depending upon the length of the target DNA, of both the HA and the NA genes with an accuracy of over 90%. Even when the target DNA was synthesized from isolates that differed in their HA and NA sequences (HA sequence from A/Chicken/Hong Kong/ NT16/99 [H9N2] and NA sequence from A/Chicken/Hong Kong/ G1/97 [H9N2]), the resulting sequence was still over 90% correct. The intensity of the hybridization signal for each probe was affected by the respective secondary structure of the probe, which resulted in a weaker hybridization and a weaker resulting signal. However, the accuracy of base calling was not significantly affected by the secondary structure of the probe. In the sequence from approximately 500 nucleotides of both the HA and the NA genes, approximately 4.3% of the base calls were inaccurate (45 of 1,043 nucleotides). Inaccurate base calls were predominately due to the ligated mismatches A-G (1.4%), A-A (1.0%), T-G (0.6%), G-G (0.3%), and T-T (0.2%). These mismatches are generally more difficult to detect because of their low delta G values. Inaccurate base calls that were due to mismatches with high delta G values were relatively less common, with A-C, C-C, and C-T mismatches accounting for 0.1, 0.1, and 0% of the missed calls, respectively. DISCUSSION While traditional assays for influenza virus detection and typing represent the “gold standard,” they alone cannot meet the future needs for rapid, sensitive, specific, and simple methods. For example, although immunological methods are excellent for determining subtypes, they do not give detailed genetic information or information when antigenic shifts occur. RTPCR techniques depend on specific primers, which may fail when corresponding viral sequences mutate. We are develop-
1214
LODES ET AL.
J. CLIN. MICROBIOL.
FIG. 5. Visual identification of HA and NA subtypes for H2N2 (top) and H15N9 (bottom). Arrays were hybridized with reference subtype target, labeled primer was extended and ligated, and the arrays were finally washed and scanned. The schematic below the arrays shows regions populated with subtype-specific probes.
ing a semiconductor-based oligonucleotide array technology that can be used with fluorescent labels and traditional optical scanning devices or used as a biodetector using electrochemical techniques for analysis (9). This platform is extremely flexible, which allows array designs to be rapidly and easily modified and synthesized, thus permitting oligonucleotides of interest to be tested empirically. In addition, the ability to use electrochemical detection with semiconductor microchips eliminates the need for expensive optical scanning equipment (9). Clearly, the cost per assay and the associated costs of the equipment needed to run and analyze the assay are critical for diagnostic applications such as these. The instrumentation used to detect electrochemical signals generated on semiconductor chips will be helpful for reducing these laboratory expenses. Also, when the target sample that will be hybridized in the array is prepared, an end-point measurement in the PCR can be used as a gateway to limit the choice of samples to be run (i.e., assays are run only if the PCR result is positive). For this study we have developed an influenza A virus array that contains specific probes for each of the 15 HA subtypes and 9 NA subtypes and then hybridized targets generated from all 15 HA and 9 NA subtypes by RT-PCR amplification of influenza A reference virus RNA. The array was developed with nonoverlapping probes with similar annealing stabilities that were generated from the influenza virus sequence database, which consists of over 3,000 HA sequences and over 1,000 NA sequences. Subtype-specific probes were then se-
lected from a pool of over 23,000 HA sequences and 15,000 NA sequences and then compared to the database to ensure that each probe was unique to the respective subtype and would hybridize to the maximum number of variant sequences. Our goal is to have maximum coverage of all known influenza virus strains and provide useful information even when a novel strain is encountered. By shifting the burden of bioinformatics to the beginning of the process rather than relying on sequencing and a subsequent BLAST search and bioinformatic analysis, this system can be used by less sophisticated users or users in the field, where access to complex analysis tools may be limited. All HA and NA subtypes were correctly identified with this assay platform. Weak average intensity profiles for some subtypes were due to dilution of a positive signal by subtype probes that were not hybridized or that weakly hybridized to the subtype target. In these cases the average signals will be reduced by dilution of the signal. By eliminating unnecessary or cross-reacting probes and limiting the probe number to approximately 100 to 200 of the most universal sequences for each subtype, we can correct this artifact of the array. This approach would also reduce the number of probes so that multiple assays can be run on an array that is divided into several sectors. This again could be used to provide cost savings per assay with a minimal loss in capability. A second approach is to subdivide the probes for each subtype into similar clusters and thus concentrate the positive probes, which would
VOL. 44, 2006
INFLUENZA A VIRUS SUBTYPE IDENTIFICATION AND SEQUENCING
1215
FIG. 6. Mean values of probe hybridization intensities for hemagglutinin reference subtypes 1 through 15. For each individual graph, probe signal intensity values are indicated on the left and specific subtype probes are indicated at the bottom. An asterisk indicates that standard hybridizations with biotinylated targets were used instead of the extension-ligation method and that labeling was with streptavidin-Cy5.
increase the average signal for a positive identification. These approaches would also produce positive probe sequences that are tiled across the viral sequence of interest and should result in an approximation or a best-fit sequence for the unknown subtype. In addition, with the subtype sequencing array and protocol presented here, we are able to sequence approximately 500 or more nucleotides of the HA and NA genes after the subtypes have been identified with our broad-scan subtyping array. Our strategy for sequencing probe selection is based upon several criteria, including Tm, length, and location in the HA and NA gene sequences. Published structural analysis and antigenic epitope mapping indicate that the HA receptor-binding structure and surface antigenic epitopes are predominantly located within the 5⬘ 720 bp (12, 17, 18).
With this sequencing assay, approximately 95% or more of the bases were accurately called (998 of 1,043 bases). Miscalls were predominately due to strong secondary structures, which can be predicted and avoided before the assay is carried out, and the ligated mismatches A-G, A-A, T-G, G-G, and T-T. These mismatches are generally more difficult to detect because of their low delta G values. For example, sequence errors resulting from A-G mismatches represented 1.4% of the total errors or 6% of the potential A-G mismatches (15 of 251). Strong secondary structures (hairpins and palindromes) interfere with probe-target hybridization and result in a reduced signal. These sequencing arrays will eventually contain either a consensus subtype sequence or a known subtype sequence that lacks a high degree of secondary structure. Replicate probes
1216
LODES ET AL.
J. CLIN. MICROBIOL.
FIG. 7. Mean values of probe hybridization intensities for neuraminidase reference subtypes 1 through 9. For each individual graph, probe signal intensity values are indicated on the left, and specific subtype probes are indicated at the bottom. An asterisk indicates that standard hybridizations were used and that labeling was with streptavidin-Cy5.
for each base of sequence should reduce artifacts due to difficult mismatches by averaging out the mismatch signal. Because microarray-based sequencing is based on probe-target hybridization, the target sequence cannot diverge significantly from the arrayed sequence. However, under nonstringent hybridization conditions, internal mismatches between the probe and the target sequences do not have as great an impact on hybridization and sequencing. This technique is best suited for the sequencing of similar viruses, such as seasonal quasispecies complexes, or for surveys for mutations in an isolate over time. We have shown that influenza A virus HA subtypes 1 through 15 and NA subtypes 1 through 9 can be rapidly and specifically identified and sequenced by using oligonucleotide microarrays by a protocol that requires less than 1 h for target hybridization. This assay precludes the need for traditional target labeling systems and integrates an enzyme-based procedure that overcomes many of the shortfalls of traditional thermal hybridizations, such as optimal hybridization conditions and difficult mismatch detection (28). However, the influenza virus subtyping array is also compatible with traditional labeling, hybridization, and washing protocols that can be completed within 1.0 to 1.5 h. These methods lack the enzymatic step and have slightly reduced single-base mismatch discrimination. However, use of this method allows the array to be “stripped” and reused multiple times since there is no covalent coupling of the label to the array. In addition, this array can also benefit from the sectoring approach mentioned above to further bring costs to a minimum. This platform is a viable alternative to RT-PCR because of the combination of assay speed; array sectoring, which would allow multiple assays on one array; the potential to strip and reuse the chip up to five
FIG. 8. Comparison of the number of probes per HA (A) and NA (B) subtype with the average signal intensity for each subtype. In many cases, the two values show an inverse relationship due to the averaging of large numbers of probe intensity values.
times (for conventional hybridizations); and the adaptability to inexpensive electrochemical scanning devices. The target sample preparation system used here is similar to that used for standard RT-PCR-based methods, except that it uses a very redundant consensus priming system that maximizes the chance that novel strains of influenza virus will be amplified and thus minimizes false-negative results. The chip contains multiple probes that correspond to key distinguishing elements of each HA or NA subtype. It is laid out in a visual pattern so that it can be read visually for quick identification as well as analyzed with more advanced algorithms. The system can also identify rare versus more commonly seen genetic variants based on the organization of subtype-specific probes (i.e., the probes are arranged in order from more universal to more specific for each subtype). Rapid identification of the HA and NA subtypes followed by sequencing from critical regions of the HA and NA genes, such as surface antigenic epitopes, will significantly decrease the time and cost for the identification of potential lethal virus strains. This study and studies in other laboratories are demonstrating that the detection, identification, and sequencing of viral genomes in samples by using the oligonucleotide microarray technology in combination with electrochemical detection is a viable rapid approach that can complement traditional methods.
FIG. 9. Combined histogram of normalized, average probe intensity values (Z scores). The correct subtype prediction is indicated (positive), and all negative values are bracketed (negative). Data statistics are a mean of ⫺0.01785, a standard deviation of 0.12136, and a sample number of 177,264.
FIG. 10. Example of sequencing of the neuraminidase gene for target DNA from H9N2 isolate HK/NT16/99 with probes designed from the same sequence (A/Chicken/Hong Kong/NT16/99 [H9N2]). Graphed probe signal intensities (A) and the results of a BLASTN search of the GenBank database with the resulting DNA sequence (B) are shown. The sequence was obtained with software designed to interrogate the signal intensities for each set of four probes and to associate a specific nucleotide with the highest signal and then string the sequence together in series. 1217
1218
LODES ET AL.
J. CLIN. MICROBIOL.
ACKNOWLEDGMENTS This work was supported by CombiMatrix Corporation, and the U.S. Air Force Institute for Operational Health. Work in the laboratory of R.J.W. is funded in part by grant AI95357 from the National Institute of Allergy and Infectious Disease and by the American Lebanese Syrian Associated Charities. We thank Philip Miller for help with statistical analysis of data and Luisa Dougan for microarray quality control tests. REFERENCES 1. Adeyefa, C. A., K. Quayle, and J. W. McCauley. 1994. A rapid method for the analysis of influenza virus genes: application to the reassortment of equine influenza virus genes. Virus Res. 32:391–399. 2. Allawi, H. T., and J. SantaLucia, Jr. 1999. Nearest-neighbor thermodynamics and NMR of DNA sequences with internal A.A, C.C, G.G, and T.T mismatches. Biochemistry 38:3468–3477. 3. Allwinn, R., W. Preiser, H. Rabenau, S. Buxboum, M. Sturmer, and H. W. Doerr. 2002. Laboratory diagnosis of influenza—virology or serology? Med. Microbiol. Immunol. (Berlin) 191:157–160. 4. Altschul, S. F., T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25:3389–3402. 5. Amano, Y., and Q. Cheng. 2005. Detection of influenza virus: traditional approaches and development of biosensors. Anal. Bioanal. Chem. 381:156– 184. 6. Baxi, M. K., S. Baxi, A. Clavijo, K. M. Burton, and D. Deregt. Microarraybased detection and typing of foot-and-mouth disease virus. Vet. J., in press. 7. Bodrossy, L., and A. Sessitsch. 2004. Oligonucleotide microarrays in microbial diagnosis. Curr. Opin. Microbiol. 7:245–254. 8. Chen, H., G. Deng, Z. Li, G. Tian, Y. Li, P. Jiao, L. Zhang, Z. Liu, R. G. Webster, and K. Yu. 2004. The evolution of H5N1 influenza viruses in ducks in southern China. Proc. Natl. Acad. Sci. USA 101:10452–10457. 9. Dill, K., D. D. Montgomery, A. L. Ghindilis, K. R. Schwarzkopf, S. R. Ragsdase, and A. V. Oleinikov. 2004. Immunoassays based on electrochemical detection using microelectrode arrays. Biosensors Bioelectronics 20:736– 742. 10. Ellis, J. S., and M. C. Zambon. 2002. Molecular diagnosis of influenza. Rev. Med. Virol. 12:375–389. 11. Fouchier, R. A., V. Munster, A. Wallensten, T. M. Bestebroer, S. Herfst, D. Smith, G. F. Rimmelzwaan, B. Olsen, and A. D. Osterhaus. 2005. Characterization of a novel influenza A virus hemagglutinin subtype (H16) obtained from black-headed gulls. J. Virol. 79:2814–2822. 12. Gulati, U., C.-C. Hwang, L. Venkatramani, S. Gulati, S. J. Stray, J. T. Lee, W. G. Laver, A. Bochkarev, A. Zlotnick, and G. M. Air. 2002. Antibody epitopes on the neuraminidase of a recent H3N2 influenza virus (A/Memphis/31/98). J. Virol. 76:12274–12280. 13. Hay, A., V. Gregory, A. R. Douglas, and Y. P. Lin. 2001. The evolution of human influenza viruses. Phil. Trans. R. Soc. Lond. Ser. B 356:1861–1870. 14. Hoffmann, E., J. Stech, Y. Guan, R. G. Webster, and D. R. Perez. 2001. Universal primer set for the full-length amplification of all influenza A viruses. Arch. Virol. 146:1–15. 15. Ivshina, A. V., G. M. Vodeiko, V. A. Kuznetsov, D. Volokhov, R. Taffs, V. I. Chizhikov, R. A. Levandowski, and K. M. Chumakov. 2004. Mapping of genomic segments of influenza B virus strains by an oligonucleotide microarray method. J. Clin. Microbiol. 42:5793–5801. 16. Kajiyama, T., Y. Miyahara, L. J. Kricka, P. Wilding, D. J. Graves, S. Surrey, and P. Fortina. 2003. Genotyping on a thermal gradient DNA chip. Genome Res. 13:467–475. 17. Kaverin, N. V., I. A. Rudneva, N. A. Ilyushina, N. L. Varich, A. S. Lipatov, Y. A. Smirnov, E. A. Govorkova, A. K. Gitelman, D. K. Lvov, and R. G. Webster. 2002. Structure of antigenic sites on the haemagglutinin molecule of H5 avisn influenza virus and phenotypic variation of escape mutants. J. Gen. Virol. 83:2497–2505.
18. Kaverin, N. V., I. A. Rudneva, N. A. Ilyushina, A. S. Lipatov, S. Krauss, and R. G. Webster. 2004. Structural differences among hemagglutinins of influenza A virus subtypes are reflected in their antigenic architecture: analysis of H9 escape mutants. J. Virol. 78:240–249. 19. Kessler, N., O. Ferraris, K. Palmer, W. Marsh, and A. Steel. 2004. Use of the DNA Flow-Thru Chip, a three-dimensional biochip, for typing and subtyping of influenza viruses. J. Clin. Microbiol. 42:2173–2185. 20. Korimbocus, J., N. Scaramozzino, B. Lacroix, J. M. Crance, D. Garin, and G. Vernet. 2005. DNA probe array for the simultaneous identification of herpesviruses, enteroviruses, and flaviviruses. J. Clin. Microbiol. 43:3779– 3787. 21. Li, J., S. Chen, and D. H. Evans. 2001. Typing and subtyping influenza virus using DNA microarrays and multiplex reverse transcriptase PCR. J. Clin. Microbiol. 39:696–704. 22. Lipatov, A. S., E. A. Govorkova, R. J. Webby, H. Ozaki, M. Peiris, Y. Guan, L. Poon, and R. G. Webster. 2004. Influenza: emergence and control. J. Virol. 78:8951–8959. 23. Macken, C., H. Lu, J. Goodman, and L. Boykin. 2001. The value of a database in surveillance and vaccine selection, p. 103–106. In. A. D. M. E. Osterhaus, N. Cox, and A. W. Hampson (ed.), Options for the control of influenza IV. Elsevier Science, Amsterdam, The Netherlands. 24. Mizuta, K., N. Katsushima, S. Ito, K. Sanjoh, T. Murata, C. Abiko, and S. Murayama. 2003. A rare appearance of influenza A(H1N2) as a reassortant in a community such as Yamagata where A(H1N1) and A(H3N2) co-circulate. Microbiol. Immunol. 47:359–361. 25. Palmer, D. F., M. T. Coleman, W. R. Dowdle, and G. C. Schild. 1975. Advanced laboratory techniques for influenza diagnosis, p. 51–52. Immunology series no. 6. U.S. Department of Health, Education, and Welfare, Washington, D.C. 26. Scholtissek, C., H. Burger, O. Kistner, and K. F. Shortridge. 1985. The nucleoprotein as a possible major factor in determining host specificity of influenza H3N2 viruses. Virology 147:287–294. 27. Schweiger, B., I. Zadow, and R. Heckler. 2002. Antigenic drift and variability of influenza viruses. Med. Microbiol. Immunol. (Berlin) 191:133–138. 28. Sengupta, S., K. Onodera, A. Lai, and U. Melcher. 2003. Molecular detection and identification of influenza viruses by oligonucleotide microarray hybridization. J. Clin. Microbiol. 41:4542–4550. 29. Taubenberger, J. K., and S. P. Layne. 2001. Diagnosis of influenza virus: coming to grips with the molecular era. Mol. Diagn. 6:291–305. 30. Templeton, K. E., S. A. Scheltinga, M. F. C. Beersma, A. C. M. Kroes, and E. C. J. Claas. 2004. Rapid and sensitive method using multiplex real-time PCR for diagnosis of infections by influenza A and influenza B viruses, respiratory syncytial virus, and parainfluenza viruses 1, 2, 3, and 4. J. Clin. Microbiol. 42:1564–1569. 31. Ueda, M., A. Maeda, N. Nakagawa, T. Kase, R. Kubota, H. Takakura, A. Ohshima, and Y. Okuno. 1998. Application of subtype-specific monoclonal antibodies for rapid detection and identification of influenza A and B viruses. J. Clin. Microbiol. 36:340–344. 32. Urakawa, H., S. El Fantroussi, H. Smidt, J. C. Smoot, E. H. Tribou, J. J. Kelly, P. A. Noble, and D. A. Stahl. 2003. Optimization of single-base-pair mismatch discrimination in oligonucleotide microarrays. Appl. Environ. Microbiol. 69:2848–2856. 33. Wang, D., L. Coscoy, M. Zylberberg, P. C. Avila, H. A. Boushey, D. Ganem, and J. L. DeRisi. 2002. Microarray-based detection and genotyping of viral pathogens. Proc. Natl. Acad. Sci. USA 99:15687–15692. 34. Wang, D., A. Urisman, Y.-T. Liu, M. Springer, T. G. Ksiazek, D. D. Erdman, E. R. Mardis, M. Hickenbotham, V. Magrini, J. Eldred, J. P. Latreille, R. K. Wilson, D. Ganem, and J. L. DeRisi. 2003. Viral discovery and sequence recovery using DNA microarrays. PLoS Biol. 1:257–260. 35. Webby, R. J., and R. G. Webster. 2001. Emergence of influenza A viruses. Phil. Trans. R. Soc. Lond. Ser. B 356:1815–1826. 36. Yuen, K. Y., and S. S. Y. Wong. 2005. Human infection by avian influenza A H5N1. Hong Kong Med. J. 11:189–199. 37. Zou, S. 1997. A practical approach to genetic screening for influenza virus variants. J. Clin. Microbiol. 35:2623–2627.