The conjunction of high resolution genetic maps based on (CA)n microsatellite markers (1) and fluorescent genotyping (2) has led to research programs which ...
540–541
1996 Oxford University Press
Nucleic Acids Research, 1996, Vol. 24, No. 3
Correction of some genotyping errors in automated fluorescent microsatellite analysis by enzymatic removal of one base overhangs Frédéric Ginot*, Isabelle Bordelais, Simon Nguyen and Gabor Gyapay Généthon, 1 Rue de l’Internationale, BP 60, 91002 Evry Cedex, France Received September 19, 1995; Revised and Accepted December 7, 1995
The conjunction of high resolution genetic maps based on (CA)n microsatellite markers (1) and fluorescent genotyping (2) has led to research programs which require the determination of hundreds of thousands of genotypes. However, the migration profile of a (CA)n microsatellite marker after PCR (polymerase chain reaction) is often complicated because of slippage of Taq polymerase during PCR, and because of the ability of this polymerase to add an extra base at the end of the amplified fragments in a template dependent manner (3). In the traces generated by a fluorescent sequencer, these phenomena generate spurious bands as shown in Figure 1. These extra bands are usually referred to as ‘shadow bands’ (4). This complicated pattern is relatively easily recognized by an experienced eye, but may mislead the software which automatically determines the sample alleles (we use ABI 373A sequencers with 24 cm gels for the electrophoresis, and the ABI software Genescan and Genotyper for peak sizing and allele calling respectively). Because of this, the operator must check visually all the traces and the allele calls made by the software. This is time-consuming, and prone to human error after a few hundred checks. We attempted therefore to eliminate most of the allele calling errors by eliminating the extra bases added by the Taq polymerase. We tried to do this by using Pfu polymerase either during or after PCR, or by using T4 DNA polymerase after PCR. Pfu polymerase is thermostable and possesses a proofreading activity (3). In some cases, however, the extra base added by the Taq polymerase was not completely removed by this enzyme (Fig. 2A and B). This phenomenon is probably due to competition between the extendase activity of the Taq polymerase and the exonuclease activity of the Pfu polymerase as both enzymes are active at 72C. On the contrary, and as suggested by others (5), a T4 polymerase treatment, similar to that conventionally used to make blunt double-stranded linear DNA fragments, completely removed the bases added by the Taq polymerase (without exception so far; Fig. 2C). The simplification of the electrophoresis pattern is sometimes spectacular (Fig. 3). The use of Pfu polymerase alone during PCR avoids the problem of the extra base addition (3), and was used successfully in fluorescent genotyping (6). However, PCR efficiency is lower than when using Taq polymerase, making it necessary to precipitate the samples during multiplex analysis (data not shown).
* To
whom correspondence should be addressed
Figure 1. Typical electrophoresis pattern of a (CA)n microsatellite marker with shadow bands. The two alleles of this heterozygous sample are eight bases apart, and are shown with their estimated size. The rightmost peak of each group corresponds to an extra base addition (peaks labelled +1). On the left, we see peaks corresponding to slippage of the Taq polymerase (labelled –2 and 4), and the ‘+l peaks’ of these ‘slippage peaks’. For this marker, the normal peaks and the +l peaks have a similar height. That means that, depending on the samples, the Genotyper software will choose one peak or the other for the allele, leading to errors in the genetic data.
After having determined the optimal conditions for this treatment on a limited set of markers, we checked these conditions on a full gel comprising 12 different markers on 34 samples (one sample per lane). This represents 408 PCRs and 816 allele determinations. For this test, we compared three methods of genotype determination: automatic genotyping by the Genotyper software with samples not treated by the T4 polymerase, the same genotypes after the manual correction normally done (this was a blind test), and automatic genotyping by the Genotyper software with samples treated by the T4 polymerase after PCR. The protocol was as follows: hot-start PCR was performed with 40 ng of human genomic DNA, 0.2 µM of each primer and 2 U Taq polymerase, in 50 µ1 standard Taq buffer. Amplification was carried out by denaturing for 10 min at 94C, and cycling 31 times with 40 s at 94C, 30 s at 55C, and 15 s at 72C. The 12 markers from a given DNA sample were pooled in such a way that the signal intensity of each marker was approximately the same. Then 0.4 U T4 polymerase was added to 10 µl of this pool, and the mixture was incubated at 37C for 30 min. Two µl were then mixed with the loading buffer, denatured and loaded onto the sequencer. Out of the 12 markers used, four always generated the same allele sizes, with or without T4 polymerase treatment. This means that, for such markers, the Taq polymerase does not add an extra
541 Nucleic Acids Acids Research, Research,1994, 1996,Vol. Vol.22, 24,No. No.13 Nucleic
Figure 2. Removal of the extra base by a Pfu or T4 polymerase treatment. (A) No treatment. (B) Pfu treatment (0.16 U in 10 µ1 of a pool of 12 markers; 72C for 5 min). (C) T4 treatment (0.5 U in 10 µ1 of a pool of 12 markers; 37C for 30 min). For this particular example, most of the amplified fragments have a +l base addition, leading to a clear electrophoresis pattern (A). The Pfu polymerase was unable to remove this base completely, leading to a more complicated electrophoresis pattern (B). If more Pfu polymerase is added, the peak heights decrease rapidly without significant improvement in the pattern. On the contrary, T4 polymerase completely eliminated the added bases, generating an electrophoresis pattern even clearer than the untreated samples (C).
base. One marker systematically showed allele sizes decreased by one base after T4 treatment which means that, for this marker, the Taq polymerase always adds an extra base, generating a +l base peak significantly greater than the peak corresponding to the normal allele size. In this case, the genetic data inferred from the allele calling are good since the allele size is uniformly overestimated. However, these results cannot be compared to others obtained using a different technique. For the seven other markers, either there was no change or there was a one base reduction in the allele sizes, depending on the DNA samples. Most often, for such a marker, the great majority of the DNA samples shared a common pattern: either no change or a one base reduction after T4 treatment, with DNA samples from only a few individuals (typically 1–4 out of 34) displaying a pattern different from that of this majority. These latter samples therefore represent errors in the genetic data generated by the Genotyper software, so we looked carefully at the traces to determine whether it was the samples which had been treated by the T4 polymerase or the untreated samples which had been incorrectly reported. Out of the 408 samples analysed in this way, there were a total of 22 discrepancies between the genotypes attributed automatically by the Genotyper software on the T4 polymerase-treated and untreated samples. Among these 22 discrepancies, 21 were in favour of T4 treatment and one in favour of no T4 treatment. Indeed, this exception corresponded to a very faint signal (50–60 fluorescence units on the sequencer), which was missed in the T4 polymerase-treated sample. By analysing the manual corrections made on the samples which were not treated by the T4 polymerase, we observed that
541
Figure 3. Spectacular simplification of the electrophoresis pattern by T4 polymerase treatment. Without any treatment, the shadow bands generate a complicated electrophoresis pattern. This misled the Genotyper software which found the sample homozygous for the 242 b allele (upper panel). After treatment, the pattern is much clearer, and the software assigned the two alleles correctly. Peaks on the left are slippage peaks distant from two bases to two bases. The software readily recognizes these peaks in such a situation.
17 erroneous genotypes out of the 21 were corrected properly, one was simply removed, one was left unchanged, and the two others were erroneously corrected. In addition, the faint signal mentioned above (which was correctly assigned) was also erroneously corrected, as well as another sample. In total, five errors remained after manual correction. In conclusion, we believe that the error rate in allele calling by the Genotyper software is ∼5% with samples amplified by the Taq polymerase and loaded into the sequencer without further treatment, and is lowered to ∼1% after manual correction. T4 treatment reduced this error rate to only 0.25% in our laboratory. It should be noted that double- and single-stranded DNA are both substrates for the 3′→5′ exonuclease activity of the T4 DNA polymerase. This implies that it should be necessary to adjust the T4 polymerase concentration based on the primer concentration used. REFERENCES 1 Reed, P.W., Davies, J.L Copeman, J.B., Bennet, S.T., Palmer, S.M., Pritchard, L.E., Gough, S.C.L., Kawagushi, Y., Cordell, H.J., Balfour, K.M., Jenkins, S.C., Powell, E.E., Vignal, A. and Todd, J.A. (1994) Nature Genet., 7, 390–395. 2 Gyapay, G., Morissette, J., Vignal, A., Dib, C., Fizames, C., Millasseau, P., Marc, S., Bernardi, G., Lathrop, M. and Weissenbach, J. (1994) Nature Genet., 7, 246–248. 3 Hu, G. (1993) DNA Cell Biol., 12, 763–770. 4 Litt, M., Hauge, X. and Sharma, V. (1993) BioTechniques, 15, 286–284. 5 Kimpton, C.P., Gill, P., Walton, A., Urquhart, A., Millican, E.S. and Adams, M. (1993) PCR Methods Applic., 3, 13–22. 6 Pritchard, L.E., Kawaguchi, Y., Reed, P.W., Copeman, J.B., Davies, J.L., Barnett, A.H., Bain, S.C. and Todd, J.A. (1995) Hum. Mol. Genet., 4, 197–202.