Expeditor: A Pipeline for Designing Primers Using ...

5 downloads 3702 Views 117KB Size Report
... Hu and Max F. Rothschild at the address above, or e-mail: [email protected] ... external software programs, NCBI Blast (version 2.0.14;. Altschul et al. 1997) ...
ª 2005 The American Genetic Association

Journal of Heredity 2005:96(1):80–82 doi:10.1093/jhered/esi015 Advance Access publication December 14, 2004

Expeditor: A Pipeline for Designing Primers Using Human Gene Structure and Livestock Animal EST Information

our approach is proven to be more efficient and practical to use. This tool enables a rapid PCR-based approach to study functional or positional candidate genes of interest in farm animals among other utilities such as comparative mapping.

Program Description Z.-L. HU, K. GLENN, A. M. RAMOS, C. J. OTIENO, J. M. REECY, AND M. F. ROTHSCHILD This program uses human gene exon sequences with intron/ From the Department of Animal Science, 2255 Kildee Hall, Center for Integrated Animal Genomics, Iowa State University, Ames, IA 50011 Address correspondence to Zhi-Liang Hu and Max F. Rothschild at the address above, or e-mail: [email protected] or [email protected], respectively.

Platform and Prerequests The Expeditor tool was developed as a CGI program with Perl 5.8 in a Unix environment. The software requires three external software programs, NCBI Blast (version 2.0.14; Altschul et al. 1997), CAP3 (Huang and Madan 1999), and Primer3 (version 0.6; Rozen and Skaletsky 1996, 1997) to be called. We built multiple coding sequence databases for the blast search using TIGR Indices and NCBI Unigen for cattle, chickens, pigs, and sheep. All software and sequence database installations are on the same server. Input

The near completion of the human genome sequencing project has provided a useful resource for animal genome research due to considerable conservation of genomic organization, including the intron/exon structures, in vertebrates (O’Brien et al. 1997; Zhu et al. 2003). The recent generation of large numbers of expressed sequence tags (ESTs) in livestock species and the construction of gene indices and Unigene for cattle, chickens, pigs, and sheep along with other species (Quackenbush et al. 2000; NCBI dbEST 2003; NCBI UniGene 2003) has also made it easier for comparative genome and candidate gene analysis in animals. To facilitate a polymerase chain reaction (PCR)based approach to successfully amplify genomic segments of genes in animals using human gene structure information, we have developed software, Expeditor, which can be used to combine human gene structure information and animal coding sequence information for primer design. This tool is useful to identify mRNA splice sites in the TIGR Indices or Unigen sequences to correctly design PCR primers. Compared to molecular biology methods (Iwahana et al. 1994; Siebert et al. 1995; Zhang et al. 2000) and computational approaches (Pertea et al. 2001; Reese et al. 1997; Thanaraj et al. 2000) to determine intron/exon boundaries,

80

Expeditor is designed to take the Ensembl (Clamp et al. 2003) ExonView sequences as its standard input. We designed a Web form for input of raw data as well as userdefined parameter values (see our Web site, http:// www.genome.iastate.edu/;hu/expeditor). With this input form, users have options to determine each of the eight blast stringency thresholds for exon replacement before each run. In addition to the six standard blast parameters, ‘‘minimum matched length’’ and acceptable ‘‘score’’ values were added to filter for acceptable blast results. Ten major parameters for primer design are also passed on to the Primer3 program. The input for the consensus tool can be multiple similar sequences in fasta format, or multiple blast alignments directly from the NCBI blast output (see examples on our Web site). Output On each successful run, users can retrieve results via Web links. These include: (1) blast results for each exon. The ratio between the length of matched animal sequence and the length of the human exon are calculated for each exon evaluated. This is useful when users wish to determine how good the blast outputs are for an exon replacement. (2) A

Downloaded from http://jhered.oxfordjournals.org/ by guest on June 5, 2013

We have developed software, called Expeditor, that can be used to combine known gene structure information from human and coding sequence information from farm animal species for a streamlined primer design in target farm animal species. This software has many utilities, which include PCRbased SNP discovery for identification of genes/markers associated with economically important traits in farm animals, comparative mapping analysis, and evolution studies. The use of this software helps minimize tedious manual operations and reduces the chance of errors by more conventional approaches.

exon structural information for a given gene to form a ‘‘genomic sequence’’ template in which exon sequences may be replaced with available animal equivalents by a satisfactory blast match. When there is no good animal equivalent sequence available for a given exon, the users have the option of substituting the exon sequences with a consensus sequence generated from other (multiple) animal species. A consensus tool within the Expeditor is provided for this purpose. The existing ‘‘Primer3’’ program is bundled for streamlined primer design on the template sequence with user-defined stringency parameters.

Computer Notes

virtual genomic sequence template formed by Expeditor, with or without the exon sequences being replaced depending on the availability of similar animal sequences, the blast thresholds, and possible consensus sequences from other species. (3) PCR primers designed off the virtual template sequence. Expeditor also has options to show primer locations in color. An example of a virtual genomic sequence template for ‘‘3-hydroxy-3-methylglutaryl-coenzyme A reductase’’ (HMGCoA reductase) produced by Expeditor, with exons 7–9 of the gene from the Ensembl ExonView as the input sequence and the TIGR Pig Indices as the blast database, is shown in Figure 1. The format for the virtual sequence template is designed such that exons with pig sequences are printed in small letters (replacement was successful for the first two exons), and exons with human sequences (replacement was unsuccessful for the last exon) are printed in capital letters. This template sequence, together with each blast result, may help users get an idea how good a template is to use. Stretches of ‘‘Ns’’ in the template represent the sizes and locations of intron sequences. For the example shown in Figure 1, it took the program only 5 s on a True64 computer with 512 MB RAM and 500 MHz CPU speed. On average, for each run like this one, it may save about 2–3 h of a researcher’s time.

Availability and Usage Expeditor is available online for free use by not-for-profit or academic users (http://www.genome.iastate.edu/;hu/

expeditor). Usage instructions are provided on the Web. The code is also available on request.

Discussion One challenge to efficiently search for useful functional or positional candidate genes of interests in farm animals is to correctly identify mRNA splice sites in the coding sequences of the target species to correctly design PCR primers for amplification of genomic segments of interest based only on EST or mRNA information. Our approach to tackle the problem is more efficient than laboratory or computational approaches. When human homologous gene structure information is known, one normally has to manually align the animal and human sequences to find where in the animal coding sequence an intron may be inserted and how big this insertion may be. The same manual process has to be repeated for every possible intron insertion to create a useful template sequence for primer design. To design primers on a sequence without gene structural information using graphical primer design software like Oligo 6, one needs to determine where a primer may be located and whether it is possible for an amplicon to span an intron. Often this effort is combined with test/fail/test cycles before a cross-intron fragment may be successfully amplified when its size is within the range of PCR capability. The major advantages of Expeditor are that it relieves researchers from laborious trials and errors and reduces errors that might be introduced by hand.

81

Downloaded from http://jhered.oxfordjournals.org/ by guest on June 5, 2013

Figure 1. An Expeditor output showing a virtual genomic sequence template formed. Exons printed in small letters represent the first 2 exons are replaced with pig coding sequences upon satisfactory blast matches (the coverage of the human exon lengths are 100% and 86% respectively). The last exon is printed with the original human sequence in capital letters due to an unsatisfactory pig sequence match (10% only). The stretches of ‘‘Ns’’ represent the size and location of intron sequences.

Journal of Heredity 2005:96(1)

Acknowledgments This work was supported in part by funding from the USDA-CSREES Pig Genome and Database Coordination programs, the Iowa Agriculture and

82

Home Economics Experiment Station, and by Hatch Act and State of Iowa funds. The authors thank Dr. Kwan-Suk Kim, Benny Mote, Renata Hernandes, James Koltes, and Dr. Laura Grapes for serving as the beta testers of the software and many useful discussions through the course of this work.

References Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, and Lipman DJ, 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402. Clamp M, Andrews D, Barker D, Bevan P, Cameron G, Chen Y, Clark L, Cox T, Cuff J, and Curwen V, others, 2003. Ensembl 2002: accommodating comparative genomics. Nucleic Acids Res 31:38–42. Fedorov A, Merican AF, and Gilbert W, 2002. Large-scale comparison of intron positions among animal, plant, and fungal genes. Proc Natl Acad Sci USA 99(25):16128–16133. Huang X, and Madan A, 1999. CAP3: Aa DNA sequence assembly program. Genome Res 9:868–877. Iwahana H, Tsujisawa T, Katashima R, Yoshimoto K, and Itakura M, 1994. PCR with end trimming and cassette ligation: a rapid method to clone exonintron boundaries and a 59-upstream sequence of genomic DNA based on a cDNA sequence. PCR Methods Appl 4(1):19–25. Muller C, Denis M, Gentzbittel L, and Faraut T, 2004. The Iccare web server: an attempt to merge sequence and mapping information for plant and animal species. Nucleic Acids Res 32:W429–W434. NCBI dbEST: database of expressed squence tags. Last modified: July 28, 2003. http://www.ncbi.nlm.nih.gov/dbEST/dbEST_summary.html. NCBI UniGene: pigs. Last modified: August 3, 2003 http://www.ncbi. nlm.nih.gov/UniGene/clust.cgi?ORG¼Ssc. O’Brien SJ, Wienberg J, and Lyons LA, 1997. Comparative genomics: lessons from cats. Trends Genet 13(10):393–399. Pertea M, Lin X, and Salzberg SL, 2001. GeneSplicer: a new computational method for splice site prediction. Nucleic Acids Res 29(5):1185–1190. Quackenbush J, Liang F, Holt I, Pertea G, and Upton J, 2000. The TIGR gene indices: reconstruction and representation of expressed gene sequences. Nucleic Acids Res 28(1):141–145. Reese MG, Eeckman FH, Kulp D, and Haussler D, 1997. Improved splice site detection in Genie. J Comput Biol 4(3):311–323. Rozen S and Skaletsky HJ, 1996, 1997. Primer3. Code available at http:// www-genome.wi.mit.edu/genome_software/other/primer3.html. Siebert PD, Chenchik A, Kellogg DE, Lukyanov KA, and Lukyanov SA, 1995. An improved PCR method for walking in uncloned genomic DNA. Nucleic Acids Res 23(6):1087–1088. Thanaraj TA and Robinson AJ, 2000. Prediction of exact boundaries of exons. Brief Bioinform 1(4):343–356. Zhang Z-G and Gurr SJ, 2000. Walking into the unknown: a ‘‘step down’’ PCR-based technique leading to the direct sequence analysis of flanking genomic DNA. Gene 253(2):145–150. Zhu L, Swergold GD, and Seldin MF, 2003. Examination of sequence homology between human chromosome 20 and the mouse genome: intense conservation of many genomic elements. Hum Genet 113(1):60–70.

Received November 25, 2003 Accepted July 23, 2004 Corresponding Editor: Leif Andersson

Downloaded from http://jhered.oxfordjournals.org/ by guest on June 5, 2013

Expeditor is designed for PCR amplification of genomic sequences with primer sites anchored in exons. Primer3 is used to optimize the locations of the primers with the userset primer design parameters. The representation of introns with stretches of Ns effectively helps prevent Primer3 from landing any primer on intron locations because only the coding sequences share good homologies between species. The amplicon may or may not span introns depending on the intron/exon sizes but will not span 59 or 39 untranslated region. There are several factors affecting the success rate of the output from Expeditor: (1) availability of complete coding sequence of the gene from the target species; (2) blast stringency; (3) presence of alternative splicing site(s); and (4) actual degree of intron/exon structural homology for a given gene between human and animal species. The design of Expeditor was based on the assumption that the human and livestock animals (e.g., pig) share a high degree of homology in both coding sequence and genomic organization surrounding a gene. However, due to the fact that alternative splicing exists, not all intron locations for a gene may be correctly predicted. In addition, because of the high frequency of tandem repeats within introns, the size of an intron may not be precisely determined. Subsequently, the actual amplified DNA fragments may or may not be exactly like what was anticipated based on Expeditor prediction. In other words, the template sequence made by Expeditor is only an approximation. Therefore, users are strongly advised to carefully evaluate the template produced before carrying out actual PCR experiments. The default blast and primer design parameters were set based on our empirical data. Users are always encouraged to properly adjust them and find appropriate combinations for a given situation. Compared with other primer design tools using a similar approach (for example, ICCARE; Muller et al. 2004), Expeditor is unique in that it takes inputs directly from known human genes/sequences, take into account the intron sizes in estimating amplicon sizes, and allows consensus sequences from multiple animal species to be used as part of the template. In our laboratory the use of this tool has been quite successful in searching for SNPs in pigs. Besides its utility in individual gene searches, Expeditor can also be used in comparative mapping and evolution analysis of genes/genomes between species. As evidence for increased genomic organization of intron/exon structure conservation continues to be demonstrated among animals and even plants (Fedorov et al. 2002), this tool may have wider utility in situations where information on gene structure is available in one species and coding sequences are available in another species.

Suggest Documents