Springer 2005
Plant Molecular Biology (2005) 57:115–127 DOI 10.1007/s11103-004-6636-z
A novel class of Helitron- related transposable elements in maize contain portions of multiple pseudogenes Smriti Gupta1, Andrea Gallavotti2, Gabrielle A. Stryker1, Robert J. Schmidt2 and Shailesh K. Lal1,* 1
Department of Biological Sciences, Oakland University, Rochester, MI 48309-4401, USA; 2 Department of Biology, University of California, San Diego, La Jolla 92093-0116, USA (*author for correspondence: e-mail:
[email protected])
Received 2 July 2004; accepted in revised form 27 November 2004
Key words: genome evolution, Helitrons, transposable elements
Abstract We recently described a maize mutant caused by an insertion of a Helitron type transposable element (Lal, S.K., Giroux, M.J., Brendel, V., Vallejos, E. and Hannah, L.C., 2003, Plant Cell, 15: 381–391). Here we describe another Helitron insertion in the barren stalk1 gene of maize. The termini of a 6525 bp insertion in the proximal promoter region of the mutant reference allele of maize barren stalk1 gene (ba1-ref) shares striking similarity to the Helitron insertion we reported in the Shrunken-2 gene. This insertion is embedded with pseudogenes that differ from the pseudogenes discovered in the mutant Shrunken-2 insertion. Using the common terminal ends of the mutant insertions as a query, we discovered other Helitron insertions in maize BAC clones. Based on the comparison of the insertion site and PCR amplified genomic sequences, these elements inserted between AT dinucleotides. These putative non-autonomous Helitron insertions completely lacked sequences similar to RPA (replication protein A) and DNA Helicases reported in other species. A blastn analysis indicated that both the 50 and 30 termini of Helitrons are repeated in the maize genome. These data provide strong evidence that Helitron type transposable elements are active and may have played an essential role in the evolution and expansion of the maize genome.
Introduction Eukaryotic genomes consist largely of repetitive sequences that are derivatives of transposable elements. In humans, these elements account for up to 40% of the total genome (Lander et al., 2002). Transposable elements frequently cause mutations when they insert in genes; these mutations are sometimes unstable and the excision of the element can restore gene function (Feschotte et al., 2002). Until recently all known eukaryotic transposable elements, despite their diversity, could be divided into two groups based on their mode of transposition. Class 1 elements transpose via RNA intermediates catalyzed by reverse transcriptase and other proteins encoded by the
element. Class 2 elements transpose via DNA intermediates catalyzed by element-encoded transposase (Engels, 1983; Fedoroff, 1989; Wessler, 1995). Furthermore, these transposable elements have characteristic hallmarks; their termini usually bear direct or inverted repeats and their insertion causes a characteristic and family specific 2–10 bp duplication of the target host site sequence (Doring and Starlinger, 1986; Nevers et al., 1986; Jin and Bennetzen, 1989; Singer et al., 1993; Kunze et al., 1997). There are numerous families of transposable elements that share sequence similarity and the length of their host site duplication. Within each family, transposable elements are further classified as autonomous and non-autonomous. Autonomous elements are active members that encode all
116 the proteins required for their transposition. Nonautonomous elements are the defective versions of the autonomous elements. They typically bear deletions, insertions or rearrangements of internal sequences and rely on the proteins encoded by the autonomous elements for transposition. Virtually all non-autonomous elements within a family share sequence similarity in both termini with their autonomous counterpart. This ensures their transposition through the interaction of the termini and the transposase encoded by the autonomous element (Fedoroff, 1989; Feschotte et al., 2002). Most transposable elements described in eukaryotes are non-autonomous (Bennetzen, 2000). Recently, a novel family of transposable elements was proposed that apparently transposes by replication and strand replacement. Called Helitrons, these elements were initially discovered by computerbased analysis of Arabidopsis, Caenorhabdiitis elegans and rice (commentary Feschotte and Wessler, 2001; Kapitonov and Jurka, 2001). Recently, computer based database searches discovered Helitron like elements in the genomes of vertebrates, such as, fish (Danio rerio and Sphoeroides nephelus), Anapheles, Drosophila and white rot fungus Phanerochaete chrysosporium (Kapitonov and Jurka, 2003; Poulter et al., 2003). An autonomous Helitron is proposed to encode proteins similar to DNA helicase and nuclease/ligase. These are associated with bacterial transposons IS91, IS801 and IS1294. These bacterial transposons transpose via rolling circle replication (RCR) (Khan, 2000; Tavakoli et al., 2000). Intrigungly, fish elements termed Helentrons were distinct from Helitrons from other species. Fish Helentrons also encode an apurinic apyrimidine (AP) endonuclease similar to those of non-LTR retroposons, suggesting that Helentrons may transpose via distinct mechanism from Helitrons (Poulter et al., 2003). Despite the abundance of Helitrons, (~2% of the Arabidopsis and Caenorhabditis genome), they remained virtually unknown because their structural features are not easily detected by standard computer-based DNA database searches. For example, unlike most Class 1 and 2 transposable elements, Helitrons lack direct or indirect repeats on their termini, nor do they duplicate host sequences upon insertion. Similarly, they bear non-conserved palindrome sequence located 10–12 bp upstream of their 30 terminus. The palindromes’ potential of forming ~20 bp hairpin structure has been postulated to be a termination signal for RCR. In addition, most
Helitrons in Arabidopsis have suffered significant rearrangements such as deletions and insertions and represent non-autonomous Helitrons. Despite the proposition that Helitrons are active in modern times, direct genetic evidence of their mobility and an autonomous Helitron have not yet been reported in any organisms (Kapitonov and Jurka, 2003). We recently describe a maize mutant that was caused by the insertion of a Helitron type transposable element (Eckardt, 2003; Lal et al., 2003). This mutant was isolated in the 1970s from a corn breeding program. The recent insertion of this non-autonomous sh2-7527 element strongly suggests that the present day maize genome contains an active Helitron element. In this report, we describe another Helitron-like transposable element in the proximal promoter region of the barren stalk1-reference mutant (ba1ref) first identified by R. A. Emerson in 1928 (Hofmeyer, 1930). The terminal ends of this insertion bear strong similarity to the mutant sh2-7527 insertion. We also discovered other putative Helitrons that bear sequence similarity to the termini of the insertions in ba1-ref and sh2-7527. Maize Helitrons described in this report are laden with portions of truncated pseudogenes that no longer encode full-length protein. We provide evidence that the Helitrons we have identified are abundant and may have played a significant role in reshaping the present day maize genome.
Methods and material Isolation of mutant ba1 insertion The plants segregating the ba1-ref mutation and the isolation of the Helitron insertion in the barren stalk1 gene are described elsewhere (Ritter et al., 2002, Gallavotti et al., 2004). Database searches The 500 bp from both the 50 and 30 of the sh2-7527 and ba1 elements along with the 150 bp flanking sequence of the Sh2 and ba1 genes were separately subjected to BLAST searches against GenBank nr, and GSS databases (Altschul et al., 1990) (www.ncbi.nlm.nih.gov/blast). Helitron insertions in the BAC clones were manually determined using the terminal ends of the sh2-7527 and ba1 insertions.
117 The annotation of the maize Helitrons was performed by direct spliced alignment of the available ESTs, cDNA and proteins using the World Wide Web services of GeneSeqer (http://bioinformatics. iastate.edu/cgi-bin/gs.cgi) and SplicePredictor (http:// bioinformatics.iastate.edu/cgi-bin/sp.cgi) (Usuka and Brendel, 2000; Usuka et al., 2000).
resolved, excised and gel purified using agarose gel purification GENECLEAN II kit (Bio101, Inc). The purified fragments were cloned using TOPO TA cloning kit (Invitrogen) following the instructions provided by the manufacturer. DNA sequencing was done by the ABI Prism Dye Terminator sequencing protocol of Applied Biosystems, Foster City, CA.
Genomic PCR analysis Southern blot and RT-PCR analysis Zea mays B73 inbred germplasm was obtained from Dr Curt Hannah (University of Florida). Genomic DNA was isolated from maize B73 inbred leaves using Plant DNAzol reagent (Bethesda Research Laboratories) according to the manufacturer. Optimization of the PCR conditions in several cases was done using an optimization kit (Opti-Prime PCR, Stratagene, La Jolla, CA). To amplify the genomic sequences of disease resistance Rp1 locus without the Helitron insertion, we designed primers, (5Up2, 50 -GCATTCATTGCTTGCCTTTA-30 ) and (3Lo1, 50 -TCGGCTGTGTTGGGATCTAT-30 ). These primers are complementary to the BAC sequence flanking the putative insertion site of the Helitron and span positions 38, 623–38, 643 and 74, 106–74, 126 of the BAC clone (gi: 19908846), respectively. To detect the genomic sequences with the Helitron insertion, we performed two separate reactions to amplify the 50 and 30 ends of the Helitron and their flanking sequences. To amplify the 50 end, we used primers (5Up1, 50 -TCAAATGGG TCGGTCATTTC-30 ) and (5Lo1, 50 -CGCACAA ATCGTAGGACACA-30 ). These span positions 38472 38492 and 38956 38976 of the gi: 19908846 and are complementary to the 50 flanking sequence (outside the Helitron) and 50 end (inside the Helitron) sequences, respectively. Similarly, the 30 end of the Helitron was amplified using primers, (3Up1, 50 TGGCTCGTGAATGTTGTCAT-30 ) and (3Lo1, 50 TCGGCTGTGTTGGGATCTAT-30 ). These are complementary to the Helitron 30 end sequence and the flanking BAC sequences and span positions 73673 73692 and positions 74106 74126, respectively (Figure 4). Similarly, amplification of the sequence of the maize locus containing the 19 kD zein family cluster without the Helitron insertion was achieved using primers, gUp1 (50 -GGCTATCGAAGGCTTCAAGG-30 ) and gLo1 (50 -GCCGATCCATCCATTGTTTA-30 ). These primers flank the putative insertion site of the Helitron (Figure 5). The resultant products were
Genomic DNA isolated from maize inbred B73 line were resolved on 0.7% agarose gels after complete digestion with restriction endonucleases and subjected to southern analysis using procedures previously described (Saghai-Maroof et al., 1984). The 50 and 30 terminal probes of mutant sh2-7527 were synthesized by PCR using primers, Sh2E11F (50 -ACGGGCTATTGGGAGGATGT-30 ) and 7527 50 R (50 -CATGCCTGCTACAGAGAAA G-30 ) that are complementary to Sh2 exon 11 and 50 insertion sequence, and primers Sh2E12R (50 -GGGTGCAGTGAAGAAAGGTG-30 ) and 752730 F (50 -CTGCAGTCACAGAAGGAAAC) that are complementary to Sh2 exon 12 and insertion 30 sequence, respectively. Total RNA from the 20- to 22-days post pollination maize kernels were extracted as described previously (McCarty, 1986; Giroux and Hannah, 1994). RNA from maize roots and shoots were extracted using TRIZOL reagent (Invitrogen) according to the protocols provided by the manufacturer. The RT-PCR analysis was performed using primers RTUp1 (50 - GTGTTCAACGCAT GATCTCG-30 ) and RTLo1 (50 -GATATCGTTT CGGACCGTTTG-30 ). These are complementary to the BAC sequences flanking the 50 and 30 regions of the putative Helitron insertion site, respectively (Figure 5). The first strand synthesis was performed using commercially available SuperScript RT-PCR kit (Invitrogen). The amplified products were resolved on gel, cloned and sequenced.
Results The maize barren stalk 1 mutation is caused by an insertion of a Helitron type transposable element The maize barren stalk-1, a recessive mutation defective in the development of axillary meristem,
118 was isolated in 1928 (Hofmeyer, 1930; Ritter et al., 2002). The molecular and genetic analysis of the ba-1 reference allele (ba1-ref) revealed the presence of 6,525 bp insertion proximal to the promoter region of the ba-1 gene (Gallavotti et al., in preparation). A database search with the mutant sequence did not detect any sequence similarity to known transposable elements. However, manual inspection of the sequence revealed all the hallmarks of a recently, described novel family of putative rolling circle transposable elements termed Helitrons (Kapitonov and Jurka, 2001). Like Helitrons, the insertion did not bear terminal repeats, was precisely inserted between the nucletotides, 50 -A and T-30 and did not cause duplication of the insertion site sequence. Furthermore, in agreement with the conserved terminal ends of Helitrons, the insertion starts with 50 -TC and ends with CTAG-30 . We conclude that the foreign insertion in ba1-ref is a Helitron type transposable element. The insertion in ba1-ref contains pseudogenes and both the terminal ends bears striking similarity to the Helitron termini in mutant sh2-7527 Intriguingly, both termini of the mutant ba-1-insertion share strong sequence similarity with the termini of the Helitron insertion recently reported in the maize mutant sh2-7527 (Lal et al., 2003). The pair wise alignment of the terminal ends of both mutant insertions is displayed in Figure 1. As shown, 42 of the 46 nucleotides of the 50 end termini and 24 of the 29 nucleotides of the 30 termini of both insertions are identical. However, similarity was limited to the termini and the remaining sequences were completely divergent between the two mutant insertions. We previously
reported the presence of portions of at least three genes presumably transduced by the mutant sh2 insertion (Lal et al., 2003). To investigate the presence of sequences that are expressed in the maize genome, we performed the spliced alignment of the entire mutant ba1-ref insertion against all 361,775 available maize ESTs at the PlantGDB using GeneSeqer (Usuka et al., 2000). GeneSeqer performs spliced alignment between the cDNAs and their cognate homologous or heterologous genomic sequences and predicts intron/exon junctions based on the strength of the splice site scores (Brendel and Zhu, 2002). The details of the alignments are available at http://www2.oakland.edu/ biology/ba1.html and shown schematically in Figure 2. The alignment produced 62 matching ESTs and nine predicted gene structures (PGS) based on the alignment of the ESTs. After manual inspection of the EST alignments of these 9 PGS only 4 non-overlapping PGS (1–4) were deemed reliable based on high quality of EST alignment in the predicted region and are marked in Figure 2. The remaining 5 PGS were predicted by low quality EST alignments or ESTs that potentially utilized alternative splice sites during pre-mRNA splicing. These results suggest that the existence of portions of four different expressed genes in the mutant ba1 insertion. However, the lack of significant similarity of matching ESTs to known proteins made it difficult to provide more evidence in support of this prediction. A single gene may be assigned two or more PGSs during contig assembly if its representative ESTs do not bear significant overlaps. Of the 4 PGSs, small stretches of sequence of 3 PGSs show weak similarity to three different proteins. For example, position 1731–1972 (PGS1) of the insertion shows weak similarity to an Arabidopsis protein (gi: 30678845) annotated as MATH domain
Figure 1. Terminal ends of the Helitrons from mutants sh2-7527 and ba1-ref share strong sequence similarity. Pair-wise sequence alignment of the 50 and 30 ends of Helitron insertion in maize mutants sh2-7527 (upper sequence) and ba1-ref (lower sequence). The sizes of the inserts are indicated.
119
Figure 2. Scalable view of the direct spliced alignment of the maize ESTs with the complete mutant ba1 Helitron insertion (GenBank Accession: AY645947). In the alignment, boxes depict the exons and lines connecting the boxes represent introns. Maize ESTs, predicted gene structures (PGS) and open reading frames of more than 64 amino acid residues are displayed by red, green and orange bars, respectively. The four non-overlapping PGS1-4 are indicated.
containing protein (data not presented). Similarly, regions from position 5007 to 5984 and position 1087 to 1347 of the insertion spanning PGS2 and PGS3 regions showed weak similarity to an Arabidopsis expressed protein Atg26670 and to an unknown protein from rice (gi: 27573352). These observations provide further evidence that these PGSs may represent sequences of three different genes. All predicted PGSs of the mutant ba1 insertion lacked significant ORFs due to the presence of several in-frame stop codons. Several small ORF’s of 60 or more amino acids residues (Figure 2) failed to produce any sequence similarity to the known proteins. These observations suggest that these TUGs represent pseudogenes that no longer encode full-length protein. Furthermore, BLASTN analysis of the ba1 insertion against the 400, 791 maize GSS (Genome Survey Sequence) contigs assembled at the Plant Genome Database (http://www.plantgdb.org/) produced 196 hits. Several of these GSS clones depicted long stretches of strong similarity and may represent other members of the ba1 insertion in different regions of the genome. For example, ZmGSStuc04-27-04.23518.1
showed 97% similarity from position 3492 to 6007 of the ba1 insertion (data not presented). This region spans predicted gene structures PGS3 and PGS4 of the ba1 insertion. Similarly, ZmGSStuc0427-04.30383.1 bears 96% sequence similarity from position 1294 to 5053 spanning predicted gene structures PGS1 and PGS2 of the mutant insertion. The termini of the mutants’ sh2-7527 and ba1 insertion are a repetitive sequence in the maize genome The comprehensive blastn analysis of the 500 bp sequence from the 50 and 30 ends of the mutants, sh2-7527 and ba1 insertions, along with their flanking 150 bp Sh2 and ba1 sequence, against the NCBI genome survey sequence (dbGSS) and non-redundant database produced multiple (e-value ¼ 10) hits from each end of the Helitron element. All identified homologous sequences were derived from maize DNA. None of the identified sequences exhibited homology to Sh2 or ba1. The length of the sequence similarity of these hits typically ranged from 35 to 50 bp from the
120 terminal ends, but in few cases the regions of similarity was significantly longer. For example, 50 and 30 ends similarity with mutant sh2-7527 insertion spanned the entire sequence of GSS clones, 22222406 and 19786376, respectively (data not presented). These GSS clones represent insertion sites of the engineered RescueMu transposon in the maize genome available at the maize genome database (Lawrence et al., 2004). Multiple sequence alignments of representative non-redundant hits of the shorter sequence similarities obtained from blastn analysis of the 50 and 30 ends of Helitron insertions are shown in Figure 3A and B. None of these sequences exhibit sequence similarity to Sh2 or ba1. Further, the immediate nucleotide A flanking the 50 alignment and T flanking the 30 alignment present in all the hits indicates they represent the ends of the Helitron
like transposable element inserted in different regions of the maize genome. In order to further analyze the repetitive nature of the Helitron ends, we performed genomic Southern blot analysis of maize genomic DNA and used terminal ends of mutant sh2-7527 insertion as probes (Figure 3C). The high stringency-hybridizing pattern of multiple bands provides further evidence that the terminal ends of the maize Helitrons are repetitive in the maize genome. Apparently, these elements share sequence similarity to the mutant sh2-7527 insertion only at the termini. Putative helicase bearing Helitrons may be abundant in the maize genome Intriguingly, despite the total absence of sequence similar to DNA helicase and RPA like proteins
Figure 3. The 50 and 30 termini of Helitron are highly repetitive in the maize genome: A and B show the sequence alignments of representative non-redundant hits obtained from the blastn analysis using the 50 and 30 termini of the sh2-7527 Helitron as a query against the GSS database, respectively. The invariant nucleotides of the Helitron insertion are in yellow and the putative insertion site of the host dinucleotide, AT, is in red font. The GenBanK accession number of the sequence source is shown on the left of the panels. (C) Southern blot analysis of 50 and 30 terminal ends of mutant sh2-7527 insertion in maize genome. Genomic DNA isolated from maize inbred B73 was digested to completion with restriction enzymes indicated above the lanes and analyzed as described in ‘‘Methods and material.’’ Genomic hybridization pattern produced by 50 and 30 terminal probes of the sh2-7527 insertion are displayed in left and right panel, respectively.
121 within the ba1 and sh2-7527 insertions, tblastn analysis using rice Helitron helicase (HELITORN1_OS; OSHEL1p; Kapitonov and Jurka, 2001) against the available maize GSS contigs produced more than 500 hits. The high degree of sequence similarity that spanned the entire length of the majority of the GSS contigs suggests that these hits may represent orthologous helicase genes of Helitrons that are distributed throughout the maize genome (data not presented). To investigate this possibility, we searched for similar DNA helicase sequences in the maize BAC clones. We discovered two instances in which sequences very similar to rice Helitron Replication protein A (HELITORN1_OS; OSRPA1Hp) were in close proximity to a putative DNA helicase, a characteristic of Helitrons described from other species (Kapitinov and Jurka, 2001; Poulter et al., 2003). For example, positions 160925–165104 and positions 159415–160670 of maize BAC clone (gi: 38153991) and positions 143833–148500 and positions 141509–141925 of BAC clone (gi: 42517210) bear significant sequence similarity to rice Helitron helicase and replicase proteins, respectively. However, the presence of several inframe stop codons suggests that these two elements represent non-autonomous Helitrons. The absence of sequence similar to the terminal ends of ba1-ref and sh2-7527 insertions flanking these helicases/replicases suggests these putatively nonautonomous elements may not be related to the ba1 and sh2 insertions.
and meiotically unstable disease resistance locus, Rp1 (Ramakrishna et al., 2002). Sequences from position 38,715 to 38,733 bp of BAC clone, gi: 19908846 and position 48,371 to 48,389 of BAC clone, gi: 19908841 show 100% sequence similarity with the 18 bp 50 terminus of the mutant sh2-7527 insertion. Similarly, 28 nucleotide sequence spanning position 73 751–73 779 of gi: 19908846 and position 82,928 to 82,941 of gi: 19908841 share 25 bp identity with the terminal 30 sequence of the mutant sh2-7527 insertion. The alignment of the identical sequence found in the two Rp1 clones with the mutants sh2-7527 and ba1-ref termini is depicted in Figure 4. The sequence between the terminal ends of the putative Helitrons, that spans from position 38,715 to 73,779 is 35 065 bp in length in BAC clone, gi: 19908846. It shares >99% sequence similarity to 34,570 bp in BAC clone, gi: 19908841 from position 48,371 and 82,941. Except for sequence similarity at the termini, this element in the Rp1 locus exhibits no sequence similarity to the Helitron elements of sh27527 or ba1-ref. Similar to Helitrons, which always transpose between the target dinucleotide, AT, causing no duplication or modification of the target site, ends of the element discovered in the BAC clones were flanked by A and T at their 50 and 30 termini, respectively. Also like Helitrons, the elements in the BAC clone contained two 8 bp palindromic sequences that potentially can form a hairpin approximately 8 bp upstream from the 30 terminus (Figure 4A).
Two BAC clones derived from the disease resistance locus, Rp1, in maize contain Helitron insertion
The putative Helitron insertions are contained within the 43 kb region duplicated in the BAC clones, gi: 19908846 and 19908841
The sequence analysis of mutant ba1-ref and sh2-7527 Helitron insertion suggested that despite being totally divergent, the conserved terminal ends may play an important role in the transposition of these elements. Using this rationale, we searched for other Helitrons in the available maize BAC sequences with ba1-ref and sh2-7527 mutant terminal end sequences as a query and discovered two additional Helitron insertions in the maize genome. These were found in BAC clones, gi: 19908846 and gi: 19908841. These showed a virtually identical sequence identity to both the 50 and 30 terminal sequences of the mutant sh2-7527 insertion. These two BAC clones were derived from the same locus in maize; the highly variable
The two BAC clones containing the Helitron insertion were initially selected by hybridization to the maize Rp1-D rust resistance gene (Collins et al., 1999; Ramakrishna et al., 2002). In addition to having two Rp1 homologous genes, these BAC clones contain copies of retroelements and truncated pseudogenes. These two BAC clones share a near-perfect duplication of ~43 kb region, containing an Rp1 gene, six truncated versions of the cellular pseudogenes, three retroelements of Opie family, and a MITE (Ramakrishna et al., 2002). In BAC clone gi: 19908846, this duplication spans positions 37,224–80,460 and, in BAC clone gi: 19908841, from position 46,294 and 89,630.
122
Figure 4. Helitrons within the Rp1 locus of maize. (A) Helitron and the flanking Rp1 BAC sequences are displayed in red and black letters, respectively. Conserved terminal sequences of the maize Helitrons are underlined. The palindrome sequences that can potentially form a hairpin are depicted in blue letters. (B) The location of two nearly identical Helitrons within the duplicated regions of the BAC clones, GIs 19908846 and 19908841, are shown by open triangles. The flanking dinucleotide AT of the putative insertion site of the Helitron is highlighted in red. In the pair-wise alignment, the upper, middle and lower sequences represent the termini of the Helitron insertion in mutants’ sh2-7527, ba1-ref and the Helitrons discovered at Rp1 locus, respectively. Black boxes depict three Opie retroposons and a gray box displays a miniature inverted repeat element (MITE). Open boxes indicate pseudogenes contained between the two ends of a putative Helitron insertion.
Intriguingly, the Helitrons discovered in the two BAC clones are contained within this duplicated region. In BAC clone gi: 19908841, the Helitron insertion starts 831 bp upstream of the Opie-B retroelement and ends 846 bp upstream of the rp1-2 gene. Similarly, the Helitron insertion in BAC clone gi: 1990846, starts 831 bp upstream of the Opie-B element and ends 853 upstream of rp1-4 gene. Contained within the putative Helitron insertion are three copies of Opie family of LTR retrotransposons, a MITE, and 6 truncated pseudogenes. The positions of these elements and the pseudogenes are shown in Figure 4B. We also performed PCR amplification of the genomic DNA isolated from leaves of maize inbred B73 to search for Rp1 genes lacking the Helitron element. PCR primers complementary to the BAC sequences flanking the 50 and 30 ends of the putative element were used. This resulted in a single product of 430 bp in length (Figure 5B). The sequence of the PCR product shared 95% similarity to the sequence flanking the termini of the insertion found in the BAC and precisely lacked the Helitron insertion between the dinucleotide, 50 -AT-30 . This is marked by an arrow in Figure 5C along with a pair-wise alignment of the
sequence from the PCR product with the BAC sequence. We conclude that the inbred B73 contains Rp1 genes both lacking and harboring Helitron elements. The 19-kD zein gene family cluster in cultivar B73 contains a Helitron insertion In addition to the Helitron in Rp1, our database search identified a putative Helitron insertion in BAC clone, gi: 13606087, representing the 19-kD zein family cluster of maize inbred B73. The sequence of the BAC clone from positions 4408 to 4426 and from positions 22,130 to 22,158 showed strong similarity to the 50 and 30 terminal ends of the Helitron insertions in ba1-ref and sh2-7527, respectively (Figure 6). Further database searches discovered three maize ESTs, gis: 4730266, 32860284 and 6501061 with 100% similarity to the sequences immediately flanking the putative Helitron termini of the BAC clone (gi: 13606087) but precisely lacking the intervening 17,749 bp region. These overlapping ESTs did not bear sequence similarity to the known proteins in the public databases. Like Helitrons, the 50 and 30 termini of the putative insertion was flanked by nucleotides A and T, respectively. Furthermore,
123
Figure 5. PCR amplification of the putative insertion site of Helitron insertion in Rp1 paralog (A) The terminal ends of the two Helitrons and their flanking BAC sequences are displayed in lower and upper case letters, respectively. The positions of the primers used during PCR amplification are marked by arrows. (B) PCR amplification of the gene with and without the Helitron insertion. Ethidium bromide stained gel of the PCR product resolved on 1% agarose gel resulting from subjecting maize inbred B73 genomic DNA to PCR using primer pairs depicted in panel A. Molecular weight markers are given on the right. Primers 5Up1 and 5Lo1 detect the 50 ends, and primers 3Up1 and 3Lo1 detect the 30 ends of the Helitron insertion. The primers 5Up2 and 3Lo1 detect the wild type gene. (C) Pair-wise sequence alignment of the BAC sequences flanking the Helitron (upper sequence) with the PCR product derived from the amplification of the maize B73 inbred genomic DNA using primer pairs 5Up2 and 3Lo1 (lower sequence). An arrow marks the putative insertion site of the Helitron.
genomic PCR using primers flanking the putative insertions site of the Helitron amplified a single product that bore >95% sequence similarity to the flanking BAC sequences and lacked the putative Helitron insertion (Figure 6). Taken together, these observations suggest that a Helitron insertion exists in an expressed gene. We annotated the intervening 17 749 bp sequence between the putative ends of the Helitron by performing the direct spliced alignment of the available proteins and cDNA sequences using the computer
software GeneSeqer and SplicePredictor (Usuka et al., 2000; Usuka and Brendel, 2000). This method provides improved and more refined annotation than gene prediction solely based on EST evidence or statistical approaches (Zhu and Brendel, 2002). A manual analysis of the alignments discovered portions of three different intron-bearing genes with strong similarity to rice proteins annotated as unknown protein (gi: 24960746), transporter protein (gi: 34902182) and plastid division protein FtsZ (gi: 14495344). In addition, five aligned with maize ESTs
124
Figure 6. Genomic and RT-PCR analysis of a Helitron insertion site in a putative gene. (A) Helitron and flanking BAC sequences are displayed in lower and upper cases, respectively. The positions of the primers used during genomic and RT-PCR analysis are identified by arrows. (B) The left panel displays the PCR product amplified from maize BSSS53 inbred genomic DNA template using PCR primers, gUp1 and gLo1 displayed in panel A. The right panel displays the RT-PCR product amplified from total RNA extracted from maize endosperm (22 day post pollination) DAP, shoot and root tissue using primer pairs RTUp1 and RTLo1 displayed in panel A. (C) Pair-wise alignment of the flanking BAC (upper sequence) sequences without Helitron insertion and the sequence of the genomic PCR product displayed in panel B (lower sequence). The putative insertion site of the Helitron is marked by an arrow.
of unknown function. These observations indicate the presence of multiple pseudogenes within the intervening sequences. Both graphic and text interactive detailed view of the alignments are available at http://www.zmdb.iastate.edu/~volker/HELITRON/ gs_sorted-output-ht_top.html.
Discussion Common terminal ends of ba1 and sh2-7527 insertions may play important role in transposition The complete lack of coding sequences similar to RPAs and DNA helicases in maize Helitrons
described in this report indicates they represent nonautonomous elements. This is also evident from the lack of significant ORFs within the insertions. The lack of a known genetic system for activation of this transposable element system has hampered our knowledge of the transposition and replication mechanisms of Helitrons. It is widely presumed that these elements transpose by replication and strand replacement similar to some bacterial rolling circle transposons like IS91 (Mendiola et al., 1994). However, no in vivo or in vitro evidence exist in support of this hypothesis. The sh2-7527 and ba1-ref mutations represent the only mutant phenotypes identified to date that are caused by an insertion of a Helitron-related sequence in maize. The strong
125 similarity of the terminal ends of these two independent Helitron insertions suggests that they may play important roles in transposition. Whether the transposition of these putative non-autonomous maize Helitrons are facilitated through the interaction of the terminal ends with the helicase/replicase encoded by their autonomous member needs further investigation. Helitrons discovered in maize bear structural features distinct from Helitrons reported from other species The structural features of maize non-autonomous Helitrons described in this report are distinct from the Helitrons thus far reported from other species. First, unlike maize Helitrons, the Helitrons characterized from other species bear sequence similar to RCR-initiator like protein (Kapitonov and Jurka, 2001; Poulter et al., 2003). The presence of genes related to the mechanism of their transposition, albeit destroyed by multiple mutation/deletion in the non-autonomous members, provides a basis for their classification as a distinct group. Second, compared to other species, the nonautonomous Helitrons in maize appear to have grown in size. The complete divergence of sequences between the termini of the ba1-ref and sh2-7527 insertions are mainly attributed to the presence of different pseudogenes. The difference in the length between the two insertions suggests extreme size polymorphism among maize Helitrons. For example, compared to ~6.5 kb insertion in the ba1-ref allele, the Helitron insertion in sh2-7527 is at least twice as large (Lal et al., 2003). Our search of the existing BAC sequences in the public databases for other members similar to the Helitron in ba1 failed to produce any positive result. However, several maize GSS contigs showed strong similarity that spanned more than one predicted gene structures (PGS) discovered within the termini of the ba1 insertion. These observations suggest that other members with similarity to the ba1 insertion may exist in the maize genome. The total lack of similarity between the sequences within the conserved termini of the ba1 and sh2-7527 insertions provides no clues to the possible biological relevance of the pseudogenes. The simplest explanation is that these pseudogenes represent the portions of cellular genes captured by these elements during their journey across the
genome. Inefficient recognition at the termination signal for the rolling circle replication of the hairpin loop in the conserved 30 terminus of the Helitrons has been proposed to cause the transduction of the 30 flanking sequence during transposition. (Feschotte and Wessler, 2001). However, no in vivo or in vitro evidence exist in support of this hypothesis. The random distribution of the captured genes throughout the element rather than a concentration near the 30 terminus suggests that these maize Helitrons acquire genes via different mechanisms. Intriguingly, the basic structural features of maize Helitrons described in this report share similarity to the bacterial mobile genetic elements termed integrons. Bacterial integrons, like maize Helitrons have conserved boundaries surrounding variable intervening regions composed of randomly arrayed promoter-less gene cassettes captured by these elements through sitespecific recombination (Hall and Collis, 1995). The 50 terminus of integrons consists of an integrase gene with a promoter located within the integrase gene and oriented towards the cassette (Levesque et al., 1994). The product of the integrase gene facilitates excision and insertion of gene cassettes in a single site-specific recombination event, and involves circular intermediates (Hall and Collis, 1995). The promoter of the integrase gene of the integrons transcribes the sequences of the captured genes. The ability of the integrons to capture and propagate genes by site-specific recombination has played a vital role in the evolution of bacterial genome and has given a selective advantage against adverse physiological environment such as the presence of antibiotics (Rowe-Magnus and Mazel, 2001). The relatively recent insertion of at least two events described here indicates that unlike integrons, maize elements are mobile. It is possible that similar to integrons, capture of genes by maize Helitrons may involve recombination events. The captured genes may be transcribed and hence may give rise to chimeric transcripts containing portions of different genes. These may evolve into novel genes under evolutionary selection. For example, the Helitron insertion in Shrunken-2 gene resulted in mutant transcripts containing sequences of 12 foreign exons that were spliced to the Sh2 transcript from within the insertion (Lal et al., 2003). Intriguingly, the pseudogenes detected within the insertion bore different degrees of sequence
126 similarity to their putative wild type cellular genes. The simplest model to account for the variation in the similarity may reflect their evolutionary time of capture. The genes that were captured earlier may be more divergent from their wild-type sequence than the ones captured later. It is also proposed that the sequences of the captured genes that bestow transposition advantage to the Helitrons are maintained and thus bear more similarity to their wild type sequences, whereas the others are subsequently destroyed by multiple mutations (Kapitonov and Jurka, 2001). The lack of sequence similarity compounded by extreme size heterogeneity among these nonautonomous maize Helitrons poses a challenge to discovering other members of the family in extant plant databases. In our efforts, we exploited the short conserved terminal ends of the maize ba1-ref and sh2-7527 Helitrons to discover other Helitrons in the long contiguous sequences of the BAC clones available in the public domain. The discovery of sequences with a high degree of similarity to both 50 and 30 ends located within the same BAC clone points to the presence of Helitron insertions. The precise lack of putative Helitron insertion from the PCR amplified genomic fragment from the same B73 inbred line from which the two Rp1 BACs were obtained indicates a bonafide Helitron insertion occurred in an Rp1 paralogue. The 95% sequence similarity between the PCR product and the sequences flanking the BAC Helitron supports the suggestion that the amplified PCR product did not result from the excision of the insertion, but rather represents a paralogous locus. This conclusion is further supported by the PCR amplification of the genomic fragments containing both ends of the flanking sequence of the putative Helitron. Intriguingly, the Helitron insertion within the duplicated 43kb region shared between the two BAC clones are physically separated by approximately 300kb in the maize genome (Ramakrishna et al., 2002). Whether the Helitron played any role in this duplication event is not apparent. It is plausible that this insertion may represent a relic of an ancient Helitron already containing MITES and retroelements. This was then duplicated. We note the insertion of the retroelements Opie B, Opie C and Opie D in this region occurred perhaps 1.5 million years ago. This vastly preceded the duplication event, which happened in relatively recent
times, about 200 000 years ago (Ramakrishna et al., 2002). The intriguing possibility that maize Helitrons can frequently capture large portions of cellular genes and multiply them in different regions of the genome may have impacted in creating the recently reported lack of colinearity of genes within inbreds of maize (Fu and Dooner, 2002; commentary by Bennetzen and Ramakrishna, 2002; Song and Messing, 2003). The lack of gene colinearity between two inbreds of maize has been recently implicated as a molecular basis for heterosis or hybrid vigor in maize. The Helitron insertion discovered in 19 kD gene cluster family locus in cultivar BSSS53 was recently reported missing from the orthologous region in B73 (Song and Messing, 2003).
Acknowledgements We are grateful to Dr Curt Hannah for his help and suggestions throughout the course of this project and for critically reviewing the earlier version of this manuscript. The work was supported by Research Excellence Fund, Oakland University. A. Gallavotti and R. Schmidt acknowledge the support of the National Science Foundation.
References Altschul, S.F., Gish, W., Miller, W., Myers, E.W. and Lipman, D.J. 1990. Basic local alignment search tool. J. Mol. Biol. 215: 403–410. Bennetzen, J.L. 2000. Transposable element contributions to plant gene and genome evolution. Plant Mol. Biol. 42: 251–269. Bennetzen, J.L. and Ramakrishna, W. 2002. Exceptional haplotype variation in maize. Proc. Natl. Acad. Sci. USA 99: 9093–9095. Brendel, V. and Zhu, W. 2002. Computational modeling of gene structure in Arabidopsis thaliana. Plant Mol. Biol. 48: 49–58. Collins, N., Drake, J., Ayliffe, M., Sun, Q., Ellis, J., Hulbert, S. and Pryor, T. 1999. Molecular characterization of the maize Rp1-D rust resistance haplotype and its mutants. Plant Cell 11: 1365–1376. Doring, H.P. and Starlinger, P. 1986. Molecular genetics of transposable elements in plants. Annu. Rev. Genet. 20: 175–200. Eckardt, N.A. 2003. A new twist on transposon: the maize genome harbors a Helitron insertion. Plant Cell 15: 293–295. Engels, W.R. 1983. The P family of transposable elements in Drosophila. Annu. Rev. Genet. 17: 315—344. Fedoroff, N.V. 1989. In: D.E. Berg and Howe, M.M. (Edn.), Mobile DINA American Society for Microbiology Press, Washington DC. PP. 375–411.
127 Fedoroff, N.V. 1989. About maize transposable elements and development. Cell 56: 181–191. Feschotte, C., Jiang, N. and Wessler, S.R. 2002. Plant transposable elements: where genetics meets genomics. Nat. Rev. Genet. 3: 329–341. Feschotte, C. and Wessler, S.R. 2001. Treasures in the attic: Rolling circle transposons discovered in eucaryotic genomes. Proc. Natl. Acad. Sci. USA 98: 8923–8914. Fu, H. and Dooner, H.K. 2002. Intraspecific violation of genetic colinearity and its implications in maize. Proc. Natl. Acad. Sci. USA 99: 9573–9578. Gallavotti, A., Zhao, Q., Kyozuka, J., Meeley, R., Ritter, M., Doebley, J., Enrico Pe‘, M. and Schmidt, R.J. 2004. The role of barren stalk1 in the architecture of maize. Nature 132: 630–635. Giroux, M.J. and Hannah, L.C. 1994. ADP-glucose pyrophosphorylase in shrunken2 and brittle2 mutants of maize. Mol. Gen. Genet. 243: 400–408. Gong, X., Kaushal, S., Ceccarelli, E., Bogdanova, N., Neville, C., Nguyen, T., Clark, H., Khatib, Z.A., Valentine, M., Look, A.T. and Rosenthal, N. 1997. Developmental regulation of Zbu1, a DNA-binding member of the SWI2/SNF2 family. Dev. Biol. 183: 166–182. Hall, R.M. and Collis, C.M. 1995. Mobile gene cassettes and integrons: capture and spread of genes by site-specific recombination. Mol. Microbiol. 15: 593–600. Hofmeyer, J.D.J. 1930. The inheritance and linkage relationships of barrenstalk1 and barrenstalk2, two mature plant characters of maize. Ph.D. Dissertation, Cornell University, Ithaca, New York, USA. Jin, Y.K. and Bennetzen, J.L. 1989. Structure and coding properties of Bs1, a maize retrovirus- like transposable element. Proc Natl Acad Sci USA. 86: 6235–6239. Kapitonov, V.V. and Jurka J. 2001. Rolling circle transposons in eukaryotes. Proc. Natl. Acad. Sci. USA 17: 8714–8719. Kapitonov, V.V. and Jurka J. 2003. Molecular paleontology of transposable elements in the Drosophila melanogaster genome. Proc. Natl. Acad. Sci. USA 100: 6569–6574. Khan, S.A. 2000. Plasmid rolling circle replication: recent development. Mol. Microl. 37: 477–484. Kunze, R., Saedler, H. and Lonnig, W.E. 1997. Plant transposable elements. Adv. Bot. Res. 27: 331–470. Lal, S.K., Giroux, M.J., Brendel, V., Vallejos, E. and Hannah, L.C. 2003. The maize genome contains a Helitron insertion. Plant Cell 15: 381–391. Lander, E.S., et al., 2001. Initial sequencing and analysis of the human genome. Nature. 409: 860–921. Lawrence, C.J., Dong, Q., Polacco, M.L., Seigfried, T.E. and Brendel, V. 2004. MaizeGDB, the community database for maize genetics and genomics. Nucleic. Acids Res. 32: 393–397. Levesque, C., Brassard, S., Lapointe, J. and Roy, P.H. 1994. Diversity and relative strength of tandem promoters for the
antibiotic-resistance genes of several integrons. Gene 142: 49–54. McCarty, D.R. 1986. A simple method for extraction of RNA from maize tissue. Maize Genet. Coop. Newslett. 60: 61. Mendiola, M.V., Bernales, I. and de la Cruz, F. 1994. Differential roles of the transposon termini in IS91 transposition. Proc. Natl. Acad. Sci. (USA) 91: 1922–1926. Nevers, P., Shepherd, N. and Saedler, H. 1986. Plant transposble elements. Adv. Bot. Res. 12: 102–203. Poulter, R.T., Goodwin, T.J. and Butler, M.I. 2003. Vertebrate helentrons and other novel Helitrons. Gene 313: 201– 212. Ramakrishna, W., Emberton, J., Ogden, M., SanMiguel, P. and Bennetzen, J.L. 2002. Structural analysis of the maize Rp1 complex reveals numerous sites and unexpected mechanisms of local rearrangement. Plant Cell 13: 3213–3223. Ritter, M.K., Padilla, C.M. and Schmidt, R.J. 2002. The maize mutant barren stalk1 is defective in axillary meristem development. Am. J. Bot. 89: 203–210. Rowe-Magnus, D.A. and Mazel, D. 2001. Integrons: natural tools for bacterial genome evolution. Curr. Opin. Microbiol. 5: 565–569. Saghai-Maroof, M.A., Soliman, K.M., Jorgensen, R.A. and Allard, R.W. 1984. Ribosomal DNA spacer-length polymorphisms in barley: mendelian inheritance, chromosomal location, and population dynamics. Proc. Natl. Acad. Sci. USA 81: 8014–8018. Singer, M.F., Krek, V., McMillan, J.P., Swergold, G.D. and Thayer, R.E. 1993. LINE-1: a human transposable element. Gene 135: 183–188. Song, R. and Messing, J. 2003. Gene expression of a gene family in maize based on noncollinear haplotypes. Proc. Natl. Acad. Sci. USA 100: 9055–9060. Tavakoli, N., Comanducci, A., Dodd, H.M., Lett, M.C. and Albiger Bennett, P. 2000. IS1294, a DNA element that transposes by RC transposition. Plasmid 44: 66–84. Usuka, J. and Brendel, V. 2000. Gene structure prediction by spliced alignment of genomic DNA with protein sequences: increased accuracy by differential splice site scoring. J. Mol. Biol. 297: 1075–1085. Usuka, J., Zhu, W. and Brendel, V. 2000. Optimal spliced alignment of homologous cDNA to a genomic DNA template. Bioinformatics 16: 203–211. Wessler, S.R., Bureau, T.E. and White, S.E. 1995. LTRretrotransposons and MITEs: Important players in the evolution of plant genomes. Curr. Open. Genet. Dev. 5: 814–821. Zhu, W. and Brendel, V. 2002. Gene structure identification with MyGV using cDNA evidence and protein homologs to improve al binitio predictions. Bioinformatics 18: 761– 762.