J Mol Evol (1999) 48:187–196
© Springer-Verlag New York Inc. 1999
Dynamic Diversification from a Putative Common Ancestor of Scorpion Toxins Affecting Sodium, Potassium, and Chloride Channels Oren Froy, Tal Sagiv, Michal Poreh, Daniel Urbach, Noam Zilberberg,* Michael Gurevitz Department of Plant Sciences, George S. Wise Faculty of Life Sciences, Tel-Aviv University, Ramat-Aviv 69978, Tel-Aviv, Israel Received: 16 March 1998 / Accepted: 30 July 1998
Abstract. Scorpions have survived successfully over millions of years without detectable changes in their morphology. Instead, they have developed an efficient alomonal machinery and a stinging device supporting their needs for prey and defense. They produce a large variety of polypeptidic toxins that bind and modulate ion channel conductance in excitable tissues. The binding site, mode of action, and chemical properties of many toxins have been studied extensively, but little is known about their genomic organization and diversity. Genes representing each of the major classes of Buthidae scorpion toxins, namely, ‘‘long’’ toxins, affecting sodium channels (alpha, depressant, and excitatory), and ‘‘short’’ toxins, affecting potassium and chloride channels, were isolated from a single scorpion segment and analyzed. Each toxin type was found to be encoded by a gene family. Regardless of toxin length, 3-D structure, and site of action, all genes contain A+T-rich introns that split, at a conserved location, an amino acid codon of the signal sequence. The introns vary in length and sequence but display identical boundaries, agree with the GT/AG splice junctions, and contain T-runs downstream of a putative branch point, 5⬘-TAAT-3⬘. Despite little sequence similarity among all toxin classes, the conserved gene organization, intron features, and common cysteine-stabilized ␣-helical (CSH) core connecting an ␣helix to a three-stranded -sheet suggest, that they all
* Present address: Yale University School of Medicine, Boyer Center for Molecular Medicine, 295 Congress Avenue, New Haven, CT 06536-0812, USA Correspondence to: M. Gurevitz; e-mail:
[email protected]
evolved from an ancestral common progenitor. Furthermore, the vast diversity found among genomic copies, cDNAs, and their protein products for each toxin suggests an extensive evolutionary process of the scorpion ‘‘pharmaceutical factory,’’ whose success is due, most likely, to the inherent permissiveness of the toxin exterior to structural alterations. Key words: Gene organization — Scorpion neurotoxins — Ion channels — Common progenitor
Introduction Ion channel modifiers found in scorpion venoms are composed of two major polypeptide populations. The first includes ‘‘short’’ (less than 40 amino acid residues) toxins affecting potassium (Miller 1995) and chloride (DeBin et al. 1993) channels, and the second consists of several classes of ‘‘long’’ (60–70 amino acid residues) toxins affecting sodium channels (Zlotkin et al. 1978; Rochat et al. 1979). The toxins are classified according to their structure, mode of action, and binding site on different channels or channel subtypes (Martin-Eauclaire and Couraud 1995; Gordon 1996). Each class consists of several characterized representatives isolated from the venom of various scorpions. The toxins affecting potassium channels are constrained by three disulfide bridges and divided into four subfamilies sharing a very low sequence similarity (Miller 1995). One chloride channel toxin was characterized (DeBin et al. 1993) and several variants were identified (Lippens et al. 1995). The chlorotoxin (ClTx) shares only 30% similarity with the po-
Degenerate primers were used for LqhIT2 and ChTx. Since ClTx-cDNA was available, a primer for the 5⬘ of its mature sequence was unnecessary. All primers are from 5⬘ (left) to 3⬘ (right).
ClTx ChTx
a
CAAT T T ACAAAT GT AAGT T GT ACT ACAT CT
T T AT T T GCGAT GGCAT T T T C CT AAT T AAT T AT T GT GAAAT C CT ACT T T T T GCT ACCGCA C C T CAACCACGACACAAACAT T GT GGACC T AACAT CT ACAT T T T T T AT T CAT ACA G C GG C C G G GT ACGT GACGCT T AT AT T G AAGAAGAAT GGGT AT GC GACGGAT AT AT AAAAAGA
AT GAAT CAT T T GGT AAT GAT T AGT T T GG AT GAAGT T T T T CT T ACT GT T T CT CG AT GAAACT AT T GCT T T T AC T A AT GAAGT T CCT CT AT GGAAT CGT T T T T AT T GCC AT GAAAAT T T T AT CAGT T CT GCT ACT AGCT CT C Lqh␣IT LqhIT1 LqhIT2
5⬘ of mature toxin gene 5⬘ of leader sequence
Scorpions, Cells, Plasmids, and Oligonucleotide Primers. Leiurus quinquestriatus hebraeus was collected in the Judaea mountains. Buthus occitanus mardochei var. Israelis was collected in the northern Negev area. Escherichia coli strain DH5␣ was used for plasmid constructions and propagations. pBluescript (Stratagene, USA) was used for cloning cDNA or genomic sequences. Sequenase kit Version 2.0 (United States Biochemicals, USA) was used for DNA sequencing as described (Sanger et al. 1977). The synthetic oligonucleotide primers used are depicted in Table 1.
Oligonucleotide primers used for the isolation of genomic clonesa
Materials and Methods
Table 1.
tassium channel toxin, charybdotoxin (ChTx), and is constrained by four disulfide bridges. Still, both short toxins exhibit a similar overall three-dimensional structure (Lippens et al. 1995). The long toxins affecting sodium channels have been divided primarily into two major types, i.e., ␣ and  (Zlotkin et al. 1978; Rochat et al. 1979). Although many representatives have been isolated, little is known about their toxic sites. The ␣ toxins, found in Buthidae, bind to receptor site 3 on the vertebrate voltage-sensitive sodium channel (Catterall 1986) and the putative toxic surface of one representative has been reported recently (Tugarinov et al. 1997; Zilberberg et al. 1997). The  toxins, derived from American scorpions, bind to receptor site 4 on vertebrate sodium channels (Catterall 1986; Couraud and Jover 1984). Two other distinct classes of toxins showing specificity to arthropods, i.e., depressant and excitatory, have been described as well (Zlotkin 1987). The four classes of long toxins share only 20–40% sequence similarity (Dufton and Rochat 1984; Martin-Eauclaire and Couraud 1995). Although no sequence similarity exists among short and long toxins, they all share a similar scaffold composed of an ␣-helix and three-stranded -sheet (Fontecilla-Camps 1989; Miller 1995). This scaffold contains a cysteine-stabilized ␣-helical (CSH) motif, which involves a Cys–X–X–X–Cys stretch of the ␣-helix bonded through two disulfide bridges with a Cys–X–Cys triplet of a -strand (Kobayashi et al. 1991). Whether such a common structural motif has evolved independently in the different toxins, as a result of evolutionary paralellism, or has been formed in an ancient progenitor and maintained thereafter is unknown. Since sequence similarities among various scorpion toxin classes is rather slim, we concentrated on their genomic organization, as a tool for comparative analysis. The existence of one intron within the genes encoding scorpion neurotoxins has been reported for several  and ␣ long toxins and, most recently, for a potassium channel short toxin (Becerril et al. 1993; Delabre et al. 1995; Becerril et al. 1996; Corona et al. 1996; Legros et al. 1997). These observations, together with our findings of introns sharing similar characteristics at a unique, conserved location in all other toxin classes, suggest a genetic link among the diverse populations of all scorpion neurotoxins and a common ancestor.
3⬘ of toxin gene
188
189
Fig. 2. Amplification of genomic DNA sequences. The PCR products obtained for the entire genes plus leader sequences (+) or for the mature toxin regions (−) are shown in a 2% stained agarose gel. cDNA stands for the amplified product of ClTx using the same primers as for the entire gene with L. q. hebraeus cDNA as template. Relevant PCR products are indicated by arrows. Other bands appearing in both − and + lanes are nonspecific products.
termini (Table 1). The polymerase chain reactions were performed in a thermocycler (MJ Research, Inc., USA) as follows: denaturation at 94°C for 45 s, annealing for 45 s at 58°C, and polymerization for 90 s at 72°C.
Fig. 1. Cloning procedure of the excitatory toxin LqhIT1. A Circularization of total cDNA adding an EcoRI adaptor. B ‘‘Back-to-back’’ degenerate primers 1 and 2 reacted via PCR with the circularized DNA. Primer 1 is complementary to the sequence represented by the amino acid stretch 26–38 of AaIT and LqqIT1 (Bougis et al. 1989; Kopeyan et al. 1990). C The linear PCR product. D Circularization of the PCR product after ‘‘filling-in’’ and 5⬘-phosphorylation. E EcoRI digestion to yield the LqhIT1-cDNA.
Isolation of cDNA Clones. Preparation of a Leiurus quinquestriatus hebraeus cDNA library was reported (Zilberberg et al. 1992). cDNA copies of Lqh␣IT (Gurevitz et al. 1991) and LqhIT2 (Zilberberg et al. 1991) were isolated from the library by the colony hybridization technique (Grunstein and Hogness 1975). The cDNA for LqhIT1 was isolated by reverse genetics using two ‘‘back-to-back’’ (Zilberberg and Gurevitz 1993) degenerate oligonucleotide primers designed according to the amino acid sequence of AaIT (Bougis et al. 1989) and LqqIT1 (Kopeyan et al. 1990) via PCR (Fig. 1). The resulting product was used as a probe to pull out the full-length authentic cDNA. The cDNA clone for chlorotoxin was identified by random sequencing of short cDNA inserts of the library and comparison of the deduced amino acid sequence to the EMBL protein collection. The charybdotoxin cDNA was isolated from the library using a degenerate oligonucleotide primer (Table 1), designed according to the known amino acid sequence of the C terminus (Gimenez-Gallego et al. 1988), and KS primer via PCR. Isolation of Genomic DNA Sequences. Genomic DNA was extracted from a single abdominal scorpion segment by grinding under liquid nitrogen, 3 h of incubation with proteinase K (250 g/ml), mixing with a solution containing 7.5 M guanidinium–HCl, 50 mM EDTA, 3 M potassium acetate, pH 5, and glass fibers, washing with 70% ethanol, and eluting with water. Template DNA was reacted in PCR with two groups of oligonucleotide primers: (1) primers designed according to the amino- and carboxy-terminal regions of the mature toxins (Table 1) and (2) primers designed according to the amino terminal region of the leader peptides and the aforementioned primers for the C
Results Isolation and Characterization of Five Categories of Toxin Genes Genes encoding ␣, depressant, excitatory, charybdotoxin, and chlorotoxin have been isolated from two different scorpions found in Israel. These scorpions do not contain  toxins. ␣ Class. Using the cDNA of Lqh␣IT (Gurevitz et al. 1991) as a probe, five cDNAs were identified (Zilberberg et al. 1996). To assess the genomic arrangement of genes representing the ␣ class, oligonucleotide primers designed for both 3⬘ and 5⬘ ends of Lqh␣IT were reacted with DNA from either Leiurus quinquestriatus hebraeus or Buthus occitanus mardochei. The primers for the 5⬘ region were designed either for the N-terminal leader peptide or for the N-terminal mature protein sequences (Table 1). The size difference between both products obtained (Fig. 2) indicated the presence of an intron in the region preceding the sequence encoding the mature toxin. DNA sequence analyses of the entire L. q. hebraeus products revealed three nonidentical genes, Lqh␣-6a, Lqh␣-6b, and Lqh␣-6c, whose deduced amino acid sequences for the mature toxin region reveal 77.3, 75.7, and 74.2% similarity with Lqh␣IT, respectively (Fig. 3). A single intron 396 nt long, spanning nucleotides 47–443 in Lqh␣-6a and Lqh␣-6c (Fig. 4), or 399 nt long, spanning nucleotides 47–446 in Lqh␣-6b (not
190
Fig. 3. Gene families of scorpion neurotoxins. Several representatives for each of the ␣, depressant, and excitatory ‘‘long’’ toxins and several ‘‘short’’ toxins are presented. Leader peptides and putative pro-regions are underlined. Residues identical to those in the upper line are designated by dashes. Dots indicate gaps of missing residues. Cterminal residues removed posttranslationally appear in lowercase letters. Similarities were calculated for the mature toxin region including
the C-terminal residues removed posttranslationally. The sequences of Lqh␣ITa, LqhIT2-44, LqhIT1-b-d, ClTx-a, ClTx-d, and ChTx-d were determined from cDNA clones; all others, from genomic clones. Clones lacking leader peptide sequences were isolated with primers constructed for the mature toxin regions. The termini of all genomic sequences isolated via PCR are determined by the oligonucleotide primers used.
shown), were found in these genes. The three genomic gene variants and the five reported Lqh␣IT-cDNA clones (Zilberberg et al. 1996) imply that a multigene family encoding Lqh␣IT variants exists in the genome of L. q. hebraeus. The PCR products obtained when the template DNA was derived from a single segment of the scorpion Buthus occitanus mardochei (Bom) were similar in size to those obtained for the genomic Lqh␣IT-like clones (Fig. 2). A family of five Bom clones, Bom␣-6ae, was identified (Fig. 3) and each of its members con-
tained an intron approximately 400 nt long within the leader sequence region in a similar position as found in Lqh␣6. Depressant Class. Cloning of two LqhIT2 cDNA copies has been reported (Zilberberg et al. 1992). Genomic clones encoding LqhIT2 were isolated and analyzed similarly to the ␣ toxins (see above). Two genes, LqhIT2-53 and LqhIT2-13, varying in their coding and intron regions, were identified. The introns in these genes are 307
191
Fig. 4. Genomic clones of scorpion neurotoxins. The nucleotide and deduced amino acid sequences are presented. The leader peptides and putative pro-regions are underlined. The introns appear in lowercase letters and the split codons are designated in boldface. Numbering of amino acids is according to Fig. 3.
192
and 309 nucleotides long, residing between nucleotide 47 and nucleotide 353 in LqhIT2-53 (Fig. 4) and 47-355 in LqhIT2-13 (not shown). Excitatory Class. Thus far, three excitatory toxins [AaIT from Androctonus australis Hector (Zlotkin et al. 1971; Darbon et al. 1982), AmIT from Androctonus mauretanicus (Zlotkin 1987), and LqqIT1 from Leiurus quinquestriatus quinquestriatus (Kopeyan et al. 1990)] have been characterized and only AaIT cloned (Bougis et al. 1989). Despite symptoms of injected blowfly larvae and electrophysiological effects produced by the crude venom, which suggest the presence of an excitatory toxin, fractionation of the venom derived from Leiurus quinquestriatus hebraeus by either gel filtration, ionexchange, or RP-HPLC chromatographies has not yielded any such toxin (unpublished). Therefore, we employed ‘‘back-to-back’’ oligonucleotide primers, designed according to the conserved region of amino acids 26–38 of AaIT and LqqIT1 (Bougis et al. 1989; Kopeyan et al. 1990), for direct isolation of LqhIT1-cDNA via PCR (Zilberberg and Gurevitz 1993) (Fig. 1). A 350-bp product was obtained, verified by DNA sequencing, and further used as a probe to pull out the authentic clone from the cDNA library. Three cDNA variants were identified (Fig. 3), two of which were 90% similar and the third one revealed 77–80% similarity to the first two (deduced amino acid sequence within the region encoding the mature toxin). The similarity of the three sequences to those of AaIT variants (Bougis et al. 1989) varied between 71.4 and 92.8%, or 71.4 and 85.7%, from that of LqqIT1 (Kopeyan et al. 1990). Oligonucleotide primers designed according to the cDNA termini (Table 1) were employed for the isolation of the corresponding genomic clones. As shown in Figs. 2 and 4, these clones are 578 nucleotides longer than the reading frame of LqhIT1-cDNAs. The intron in LqhIT1-a spans nucleotides 44–621 (Fig. 4). Chlorotoxin and Charybdotoxin. Two cDNA clones, ClTx-a and ClTx-b, resembling the reported chlorotoxin, ClTx (DeBin et al. 1993), in their deduced amino acid sequence, were isolated from the L. q. hebraeus cDNA library. The leader sequences of these toxins are composed of 24 amino acid residues and differ from those of the ‘‘long toxins’’ affecting sodium channels (Fig. 3). Isolation of chlorotoxin genomic clones was achieved by the same strategy described for the toxins affecting sodium channels (Fig. 2). Two genes, revealing 71.4 and 75.6% similarity to the known chlorotoxin (DeBin et al. 1993) or 88.5 and 91.4% similarity to the other cDNA variant, were isolated (Fig. 3). A 95-nucleotide-long intron was found between nucleotide 56 and nucleotide 151 in both genomic clones (Fig. 4). Two distinct cDNA clones, ChTx-a and ChTx-d, were identified (Fig. 3). Primers designed according to the
terminal sequences of the ChTx-a cDNA (Table 1) were used to isolate genomic clones. Of four clones, ChTx-a, ChTx-b, and ChTx-c are depicted in Fig. 3. A 123- to 125-nt-long intron was found between nucleotides 49 and 175–177 (Fig. 4, Table 2). ChTx-a (Fig. 4) and ChTx-e (Table 2) resembled one another in their coding region but varied in their intron sequences. The first deduced amino acid for the mature toxin region in all gemomic clones is glutamine, which undergoes posttranslational modification to pyroglutamate, the first residue of the reported toxin (Gimenez-Gallego et al. 1988). Analysis of Introns The introns of the scorpion neurotoxin genes are characterized by a high percentage (over 80%) A+T content, as opposed to the coding region (Fig. 4, Table 2). All introns analyzed had a consensus GT/AG splice junction (Senapathy et al. 1990). The sequence of the 5⬘ splice donor in all the genes analyzed was 5⬘-G|GTAAG with the exception of 5⬘-G|GTAAA found in LqhIT2-13 (Table 2). These sequences are in concert with the consensus in many species (Senapathy et al. 1990). All these introns split a codon toward the end of the leader sequence, so that the first nucleotide resides upstream of the intron, whereas the following dinucleotide flanks the 3⬘ intron boundary (Fig. 4). The splitting in the ␣ class involves a codon at position −4 (numbering according to Fig. 3), i.e., a Gly codon in the genes of L. q. hebraeus and a Val codon in the genes of B. o. mardochei (not shown). In the depressant class, splitting is within the codon for Glu (−6), and in the excitatory class a Gly codon (−4) is split. Two codons were found to be split in chlorotoxins, either that for Val (−6) in ClTx-b or that for Ala (−6) in ClTx-c (Fig. 4). The introns, in the charybdotoxin genes, split a valine codon juxtaposed to the five codons upstream of the mature toxin sequence (position −6). All introns of the long and the short toxins contain a putative branch point, 5⬘-TAAT-3⬘ (Table 2; Fig. 5), located within the ideal distance (47–61 bp) upstream of the 3⬘ splice site (Senapathy et al. 1990) (Fig. 4, Table 2). A considerable number of poly(A) stretches appear upstream of the branch points, whereas T-runs predominate the regions downstream.
Discussion Gene Families and Genomic Organization Cloning and analysis of genes or cDNAs encoding any representative among scorpion ‘‘short’’ or ‘‘long’’ neurotoxins reveal several members. Till now, polymorphism of scorpion neurotoxin genes has been shown merely among cDNA clones (Bougis et al. 1989; Zilber-
AT GCT GAT T G GTAAAACAT AT T T T AT T T T C-218-T T AAAT T GTAAT GAAGAAAA-29-T T T AT T T T GT T T T AAT AT AG AAAGCT T AGT AT GCT GAT T G GTAAGCGT AT T T T AT T T T CT -219-T T GGAT T GTAAT GAAGAAT A-30-T T AT T T CCT T T T T AAT AT AG AAGGCT T AGT
CCAAT AAT GG GTAAGT AT T T AT T T T T AT AT -499-T ACAT T CT TAAT AT AAT AT T -19-T CT CAAAT T T T CT T ACAAAG GGGT GCT T GG
GT AAT GAT CG GTAAGT GAT T GCCAAT AT T T --14- AAAAT AAATAAT AT GAAAGT -21-T CAT CAT T T T CT T T CT GT AG CAACT CAT AT GT AAT GAT CG GTAAGT GAT T GCCAAT AT T T --14- AAAAT AAATAAT AT GAAAGT -21-T CAT CAT T T T CT T T CT GT AG T AACT GAT AT
T GT T CAAT AG GTAAGT T T T CCT GT T CCAT T --36- T T GT T AT ATAAT GT AT AGAA-29-AT AT GT T AAT AAT AT T T T AG T AGGT T GGAG T GT T CAAT AG GTAAGT T T T CCT GT T CCAT T --36- T T GT T AT ATAAT AT AT AGAA-27-AT AT GT T AAT AAT AT T T T AG T AGGT T GGAG T GT T CAAT AG GTAAGT T T T CCT GT T CCAT T --36- T T GT T AT ATAAT AT AT AGAA-27-AT AT GT T AAT AAT AT T T T AG T AGGT T GGAG
T GT T CAAT GA GTAAGT T GCAT T T T T T AT T A--16- GCAAAAT T TAAT GAT T CAT G-11-GGT T CT T T T T AACAT T CT AG T T AT AGGCAT T GT T CAAT GA GTAAT T ACGAAT T T T T AT T A--18- T AAAAACT TAAT AAT T CAT T --9- T AT GT T GT T T AAT AT T T T AG T T AT T GGAAT
LqhIT2-53 LqhIT2-13
LqhIT1-a
ClTx-c ClTx-d
ChTx-a ChTx-b ChTx-e
KTX1 KTX2
T T AT T T T GT T AACAT AG GCAT T GT CGT T T T T T T AT T AACAT AG ACAT T GT CGT T AT T T T GT T AACAT AG GAAT T GT CGT
87 87
125 123 123
95 95
578
307 309
475 474 464
396 339 425 347
Intron size (nt)
Intron junctions (boldface) and the putative branch point (underlined) are delineated. Sequences of Lqh␣-6a, Lqh␣-6b, LqhIT2-53, LqhIT2-13, LqhIT1-a, ClTx-c, ClTx-d, ChTx-a, ChTx-b, and ChTx-e were obtained in this work. AaHI⬘, Androctonus australis Hector toxin I⬘ (Delabre et al. 1995); Ts IV-5, Tityus serrulatus toxin IV-5 (Corona et al. 1996); Ts-␥, Tityus serrulatus toxin ␥ (Becerril et al. 1993); ␥-st, toxin ␥ from Tityus stigmurus (Becerril et al. 1996); ␥-b, toxin ␥ from Tityus bahiensis (Becerrill et al. 1996); KTX1 and KTX2, kaliotoxins 1 and 2 from Androctonus australis (Legros et al. 1997).
a
T T GCT GAT CG GTAAGCT GAAT T CAGT T T CT -418-CAAAAT GCTAAT GGACT T T T T T GCT GAT CG GTAAGCT GAAT T CAGT T T CT -418-CAAAAT ACTAAT GGAT T T T T T T GCT GAT CG GTAAGCT GAAT T CAGT T T T T -408-CAAAAT GCTAAT GGACT T T T
GT GT GGAGAG GT GT GGAGAG GT GT GGAGAG GCACGGAGGG
Ts-␥ ␥-st ␥-b
GTAAGAT T T ACAT AT T CT T A-303-GGAAAT AT TAAT T T T T T GAT -33-GT AAT T T T T T CT GACT ACAG GTAAGAT T T ACAT T T T CT T A-306-AGAAAT AT TAAT T T T T T GAT -33-GT AAT T T T T T CCGACT ACAG GTAAGAT T T AT AT ACT CT T A-333-GGAAAT AT TAAT T T T T T AAT -32-GT T AT T T T T T CT GACT ACAG GTAAGAT T T T CCT CCT T AT T -266-AGAAACGCTAAT T T GGAT GC-21-T AACGACT GT T AAAT T T T AG
CT CAT GACAG CT CAT GACAG CT CAT GAT AG T T GACCGCGG
3⬘ intron Exon
Lqh␣-6a Lqh␣-6b AaHI⬘ TsIV-5
Intron
5⬘ exon
Nucleotide sequence alignment of scorpion neurotoxin intronsa
Toxin
Table 2.
193
194
Fig. 5. Schematic description of scorpion toxin gene organization. The splice sites and the putative branch point are indicated. The bases participating in the formation of the presumable lariat structure are designated by asterisks.
berg et al. 1992; Zilberberg et al. 1996), which could be attributed to variations in scorpion population. However, we present evidence for polymorphism at the level of individual scorpions, since all of the genomic copies were isolated from DNA extracted from a single abdominal segment. The genomic organization of all genes analyzed is similar (Fig. 5) in that they comprise two exons and one intron of variable lengths located 43–55 nucleotides downstream of the ATG initiation codon of the leader sequence. It is most likely that no other introns appear in these genes as inferred from the genomic sequences of the ␣ toxin AaHI⬘ (Delabre et al. 1995) and the kaliotoxin, KTX2 (Legros et al. 1997). Although the occurrence of an intron within the signal sequence is rare, it has been documented for various genes encoding barley and rice ␣-glucanases (Simmons et al. 1992; Malehorn et al. 1993), the VH gene segments in the channel catfish (Ventura-Holman et al. 1994), the human parvalbumin gene (Berchtold 1989), the human adrenodoxin reductase gene (Lin et al. 1990), the pheromone 4 gene of Euplotes octocarinatus (Meyer et al. 1992), and scorpion toxins (Becerril et al. 1993; Delabre et al. 1995; Becerril et al. 1996; Corona et al. 1996; Legros et al. 1997). Sequence analyses revealed little similarity among introns and coding regions of scorpion genes belonging to different pharmacological classes and very high similarities among each gene family or among pharmacologically related toxins, for example, Lqh␣6-a and its 96.9% and 89.4% variants, Lqh␣6-b and Lqh␣6-c, respectively; the chlorotoxin ClTx-b and its variant ClTx-c; the charybdotoxin ChTx-a and its 94.6 and 91.9% variants, ChTx-b and ChTx-c, respectively; and the LqhIT2-53 and its 67.2% variant LqhIT2-13 (Fig. 3). Interestingly, a codon for the negatively charged Glu appears downstream of the intron site in Lqh␣6, ClTx, and ChTx variants and is split by the intron in the LqhIT2 variants (Fig. 4). Negatively charged amino acids do not generally occupy the last third of the leader sequence, which is usually polar (Von-Heijne 1990; Izard and Kendall 1994), and thus may be candidates for a cleavage site. Adopting this principle, together with the location of the introns in all genes analyzed, may suggest that the codons downstream of the intron and, perhaps, the split codon, do not belong to the leader sequence. As shown in Fig. 4, the length of these amino acid short stretches varies among the different classes of toxins. Such stretches may form structural entities, having a potential to be cleaved off along the secretory pathway, as is
shown for secreted proteins, whose maturation involves the formation of a pro-protein (Harris 1989). Pro-regions of secreted proteins have been studied thoroughly and two mechanisms for their formation have been described. In the first, cleavage occurs after two basic amino acids (Harris 1989), whereas, in the second, cleavage is performed by dipeptidylamino peptidase, which removes dipeptides after alanine or proline (Kreil 1990). The two mechanisms involve a consensus sequence, which is recognized by the endoproteases participating in the cleavage. Since the toxin sequences downstream of the intron sites do not share a common consensus sequence, nontraditional cleavage may be involved. Such a cleavage of pro-regions was reported for apamin and MCD peptide derived from the honeybee venom (Gmachl and Kreil 1995). Thus, it is possible that the leader sequence of scorpion toxins is encoded by the first exon, whereas the sequence downstream of the intron site is presumably a pro-region cleaved off during maturation (Fig. 5).
Comparative Analysis of Scorpion Toxin Introns Matching the boundaries of the various introns with the consensus 5⬘-GT/AG-3⬘ site necessitates the splitting of a codon. Such splitting was shown for other genes, such as those of protein kinases (Hradetzky et al. 1992), the human parvalbumin gene (Berchtold 1989), the pheromone gene of Europlotes octocarinatus (Meyer et al. 1992), the VH gene segments in the channel catfish (Ventura-Holman et al. 1994), and the ␣-glucanase genes (Simmons et al. 1992; Malehorn et al. 1993). Three highly conserved regions can be found in all scorpion gene introns including those reported recently (Becerril et al. 1993; Delabre et al. 1995; Becerril et al. 1996; Corona et al. 1996; Legros et al. 1997). These are the 5⬘ splice donor, the 3⬘ splice acceptor, and a putative branch point (Fig. 5, Table 2). The 5⬘ splice donor is more conserved than the 3⬘ splice acceptor, and is recognized by the U1 small nuclear ribonucleoprotein particle (U 1 snRNP) through basepairing interactions (Zhuang et al. 1986). The 5⬘ splice donor fits the consensus AG|GTWAGT (where | designates the splice site and W indicates A or G) (Senapathy et al. 1990), although in all instances we found A at position +3, not necessarily T at position +6, and only G from the upstream exon (Table 2). The consensus found in scorpions matches the consensus reported for invertebrates in gen-
195
eral (Senapathy et al. 1990; Csank et al. 1990). The 3⬘ splice acceptor matches the consensus only at the last two nucleotides of the intron, AG. The putative branch point consensus for long and short toxins, 5⬘-TAAT-3⬘, differs from the mammalian signal primarily in the absence of G at the position preceding the branch point (the second A in the sequence) (Senapathy et al. 1990). In the only case this sequence was not found, Ts IV-5 from the scorpion Tityus serrulatus (Corona et al. 1996), it could have been due to a PCR error. The consensus sequence of the branch point is highly similar to the consensus sequence found in Drosophila, CTAAT (Mount et al. 1992) and Tetrahymena thermophila, TTAAT (Csank et al. 1990). This sequence is located 25–60 bp from the end of the 3⬘ splice acceptor, the conventional distance found in introns (Senapathy et al. 1990). In addition, introns in many species, including plants, Dictyostelium discoideum, Caenorhabditis elegans, and Drosophila, are significantly more A+T rich than flanking exons (Weibaur et al. 1988; Csank et al. 1990; Mount et al. 1992) having a considerably higher G+C content. This feature was also found in introns of scorpion toxin genes. Conversley, S. cerevisiae and human introns show little or no difference in pyrimidine, purine, A+T, or G+C content relative to their neighboring exon sequences (Csank et al. 1990). Csank et al. (1990) proposed that the high A+T content and frequent presence of A and T homopolymers could limit strong secondary structure formation or be recognized by specific factors important for splicing. Of particular interest is the pyrimidine-rich region contiguous to the 3⬘ splice acceptor. In scorpion toxin genes it appears that the pyrimidine-rich region is replaced by a large T-rich region, and no abundance in C. Clearly, the frequency of T increases downstream of the putative branch-point (Fig. 4, Table 2) as was also found in Drosophila (Mount et al. 1992).
Conclusions The results of the present study suggest that all scorpion neurotoxins constitute gene families with more representatives to be isolated. Gene duplication, on the one hand, and vast diversity generated due to the structural permissiveness of scorpion toxin structures (Vita et al. 1995), on the other hand, may permit a continuous trend of natural mutagenesis and selection of novel neuropharmacological substances for scorpion defense and prey in the course of evolution. The existence of a unique carbon backbone containing a cysteine-stabilized ␣-helical (CSH) motif in all scorpion neurotoxins could be rationalized on the basis of similar requirements (e.g., solubilization in membranes, binding of identical cofactors or activity at similar sites) resulting in a variety of polypeptides generated by evolutionary parallelism or divergence from a primordial gene. The possibility for common ancestry is supported by the similarity among these
toxins at three levels of comparison. (1) Bioactivity—All gene products bind and modulate ion conductance of channels in excitable tissues (Catterall 1986; DeBin et al. 1993; Miller 1995). (2) Structure—Regardless of length and number of S–S bonds, all corresponding toxins share a similar cysteine-stabilized ␣-helical (CSH) motif, which involves a Cys–X–X–X–Cys stretch of the ␣-helix bonded through two disulfide bridges to a Cys–X–Cys triplet in a -strand belonging to an anti-parallel -sheet (Fontecilla-Camps 1989; Kobayashi et al. 1991). (3) Genomic organization—All toxin genes are arranged as two exons and a phase I intron splitting a codon at a similar location, i.e., toward the end of the signal sequence and prior to a putative pro-toxin region. The introns are similar in their consensus junctions, putative branch points, and characteristic T-runs and A+T abundance. Aside from the putative common ancestry, the prominent polymorphism found in each toxin class suggests an extensive process of diversification, which enlarges the arsenal of useful bioactive compounds playing a major role in the efficient survival of scorpions. Acknowledgments. This research was supported by Grant IS-248694C from BARD, The United States–Israel Binational Agricultural Research & Development Fund; Grant 891-0112-95 from the Israeli Ministry of Agriculture; and Grant 466/97 from The Israel Academy of Sciences and Humanities.
References Becerril B, Corona M, Mejia MC, Martin BM, Lucas S, Bolivar F, Possani LD (1993) The genomic region encoding toxin gamma from the scorpion Tityus serrulatus contains an intron. FEBS Lett 335:6–8 Becerril B, Corona M, Coronas FIV, Zamudio F, Calderon-Aranda ES, Fletcher PL, Martin BM, Possani LD (1996) Toxic peptides and genes encoding toxin ␥ of the Brazilian scorpions Tityus bahiensis and Tityus stigmurus. Biochem J 313:753–760 Berchtold MW (1989) Parvalbumin genes from human and rat are identical in intron/exon organization and contain highly homologous regulatory elements and coding sequences. J Mol Biol 210: 417–428 Bougis PE, Rochat H, Smith LA (1989) Precursors of Androctonus australis scorpion neurotoxins: Structures of precursors, processing outcomes, and expression of functional recombinant toxin II. J Biol Chem 264:19259–19265 Catterall WA (1986) Molecular properties of voltage-sensitive sodium channels. Annu Rev Biochem 55:953–985 Corona M, Zurita M, Possani LD, Becerril B (1996) Cloning and characterization of the genomic region encoding toxin IV-5 from the scorpion Tityus serrulatus Lutz and Mello. Toxicon 34:251–256 Couraud F, Jover E (1984) Mechanism of action of scorpion toxins. In: Tu AT (ed) Handbook of natural toxins, Vol 2. Marcel Dekker, New York, pp 659–678 Csank C, Taylor FM, Martindale DW (1990) Nuclear pre-mRNA introns: Analysis and comparison of intron sequences from Tetrahymena thermophila and other eukaryotes. Nucleic Acids Res 18: 5133–5141 Darbon H, Zlotkin E, Kopeyan C, Van-Rietschoten J, Rochat H (1982) Covalent structure of the insect toxin of the North African scorpion Androctonus australis Hector. Int J Peptide Prot Res 20:320–330
196 DeBin JA, Maggio JE, Strichartz GR (1993) Purification and characterization of chlorotoxin, a chloride channel ligand from the venom of the scorpion. Am J Physiol Soc 264:C361–C369 Delabre ML, Pasero P, Marilley M, Bougis PE (1995) Promoter structure and intron-exon organization of a scorpion ␣-toxin gene. Biochemistry 34:6729–6736 Dufton MJ, Rochat H (1984) Classification of scorpion toxins according to amino acid composition and sequence. J Mol Evol 20:120– 127 Fontecilla-Camps JC (1989) Three dimensional model of the insectdirected scorpion toxin from Androctonus australis Hector and its implication for the evolution of scorpion toxins in general. J Mol Evol 29:63–67 Gimenez-Gallego G, Navia MA, Reuben JP, Katz GM, Kaczorowski GJ, Garcia ML (1988) Purification, sequence, and model structure of charybdotoxin, a potent selective inhibitor of calcium-activated potassium channels. Proc Natl Acad Sci USA 85:3329–3333 Gmachl M, Kreil G (1995) The precursors of the bee venom constituents apamin and MCD peptide are encoded by two genes in tandem which share the same 3⬘-exon. J Biol Chem 270:12704–12708 Gordon D (1996) Molecular biology of voltage-gated ion channels: Structure-function relationship. In: Bittar EE, Bittar N (eds) Principles of medical biology, Vol 7A. JAI Press, Greenwich, CT, pp 245–305 Grunstein M, Hogness D (1975) Colony hybridization: A method for the isolation of cloned DNAs that contain a specific gene. Proc Natl Acad Sci USA 72:3961–3965 Gurevitz M, Urbach D, Zlotkin E, Zilberberg N (1991) Nucleotide sequence and structure analysis of a cDNA encoding an alpha insect toxin from the scorpion Leiurus quinquestriatus hebraeus. Toxicon 29:1270–1272 Harris RB (1989) Processing of pro-hormone precursor proteins. Arch Biochem Biophys 275:315–333 Hradetzky D, Strebhardt K, Rubsamen-Waigmann H (1992) The genomic locus of the human hemopoietic-specific cell protein tyrosine kinase (PTK)-encoding gene (HCK) confirms conservation of exon-intron structure among human PTKs of the src family. Gene 113:275–280 Izard JW, Kendall DA (1994) Signal peptides: Exquisitely designed transport promoters. Mol Microbiol 13:765–773 Kobayashi Y, Takashima H, Tamaoki H, Kyogoku Y, Lambert P, Kuroda H, Chino N, Watanabe TX, Kimura T, Sakakibara S, Moroder L (1991) The cysteine-stabilized ␣-helix: A common structural motif of ion-channel blocking neurotoxic peptides. Biopolymers 31:1213–1220 Kopeyan C, Mansuelle P, Sampieri F, Brando T, Bahraoui EM, Rochat H, Granier C (1990) Primary structure of scorpion anti-insect toxins isolated from the venom of Leiurus quinquestriatus quinquestriatus. FEBS Lett 261:423–426 Kreil G (1990) Processing of precursors by dipeptidylaminopeptidases: A case of molecular ticketing. TIBS 15:23–26 Legros C, Bougis PE, Martin-Eauclaire MF (1997) Genomic organization of the KTX2 gene, encoding a ‘‘short’’ scorpion toxin active on K+ channels. FEBS Lett 402:45–49 Lin D, Shi Y, Miller WL (1990) Cloning and sequencing of the human adrenodoxin reductase gene. Proc Natl Acad Sci USA 87:8516– 8520 Lippens G, Najib J, Wodak SJ, Tartar A (1995) NMR sequential assignments and solution structure of chlorotoxin, a small scorpion toxin that blocks chloride channels. Biochemistry 34:13–21 Malehorn DE, Scott K, Shah DM (1993) Structure and expression of a barley acidic ␣-glucanase gene. Plant Mol Biol 22:347–360 Martin-Eauclaire MF, Couraud F (1995) Scorpion neurotoxins: Effects
and mechanisms. In: Chang LW, Dyer RS (eds) Handbook of neurotoxicology. Marcel Dekker, New York, pp 683–716 Meyer F, Schmidt HJ, Heckmann K (1992) Pheromone 4 gene of Euplotes octocarinatus. Dev Genet 13:16–25 Miller C (1995) The charybdotoxin family of K+ channel-blocking peptides. Neuron 15:5–10 Mount SM, Burks C, Hertz G, Stormo GD, White O, Fields C (1992) Splicing signals in Drosophila: Intron size, information content, and consensus sequences. Nucleic Acids Res 20:4255–4262 Rochat H, Bernard P, Couraud F (1979) Scorpion toxin: Chemistry and mode of action. In: Ceccarelli B, Clementi F (eds) Advances in cytopharmacology, Raven Press, New York, pp 325–334 Sanger F, Nicklen S, Coulson AR (1977) DNA sequencing with chainterminating inhibitors. Proc Natl Acad Sci USA 74:5463–5467 Senapathy P, Shapiro MB, Harris NL (1990) Splice junctions, branch point sites, and exons: Sequence statistics, identification, and application to genome project. Methods Enzymol 183:252–278 Simmons CR, Litts JC, Huang N, Rodriguez RL (1992) Structure of a rice ␣-glucanase gene regulated by ethylene, cytokinin, wounding, salicylic acid and fungal elicitors. Plant Mol Biol 18:33–45 Tugarinov V, Kustanovitz I, Zilberberg N, Gurevitz M, Anglister Y (1997) Solution structures of a highly insecticidal recombinant scorpion ␣-toxin and a mutant with increased activity. Biochemistry 36:2414–2424 Ventura-Holman T, Jones JC, Ghaffari SH, Lobb CJ (1994) Structure and genomic organization of VH gene segments in the channel catfish: Members of different VH gene families are interspersed and closely linked. Mol Immunol 31:823–832 Vita C, Roumestand C, Toma F, Menez A (1995) Scorpion toxins as natural scaffolds for protein engineering. Proc Natl Acad Sci USA 92:6404–6408 Von-Heijne G (1990) The signal peptide. Membr Biol 115:195–201 Weibaur K, Herrero JJ, Filipowicz W. (1988) Nuclear pre-mRNA processing in plants: Distinct modes of 3⬘-splice-site selection in plants and animals. Mol Cell Biol 8:2042–2051 Zhuang Y, Weiner AM (1986) A compensatory base change in U1 snRNA suppresses a 5⬘ splice site mutation. Cell 46:827–835 Zilberberg N, Gurevitz M (1993) Rapid isolation of full length cDNA clones by ‘‘inverse PCR’’: Purification of a scorpion cDNA family encoding ␣-neurotoxins. Anal Biochem 209:203–205 Zilberberg N, Zlotkin E, Gurevitz M (1991) The cDNA sequence of a depressant insect selective neurotoxin from the scorpion Buthotus judaicus. Toxicon 29:1155–1158 Zilberberg N, Zlotkin E, Gurevitz M (1992) Molecular analysis of cDNA and the transcript encoding the depressant insect selective neurotoxin of the scorpion Leiurus quinquestriatus hebraeus. Insect Biochem Mol Biol 22:199–203 Zilberberg N, Gordon D, Pelhate M, Adams ME, Norris T, Zlotkin E, Gurevitz M (1996) Functional expression and genetic alteration of an alpha scorpion neurotoxin. Biochemistry 35:10215–10222 Zilberberg N, Froy O, Cestele S, Loret E, Arad D, Gordon D, Gurevitz M (1997) Identification of structural elements of a scorpion alpha neurotoxin important for receptor-site recognition. J Biol Chem 272:14810–14816 Zlotkin E (1987) Pharmacology of survival: Insect selective neurotoxins from scorpion venom. Endeavour 11:168–174 Zlotkin E, Rochat H, Kopeyan C, Miranda F, Lissitzky S (1971) Purification and properties of the insect toxin from the venom of the scorpion Androctonus australis Hector. Biochimie 53:1073–1078 Zlotkin E, Miranda F, Rochat H (1978) Chemistry and Pharmacology of Buthinae scorpion venoms. In: Bettini S (ed) Arthropod venoms. Springer-Verlag, Berlin, Heidelberg, pp 317–369