Feb 13, 1989 - 22 Glycine max. - - - - +. - -. +---. + - ...... Kimmel,B.E., Samson,S., Wu,J., Hirschberg,R. and Yarbrough,L.R.. (1985) Gene, 35, 237-248. Kost,T.A. ...
The EMBO Journal vol.8 no. 7 pp.201 5 - 2021, 1 989
Evidence that introns
arose
at proto-splice sites
N.J.Dibb and A.J.Newman
positions within the coding sequence of the tubulin genes. Individual genes encode 450 amino acids and have only a subset of introns, usually three. A gene from Aspergillus nidulans, with eight introns, has the most. There are no introns that are present in every gene. The positions of introns within the actin gene family are shown in Figure 2. Introns that occur outside the coding sequence are not shown. There are some tubulin and actin genes that have no introns (Bergsma et al., 1985; Kimmel et al., 1985; Barahona et al., 1988). These are not included in Figures 1 and 2 because their omission clarified but did not invalidate our analysis. Figure 3 illustrates that the distribution of intron 126-0 of the a-tubulins can be explained by either one intron gain (upper panel) or by many intron losses (lower panel). In Figures 1 and 2 we have indicated with the letter G the other tubulin and actin introns whose distribution can also be explained by a single intron gain by an ancestral gene. The distribution of introns at 54 out of 60 different tubulin and actin intron positions can be explained by this route. The distribution of six of the 60 introns cannot be explained by a single gain event. These are indicated by the letter A for anomalous in Figures 1 and 2. For example, the distribution of intron 152-1 of the actin gene family can only be explained by two separate intron gains at identical positions in an ancestor of the two plant species and an ancestor of the vertebrates (see Figure 2). Alternatively, the distribution of intron 152-1 could have resulted from multiple intron loss (Zakut et al., 1982). This possibility led to the proposal that the entire distribution of introns within gene families is attributable to intron loss (Gilbert et al., 1986). -
Laboratory of Molecular Biology, Medical Research Council Centre, Hills Road, Cambridge CB2 2QH, UK Communicated by M.Perutz
The unexpected discovery of introns raised many questions about gene evolution. We provide evidence that actin and tubulin introns were gained between the G and R of the conserved coding sequence C/AAGR that is known to flank introns in general and which we call a proto-splice site. We conclude that the tubulin and actin introns are less ancient than the coding sequence and so could not have been involved in the primary evolution of the tubulin and actin genes. Key words: introns/proto-splice sites/tubulin/actin
Introduction Over ten years ago it was discovered that the coding sequence of eukaryotic genes is often interrupted by non-coding segments (intervening sequences or introns). There are two theories about the origin of introns. One theory holds that introns are as ancient as genes. Therefore the prokaryotes, which generally lack introns, must have lost them (Darnell and Doolittle, 1986; Gilbert et al., 1986). The second theory holds that introns are of more recent origin and were inserted into eukaryotic genes after the divergence of the prokaryote and eukaryote lineages (Cavalier-Smith, 1985). Introns are often found between protein structural domains. Some authors feel that such a distribution could not have resulted from random intron insertion. Gilbert (1978) proposed that this situation resulted from the assortment of exons by recombination between introns. This begs the question of where the introns came from. Darnell and Doolittle (1986) argue that introns were always there. However, there are many genes that show no obvious correlation between intron position and protein structure. It has been suggested that their introns were inserted during eukaryotic evolution (Cavalier-Smith, 1985). In support of this, there is evidence of intron gain during the evolution of the serine proteases (Patthy, 1985; Rogers, 1985). There is also evidence that the regular distribution of introns in some genes is a result of intron gain at preferred locations (Rogers, 1986; Wilson et al., 1988). We think that the antiquity of introns can be revealed by looking at their distribution in highly conserved genes that are present in a wide range of eukaryotes.
Results and discussion Figure 1 shows the intron positions of 28 tubulin genes from a range of eukaryotes. The intron positions are described by both codon and phase. There are introns at 35 different ©IRL Press
Sites of intron gain or intron loss It is possible to deduce the probable coding sequence at which actin and tubulin introns were either gained or lost during evolution. For example, in Table Ia we have listed the sequences of the 15 a-tubulin genes of Figure 1 in the region of intron 211-1. Only sequence 14 actually contains an intron (see Figure 1). At the bottom of this table we have drawn a consensus sequence. This represents the sequence which we can say with the most certainty was the site for the gain or loss of intron 211-1. We have repeated this process for the other introns of the actin and tubulin gene families. These consensus sequences are listed in Table II. These show all that we can reasonably deduce about the sequences at which the introns of Figures 1 and 2 were gained or lost. We then calculated an overall consensus sequence from the 60 consensus sequences that are listed in Table II. Table lb is a summation of these data. Table Ic expresses Table lb as a percentage. A consensus sequence is written underneath. It can be thought of as the sequence at which actin and tubulin introns are most likely to have been gained or lost during evolution. We call this sequence a proto-splice site. The proto-splice site is virtually identical to the exon consensus sequence that is known to flank introns (compare 2015
N.J.Dibb and A.J.Newman
Intron positions
Cv9 Ov--
Species 1
2 3 4 5 6 7
Homo sapiens Macaca fascicularis Rattus norvegicus Gallus gallus Gallus gallus Gallus gallus
Gallus gallus
8 Drosophila melanogaster 9 Drosophila melanogaster
10 11 12 13 14 15
Drosophila melanogaster Physarum polycephalum Schizosaccharomyces pombe Chlamydomonas reinhardil Volvox carteri Toxoplasma gondii
1 2 3 4 5
Homo sapiens Homo sapiens
6 7 8 9 10
0,-e__0o
v v
+-, .-
e
-+
Vertebrate
-+
a-
NC
NC NC +-
Non-Vertebrate tubulins
+-
a-
-~~+-
-+-~~~+
-+-
+-+-+-
Vertebrate 3- tubulins
~+-+-+-
Homo sapiens Gallus gallus Gallus gallus
~+-+-+-
~+-.
+-.
-~~
Aspergillus nidulans Aspergillus nidulans Neurospora crassa
-
-+
+-
-+
Fungal
+-
Schizosaccharomyces pombe Candida albicans
13- tubulins
++-
+
11 Chlamydomonas reinhardii 12 Volvox carteri
13 Toxoplasma gondii
tubulins
+-
Plant 13- tubulins -
-
-
-
-
-
--
-
-
+-
- -
-
- -
- - -
-
-
-
-
-
-- -
-
-
-
+- +-
Protozoan
G G GGG GGG GA GGGG GGGGGGG G GAG GGG GGG GG GG 1-tubulin Fig. 1. The distribution of introns within the tubulin gene family. The presence of an intron is shown by +, its absence by -. NC, not complete. The intron positions are described by both phase and codon. Codons followed by - 1 or -2 are split by an intron after the first and second base respectively. A codon followed by -0 has an intron between it and the previous codon. The codon positions of both a- and /3-tubulins are with reference to the human ce-tubulin gene. The distribution of introns at each intron position is described by G or A, which stand for single intron gain or anomalous intron distribution (see text). a-Tubulin references: 1, Hall and Cowan (1985); 2, Dobner et al. (1987); 3, Lemischka and Sharp (1982); 4-7, Pratt and Cleveland (1988); 8-10, Theurkauf et al. (1986); 11, Monteiro and Cox (1987); 12, Toda et al. (1984); 13, Silflow et al. (1985); 14, Mages et al. (1988); 15, Nagel and Boothroyd (1988). 3-Tubulin references: 1, Lewis et al. (1985); 2, Lee et al. (1984); 3, Lee et al. (1983); 4, Sullivan and Cleveland (1984); 5, Sullivan et al. (1986); 6,7, May et al. (1987); 8, Orbach et al. (1986); 9, Hiraoka et al. (1984); 10, Smith et al. (1988); 11, Youngblom et al. (1984); 12, Harper and Mages (1988); 13, Nagel and Boothroyd (1988).
Table Ic and d). It is improbable that such similarity occurred by chance. We stress that the elimination, from our analysis, of actin and tubulin sequences that actually do flank introns does not alter this result. This means that the conserved bases that flank tubulin and actin introns are also found at the equivalent position in members of these gene families that do not have an intron at this site. This result is not unexpected. Presumably the splice site sequences that flank all introns have both a coding and a splicing function. Consequently, these sequences can be maintained in the absence of an intron because of coding constraints. This analysis does not say whether introns have been lost or gained. Proto-splice sites could be the remnants of intron loss. Alternatively, introns could have been gained between the G and R of pre-existing proto-splice sites. However, this analysis does show that if introns were gained then they must have been gained, on average, at similar sequences. Consequently, it would not be surprising if introns were occasionally gained at identical positions by different members of a gene family.
2016
Intron gain or intron loss?
The question of whether introns were gained or lost at proto-splice sites during evolution can be addressed by examining the distribution of introns within gene families (Rogers, 1985; Wilson et al., 1988). The a- and f-tubulins show -40% conservation of identical amino acids. It is generally believed that they arose by the duplication of a common ancestral gene. There are introns at 17 different positions within the oa-tubulin gene family and at 19 different positions within the fl-tubulin gene family. There is only one intron position common to both oa- and i-tubulin genes (see Figure 1). The simplest explanation is that the introns were not present in a common ancestral gene but were gained later, after the divergence of the a- and (3-tubulin gene families. Seventeen introns were gained by the a-tubulin gene family, 19 by the fl-tubulin gene family. Two introns were gained independently at position 21-2. Intron gain is also supported by the distribution of introns within each side of the and 3-tubulin gene families. Figure a-
Intron gain at proto-splice sites
Species
Intron positions O . __O C\l - 0 C- C cn
_ _ _ CM O
9
-
CN
0
9
_ _
C
_rlU C\
_-
9
_
9
Homo sapiens Rattus norvegicus Musmusculus Gallus gallus Gallus gallus Xenopus laevis
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
+ + + + + - - -+ + -+ +-+++ + + + + + + + -
7 Homo sapiens 8 Rattus norvegicus 9 Gallus gallus
-
-
-
-
-
-
-
-
-
-
-
-
-
1
2 3 4 5 6
10 11 12 13 14 15 16 17 18 19 20 21 22 23
Strongylocentrotus purpuratus Strongylocentrotus purpuratus Drosophila melanogaster Drosophila melanogaster Bombyx mo Caenorhabditis elegans Acanthamoeba castellani Physarum polycephalum Physarum polycephalum Aspergillus nidulans Thermomyces lanuginosus Saccharomyces cerevisiae Glycine max Arabidopsis thaliana
-
-
+ -
-
-
-
-
-
-
-
-
-
-
+ - -
+ - + - + - -
-
- - - -
+ NC + - -
+ - + - ++ - +-
-
-
co
+ - + - - - + +-+ - +-+ - -
- - + -
to
+ -
-
-
99 99
+ -
-
- - - - - -
- - -
_
@ @cM cs a X0 cr) a Ncn cnOC
-
-
-
-
-
Vertebrate e -actins
Non-vertebrate actins
+
-
-
+-
-
Vertebrate a -actins
- - - - + - - - - + NC -
-
-
-
-
-
-
-
+ - + - -
-
-
-
-
+ -
+ - - - + +--- + ----
+ - - + - - + - + + - - + - - + - + - + - - - - + - - - - - + - -
-
- - + - - + - - + - + - +
-
-
-
-
+--+ +- - + G GGGAGGG GG G GGG A A A GGGGGGGG
Fig. 2. The distribution of introns within the actin gene family. See Figure 1 for details. The codon positions are with reference to actin gene 1 (human a-actin). Actin references: 1, Ueyama et al. (1984); 2, Zakut et al. (1982); 3, Hu et al. (1986); 4, Fornwald et al. (1982); 5, Chang et al. (1985); 6, Mohun et al. (1986); 7, Nakajima-lijima et al. (1985); 8, Nudel et al. (1983); 9, Kost et al. (1983); 10, Crain et al. (1987); 11, Cooper and Crain (1982); 12,13, Fryberg et al. (1981); 14, Mounier and Prudhomme (1986); 15, Files et al. (1983); 16, Nellen and Gallwitz (1982); 17, Nader et al. (1986); 18, Gonzales-y-Merchand and Cox (1988); 19, Fidel et al. (1988); 20, Wildeman (1988); 21, Gallwitz and Sures (1980); 22, Shah et al. (1982); 23, Nairn et al. (1988).
1 illustrates that the introns of vertebrate, fungal and plant f3-tubulin genes are conserved within each group. Between
groups all intron positions are different. Again this suggests that these introns were gained after the separation of the vertebrate, fungal and plant lineages of the f-tubulins. Similar conclusions can be made for the vertebrate and non-vertebrate a-tubulins which share only one intron. Intron gain is further supported by the distribution of individual tubulin introns: 33 out of 35 can be attributed to single intron gains. Similarly, 21 out of 25 actin introns can be clearly attributed to descent from single intron gains. There are only two tubulin introns whose distribution cannot be explained by a single intron gain alone. We suggest that tubulin introns 21-2 and 177-0 were gained on two separate occasions during evolution. It is also possible that some anomalous intron distributions were generated by intron loss or intron shifts following intron gain. An alternative explanation is that all of the tubulin and actin introns are very ancient (Darnell and Doolittle, 1986; Gilbert et al., 1986). According to this model the ancestral tubulin gene, for example, had at least 35 introns that were then differentially lost to produce the observed distribution of introns between and within the ca- and 3-tubulin gene families. It seems highly unlikely that random loss from only 35 introns could result in two sets of 17 ca- and 19 f-tubulin introns with only one intron in common. Consequently, the ancestral gene would have needed to contain a large number of additional introns that were lost without trace. The same problem is encountered in a more severe form if an attempt is made to explain the distribution of individual
tubulin or actin introns by loss. It must be explained why nearly all of these introns look as if they arose from single intron gains (see Figure 3). Why should the loss of introns result in such a pattern? After all, intron loss could just as well generate a pattern of introns that cannot be explained by a single intron gain. It might be argued that intron loss during evolution was so vigorous that the small percentage of survivors happen to look as if they were the result of single intron gains. Again, this implies that the ancestral tubulin or actin gene had an unrealistically large number of introns, the majority of which were lost entirely during evolution. Another problem with an intron loss model is that all eukaryotes, not just those with small genomes, must have undergone massive loss of introns during evolution. In apparent contradiction to this, introns are extremely long lived, as shown by their conservation within (but not between) all of the evolutionary groups of Figures 1 and 2. Intron shifts There is an additional and independent reason for believing that many of the actin and tubulin introns were gained during evolution. Some of these introns are separated by only one or just a few base pairs. There are sound reasons for believing that differences in the position of actin and tubulin introns cannot have resulted from intron shifts (see below). Therefore it is either accepted that introns, such as 90-0 and 90-1 of tubulin, were gained independently or that they originally resided within an ancestral gene only 1 bp apart, which seems absurd. These arguments have been made before (Rogers, 1986; Patthy, 1987). They are based on the realization that an
2017
N.J.Dibb and A.J.Newman
Table I. (a)
CC CT CC CC CC CA CC CT CA CC CC CC CT CC CC
I 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Consensus
GAC GAC GAC GAC GAC GAC GAC GAC GAC GAT GAC GAC GAT GAT GAC
ATC ATC ATC ATT ATC ATC ATC ATC ATC ATA CTC ATT ATC ATC ATC
TGC A TGT T TGC C TGC C TGT C TGC C TGT C TGC C TGC C TGT A TGC A TGC C TGT C TGC C TGC C
C N AT CT T A TC G A CT A T TC T G CT CA
-7 81/2 12 141/2 12 13
-6 121/2 141/2 141/2 141/2 4
-5 14 14 4 18 10
13 10 131/2 13
-2 -3 15 7 181/2 81/2 6 51/2 211/2 22 9 7
-9 T 28 C 29 G 25 A 18 N
-8 25 21 32 22 N
-7 18 25 31 25 N
-6 22 26 26 26 N
-5 28 28 8 36 G*
-4 31 28 21 29 N
-3 13 35 11 41 AC
-2 29 17 11 43 A
-9 11 C 29 G 23 A 37
-8 24 26 24 26 N
-7 21 23 28 28 N
-6 20 29 25 26 N
-5 25 32 17 27 N
-4 17 24 26 33 N
-3 12 41 15 33 CA
-2 14 16 16 55 A
T C G A N
(d)
TAT TAT TAC TAT TAT TAT TAT TAC TAC TAT TAC TAC TAC TAC TAC
-8 12 10 151/2 10½/2 12
(b)
(c)
ATC ATA ATC ATC ATC ATC ATC ATC ATC ATC ATT ATC TGT ATC ATC
-9 15 15½/2 131/2 10 6
T
N
-4
10½/2
+3 10 11 9 13 17
+4 15 81/2 171/2 14 5
+5 81/2 11 181/2 14 8
+6 +7 11 10 11 121/2 151/2 14 81/2 131/2 9 15
+8 13
271/2 16 8
+2 16 151/2 10 51/2 13
121/2 12 12
+9 10 91/2 171/2 7 16
7 8 68 17 G
+1 7 10 53 31 R
+2 34 33 21 12 N
+3 23 26 21 30 N
+4 27 15 32 25 N
+5 16 21 36 27 N
+6 22 24 34 19 N
+7 22 24 27 26 N
+8 27 22 26 25 N
+9 23 22 40 16 G
-1 7 5 79 9 G
+1 10 12 50 28 R
+2 33 23 22 22 N
+3 27 28 17 28 N
+4 20 29 28 24 N
+5 30 34 17 19 N
+6 24 32 23 21 N
+7 15 22 31 31
+8 23 29 24 24 N
+9 27 26 25 21 N
-1
31/2 4 34 81/2 10 -1
+1 31/2 5
N
10½/2
(a) A list of the sequences of the 15 a-tubulin genes of Figure 1, 9 bp on either side of intron position 211-1. Only sequence 14 has an intron that falls between the G and A of the codon that encodes aspartic acid. At the bottom of this table is a consensus sequence. This was made using the following rules. One, a single base is written if it occurs in at least 13/15 positions. Two, if two bases occur in at least 14/15 positions this is shown. If the distribution of two bases is 2:13, both are shown. All other distributions are shown by the letter N. A similar level of stringency was used in the construction of the 1-tubulin and a-tubulin consensus sequences that are listed in Table II. (b) A summation of the data listed in Table II. It shows the total number of conserved and non-conserved bases 9 bp on either side of positions at which introns occur in the actin and tubulin gene families. Where there was a choice of two conserved bases each was given the value of 1/2. (c) The data in (b) expressed as a percentage. This was calculated from the formula: 100B/60 -N, where B is the number of each conserved base, as shown in Table Ib, and N is the number of non-conserved bases at the same position. (d) Data taken from Ohshima and Gotoh (1987) listing the percentage of bases that occur 9 bp on either side of 220 vertebrate introns. Consensus sequences are written at the bottom of Table Ic and d. An asterisk after a base indicates that it tends to be excluded.
intron shift by, for example, 1 bp requires the simultaneous and compensatory movement of both splice sites by the same amount. This is considered to be a highly improbable event (Rogers, 1986). In support of this, Patthy (1987) reports that intron shifts are restricted to those that move a single splice site by an amount that does not alter the reading frame. Naturally, these result in the addition or deletion of peptides. Because tubulin and actin genes are highly conserved it follows that intron shifts would almost certainly be deleterious.
2018
The fact that most tubulin and actin intron distributions are not anomalous indicates that intron shifts are restricted. Intron shifts would be expected to increase the number of anomalous intron distributions. The actin and tubulin introns appear to have been gained during eukaryotic evolution, as judged by their distribution within gene families. Some introns, such as those of the triosephosphate isomerases (Gilbert et al., 1986) may be older than this because they are probably common to nearly all eukaryotic species. It is possible that highly conserved
Intron gain at proto-splice sites
Table II. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
2-0 4-1 16-0 19-1 33-0 41-0 62-0 76-1 90-0 90-1 126-0 177-0 211-1 257-0 327-0 353-0 412-1 5-0 9-0 13-0 17-0 20-0 21-2 35-2 56-0 58-1 59-1 95-1 134-1 208-0 319-2 351-1 407-1 437-0 448-2 3-1 4-1 15-1 15-2 21-0 27-1 34-1 44-0 44-2 65-1 87-0 95-1 108-0 118-2 124-0 152-1 206-1 216-0
269-1 301-0 309-1 322-0 330-0 356-0 366-0
-9 N T G T N AT G
-8 N G G N T CG G N T CT G C N A T G N G A G A G AG N TA GA CT CT CT A GA CA TC A G N G CT G T
-7 -6 AC A CA G TC GA G G N G N G N A C G T T CT C T C A C CGA GT N C C A T G GAT G N C G N T T G G CA TG G C CT N G N C A C C N G G TC AC N CA C N T CT A A G C C N CG T GA T GC CT T G CT A N C CT C A A C T GC C GA T N A GA N A T T TG N TG G A CT A C AC G TG C N G G C CT G T CA G N C N C A CT GC C A A A T G G T C T A GA TC N C GA GT C CT A T G A G N A C CT N A C G A CT A N TG T C AG N N G T G T AT A C N G A A G A T T C C T A CT G
-5 N N T CT A A A N A CT A C CT T C T N A T A A C T G A CT N N N A N CT C A GT G N C A GT N CG A CA GA A CT T A C C C T CT T CT C T A A
-4 -3 -2 N A T G A GA CT C A A A TC CT N C N A A GA C A AG T N CT C C C C N G N T N C A T A TC C C A C A A C A A G A GA GA A T N C A GA T G GA AG T TC AG AC T CT T G N AG CT G A N CG N CT GC TC T TA CT C A GA CT G A T N T A A TC A CT T C C A N N TG TG N N G A N A A TC A TC G N A A G A CT T T C CT CA AC AC N G AC G AT GA A A A A TC N CT T A C CA N C A A C N N A C N A A N N C N CT GA
T CA C CT A C TC G
N T N T A A A
-1 G N GA G N G CT G N G GC GA G GA GA G G CT GA CT CT N G N GA CT N G G
GA T GA G GA A
G G G G GA G C N G G
GA G N G GA G G GA G N G N G GA GA
CA GT AG C N A G A G A G G A
+2 G N T C C C T T A G C T CT C
N G A A A N A G C G G T G G A A N
GAT N A T G N A T C C T T CT T G G C GA CTT N CAG C N G AG G G A GAA A G T CT AGT G T TC A T CT G T TCG A GA CAG A GAA A GAC N A T G G A C N G G N G N A T TC G C C N C C N N C T A T TC G T G T G G A CG TCG G N C
+3 +4 +5 +6 +7 +8 +9 N G A GA N GT N AG T CT T C N AG N G G CT A A TC T G CT G G G CTG G N AGN N C AGT N G G N GAC C N CAG N G A GA GAT N C G C A GA CAT CG C A GA CAT CGA N G A CT CAA N N T C N A C N A T CT T G CT CA A N A A CT CT T G G A TC G T N A A CT N TG N G G N A T CT G N A T G G A G G G T N C A CT N T N GA CG CTG G N C A GA G G CT A A C C A GA G G N G C TC AG CAN T T CTT G G GCA GA G CGA GA GA TCN ACT N T A CTN AT TCG G G C CTN CGN CT GC TC GC TCN GAN AC AGN T GAN AC AGN T A CTN G N C A GA TA CG CTG G N T T C C A GAN G C N CT T N T A CT C CAG N G G CT CA GA T N ACA GAN N N G G T A CT A C N G GC G A N G C CTN CGN N G N N G A GT G AG A N G A N GA ATN G A N GA ATN G N N GA G N T C N G G N AC N T C N G G N ACT G C N G G N T T CT A TC G C CT C C N CA C N T C N A T CT G G G N GAT N A T G +1
G N A T C A T N N
T A CT G N T TC N G T GC GA TG G TC TC G GC N C N T C C
N
Sixty consensus sequences that were made from the coding regions 9 bp on either side of the indicated intron positions. See Table I for details. Sequences 1 -17 are from ca-tubulins, 18-35 are from 3-tubulins, 36-60 are from actins. Each consensus sequence is aligned about the position of an intron. Introns occur in only some of the sequences from which each consensus sequence was derived. The consensus sequence is unaltered by the exclusion of sequences that actually flank introns. The intron sequence, which is not shown, occurs between bases -1 and + 1. The ax- and ,tubulin gene consensus sequences were compiled from 15 and 13 sequences respectively. The consensus sequences for the actin regions were compiled from 18 actin gene sequences. Bases that are highly conserved across the entire spectrum of genes in each family are indicated by their single-letter code. Often two bases are highly conserved as shown. Where there is less conservation an N is written. This happens most often at the third base of codons.
2019
N.J.Dibb and A.J.Newman
oX- tubulins +
Man Monkey Rat
+ +
*rChicken PFruit-fly
+ -
+
Physarum S.pombe Chlamydomonas -
ancestral tubulin gene
-
:3-tubulins x- tubulins
-Man
+
introns correspond to junctions between structural or functional domains. Our results raise the possibility that the conserved sequence that flanks introns in general (Mount, 1982) has been used as a site of intron gain during evolution. Cavalier-Smith's (1985) proposal for the origin of introns
is compatible with this idea. In this model, proto-splice sites might have acted as integration sites for self-splicing introns that were inserted during evolution. However, our results may also support other models for the origin of introns. Sharp's (1985) proposal that introns could be spread by the splicing machinery is also consistent with our results. It seems possible that a proto-splice site could be cleaved by the same mechanism responsible for 5' splice site cleavage (Aebi et al., 1987; Weber and Aebi, 1988). An intron from elsewhere could then be ligated at the site of cleavage and incorporated into the genome as previously described (Sharp, 1985).
+
Acknowledgements We wish to thank C.Chothia, J.Fermi, J.Girdlestone, A.Lesk, M.Perutz, G.Pieczenik, J.Rogers, R.Staden, A.Travers, R.Treisman and D.Zimmern for their help. This work was supported by the Medical Research Council.
References
ancestral tubulin gene
3-tubulins Fig. 3. Two different origins of the a-tubulin intron 126-0. The - signs indicate that this intron is absent from the ,B-tubulins and most of the ca-tubulin genes except for those of the vertebrates which have a + sign. According to an intron gain model, shown in the upper panel, intron 126-0 (represented by a triangle) was gained by an ancestor of the vertebrate as-tubulin genes after the divergence of the vertebrate and fruit-fly lineages. According to an intron loss model, shown in the lower panel, intron 126-0 was originally present in a common ancestral gene of the a- and 3-tubulins. It was then lost from all lineages except for the vertebrate a-tubulins. As illustrated, this requires intron loss from all lineages that do not have the intron.
introns have always been an integral part of the gene structure (Darnell and Doolittle, 1986). Alternatively, these introns could have been gained prior to gene divergence.
Conclusions Our analysis of the tubulin and actin gene families has revealed two independent but related results. One, if tubulin and actin introns were gained during evolution they must have been gained, on average, between the G and R of the consensus sequence C/AAGR that flanks most introns. We call this consensus sequence a proto-splice site. Two, the distribution of introns within these gene families indicates that they were indeed gained during eukaryotic evolution. We therefore conclude that introns were gained at proto-splice sites. This means that actin and tubulin introns are not as ancient as the coding sequence that flanks them. Consequently, these introns could not have been involved in the primary evolution of the tubulin and actin genes. It will be intriguing to discover if any of the tubulin or actin
2020
Aebi,M., Hornig,H. and Weismann,C. (1987) Cell, 50, 237-246. Barahona,I., Soares,H., Cyrne,L., Penque,D., Denoulet,P. and Rodrigues-Pousada,C. (1988) J. Mol. Biol., 202, 365-382. Bergsma,D.J., Chang,K.S. and Schwartz,R.J. (1985) Mol. Cell. Biol., 5, 1151-1162. Cavalier-Smith,T. (1985) Nature, 315, 283-284. Chang,K.S., Rothblum,K.N. and Schwartz,R.J. (1985) Nucleic Acids Res., 13, 1223-1237. Cooper,A.D. and Crain,W.R. (1982) Nucleic Acids Res., 10, 4081 -4092. Crain,W.R., Boshar,M.F., Cooper,A.D., Durica,D.S., Nagy,A. and Steffen,D. (1987) J. Mol. Evol., 25, 37-45. Darnell,J.E. and Doolittle,W.F. (1986) Proc. Natl. Acad. Sci. USA, 83, 1271-1275. Dobner,P.R., Kislauskis,E., Wentworth,B.M. and Vila-Komaroff,L. (1987) Nucleic Acids Res., 15, 199-218. Fidel,S., Doonan,J.H. and Morris,N.R. (1988) Gene, 70, 283-293. Files,J.G., Carr,S. and Hirsch,D. (1983) J. Mol. Biol., 164, 355-375. Fornwald,J.A., Kuncio,G., Peng,I. and Ordahl,C.P. (1982) Nucleic Acids Res., 10, 3861-3876. Fryberg,E.A., Bond,B.J., Hershey,N.D., Mixter,K.S. and Davidson,N. (1981) Cell, 24, 107-116. Gallwitz,D. and Sures,I. (1980) Proc. Natl. Acad. Sci. USA, 77, 2546-2550. Gilbert,W. (1978) Nature, 271, 501. Gilbert,W., Marchionni,M. and McKnight,G. (1986) Cell, 46, 151 -154. Gonzalez-y-Merchand,J.A. and Cox,R.A. (1988) J. Mol. Biol., 202, 161-168. Hall,J.L. and Cowan,N.J. (1985) Nucleic Acids Res., 13, 207-223. Harper,J.F. and Mages,W. (1988) Mol. Gen. Genet., 213, 315-324. Hiraoka,Y., Toda,T. and Yanagida,M. (1984) Cell, 39, 349-358. Hu,M.C.-T., Sharp,S.B. and Davidson,N. (1986) Mol. Cell. Biol., 6, 15-25. Kimmel,B.E., Samson,S., Wu,J., Hirschberg,R. and Yarbrough,L.R. (1985) Gene, 35, 237-248. Kost,T.A., Theodorakis,N. and Hughes,S.H. (1983) Nucleic Acids Res., 11, 8287-8301. Lee,M.G.-S., Lewis,S.A., Wilde,C.D. and Cowan,N.J. (1983) Cell, 33, 477-487. Lee,M.G.-S., Loomis,C. and Cowan,N.J. (1984) Nucleic Acids Res., 12, 5823-5836. Lemischka,I. and Sharp,P.A. (1982) Nature, 300, 330-335. Lewis,S.A., Gilmartin,M.E., Hall,J.L. and Cowan,N.J. (1985) J. Mol. Biol., 182, 11-20.
Intron gain at proto-splice sites
Mages,W., Salbaum,J.M., Harper,J.F. and Schmitt,R. (1988) Mol. Gen. Genet., 213, 449-458. May,G.S., Tsang,M.L.-S., Smith,H., Fidel,S. and Morris,N.R. (1987) Gene, 55, 231-243. Mohun,T.J., Garrett,N. and Gurdon,J.B. (1986) EMBO J., 5, 3185-3193. Monteiro,M.J. and Cox,R.A. (1987) J. Mol. Biol., 193, 427-438. Mounier,N. and Prudhomme,J.-C. (1986) Biochimie, 68, 1053-1061. Mount,S.M. (1982) Nucleic Acids Res., 10, 459-472. Nader,W.F., Isenberg,G. and Sauer,H.W. (1986) Gene, 48, 133-144. Nagel,S.D. and Boothroyd,J.C. (1988) Mol. Biochem. Parasitol., 29, 261 -273. Naim,C.J., Winesett,L. and Feri,R.J. (1988) Gene, 65, 247-257. Nakajima-Iijima,S., Hamada,H., Reddy,P. and Kakunaga,T. (1985) Proc. Natl. Acad. Sci. USA, 82, 6133-6137. Nellen,W. and Gallwitz,D. (1982) J. Mol. Biol., 159, 11 -18. Nudel,U., Zakut,R., Shani,M., Neuman,S., Levy,Z. and Yaffe,D. (1983) Nucleic Acids Res., 11, 1759-1771. Ohshima,Y. and Gotoh,Y. (1987) J. Mol. Biol., 195, 247-259. Orbach,M.J., Porro,E.B. and Yanofsky,C. (1986) Mol. Cell. Biol., 6, 2452-2461. Patthy,L. (1985) Cell, 41, 657-663. Patthy,L. (1987) FEBS Lett., 214, 1-7. Pratt,L.F. and Cleveland,D.W. (1988) EMBO J., 7, 931-940. Rogers,J. (1985) Nature, 315, 458-459. Rogers,J. (1986) Trends Genet., 2, 223. Shah,D.M., Hightower,R.C. and Meagher,R.B. (1982) Proc. Natl. Acad. Sci. USA, 79, 1022-1026. Sharp,P. (1985) Cell, 42, 397-400. Silflow,C.D., Chisholm,R.L., Conner,T.W. and Ranum,L.P.W. (1985) Mol. Cell. Biol., 5, 2389-2398. Smith,H.A., Allaudeen,H.S., Whitman,M.H., Koltin,Y. and Gorman,J.A. (1988) Gene, 63, 53-63. Sullivan,K.F. and Cleveland,D.W. (1984) J. Cell. Biol., 99, 1754-1760. Sullivan,K.J., Havercroft,J.C., Machlin,P.S. and Cleveland,D.W. (1986) Mol. Cell. Biol., 6, 4409 - 4418. Theurkauf,W.E., Baum,H., Bo,J. and Wensink,P.C. (1986) Proc. Natl. Acad. Sci. USA, 83, 8477-8481. Toda,T., Adachi,Y., Hiraoka,Y. and Yanagida,M. (1984) Cell, 37, 233-242. Ueyama,H., Hamada,H., Battula,N. and Kakunaga,T. (1984) Mol. Cell. Biol., 4, 1073-1078. Weber,S. and Aebi,M. (1988) Nucleic Acids Res., 16, 471-486. Wildeman,A.G. (1988) Nucleic Acids Res., 16, 2553-2564. Wilson,P.W., Rogers,J., Harding,M., Pohl,V., Pattyn,G. and Lawson, D.E.M. (1988) J. Mol. Biol., 200, 615-625. Youngblom,J., Schloss,J.A. and Silflow,C.D. (1984) Mol. Cell. Biol., 4, 2686-2696. Zakut,R., Shani,M., Givol,D., Neuman,S., Yaffe,D. and Nudel,U. (1982) Nature, 298, 857-859.
Received February 13, 1989; revised March 29, 1989
2021