Alternatively and Constitutively Spliced Exons Are ...

3 downloads 0 Views 219KB Size Report
nal might reside in silent sites (Fairbrother et al. 2004a,. 2004b; Wang et al. .... Mol. Biol. Evol. 2:150–174. Linding, R., R. B. Russell, V. Neduva, and T. J. Gibson.
Alternatively and Constitutively Spliced Exons Are Subject to Different Evolutionary Forces Feng-Chi Chen,* Sheng-Shun Wang,  Chuang-Jong Chen,* Wen-Hsiung Li,*à and Trees-Juen Chuang* *Genomics Research Center and  Institute of Information Science, Academia Sinica, Taipei, Taiwan; and àDepartment of Ecology and Evolution, University of Chicago There has been a controversy on whether alternatively spliced exons (ASEs) evolve faster than constitutively spliced exons (CSEs). Although it has been noted that ASEs are subject to weaker selective constraints than CSEs, so they evolve faster, there have also been studies that indicated slower evolution in ASEs than in CSEs. In this study, we retrieve more than 5,000 human-mouse orthologous exons and calculate the synonymous (KS) and nonsynonymous (KA) substitution rates in these exons. Our results show that ASEs have higher KA values and higher KA/KS ratios than CSEs, indicating faster amino acid–level evolution in ASEs. The faster evolution may be in part due to weaker selective constraints. It is also possible that the faster rate is in part due to faster functional evolution in ASEs. On the other hand, the majority of ASEs have lower KS values than CSEs. With reference to the substitution rate in introns, we show that the KS values in ASEs are close to the neutral substitution rate, whereas the synonymous substitution rate in CSEs has likely been accelerated. The elevated synonymous rate in CSEs is not related to CpG dinucleotides or low-complexity regions of protein but may be weakly related to codon usage bias. The overall trends of higher KA and lower KS in ASEs than in CSEs are also observed in humanrat and mouse-rat comparisons. Therefore, our observations hold for mammals of different molecular clocks.

In complex organisms such as mammals, the frequency of alternative splicing (AS) is high. Genomic studies have suggested that as high as 40–60% of human genes undergo AS (Mironov, Fickett, and Gelfand 1999; Kan et al. 2001; Modrek et al. 2001; Kan, States, and Gish 2002). AS has been shown to be associated with nonsense-mediated decay (Wachtel et al. 2004; Stamm et al. 2005), programmed cell death (Wu, Tang, and Havlioglu 2003), and many other important biological processes. Some AS events were shown to be highly related to human diseases (Orban and Olah 2003; Sazani and Kole 2003; Garcia-Blanco, Baraniak, and Lasda 2004; Rossi 2004; Venables 2004). Therefore, AS has become an important topic in a variety of fields such as oncology, molecular medicine, and developmental biology. In evolution, AS is thought to be one of the major mechanisms of increasing transcriptome complexity (Hanke et al. 1999; Modrek and Lee 2002). It allows the generation of different transcript/protein isoforms from the same genes, thus increasing functional diversity of proteome without increasing the number of genes. Several studies have suggested that AS plays a major role in genome evolution because of relatively weaker negative selection pressure (Kan, States, and Gish 2002; Boue, Letunic, and Bork 2003; Modrek and Lee 2003; Xing and Lee 2004). Because new gene functions may arise from insertion or deletion of an exon, it was suggested that alternatively spliced exons (ASEs) can accelerate gene evolution. This hypothesis has been supported by recent studies. Examples include the following: (1) a significant proportion of ASEs is species specific and not conserved between human and rodents in contrast with high conserKey words: selective constraint, alternatively spliced exons, constitutively spliced exons, synonymous/nonsynonymous substitution, comparative genomics. E-mail: [email protected]. Mol. Biol. Evol. 23(3):675–682. 2006 doi:10.1093/molbev/msj081 Advance Access publication December 20, 2005 Ó The Author 2005. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: [email protected]

vation of orthologous gene structures between these species (Nurtdinov et al. 2003); (2) AS is associated with exon gain or loss events, implying faster evolution in ASEs than in constitutively spliced exons (CSEs) (Modrek and Lee 2003); (3) ASEs tend to insert or delete complete protein domains more frequently than expected by chance, leading to increase in functional diversity (Kriventseva et al. 2003); and (4) ASEs tend to have higher KA/KS (nonsynonymous to synonymous substitution rate) ratios than CSEs as revealed from comparison of orthologous exons between human and other species (Iida and Akashi 2000; Hurst and Pal 2001; Filip and Mundy 2004; Xing and Lee 2005a, 2005b). However, it has also been suggested that ASEs evolve at a slower pace than CSEs. Indeed, ASEs have been found to be better conserved than CSEs (Sorek and Ast 2003; Sorek et al. 2004; Sugnet et al. 2004), to be under stronger selection to conserve reading frame (Resch et al. 2004), and to have fewer single-nucleotide polymorphisms (Yeo et al. 2005). It was also observed that the introns flanking ASEs are better conserved than introns flanking CSEs (Sorek and Ast 2003; Philipps, Park, and Graveley 2004), implying that ASEs are under stronger selection pressure than CSEs. In addition, Cusack and Wolfe (2005) suggested that genes undergoing AS by exon skipping were more constrained than the genome average. In view of these conflicting conclusions, we conducted an extensive analysis to compare evolutionary rates in ASEs and CSEs and to explore possible selection forces that underlie ASE evolution.

Materials and Methods Retrieval of CSEs and ASEs Human CSEs and ASEs were available from the online database ASAP (the Alternative Splicing Annotation Project [Lee et al. 2003]) at http://www.bioinfo.mbi. ucla.edu/ASAP/. By mapping the ASAP-provided homolog table to the ASAP genomic data set or the corresponding mouse UniGene expressed sequence tag (EST) sequences (ftp://ftp.ncbi.nih.gov/repository/UniGene/, March 2005),

Downloaded from http://mbe.oxfordjournals.org/ by guest on June 1, 2013

Introduction

676 Chen et al.

ASAP database

Human CSEs

Human majorform ASEs

NCBI mouse Blastn Unigene database alignment

Human nonmajor-form ASEs Blastn alignment

Orthologous humanmouse exon pairs

Reading frame detection

NCBI RefSeq Blastx Protein database alignment

FIG. 1.—The procedure of extracting human exonic sequences and their mouse counterparts.

4,630 human CSEs and their orthologous mouse exonic sequences were extracted (Fig. 1). Because ASEs that have been conserved between human and mouse were not available from ASAP, we BlastN aligned the human ASE plus two flanking exons against the whole mouse UniGene EST database to identify the orthologous mouse counterparts of human ASEs (Fig. 1). Mouse exons that had a 70% sequence identity to the full lengths of human exon queries were extracted. A total of 788 human ASEs, including 512 major-form exons, 21 minorform exons, and 255 undetermined-form exons, were paired with their mouse orthologs. The classification of majorform (included in at least two-thirds of the EST counts), minor-form (skipped in at least two-thirds of the EST counts), and undetermined-form (in the intermediate case or 5 ESTs in total) exons was retrieved from ASAP. The detailed definitions of major-, minor-, and undeterminedform exons were described in Modrek and Lee (2003). Because only a small number of minor-form exons were available, we grouped minor-form exons with undeterminedform exons to form nonmajor-form (NM) exons. In addition, orthologous human-rat and mouse-rat exon pairs, including CSEs and ASEs, were identified based on BlastN alignments between human/mouse exons and the rat UniGene database. The sequences of exons analyzed in this study are available at http://www.sinica.edu.tw/ ;trees/ASE_CSE/ASE_CSE.htm. Retrieval of Pure (Constitutive) Introns Human-mouse orthologous introns that did not overlap with any known human transcripts (the ‘‘pure’’ or ‘‘constitutive’’ introns) were extracted from the University of California, Santa Cruz (UCSC) Genome Browser at http:// hgdownload.cse.ucsc.edu/goldenPath/hg17/multiz8way/. Sequences with lower than 70% identity were excluded. The

human intronic sequences in the remaining human-mouse sequence alignments were then matched to the Human Gene Index (HGI) Release 15 from the TIGR database (The Institute of Genome Research; http://www.tigr.org/tdb/ tgi) using the CRASA program (Chuang et al. 2003). Introns that were matched to HGI entries were excluded because they might, in fact, be ASEs. The flanking exons of these pure introns were further examined to reconfirm that they were conserved between human and mouse. By doing so, we could be confident that the introns retrieved were true orthologous introns between human and mouse. Computation of Divergence at Fourfold Degenerate Sites We extracted fourfold degenerate sites from human CSEs, major-form exons, NM exons, and their mouse counterparts. To exclude the effect of CpG dinucleotides, only sites that were neither preceded by a ‘‘C’’ nor followed by a ‘‘G’’ (‘‘non-CpG–prone sites’’) were considered. The extracted sites of the three exon types from human and mouse were submitted to the BASEML program of the PAML package (Yang 1997; Yang and Nielsen 2000) for calculation of genetic distance. Computation of KA, KS, KA/KS, Ke, and Ki Values For the KA/KS ratio analysis of orthologous exon pairs, two procedures were performed: (1) detecting reading frames of human protein-coding exons by BlastX aligning these exons against the corresponding RefSeq protein sequences (ftp://ftp.ncbi.nih.gov/genomes/H_sapiens/protein/) and (2) calculating KA, KS, and KA/KS values using the YN00 program of the PAML package (also see Fig. 1). The substitution rates of human-mouse orthologous exons (Ke values of ASEs and CSEs) and pure introns (Ki value) were measured using the TN93 model implemented in the BASEML program of the PAML package.

Downloaded from http://mbe.oxfordjournals.org/ by guest on June 1, 2013

Calculation of evolutionary rates including Ke , KA, KS , and KA/KS values using the PAML package

Different Evolutionary Forces Between ASE and CSE 677

Table 1 Basic Properties and Evolutionary Features (Ke, KA, KS, and KA/KS values) of the Retrieved Human-Mouse Orthologous Exons ASEs CSEs Number of exons analyzed Median length (mean) (bp) Average sequence identity (%) GC content (%) Average CpG, TpG, and CpA dinucleotides content (%) GC content at fourfold degenerate sites (%) Substitution rate at non-CpG–prone fourfold degenerate sites Effective number of codons Percentage of exon length divisible by 3 (%) Average transversion/transition ratio Median Ke value (mean) Median KA value (mean) Median KS value (meana) Median KA/KS ratio (meana)

4,630 115 (124) 88.35 51.09

Major 512 111 (117) 89.07 51.09

Nonmajor 276 110 (119) 89.22 50.41

19.18 57.54

18.97 55.56

18.64 54.70

0.394 53.7 37.75 0.475 0.130 0.026 0.588 0.037

0.344 54.6 42.19 0.518 0.122 0.033 0.468 0.069

0.314 54.8 48.91 0.564 0.118 0.043 0.447 0.098

(0.132) (0.040) (0.873) (0.091)

(0.123) (0.046) (0.632) (0.144)

(0.120) (0.062) (0.605) (0.181)

Prediction of Protein Domains and Function We detected protein domains using the InterProScan package and the INTERPRO resource (Mulder et al. 2005; Quevillon et al. 2005) (downloaded from http://www.ebi. ac.uk/InterProScan/index.html and ftp://ftp.ebi.ac.uk/pub/ databases/interpro/iprscan/, respectively). The globular domains were annotated by the GlobPlot package (Linding et al. 2003) downloaded from http://globplot.embl.de/. Low-complexity protein domains were identified using the BlastP program downloaded from ftp://ftp.ncbi.nlm.nih. gov/blast/executables/release/2.2.13/. Exonic sequences were BlastN aligned against the RefSeq database (http:// www.ncbi.nlm.nih.gov/RefSeq/), and the matched RefSeq entries, together with the included gene ontology annotations, were retrieved. Results Basic Features of Human-Mouse Orthologous Exons Table 1 shows the basic features of the exons retrieved in this study. It is apparent that lengths, average sequence identities, and GC contents differ only slightly from each other among the three exon types. The same comment applies to the percentages of CpG dinucleotides and their derivatives (TpG and CpA). However, for GC contents at fourfold degenerate sites and substitution rates at nonCpG–prone fourfold degenerate sites, CSEs have the highest values, while NM exons have the lowest, but the reverse trend is observed for effective number of codons. Meanwhile, the percentage of exons with length divisible by three is the highest in NM exons and the lowest in CSEs. Similarly, NM exons have the highest average transversion to transition ratio, followed by major-form exons and then by CSEs. Because transversions are mostly nonsynonymous changes (while transitions tend to be synonymous changes), different transversion/transition ratios in ASEs and CSEs

may be associated with different KA and KS values in these exons. Indeed, our results show that the largest average and median KA values occur in NM exons, followed by majorform exons, and then by CSEs, whereas the reverse is true for the KS values (Fig. 2A and B and Table 1). Interestingly, despite the fact that the median (average) KS value is lower in ASEs than in CSEs, the median (average) KA is higher in ASEs. Note that the differences of Ke, KS, and KA/KS between CSEs and ASEs are all significant, while the differences between major-form and NM exons are not (Table 2). It is possible that selection of data sets may affect our results. For example, if the ASEs retrieved happen to be biased toward fast-evolving exons, it is then not surprising to observe higher KA in ASEs than in CSEs. To address this possibility, we retrieved human-rat orthologous ASEs and CSEs for the same evolutionary analysis. As shown in Table 3, substitution rates derived from human-rat orthologous exons have similar trends as observed in the human-mouse comparison. That is, ASEs have higher median KA values and higher KA/KS ratios but lower median KS values than CSEs. The probability of observing the same trends from two biased data sets appears to be small. Therefore, our results are very likely unbiased. Furthermore, because rodents have a faster molecular clock than primates (Li 1997; Nekrutenko, Chung, and Li 2003), comparison of human-mouse (or human-rat) orthologous exons might yield tendencies that would not hold in comparisons of species with similar molecular clocks. However, an analysis using orthologous exons from mouse and rat, which have similar molecular clocks, shows the same tendencies as above. Therefore, the trends of higher KA and lower KS values in ASEs than in CSEs are not affected by species selection or different molecular clocks. Our estimated median KA/KS ratio between mouse and rat is lower than that reported in the rat genome analysis (Gibbs et al. 2004). Note that mouse-rat homologous genes were used to derive KA/KS ratios in the rat genome analysis, whereas

Downloaded from http://mbe.oxfordjournals.org/ by guest on June 1, 2013

a Because large KS values and KA/KS ratios may be methodologically suspect, KS values .10 and KA/KS ratios .10 were not considered (Hillier et al. 2004) when we calculated their average values.

678 Chen et al.

Table 2 Significant Tests on Ke, KA, KS, and KA/KS Values of Human-Mouse Orthologous Exons by Using a Two-Tailed Wilcoxon Rank Sum Test CSEs versus Major-Form ASEs Ke KA KS KA/KS

P P P P

, , , ,

0.002 0.006 1 3 10ÿ5 1 3 10ÿ6

CSEs versus NM ASEs P P P P

, , , ,

Major-form Versus NM ASEs P P P P

0.002 0.002 1 3 10ÿ6 1 3 10ÿ6

. . . .

0.1 0.1 0.05 0.05

Substitutions at Synonymous Sites There are two possible explanations for a lower synonymous rate in ASEs than in CSEs: CSEs and ASEs have different mutation rates or they are under different selection pressures. The first explanation has the intriguing implication that the mutation rate varies among regions of a gene. However, this scenario requires a mechanism to distinguish CSEs from ASEs for different mutation rates to occur. Because the GC content, length, and Homo-Mus sequence identity of the two exon types are highly similar (Table 1) and because ASEs and CSEs are located in the same genomic region, it is likely that ASEs and CSEs mutate at similar rates. Therefore, we suggest that different selection pressures have operated on ASEs and CSEs, leading to different Ks values in the two exon types. Under this scenario, the synonymous rate is either reduced in ASEs or elevated in CSEs. To determine which of the two possibilities is more probable, we retrieved ;110,000 human-mouse orthologous introns that do not include any ASE or EST match (i.e., pure or constitutive introns) from the UCSC Genome Browser and calculated the substitution rates (see Materials FIG. 2.—The cumulative distributions of human-mouse orthologous CSEs, ASEs (major-form exons), and ASEs (NM exons) are plotted against (A) the KA values, (B) the KS values, and (C) the KA/KS ratios.

only well-annotated exons are used in this study. That is, only exons that are well defined to be alternatively or constitutively spliced are included in this study. Inclusion of less well-annotated exons, predicted genes, or fast-evolving genes may result in elevated estimates of KA/KS ratios. Notwithstanding the difference in KA/KS ratio estimates between different studies, the overall trend of higher KA/KS ratios in ASEs than in CSEs holds well in human-rodent and mouse-rat comparisons, in agreement with previous studies (Xing and Lee 2005a). Our estimates of KA values are smaller than those observed in previous studies (Waterston et al. 2002). The

Table 3 Evolutionary Features of Human-Rat and Mouse-Rat Orthologous Exons ASEs CSEs

Major

Nonmajor

Human-rat Number of exons analyzed Median KA value Median KS value Median KA/KS ratio

622 0.021 0.623 0.030

117 0.034 0.486 0.061

68 0.039 0.437 0.082

Mouse-rat Number of exons analyzed Median KA value Median KS value Median KA/KS ratio

3,774 0.007 0.192 0.016

382 0.014 0.181 0.070

176 0.019 0.179 0.092

Downloaded from http://mbe.oxfordjournals.org/ by guest on June 1, 2013

reason is that a large portion (.60%) of exons (including CSEs and ASEs) analyzed in this study encode for protein domains (data not shown). Regions with domains tend to have lower KA values than regions not containing domains (Waterston et al. 2002). It is emphasized that the exons analyzed in this study are well-annotated ASEs and CSEs. We recognize that a large number of exons are not included in this study because they are not well annotated and cannot be reliably classified into ASEs and CSEs.

Different Evolutionary Forces Between ASE and CSE 679

KA and KA/KS Values Are Negatively Related to Inclusion Level of Exons To probe possible selective forces on ASEs and CSEs, we analyzed the KA values and KA/KS ratios in these exons. Because ASEs have higher KA/KS ratios than CSEs (Fig. 2C and Table 1), they are either subject to positive selection more frequently than CSEs or more relaxed from negative selection. Because most ASEs were exons whose functions were still under development (Modrek and Lee 2003), ASEs may change fast at the amino acid level to gain new protein functions. Therefore, nonsynonymous changes may be selected for in ASEs, leading to higher KA values and KA/KS ratios. Such differences in nonsynonymous substitution rates between ASEs and CSEs tend to increase as the inclusion level of ASEs decreases. This observation implies that the inclusion level of ASEs is associated with the strength of selection pressure. Our results are consistent with those previously reported (Xing and Lee 2005a). Many ASEs Are Under Strong Selection Pressure Despite the tendency of fast evolution in ASEs, as high as ;30% of ASEs have a small KA (,0.02), and the distribution curves of the three exon types are barely distinguishable for the 30% of exons (Fig. 2A). A similar trend is also observed in the cumulative distributions of KA/KS ratios (Fig. 2C), with the three curves diverging from each other at ;35% cumulative proportion (or KA/KS 5 0.05). These observations indicate that a significant fraction of ASEs have very low rates of evolutionary changes and are under strong negative selection. It is likely that these ASEs may be alternative conserved exons (i.e., ASEs that are conserved between the compared species [Yeo et al. 2005]) and may have important biological functions so that nonsynonymous base substitutions are repressed in these exons. Note that the exons analyzed in this study are conserved between humans and mice. Therefore, these exons may be more conserved than newly gained (or species specific) ASEs (Cusack and Wolfe 2005). Newly added exons

may be under positive selection (or relaxed negative selection) and develop new functions. Overall, our results reveal that more than 60% of the three exon types have different KA values and KA/KS ratios (Fig. 2A and C), indicating that the majority of CSEs, major-form exons and NM exons are under different selection pressures, which may have been caused by changes of protein-level selection pressure, translational selections, and selections at silent sites (Iida and Akashi 2000; Hurst and Pal 2001). Discussion Our results indicate that ASEs have lower synonymous rates but higher nonsynonymous rates than CSEs. The elevated KA/KS ratio in ASEs implies that ASEs may have contributed more to protein diversity than CSEs during mammalian evolution. Our results also show that the synonymous substitution rates in ASEs are close to those of pure introns. We therefore suggest that the synonymous rates in ASEs are close to neutral rates. Although introns are known to contain sequences that are subject to functional constraint (Nobrega et al. 2003, 2004; Rastegar et al. 2004; Ovcharenko et al. 2005), it has been estimated that intronic sequences that are under selection pressure occupy only a small fraction of introns (Keightley and Gaffney 2003). Moreover, previous studies indicated that introns were, in general, subject to less selection pressure than exons (Li 1997). In other words, evolutionary rates of introns are closer to neutral than those of exons. Therefore, it should be reasonable to infer that synonymous rates in ASEs are close to neutral rates, while those in CSEs are accelerated. Furthermore, with a large number of pure introns (;110,000) analyzed, our estimates of substitution rates of human-mouse orthologous pure introns, which are similar to the results from previous studies (Castresana 2002), should be representative. Our conclusion is in agreement with the previous observation of a higher synonymous substitution rate in coding exons than the substitution rate in introns (Moriyama and Powell 1996; Cargill et al. 1999; Halushka et al. 1999; Zwick, Cutler, and Chakravarti 2000; Subramanian and Kumar 2003). Because ASEs have a dual role of exon and intron, it is expected that the synonymous substitution rate of ASEs falls in between the substitution rates of CSEs and introns. Moreover, it has been suggested that introns flanking ASEs are more conserved than those flanking CSEs (Sorek and Ast 2003; Sorek et al. 2004; Sugnet et al. 2004). Although we suggest that the synonymous substitution rate in CSEs is accelerated, there has not been direct evidence to relate the KS values in CSEs and the nucleotide substitution rates in introns flanking CSEs. Because exons and introns are under different selection pressures and have different mechanisms of evolution, it is likely that CSEs and their flanking introns evolve at different paces. On the other hand, previous studies indicated that synonymous changes were not neutral (Hurst and Pal 2001; Pagani, Raponi, and Baralle 2005) and that regulatory signal might reside in silent sites (Fairbrother et al. 2004a, 2004b; Wang et al. 2004). These observations may not hold for ASEs because it is possible that exonic splicing enhancers are more abundant in CSEs than in ASEs. Although

Downloaded from http://mbe.oxfordjournals.org/ by guest on June 1, 2013

and Methods). The median rate (0.441) of these pure introns is 33.6% lower than the median KS of CSEs (0.588) but very close to that of ASEs (0.468 for major-form and 0.447 for NM exons; Table 1). Thus, either both ASE synonymous sites and introns are under stronger negative selection than CSE synonymous sites or the latter evolve faster than the neutral rate. The former scenario is less probable because ASEs are less frequently translated and so are likely to be subject to weaker functional constraints than CSEs and because most parts of pure introns are likely subject to little functional constraints. Therefore, it seems that substitutions at ASE synonymous sites are approximately neutral, while CSE synonymous sites have an elevated substitution rate. This observation also implies that ASEs may have originated from introns, as suggested in previous studies (Gilbert 1978; Kondrashov and Koonin 2003). Because KA is lower in CSEs than in ASEs, the higher KS in CSEs cannot be due to its positive correlation with KA, an observation that has been made by previous authors (Graur 1985; Li, Wu, and Luo 1985).

680 Chen et al.

Acknowledgments We thank the reviewers for valuable suggestions. This work is supported by the Genomics Research Center, Academia Sinica, Taiwan; the National Health Research Institutes (NHRI), Taiwan, under contract NHRI-EX949408PC; and the National Science Council (NSC), Taiwan, under contract NSC 93-2213-E-001-023. Literature Cited Baek, D., and P. Green. 2005. Sequence conservation, relative isoform frequencies, and nonsense-mediated decay in evolutionarily conserved alternative splicing. Proc. Natl. Acad. Sci. USA 102:12813–12818. Boue, S., I. Letunic, and P. Bork. 2003. Alternative splicing and evolution. Bioessays 25:1031–1034. Cargill, M., D. Altshuler, J. Ireland et al. (18 co-authors). 1999. Characterization of single-nucleotide polymorphisms in coding regions of human genes. Nat. Genet. 22:231–238. Castresana, J. 2002. Estimation of genetic distances from human and mouse introns. Genome Biol. 3:RESEARCH0028. Chuang, T. J., W. C. Lin, H. C. Lee, C. W. Wang, K. L. Hsiao, Z. H. Wang, D. Shieh, S. C. Lin, and L. Y. Ch’ang. 2003. A complexity reduction algorithm for analysis and annotation of large genomic sequences. Genome Res. 13:313–322. Cline, M. S., R. Shigeta, R. L. Wheeler, M. A. Siani-Rose, D. Kulp, and A. E. Loraine. 2004. The effects of alternative splicing on transmembrane proteins in the mouse genome. Pp. 17–28 in Altman R.B., A.K. Dunker, L. Hunter, T.A. Jung, and T.E. Klein, eds. Biocomputing 2004. Proceedings of the Pacific Symposium; 2004 Jan 6–10; Big Island, Hawaii. World Scientific, Hackensack, N.J. Cusack, B. P., and K. H. Wolfe. 2005. Changes in alternative splicing of human and mouse genes are accompanied by faster evolution of constitutive exons. Mol. Biol. Evol. 22: 2198–2208. Fairbrother, W. G., D. Holste, C. B. Burge, and P. A. Sharp. 2004a. Single nucleotide polymorphism-based validation of exonic splicing enhancers. PLoS Biol. 2:E268. Fairbrother, W. G., G. W. Yeo, R. Yeh, P. Goldstein, M. Mawson, P. A. Sharp, and C. B. Burge. 2004b. RESCUE-ESE identifies candidate exonic splicing enhancers in vertebrate exons. Nucleic Acids Res. 32:W187–W190. Filip, L. C., and N. I. Mundy. 2004. Rapid evolution by positive Darwinian selection in the extracellular domain of the abundant lymphocyte protein CD45 in primates. Mol. Biol. Evol. 21:1504–1511. Garcia-Blanco, M. A., A. P. Baraniak, and E. L. Lasda. 2004. Alternative splicing in disease and therapy. Nat. Biotechnol. 22:535–546. Gibbs, R. A., G. M. Weinstock, M. L. Metzker et al. (229 coauthors). 2004. Genome sequence of the Brown Norway rat yields insights into mammalian evolution. Nature 428: 493–521. Gilbert, W. 1978. Why genes in pieces? Nature 271:501. Graur, D. 1985. Amino acid composition and the evolutionary rates of protein-coding genes. J. Mol. Evol. 22:53–62. Halushka, M. K., J. B. Fan, K. Bentley, L. Hsie, N. Shen, A. Weder, R. Cooper, R. Lipshutz, and A. Chakravarti. 1999. Patterns of single-nucleotide polymorphisms in candidate genes for blood-pressure homeostasis. Nat. Genet. 22:239–247. Hanke, J., D. Brett, I. Zastrow, A. Aydin, S. Delbruck, G. Lehmann, F. Luft, J. Reich, and P. Bork. 1999. Alternative splicing of human genes: more the rule than the exception? Trends Genet. 15:389–390.

Downloaded from http://mbe.oxfordjournals.org/ by guest on June 1, 2013

a previous study suggested that ASEs had more potential regulatory sequences than CSEs (Itoh, Washio, and Tomita 2004), these sequences have not been experimentally validated. In addition, the exact contents of regulatory sequences in exons remain unclear, making it difficult to infer the influences of these sequences on overall evolutionary rates. At any rate, given the approximate neutrality of substitutions in pure introns, we may tentatively conclude that the synonymous substitution rates in ASEs are close to neutral rates. It has been suggested that synonymous substitution rates in ASEs are lower than those in CSEs due to selection for conserved AS regulatory signals (Baek and Green 2005; Xing and Lee 2005a). However, the higher nonsynonymous substitution rates in ASEs appear to be inconsistent with the hypothesis of regulatory signal conservation in ASEs. Because the numbers of nonsynonymous sites are, in general, larger than those of synonymous sites, it seems more probable that regulatory signals fall in nonsynonymous sites than in synonymous sites. Therefore, we suggest that the lower KS values in ASEs than in CSEs result from accelerated synonymous substitution rates in CSEs. It is not clear why the synonymous rate is elevated in CSEs. A previous study indicated that GC-ending codons are more abundant in CSEs than in ASEs (Iida and Akashi 2000), implying different codon usage patterns between these two exon types. We used the DnaSP program (J. Rozas and R. Rozas 1999) to estimate codon usage bias in humanmouse orthologous exons. Our results showed that CSEs have a slightly smaller effective number of codons and higher GC contents at fourfold degenerate sites than ASEs. Because the differences are not very large, codon usage bias may account for just part of the difference in KS values between ASEs and CSEs. Another possible cause of the substitution rate difference is different contents of highly mutable CpG dinucleotides. However, our results show that the contents of CpG dinucleotides are similar in CSEs and ASEs (Table 1). Moreover, our analysis on non-CpG–prone fourfold degenerate sites indicates that CSEs indeed have a higher substitution rate at these sites than ASEs (Table 1), supporting the view that the elevation in the CSE synonymous rate is not related to CpG dinucleotides. Because the ASEs analyzed in this study are conserved in the genomes of humans and mice, it is very likely that they were derived from the common ancestor of humans and rodents. In other words, these ASEs either were already alternatively spliced in the common ancestor or they changed from CSEs to ASEs after the human-rodent divergence. It is likely that these ASEs are subject to weaker functional constraint than CSEs because they are usually not involved in protein domains, as revealed in previous studies (Kriventseva et al. 2003; Xing, Xu, and Lee 2003; Cline et al. 2004; Yeo et al. 2005). Therefore, ASEs may be allowed to evolve and develop new functions without disrupting the original protein structures. From this viewpoint, positive selection may have played a significant role in the evolution of these ASEs. In summary, our study suggests that ASEs and CSEs are subject to different evolutionary forces. The elevated nonsynonymous substitution rate in ASEs may have contributed to functional divergence in mammalian evolution.

Different Evolutionary Forces Between ASE and CSE 681

patterns in the human and mouse genomes. Hum. Mol. Genet. 12:1313–1320. Orban, T. I., and E. Olah. 2003. Emerging roles of BRCA1 alternative splicing. Mol. Pathol. 56:191–197. Ovcharenko, I., G. G. Loots, M. A. Nobrega, R. C. Hardison, W. Miller, and L. Stubbs. 2005. Evolution and functional classification of vertebrate gene deserts. Genome Res. 15:137–145. Pagani, F., M. Raponi, and F. E. Baralle. 2005. Synonymous mutations in CFTR exon 12 affect splicing and are not neutral in evolution. Proc. Natl. Acad. Sci. USA 102:6368–6372. Philipps, D. L., J. W. Park, and B. R. Graveley. 2004. A computational and experimental approach toward a priori identification of alternatively spliced exons. RNA 10:1838–1844. Quevillon, E., V. Silventoinen, S. Pillai, N. Harte, N. Mulder, R. Apweiler, and R. Lopez. 2005. InterProScan: protein domains identifier. Nucleic Acids Res. 33:W116–W120. Rastegar, M., L. Kobrossy, E. N. Kovacs, I. Rambaldi, and M. Featherstone. 2004. Sequential histone modifications at Hoxd4 regulatory regions distinguish anterior from posterior embryonic compartments. Mol. Cell Biol. 24:8090–8103. Resch, A., Y. Xing, A. Alekseyenko, B. Modrek, and C. Lee. 2004. Evidence for a subpopulation of conserved alternative splicing events under selection pressure for protein reading frame preservation. Nucleic Acids Res. 32:1261–1269. Rossi, E. L. 2004. Stress-induced alternative gene splicing in mind-body medicine. Adv. Mind Body Med. 20:12–19. Rozas, J., and R. Rozas. 1999. DnaSP version 3: an integrated program for molecular population genetics and molecular evolution analysis. Bioinformatics 15:174–175. Sazani, P., and R. Kole. 2003. Therapeutic potential of antisense oligonucleotides as modulators of alternative splicing. J. Clin. Investig. 112:481–486. Sorek, R., and G. Ast. 2003. Intronic sequences flanking alternatively spliced exons are conserved between human and mouse. Genome Res. 13:1631–1637. Sorek, R., R. Shemesh, Y. Cohen, O. Basechess, G. Ast, and R. Shamir. 2004. A non-EST-based method for exon-skipping prediction. Genome Res. 14:1617–1623. Stamm, S., S. Ben-Ari, I. Rafalska, Y. Tang, Z. Zhang, D. Toiber, T. A. Thanaraj, and H. Soreq. 2005. Function of alternative splicing. Gene 344:1–20. Subramanian, S., and S. Kumar. 2003. Neutral substitutions occur at a faster rate in exons than in noncoding DNA in primate genomes. Genome Res. 13:838–844. Sugnet, C. W., W. J. Kent, M. Ares Jr., and D. Haussler. 2004. Transcriptome and genome conservation of alternative splicing events in humans and mice. Pp. 66–77 in Altman R.B., A.K. Dunker, L. Hunter, T.A. Jung, and T.E. Klein, eds. Biocomputing 2004. Proceedings of the Pacific Symposium; 2004 Jan 6–10; Big Island, Hawaii. World Scientific, Hackensack, N.J. Venables, J. P. 2004. Aberrant and alternative splicing in cancer. Cancer Res. 64:7647–7654. Wachtel, C., B. Li, J. Sperling, and R. Sperling. 2004. Stop codonmediated suppression of splicing is a novel nuclear scanning mechanism not affected by elements of protein synthesis and NMD. RNA 10:1740–1750. Wang, Z., M. E. Rolish, G. Yeo, V. Tung, M. Mawson, and C. B. Burge. 2004. Systematic identification and analysis of exonic splicing silencers. Cell 119:831–845. Waterston, R. H., K. Lindblad-Toh, E. Birney et al. (222 coauthors). 2002. Initial sequencing and comparative analysis of the mouse genome. Nature 420:520–562. Wu, J. Y., H. Tang, and N. Havlioglu. 2003. Alternative premRNA splicing and regulation of programmed cell death. Prog. Mol. Subcell. Biol. 31:153–185. Xing, Y., and C. Lee. 2005a. Evidence of functional selection pressure for alternative splicing events that accelerate evolution

Downloaded from http://mbe.oxfordjournals.org/ by guest on June 1, 2013

Hillier, L. W., W. Miller, E. Birney et al (175 co-authors). 2004. Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature 432:695–716. Hurst, L. D., and C. Pal. 2001. Evidence for purifying selection acting on silent sites in BRCA1. Trends Genet. 17:62–65. Iida, K., and H. Akashi. 2000. A test of translational selection at ÔsilentÕ sites in the human genome: base composition comparisons in alternatively spliced genes. Gene 261:93–105. Itoh, H., T. Washio, and M. Tomita. 2004. Computational comparative analyses of alternative splicing regulation using full-length cDNA of various eukaryotes. RNA 10:1005–1018. Kan, Z., E. C. Rouchka, W. R. Gish, and D. J. States. 2001. Gene structure prediction and alternative splicing analysis using genomically aligned ESTs. Genome Res. 11:889–900. Kan, Z., D. States, and W. Gish. 2002. Selecting for functional alternative splices in ESTs. Genome Res. 12:1837–1845. Keightley, P. D., and D. J. Gaffney. 2003. Functional constraints and frequency of deleterious mutations in noncoding DNA of rodents. Proc. Natl. Acad. Sci. USA 100:13402–13406. Kondrashov, F. A., and E. V. Koonin. 2003. Evolution of alternative splicing: deletions, insertions and origin of functional parts of proteins from intron sequences. Trends Genet. 19:115–119. Kriventseva, E. V., I. Koch, R. Apweiler, M. Vingron, P. Bork, M. S. Gelfand, and S. Sunyaev. 2003. Increase of functional diversity by alternative splicing. Trends Genet. 19:124–128. Lee, C., L. Atanelov, B. Modrek, and Y. Xing. 2003. ASAP: the alternative splicing annotation project. Nucleic Acids Res. 31:101–105. Li, W.-H. 1997. Molecular evolution. Sinauer Associates, Sunderland, Mass. Li, W. H., C. I. Wu, and C. C. Luo. 1985. A new method for estimating synonymous and nonsynonymous rates of nucleotide substitution considering the relative likelihood of nucleotide and codon changes. Mol. Biol. Evol. 2:150–174. Linding, R., R. B. Russell, V. Neduva, and T. J. Gibson. 2003. GlobPlot: exploring protein sequences for globularity and disorder. Nucleic Acids Res. 31:3701–3708. Mironov, A. A., J. W. Fickett, and M. S. Gelfand. 1999. Frequent alternative splicing of human genes. Genome Res. 9:1288– 1293. Modrek, B., and C. Lee. 2002. A genomic view of alternative splicing. Nat. Genet. 30:13–19. Modrek, B., and C. J. Lee. 2003. Alternative splicing in the human, mouse and rat genomes is associated with an increased frequency of exon creation and/or loss. Nat. Genet. 34: 177–180. Modrek, B., A. Resch, C. Grasso, and C. Lee. 2001. Genome-wide detection of alternative splicing in expressed sequences of human genes. Nucleic Acids Res. 29:2850–2859. Moriyama, E. N., and J. R. Powell. 1996. Intraspecific nuclear DNA variation in Drosophila. Mol. Biol. Evol. 13:261–277. Mulder, N. J., R. Apweiler, T. K. Attwood et al. (40 co-authors). 2005. InterPro, progress and status in 2005. Nucleic Acids Res. 33:D201–D205. Nekrutenko, A., W. Y. Chung, and W. H. Li. 2003. An evolutionary approach reveals a high protein-coding capacity of the human genome. Trends Genet. 19:306–310. Nobrega, M. A., I. Ovcharenko, V. Afzal, and E. M. Rubin. 2003. Scanning human gene deserts for long-range enhancers. Science 302:413. Nobrega, M. A., Y. Zhu, I. Plajzer-Frick, V. Afzal, and E. M. Rubin. 2004. Megabase deletions of gene deserts result in viable mice. Nature 431:988–993. Nurtdinov, R. N., I. I. Artamonova, A. A. Mironov, and M. S. Gelfand. 2003. Low conservation of alternative splicing

682 Chen et al.

of protein subsequences. Proc. Natl. Acad. Sci. USA 102:13526–13531. ———. 2005b. Assessing the application of Ka/Ks ratio test to alternatively spliced exons. Bioinformatics 21:3701–3703. Xing, Y., and C. J. Lee. 2004. Negative selection pressure against premature protein truncation is reduced by alternative splicing and diploidy. Trends Genet. 20:472–475. Xing, Y., Q. Xu, and C. Lee. 2003. Widespread production of novel soluble protein isoforms by alternative splicing removal of transmembrane anchoring domains. FEBS Lett. 555:572–578. Yang, Z. 1997. PAML: a program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci. 13: 555–556.

Yang, Z., and R. Nielsen. 2000. Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Mol. Biol. Evol. 17:32–43. Yeo, G. W., E. Van Nostrand, D. Holste, T. Poggio, and C. B. Burge. 2005. Identification and analysis of alternative splicing events conserved in human and mouse. Proc. Natl. Acad. Sci. USA 102:2850–2855. Zwick, M. E., D. J. Cutler, and A. Chakravarti. 2000. Patterns of genetic variation in Mendelian and complex traits. Annu. Rev. Genomics Hum. Genet. 1:387–407.

Jennifer Wernegreen, Associate Editor Accepted December 12, 2005

Downloaded from http://mbe.oxfordjournals.org/ by guest on June 1, 2013