Alterations in rRNA-mRNA interaction during plastid

0 downloads 0 Views 2MB Size Report
Euglenophyta, and Chromerida/Apicomplexa lineages, presumably related to their .... Because several endosymbiotic bacteria and most mitochondria have lost their SD interactions ... Many of these studies then established that a given 5′ UTR sub-region .... Current Chromalveolata contains many subgroups such as ...
Author Manuscript Published Final Version: Molecular Biology and Evolution 2014 vol: 31 (7) pp: 1728-1740 DOI: 10.1093/molbev/msu120 http://mbe.oxfordjournals.org/content/31/7/1728

Alterations in rRNA-mRNA interaction during plastid evolution

Kyungtaek Lima,b, Ichizo Kobayashia,b, and Kenta Nakaia,b,1

aDepartment

of Medical Genome Sciences, Graduate School of Frontier Sciences, The University of

Tokyo, Shirokanedai, Minato-ku, Tokyo 108-8639, Japan bThe

Institute of Medical Science, The University of Tokyo, Shirokanedai, Minato-ku, Tokyo 108-

8639, Japan

1 Corresponding

author:

Professor Kenta Nakai Tel: +81-3-5449-5131 Fax: +81-3-5449-5133 Email: [email protected]

Key words: Translation, Ribosome, Shine-Dalgarno sequence, Endosymbiosis, Chloroplast

1

Abstract Translation initiation depends on the recognition of mRNA by a ribosome. For this to occur, prokaryotes primarily use the Shine-Dalgarno (SD) interaction, where the 3′ tail of small subunit rRNA (core motif: 3′CCUCC) forms base pairs with a complementary signal sequence in the 5′ untransla ted region of mRNA. Here we examined what happened to SD interactions during the evolution of a cyanobacterial endosymbiont into modern plastids (including chloroplasts). Our analysis of availab le complete plastid genome sequences revealed that the majority of plastids retained SD interactions but with varying levels of usage. Parallel losses of SD interactions took place in plastids of Chlorophyta, Euglenophyta, and Chromerida/Apicomplexa lineages, presumably related to their extensive reductive evolution. Interestingly, we discovered that the classical SD interaction (3′CCUCC/5′GGAGG (rRNA/mRNA))

was

replaced

by

an

altered

SD

interaction

(3′CCCU/5′GGGA

or

3′CUUCC/5′GAAGG) through coordinated changes in the sequences of the core rRNA motif and its paired mRNA signal. These changes in plastids of Chlorophyta and Euglenophyta proceeded through intermediate stages that allowed both the classical and altered SD interactions. This coevolutio n between the rRNA motif and the mRNA signal demonstrates unexpected plasticity in the translatio n initiation machinery.

2

Introduction Basic translational functions such as peptide bond synthesis are highly conserved across all domains of life, but translation initiation systems differ considerably between prokaryotes and eukaryotes (Malys and McCarthy 2011). In prokaryotes, the site recognition that initiates translatio n is often mediated by rRNA-mRNA base pairing, known as the Shine-Dalgarno (SD) interaction (Shine and Dalgarno 1974). This mechanism is never observed in the nuclear genetic systems of eukaryotes (Malys and McCarthy 2011). SD interactions adhere to a distinct base pairing rule, such that a pyrimidine-rich, anti-SD sequence in the 3′ tail of a small subunit rRNA binds to a complementa r y, purine-rich, SD signal sequence (i.e., the ribosomal binding site) in the 5′ untranslated region (UTR) of an mRNA (fig. 1). A core motif, 3′CCUCC (referred to as classical anti-SD motif in this work), is conserved among the anti-SD sequences (Ma et al. 2002; Nakagawa et al. 2010; Lim et al. 2012), suggesting an extreme evolutionary constraint and a crucial role for SD interaction. The classical antiSD motif is able to form two modes of 4-base rRNA/mRNA pairing, 3′CUCC/5′GAGG and 3′CCUC/5′GGAG, each of which is considered sufficient to mediate SD interactions (Ma et al. 2002; Chang et al. 2006). Each of these two modes of rRNA/mRNA pairing is referred to as canonical SD interaction in this study. The functionality of an SD sequence depends on its high affinity for an antiSD sequence in a ribosome within the same cell, and on proper spacing, i.e., approximately 10 nt upstream from the start codon (Hirose and Sugiura 2004a; Chang et al. 2006). Although the SD interaction is observed in the overwhelming majority of prokaryotes, its usage varies considerably (Ma et al. 2002; Chang et al. 2006; Nakagawa et al. 2010; Lim et al. 2012). We previously reported the rare loss of the anti-SD motif and its complement SD sequence in several bacterial groups (Lim et al. 2012). Most of these bacteria are primary endosymbionts or hemotrophic mycoplasmas (Lim et al. 2012) that are under obligate association with their eukaryotic hosts and have undergone massive genome reduction (Toft and Andersson 2010; Guimaraes et al. 2011; McCutcheon and Moran 2012). These obligate host-associated bacteria share many biological and evolutiona r y features with eukaryotic organelles of prokaryotic origin, such as mitochondria and plastids (Toft and Andersson 2010; McCutcheon and Moran 2012). Mitochondria seem to have lost SD interactio n except for rarely observed ones with a large genome called bacteria-like mitochondria (Lang et al. 1997; Hazle and Bonen 2007; Burger et al. 2013). 3

Some evidence for SD interaction has been reported in several plastids (Bonham-Smith and Bourque 1989; Betts and Spremulli 1994; Hirose and Sugiura 2004a), supporting the endosymbiotic hypothesis that plastids (including chloroplasts) in photosynthetic eukaryotes evolved from a cyanobacterium by endosymbiosis (Sagan 1967; Gray and Doolittle 1982). Eukaryotes acquired plastids by several endosymbiotic processes (Reyes-Prieto et al. 2007; Keeling 2013) (fig. 2). First, direct endosymbiosis of a cyanobacterium with a eukaryote (primary endosymbiosis) resulted in plastids of the supergroup Archaeplastida (or Plantae) that includes Glaucophyta, Rhodophyta (red algae), Chlorophyta (a subgroup of green algae), and Streptophyta (all land plants and a subgroup of green algae) (Yoon et al. 2004; Rodríguez-Ezpeleta et al. 2005). Second, endosymbiosis of Archaeplastida members with other eukaryotes (secondary endosymbiosis) propagated the formers’ plastids: from Chlorophyta to Chlorarachniophyta and Euglenophyta; and from Rhodophyta to Chromalveolata, Haptophyta and Cryptophyta (Janouskovec et al. 2010). Finally, plastid evolutiona r y history became more complex by tertiary endosymbiosis and serial secondary endosymbiosis events involving Dinoflagellata as a host (Tengs et al. 2000; Keeling 2013) (one tertiary endosymbiosis event was depicted in fig. 2). During their long period of endosymbiotic evolution, plastid genomes underwent reductive evolution that eliminated most of their genes, some of which were transferred to the nuclear genome (Martin et al. 2002; Timmis et al. 2004). Transferred material included genes coding for proteins that performed essential plastid functions such as translation (Gantt et al. 1991; Millen et al. 2001; Ueda et al. 2007; Jansen et al. 2011), for which the gene products have to be transported back to the plastids (Agrawal and Striepen 2010). Unlike ribosomal protein genes, all rRNA genes for plastid ribosomes are present in plastid genomes. It is noteworthy that even genes for photosynthetic capability were lost in several plastid lineages. Such nonphotosynthetic plastids have been reported in Streptophyta (dePamphilis and Palmer 1990; Delannoy et al. 2011; Logacheva et al. 2011), Chlorophyta (Boucias et al. 2001), Euglenophyta (Siemeister and Hachtel 1989), and Apicomplexa (Wilson et al. 1996). In parallel with the gene loss, these plastids have the smallest genome size. Because several endosymbiotic bacteria and most mitochondria have lost their SD interactio ns (Lang et al. 1997; Hazle and Bonen 2007; Lim et al. 2012; Burger et al. 2013), we hypothesized that SD interactions in plastids with an ancient history of endosymbiosis and genome reduction may have 4

undergone drastic evolutionary changes. To gain insight into this process, we analyzed SD interactio ns in plastids, and discovered unusual plasticity in SD interactions in many plastid lineages.

Results Predicting SD Interactions We attempted to quantify the usage of SD interactions. The design of our strategy for determining the SD sequence was based on previous studies (Schurr et al. 1993; Ma et al. 2002; Starmer et al. 2006; Nakagawa et al. 2010; Lim et al. 2012) that predicted the minimum free energy (MFE) structure between an anti-SD sequence (on the 3′ tail of a small subunit rRNA) and a 5′ UTR sub-region of an mRNA. Many of these studies then established that a given 5′ UTR sub-region contained a potential SD sequence when the MFE was lower than a cutoff value, with a lower value denoting greater structural stability. To apply the MFE-based approach to plastid genomes (n = 428, see Materials and Methods for details on the preparation of plastid genome sequences), we first set sub-regions of 5′ UTR and the 3′ tail, between which the MFE structure was predicted (these sub-regions were referred to as SD and anti-SD regions, respectively). SD regions exist approximately at -10 nt from the start codon (Hirose and Sugiura 2004a; Chang et al. 2006). We set the SD region as -20 nt to -5 nt from the start codon based on an earlier study (Nakagawa et al. 2010). Because anti-SD regions were unclear for plastids, we deduced anti-SD regions according to the following steps that were designed based on our hypothesis that an anti-SD region was likely to form secondary structures with SD regions in its own genome: (i) Because the exact 3′ end (and, therefore, the exact tail length) remained unknown for most of the plastids, the +20 nt region from the 5′ start of the 3′ tail, which was larger than known 3′ tail lengths of plastids and prokaryotes, was defined as the tentative 3′ tail region. (ii) MFE structures between the tentative 3′ tail region and SD regions of the genes in the same genome were predicted and only those with an MFE ≤ −3.7 kcal/mol were retained for further analysis, since this value represented the free energy change of canonical SD interactions (see Introduction for details). (iii) We measured the site-specific base-pairing frequency (f anti-SD) in the tentative 3′ tail region for each genome. For example, an f anti-SD value of 0.5 indicated that the specific position in the tentative 3’ tail 5

region formed base pairings with the SD region on mRNA in half of the protein-coding genes (fig. 3). (iv) For the plastid genomes, the 5th nt to 13th nt (measuring in 5′-3′ direction) sub-region of the tentative 3′ tail region showed significantly higher mean f anti-SD values than the other positions (fig. 3). This sub-region included the classical anti-SD motif (3′CCUCC), if present (see below for plastids without the classical anti-SD motif). We considered this sub-region as the anti-SD region for all plastid genomes we assessed. Only exception was the plastid of E. gracilis, whose known 3′ tails (Steege et al. 1982) were too short to fully cover the anti-SD region. Only the covered regions were considered as the anti-SD regions for this plastid (supplementary fig. S1, Supplementary Material online). We predicted MFE structures between an anti-SD region and SD regions of all genes in the same genome for all plastid genomes. When there were multiple sequences in the anti-SD regions of different copies of small subunit rRNA genes within a genome, the type observed in the largest number of small subunit rRNA gene copies was used. A sequence within the SD region was designated as an SD sequence if the MFE value of the hybridization structure was ≤ −3.7 kcal/mol, which was the free energy change of canonical SD interactions as described above. All potential rRNA-mRNA interactions that met the MFE criterion were regarded as SD interactions regardless of the conservatio n of the classical anti-SD motif (3′ CCUCC). Finally, we calculated the fraction of SD sequencecontaining protein-coding genes for each genome (FSD) and used it as an index of SD interaction usage for a plastid genome.

Variations in the Classical Anti-SD Motif are Found in Plastids We searched anti-SD regions of all plastid genome sequences used in this study (n = 428) for variations in the sequence of the classical anti-SD motif (3′CCUCC). Almost all of the plastids (425 of 428) carried one or two copies of the small subunit rRNA genes with exceptions (3 of 428) that carried three or four copies (supplementary fig S2, Supplementary Material online). In genomes of 17 plastids, referred to as mutated anti-SD plastids (table 1), variations in the classical anti-SD motif were found in all copies of small subunit rRNA genes. The other 411 plastids carrying classical anti-SD motifs were referred to as classical anti-SD plastids. Intra-genomic copies of small subunit rRNA genes had identical anti-SD region sequences, with the exception of the Euglena gracilis plastid, which showed a single nucleotide (nt) difference between the major (three of the four copies) and minor (one of the four copies) genes (table 1). This 6

intra-genomic homogeneity of rRNA genes is explained by gene conversion between them (Palmer 1985; Hashimoto et al. 2003; Khakhlova and Bock 2006). We divided the mutated anti-SD plastids into two groups based on the level of variation in the classical anti-SD motif. The first group (n = 6) contained plastids with a slight deviation (≤ 2 nt changes of insertions, deletions, and substitutions) from the classical anti-SD motif, referred to as reduced antiSD plastids. The second group (n = 11) contained plastids with a more drastic deviation in the classica l anti-SD motif, referred to as lost anti-SD plastids.

Plastids with the Classical Anti-SD Motif (3′CCUCC) are Highly Diverse in SD Interaction Usage. Fig. 4A(i) shows FSD in plastids in different taxonomic groups of eukaryotic hosts (raw data are shown in supplement dataset S1, Supplementary Material online). The SD interaction usage varied among individual plastids. Such high diversity in its usage was reported for prokaryotes (Schurr et al. 1993; Chang et al. 2006; Nakagawa et al. 2010; Lim et al. 2012). Streptophyta plastids showed a high median FSD with little diversity (fig. 4A(i)). In contrast, Chlorophyta plastids showed a high diversity in the usage (fig. 4A(i)). The extreme taxonomic bias in plastid genome sampling hindered detection of more taxon-specific patterns of SD interaction usage.

Mutations in the Classical Anti-SD Motif Arose Independently in Multiple Plastid Lineages. We calculated FSD of the mutated anti-SD plastids (reduced and lost anti-SD plastids combined), which arose in four eukaryotic taxonomic groups, Chlorophyta, Euglenophyta, Apicomplexa, and Chromerida (fig. 4A). Reduced anti-SD plastids appeared to maintain SD interactions: they displayed moderate SD signals (FSD values > 0.09) (fig. 4A(ii)). Lost anti-SD plastids (table 1 and fig. 4A(iii) ) had FSD values close to zero (< 0.05). The mutated anti-SD plastids could also be grouped into two clusters based on their phyloge ny (fig. 4A): The first cluster included primary plastids in Chlorophyta (green algae) (n = 5), and secondary plastids in Euglenophyta (n = 4) that had originated in Chlorophyta (fig. 2). The second cluster included secondary plastids in Chromerida (n = 2) and Apicomplexa (n = 6), which origina ted in Rhodophyta (red algae) (fig. 2).

7

The Chlorophyta-Euglenophyta cluster contained mutated anti-SD plastids, which include both the reduced and lost anti-SD plastids. These mutated anti-SD plastids apparently emerged independently multiple times (fig. 4B). One of the mutation events in the classical anti-SD motif appears to have occurred in a Euglenida (Euglenophyta) plastid after secondary endosymbiosis of a Chlorophyta ancestor (fig. 4B(e)). Euglenida

contains three genera with a completely sequenced plastid: Eutreptiella,

Monomorphina, and Euglena. A classical anti-SD plastid was found in Eutreptiella, whereas reduced (Monomorphina aenigmatica, Euglena viridis, and Euglena gracilis) and lost (Euglena longa) antiSD plastids were found in the other genera. A plausible order of the plastid evolution is as follows: (i) Eutreptiella (classical anti-SD), (ii) M. aenigmatica and E. viridis (reduced anti-SD), (iii) E. gracilis (reduced anti-SD), and (iv) E. longa (lost anti-SD), as examined below in detail. Lost anti-SD plastids were found separately in Trebouxiophyceae (fig. 4B(c), Helicosporidium sp. ex Simulium jonesi) and Chlorophyceae (fig. 4B(c), Acutodesmus obliquus), while three reduced anti-SD plastids clustered in a clade within Chlorophyceae (fig. 4B(c), the last three species). Only the lost anti-SD plastids were observed in the Chromerida-Apicomplexa cluster (fig. 4A). Because these two groups are neighbors in the phylogeny, it may be parsimonious to hypothesize that there was only a single event of loss of SD interaction in their common ancestor.

Loss of SD Interaction Occurred in Parallel with Genome Reduction. Next, we sought explanations for the loss of SD interaction. First, we examined a possible association with the reductive evolution of plastid genomes to find that the lost anti-SD plastids tend to have smaller gene numbers than their close relatives (fig. 5). Comparisons between plastids in the red lineage showed clearly such a trend. A Rhodophyta (red algae) plastid transferred to an ancestor of Chromalveolata through secondary endosymbiosis (fig. 2). Current Chromalveolata contains many subgroups such as Heterokontophyta, Chromerida and Apichomplexa

(fig.

2). The gene number decreased in the following order: Rhodophyta,

Heterokontophyta, Chromerida, and Apicomplexa (fig. 5B). The loss of SD interaction took place in the last two groups (figs. 4A and 5B).

8

Among the Euglenophyta plastids generated from Chlorophyta by secondary endosymbios is (fig. 2), the only lost anti-SD plastid, the E. longa plastid, showed the most drastic genome reduction (protein-coding gene number < 40) (fig. 5A). Association between genome reduction and loss of SD interaction was less apparent in plastids of the green lineages (Chlorophyta and Streptophyta). In two of the lost anti-SD plastids in Chlorophyta, only the Helicosporidium sp. ex Simulium jonesi plastid showed extreme genome reduction (fig. 5A). Nonphotosynthetic plastids in parasitic plants (Streptophyta) such as Rhizanthella gardneri, Epifagus virginiana, and Neottia nidus-avis have tiny genome size (protein-coding gene number < 40) (fig. 5A) but show high SD interaction usage (FSD > 0.3) (supplementary dataset S1, Supplementary Material online). The loss of SD interaction is found together with drastic plastid genome reduction associated with loss of photosynthetic function in several lineages (fig. 5, green box).

Anti-SD Motifs Coevolved with Complementary SD Signals. Based on the above FSD calculation, we realized that reduced anti-SD plastids have maintained SD interactions through coordinated changes in the anti-SD motif (in rRNA) and the complementa r y SD signal (in mRNA). Other than canonical SD interactions (see Introduction for details), we detected two different types of rRNA/mRNA interactions, 3′CCCU/5′GGGA and 3′CUUCC/5′GAAGG in reduced anti-SD plastids, and named them altered SD interactions. The altered SD interactions evolved in a way that still allowed the rRNA-mRNA interactio n, likely via the following coordinated changes: (i) extension, where an altered SD interaction emerged and was mediated by a sequence that combined part of the classical anti-SD motif with its flank ing sequence; (ii) limitation, where one of the two canonical SD interactions was eliminated by a 1-nt mutation in the classical anti-SD motif; and (iii) elimination, where the other canonical SD interactio n was eliminated due to a more severe mutation in the classical anti-SD motif, leaving only the altered interaction. A plausible order of this evolution in the plastid lineage of Euglenophyta is limitatio n, extension, and elimination (fig. 6), whereas that in the plastid lineage of Chlorophyta is extensio n, limitation, and elimination (fig. 7), as detailed below.

(a) Euglenophyta 9

Secondary plastids belonging to Euglenophyta clearly showed the stepwise changes of limitation, extension, and elimination, as follows (fig. 6). Limitation: A substitution at the 5′ end of the classical anti-SD motif (3′CCUCC to 3′CCUCA) disallowed one of the two canonical SD interactions (stage [C1] in fig. 6A). This stage was observed in the M. aenigmatica and E. viridis plastids (fig 6B). Extension: The appearance of one C next to the 3′ end of the above anti-SD motif (3′CCUCA to 3′CCCUCA) and mRNA adaptation to this alteration enabled an altered SD interaction, referred to as a GGGA interaction (3′CCCU/5′GGGA) (stage [C1+A] in fig. 6A). Such rRNA change was observed in a major type rRNA (i.e., three out of the four small subunit rRNA genes) of the E. gracilis plastid. This allowed both a canonical SD interaction (3′CCUC/5′GGAG) and GGGA interaction (fig. 6B). For the mRNA signal, there was emergence of GGGA in four genes, three of which were for photosystems II (psbE, psbK, and psbJ) (fig. 6BC). An intermediate mRNA signal that covers both interactions (canonical+GGGA) was found in two genes, psbJ and pcog01 (fig. 6BC), suggesting that the transition from canonical SD to GGGA interaction may have occurred through an intermedia te stage. Figure 6C shows that the GGGA mRNA signals were present at similar locations relative to the start codon. Elimination: Another substitution in the above extended motif, 3′CCCUCA, generated 3′CCCUUA, which was found in a minor type rRNA (i.e., one of the four small subunit rRNA genes) of the E. gracilis plastid. This completely precluded the canonical SD interaction, allowing only the GGGA interaction (stage [A] in fig. 6A). Therefore, ribosomes carrying this rRNA would likely favor mRNAs with GGGA signals.

(b) Chlorophyceae Plastids belonging to Chlorophyceae (Chlorophyta) displayed the sequential changes of extension, limitation, and elimination (fig. 7). Here, the altered SD interaction that emerged was 3′CUUCC/5′GAAGG (rRNA/mRNA), referred to as a GAAGG interaction. Extension: The classical anti-SD motif (3′CCUCC) in rRNA is extended to the 3′ side by three nucleotides (3′CUUCCUCC). The extend motif can engage in the GAAGG interaction as well as canonical SD interactions (stage [C2+A] in fig. 7A). According to the phylogeny (fig. 7B), this may have occurred in a common ancestor of plastids belonging to Chlorophyceae (Chlorophyta). 10

Limitation: A canonical SD interaction was eliminated by deletion of one C at the 5′ end from the classical anti-SD motif, 3′CCUCC, within the above extended motif, 3′CUUCCUCC, to generate 3′CUUCCUC (stage [C1+A] in fig. 7A). According to the phylogeny, this took place in the common ancestor of Schizomeris leibleinii and Stigeoclonium helveticum plastids (stage [C1+A] in fig. 7B). Although the anti-SD motif variant allowed both types of rRNA/mRNA interactions, a GAAGG interaction was observed in more genes than the canonical SD interaction in the plastid of S. leibleinii (fig. 7B). Elimination: Finally, the evolution of the rRNA/mRNA interactions was followed by deletion of another C to generate 3′CUUCCU in the plastid of S. helveticum (stage [A] in fig. 7 A and B). This variant motif used only the altered (GAAGG) interaction because a severe decay in the classical antiSD motif prohibited the remaining canonical SD interaction (fig. 7B). In support of this route, 5′ UTRs in some protein-coding genes have evolved to be paired with the altered anti-SD motif and its flanking regions in the plastids of S. leibleinii and S. helveticum (fig. 7C, chlL and psbA). A similarly coordinated change was also observed between S. leibleinii and S. helveticum plastids. In the psbE gene, the mRNA signal change (5′GGAG to 5′GAAG) offset the alteration in the anti-SD motif (3′CUUCCUC to 3′CUUCCU) (fig. 7C). During this coevolution, there were systematic changes in the repertoire of SD signal-carrying genes. For example, some genes (rps19, psbB, and tufA) that had previously lacked an SD interaction signal at some point acquired an altered signal (GAAGG), to stages [C1+A] (of the S. leibleinii plastid) and [A] (of the S. helveticum plastid) (fig. 7B). In these plastids, some newly emerged genes (pcog02, pcog03, and pcog04), likely acquired via horizontal transfer, also gained a GAAGG interaction signal (fig. 7B). These genes appeared to be DNA endonucleases such as homing endonucleases (supplementary table S1, Supplementary Material online). During the same period, other genes (psaA, psbD, psbC, and atpH) that had depended on the classical anti-SD motif lost their capacity for SD interaction due to serial decay events involving the classical motif (fig. 7B). These phenomena suggest a strong association between rRNA-mRNA coevolution and gene translation patterns. In addition to the above-described route, evolution of the canonical and altered SD interactio ns also proceeded via two other paths. First, the [C2+A] stage may have independently reverted to the original [C2] stage in the plastids of Oedogonium cardiacum and Floydiella terrestris (fig. 7B). Second, 11

only limitation step by a substitution at the 5′ end of the classical anti-SD motif took place in the F. terrestris plastid (stage [C1] in fig. 7A (right)).

(c) Statistical significance We tested whether the observed occurrence of mRNA signals for altered SD interaction was statistically significant compared to the probability of observing the same consensus signal in random regions of the plastid genome. GGGA (in the E. gracilis plastid) and GAAGG (in S. leibleinii and S. helveticum plastids) interactions showed statistically significant abundance (p < 0.01), supporting mRNA adaptation to the altered SD interactions (table 2).

(d) Extensions without limitation or elimination. Extension (the emergence of altered SD interactions) did not necessarily entail decay of the classical anti-SD motif. GAAGG interaction showed statistically significant abundance (p < 0.01) in four classical anti-SD plastids belonging to Streptophyta, Chlorophyta, and Chlorarachniophyta (table 2). Among these, the B. natans (Chlorarachniophyta) plastid showed the highest abundance of GAAGG signal-carrying genes. An extension in the B. natans plastid yielded a GAAGG interactio n (3′CUUCC/5′GAAGG) as described for Chlorophyceae plastids (stage [C2+A] in fig. 8A). GAAGG interactions accounted for the majority of SD interaction signals (fig. 8B), and contributed to B. natans showing the highest dependency on SD interaction (FSD) of all plastids outside Streptophyta (fig. 4B).

12

Discussion Our large-scale analysis of SD interactions in plastids discovered the followings: (i) large diversity in usage of SD interaction; (ii) parallel loss of SD interaction during evolution; and (iii) alterations in SD interaction through rRNA-mRNA coevolution in multiple lineages. We discuss what factors and evolutionary processes likely underlie these phenomena.

Translation Initiation Independent of SD Interaction Our results indicate that a significant fraction of protein-coding genes on the plastid genome is translated in a manner independent of SD interaction. Comparison with similar phenomena observed in prokaryotes lead us to the following considerations. (i) SD interaction-independent genes may be translated by alternative initiation mechanis ms. A well-known alternative mechanism is direct initiation of leaderless (5′ UTR-lacking) mRNAs (Moll et al. 2004; Udagawa et al. 2004), which appears widespread among prokaryotes (Zheng et al. 2011). This mechanism, however, has never been examined in plastids to our best knowledge. (ii) There might be other, plastid-specific translation initiation mechanisms as previous ly suggested by the presence of many plastid-specific translation-related factors such as Nac2, RBP40, plastid-specific ribosomal proteins, and ribosomal proteins with plastid-specific domains (Yamaguchi et al. 2002; Hirose and Sugiura 2004b; Manuell et al. 2007; Schwarz et al. 2007). One or a combinatio n of these elements may have functionally substituted the SD interaction. Their intra-genomic usage and roles in translational regulation across plastid lineages need to be examined to gain insight into their influence, if any, on the evolution of SD interaction. (iii) SD interaction is presumably necessary only for mRNAs with a rigid structure around the start codon. It was reported not only in prokaryotes but also in plastids that mRNAs lacking a SD signal tended to be less structured there (Scharff et al. 2011). This leads to the hypothesis that a decrease in the structural stability around the start codon would result in loss of the SD signal during evolution, which may have been occurring in the plastid genomes.

Likely Factors That Affected Loss and Alterations of SD Interaction 13

An increase in SD interaction- independent genes by the above-mentioned processes may have led to the loss of SD interaction. This may have been facilitated by a substantial genome reduction, as some association between genome reduction and SD interaction loss is observed in plastids (fig. 5). Such association was also observed in mitochondria and bacteria. Evidence for SD-like interactio n was reported in bacteria-like mitochondria whose genomes are much larger than those of the other mitochondria (Lang et al. 1997; Burger et al. 2013), whereas SD interactions in other known mitochondria seem to have been eliminated (Hazle and Bonen 2007). In bacteria, loss of SD interactio n was mostly seen in those that have experienced drastic reductive evolution (Lim et al. 2012). For genomes containing a small number of genes, loss of just a few genes with an SD signal would eliminate SD interactions altogether, facilitating the loss of the anti-SD motif by the rRNA. An increase in SD interaction- independent genes together with reductive evolution reduces number of genes that depend on canonical SD interactions, which may underlie alterations in SD interaction by facilitating the coevolution of anti-SD motifs in rRNAs with SD signals in mRNAs. A reduced number of intra-genomic loci with the canonical SD signals likely increased the chances for a newly emerged SD interaction to dominate the genome, leading to a drastic relaxation of the constraints on the classical anti-SD motif. This may, in turn, have allowed the classical anti-SD motif to undergo a series of single nucleotide mutations over time, until the newly emerged interaction was fixed within the genome.

rRNA-mRNA Coevolution The ribosome has evolved through coevolution of rRNAs and ribosomal proteins (Harish and Caetano-Anollés 2012). It was reported that rRNA evolution likely drove the compensating changes in rRNA-interacting ribosomal proteins (Barreto and Burton 2013). Beyond this view that regarded rRNA evolution as a driving force of ribosomal protein evolution, our study suggests that rRNA evolution at the anti-SD motif site can drive mRNA evolution at SD signals by affecting the translational efficiency. Coevolutionary events of the anti-SD motif and the complementary SD signals seemed to have undergone the following steps: (i) extension: the extension of the classical anti-SD motif to generate an altered SD interaction; (ii) limitation: a mutational change in the classical anti-SD motif that limits the base pairing possibilities in the canonical SD interaction, but permits an altered SD interactio n; 14

and (iii) elimination: additional mutation(s) resulting in the complete elimination of the canonical SD interactions, leaving only the altered SD interaction. The limitation step can occur before the extensio n step. Abrupt replacement of the classical anti-SD motif would presumably be detrimental to the plastid and its host, as it would hinder expression of many genes. The proposed route with intermediate stages could have provided a safer route for the evolution of the prokaryotic translation initiation system. Similarly, significant changes in codon-anticodon interaction (e.g., arising from codon reassignme nt) have often been described using stepwise models that can tolerate potentially deleterious effects resulting from such changes (Osawa and Jukes 1989; Andersson and Kurland 1991; McCutcheon et al. 2009). Carrying multiple copies of small subunit rRNA genes possibly offers an additiona l intermediate stage of evolution toward the lost or altered SD interaction. One of the two rRNA variants in the E. gracilis plastid allowed both the canonical and altered SD interactions, while the other varia nt allowed only the altered SD interaction (table 1 and fig. 6). The latter rRNA variant might have arisen and been maintained more easily due to the presence of the former rRNA variant that guaranteed the canonical SD interaction. This reminds of a well-known evolutionary hypothesis that presence of multiple gene copies in a genome allows evolution of a new function (Ohno 1970; Näsvall et al. 2012). It is presumable that natural selection would favor genes with an SD signal that is able to interact with both the rRNA variants. If genes with a signal for the altered SD interaction predominate the genome, natural selection would be relaxed for the canonical SD interaction, permitting the former rRNA variant to evolve to disallow the canonical SD interaction.

Materials and Methods Genome Sequences and Protein-Coding Genes. The RefSeq collection (http://www.ncbi.nlm.nih.gov/refseq/) of complete plastid genome sequences (n = 430) was downloaded on November 20, 2013. Structural annotation information for the genomes was retrieved from the nucleotide database (http://www.ncbi.nlm.nih.gov/nuccore/) of the National Center for Biotechnology Information (NCBI). Registered coding sequence annotatio n information was obtained for 427 of the 430 sequences. We annotated the P. falciparum HB3 plastid 15

genome (NC_017928.1) for small subunit rRNA genes and protein-coding genes (supplementary table S2, Supplementary Material online) based on a previous annotation report for the same species (Wilson et al. 1996), and highly similar sequences between the two genomes that were obtained using BLASTn 2.2.27+ (Zhang et al. 2000). Including the P. falciparum HB3 plastid genome, the final plastid genome set had 428 genomes. The list of the plastid genome set was shown in supplementary dataset S1 (Supplementary Material online). To avoid pseudogenes and to standardize the content of protein-coding gene sets, we selected a given protein-coding gene for further analyses if at least one similar protein sequence was found in other genomes. A similar protein sequence was defined as a hit when the Expect value was less than 0.001 when searched using BLASTp 2.2.27+ (Altschul et al. 1997) against the nr BLAST database (ftp://ftp.ncbi.nih.gov/blast/db/).

Obtaining Full-length Small Subunit rRNA Gene Sequences For the final plastid genome set, annotation information for small subunit rRNA genes was obtained from the nucleotide database of NCBI (http://www.ncbi.nlm.nih.gov/nuccore/). Using the annotated regions as queries, we ran a similarity search (the Expect value < 0.001) against all the plastid genomes using BLASTn 2.2.27+ to search unregistered small subunit rRNA gene regions. Among the hit genes, only those larger than 1400 bp were retained to eliminate pseudogenes. Locations and sequences of the resulting small subunit rRNA genes used in this study were presented in supplementary sequence alignment S1 (Supplementary Material online).

Predicting the minimum free energy (MFE) structure between two RNA strands. We used RNAcofold in the Vienna RNA package 2.0.7 (Lorenz et al. 2011) with the “-d0” and “--noLP” options to find the best hybridization structure between two given RNA strands. The best hybridization structure can consist of both intra- and inter-strand structures. Because SD interactions are inter-strand interactions, we only considered the inter-strand structures to measure the MFE between the two RNA strands, which was calculated using RNAeval in the Vienna RNA package 2.0.7 with the “-d0” option.

Constructing Phylogenetic Trees. 16

Plastid phylogeny (figs. 2 and 4A) was inferred from previous literature (Keeling 2013). We constructed a multi-gene phylogenetic tree for the plastids of selected species of Chlorophyta, Euglenophyta, and Chlorarachniophyta according to the following steps. (i) We found common ribosomal protein genes among the selected plastids, which were rpl2, rpl4, rpl14, rpl16, rpl20, rpl36, rps3, rps4, rps7, rps8, rps11, rps12, rps14, and rps19. (ii) We aligned protein sequences of each orthologous gene set using MAFFT 6.927b (Katoh and Toh 2008) with the L-INS-I option. (iii) After removing poorly aligned regions using Gblocks 0.91b (Talavera and Castresana 2007), we concatenated the alignments for each species. (iv) Best-fit model selection (cpREV+I+G4) and maximum likelihood tree deduction (with 1000 bootstrap replications) for the concatenated alignme nt s were conducted using IQ-TREE 0.9.5 (Minh et al. 2013). The resulting tree (supplementary fig. S3, Supplementary Material online) was consistent with previously deduced phylogeny (Turmel, Gagnon, et al. 2009; Turmel, Otis, et al. 2009; Brouard et al. 2011; Hrdá et al. 2012).

Acknowledgements We thank Saaya Lim Tsutsué, Kuo-ching Liang, and Naoki Sato for discussion. Work by I.K. was supported by KAKENHIs. The supercomputing resource was provided by the Human Genome Center, the Institute of Medical Science, the University of Tokyo.

17

References Agrawal S, Striepen B. 2010. More membranes, more proteins: complex protein import mechanisms into secondary plastids. Protist 161:672–687.

Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. 1997. Gapped BLAST and PSIBLAST: a new generation of protein database search programs. Nucleic Acids Res. 25:3389–3402.

Andersson GE, Kurland CG. 1991. An extreme codon preference strategy: codon reassignment. Mol. Biol. Evol. 8:530– 544.

Barreto FS, Burton RS. 2013. Evidence for compensatory evolution of ribosomal proteins in response to rapid divergence of mitochondrial rRNA. Mol. Biol. Evol. 30:310–314.

Betts L, Spremulli LL. 1994. Analysis of the role of the Shine-Dalgarno sequence and mRNA secondary structure on the efficiency of translational initiation in the Euglena gracilis chloroplast atpH mRNA. J. Biol. Chem. 269:26456– 26463.

Bonham-Smith P, Bourque D. 1989. Translation of chloroplast-encoded mRNA: potential initiation and termination signals. Nucleic Acids Res. 17:2057–2080.

Boucias DG, Becnel JJ, White SE, Bott M. 2001. In vivo and in vitro developmen t of the protist Helicosporidium sp. J. Eukaryot. Microbiol. 48:460–470.

Brouard J-S, Otis C, Lemieux C, Turmel M. 2011. The chloroplast genome of the green alga Schizomeris leibleinii (Chlorophyceae) provides evidence for bidirectional DNA replication fro m a single origin in the chaetophorales. Genome Biol. Evol. 3:505–515.

Burger G, Gray MW, Forget L, Lang BF. 2013. Strikingly bacteria-like and gene-rich mitochondrial genomes throughout jakobid protists. Genome Biol. Evol. 5:418–438.

Chang B, Halgamuge S, Tang S-L. 2006. Analysis of SD sequences in completed microbial genomes: non -SD-led genes are as common as SD-led genes. Gene 373:90–99.

Delannoy E, Fujii S, Colas des Francs -Small C, Brundrett M, Small I. 2011. Rampant gene loss in the underground orchid Rhizanthella gardneri highlights evolutionary constraints on plastid genomes. Mol. Biol. Evol. 28:2077– 2086.

18

dePamphilis CW, Palmer JD. 1990. Loss of photosynthetic and chlororespiratory genes from the plastid genome of a parasitic flowering plant. Nature 348:337–339.

Gantt JS, Baldauf SL, Calie PJ, Weeden NF, Palmer JD. 1991. Transfer of rpl22 to the nucleus greatly preceded its loss from the chloroplast and involved the gain of an intron. EMBO J. 10:3073–3078.

Gray MW, Doolittle WF. 1982. Has the endosymbiont hypothesis been proven? Microbiol. Rev. 46:1–42.

Guimaraes AMS, Santos AP, SanMiguel P, Walter T, Timenetsky J, Messick JB. 2011. Complete genome sequence of Mycoplasma suis and insights into its biology and adaption to an erythrocyte niche. PLoS One 6:e19574.

Harish A, Caetano-Anollés G. 2012. Ribosomal history reveals origins of modern protein synthesis. PLoS One 7.

Hashimoto JG, Stevenson BS, Schmidt TM. 2003. Rates and consequences of recombination between rRNA operons. J. Bacteriol. 185:966–972.

Hazle T, Bonen L. 2007. Comparative analysis of sequences preceding protein -coding mitochondrial genes in flowering plants. Mol. Biol. Evol. 24:1101–1112.

Hirose T, Sugiura M. 2004a. Functional Shine-Dalgarno-like sequences for translational initiation of chloroplast mRNAs. Plant Cell Physiol. 45:114–117.

Hirose T, Sugiura M. 2004b. Multiple elements required for translation of plastid atpB mRNA lacking the Shine Dalgarno sequence. Nucleic Acids Res. 32:3503– 3510. Hrdá Š, Fousek J, Szabová J, Hampl V, Vlček Č. 2012. The plastid genome of Eutreptiella provides a window into the process of secondary endosymbiosis of plastid in euglenids. PLoS One 7:e33746.

Janouskovec J, Horák A, Oborník M, Lukes J, Keeling PJ. 2010. A common red algal origin of the apicomplexan, dinoflagellate, and heterokont plastids. Proc. Natl. Acad. Sci. U. S. A. 107:10949– 10954.

Jansen RK, Saski C, Lee S-B, Hansen AK, Daniell H. 2011. Complete plastid genome sequences of three Rosids (Castanea, Prunus, Theobroma): evidence for at least two independent transfers of rpl22 to the nucleus. Mol. Biol. Evol. 28:835–847.

Katoh K, Toh H. 2008. Recent developments in the MAFFT multiple sequence alignment program. Brief. Bioinform. 9:286–298.

19

Keeling PJ. 2013. The number, speed, and impact of plastid endosymbioses in eukaryotic evolution. Annu. Rev. Plant Biol. 64:583–607.

Khakhlova O, Bock R. 2006. Elimination of deleterious mutations in plastid genomes by gene conversion. Plant J. 46:85–94. Lang B, Burger G, O’Kelly C. 1997. An ancestral mitochondrial DNA resembling a eubacterial genome in miniature. Nature.

Lim K, Furuta Y, Kobayashi I. 2012. Large variations in bacterial ribosomal RNA genes. Mol. Biol. Evol. 29:2937– 2948.

Logacheva MD, Schelkunov MI, Penin A a. 2011. Sequencing and analysis of plastid genome in mycoheterotrophic orchid Neottia nidus-avis. Genome Biol. Evol. 3:1296–1303.

Lorenz R, Bernhart SH, Höner Zu Siederdissen C, Tafer H, Flamm C, Stadler PF, Hofacker IL. 2011. ViennaRNA Package 2.0. Algorithms Mol. Biol. 6:26.

Ma J, Campbell A, Karlin S. 2002. Correlations between Shine-Dalgarno sequences and gene features such as predicted expression levels and operon structures. J. Bacteriol. 184:5733– 5745.

Malys N, McCarthy JEG. 2011. Translation initiation: variations in the mechanism can be anticipated. Cell. Mol. Life Sci. 68:991–1003.

Manuell AL, Quispe J, Mayfield SP. 2007. Structure of the chloroplast ribosome: novel domains for translation regulation. PLoS Biol. 5:e209.

Martin W, Rujan T, Richly E, et al. 2002. Evolutionary analysis of Arabidopsis, cyanobacterial, and chloroplast genomes reveals plastid phylogeny and thousands of cyanobacterial genes in the nucleus. Proc. Natl. Acad. Sci. U. S. A. 99:12246– 12251.

McCutcheon JP, McDonald BR, Moran N a. 2009. Origin of an alternativ e genetic code in the extremely small and GCrich genome of a bacterial symbiont. PLoS Genet. 5:e1000565.

McCutcheon JP, Moran NA. 2012. Extreme genome reduction in symbiotic bacteria. Nat. Rev. Microbiol. 10:13–26.

Millen RS, Olmstead RG, Adams KL, et al. 2001. Many parallel losses of infA from chloroplast DNA during angiosperm evolution with multiple independent transfers to the nucleus. Plant Cell 13:645–658.

20

Minh BQ, Nguyen MAT, von Haeseler A. 2013. Ultrafast approximation for phylogenetic bootstrap. M ol. Biol. Evol. 30:1188–1195.

Moll I, Hirokawa G, Kiel MC, Kaji A, Bläsi U. 2004. Translation initiation with 70S ribosomes: an alternative pathway for leaderless mRNAs. Nucleic Acids Res. 32:3354–3363.

Nakagawa S, Niimura Y, Miura K, Gojobori T. 2010. Dynamic evolution of translation initiation mechanisms in prokaryotes. Proc. Natl. Acad. Sci. U. S. A. 107:6382–6387.

Osawa S, Jukes TH. 1989. Codon reassignment (codon capture) in evolution. J. Mol. Evol. 28:271–278.

Palmer JD. 1985. Comparative organization of chloroplast genomes. Annu. Rev. Genet. 19:325–354.

Reyes-Prieto A, Weber APM, Bhattacharya D. 2007. The origin and establishment of the plastid in algae and plants. Annu. Rev. Genet. 41:147–168.

Rodríguez-Ezpeleta N, Brinkmann H, Burey SC, Roure B, Burger G, Löffelhardt W, Bohnert HJ, Philippe H, Lang BF. 2005. Monophyly of primary photosynthetic eukaryotes: green plants, red algae, and glaucophytes. Curr. Biol. 15:1325–1330.

Sagan L. 1967. On the origin of mitosing cells. J. Theor. Biol. 14:255–274.

Scharff LB, Childs L, Walther D, Bock R. 2011. Local absence of secondary structure permits translation of mRNAs that lack ribosome-binding sites. PLoS Genet. 7:e1002155.

Schurr T, Nadir E, Margalit H. 1993. Identification and characterization of E.coli ribo somal binding sites by free energy computation. Nucleic Acids Res. 21:4019–4023.

Schwarz C, Elles I, Kortmann J, Piotrowski M, Nickelsen J. 2007. Synthesis of the D2 protein of photosystem II in Chlamydomonas is controlled by a high molecular mass complex containing the RNA stabilization factor Nac2 and the translational activator RBP40. Plant Cell 19:3627– 3639. Shine J, Dalgarno L. 1974. The 3’-terminal sequence of Escherichia coli 16S ribosomal RNA: complementarity to nonsense triplets and ribosome binding sites. Proc. Natl. Acad. Sci. U. S. A. 71:1342–1346.

Siemeister G, Hachtel W. 1989. A circular 73 kb DNA from the colourless flagellate Astasia longa that resembles the chloroplast DNA of Euglena: restriction and gene map. Curr. Genet. 15:435–441.

21

Starmer J, Stomp a, Vouk M, Bitzer D. 2006. Predicting Shine-Dalgarno sequence locations exposes genome annotation errors. PLoS Comput. Biol. 2:e57.

Steege DA, Graves MC, Spremulli LL. 1982. Euglena gracilis chloroplast small subunit rRNA. Sequence and base pairing potential of the 3’ terminus, cleavage by colicin E3. J. Biol. Chem. 257:10430–10439.

Talavera G, Castresana J. 2007. Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Syst. Biol. 56:564–577.

Tengs T, Dahlberg OJ, Shalchian-Tabrizi K, Klaveness D, Rudi K, Delwiche CF, Jakobsen KS. 2000. Phylogenetic analyses indicate that the 19’Hexanoyloxy-fuco xanthin-containing dinoflagellates have tertiary plastids of haptophyte origin. Mol. Biol. Evol. 17:718–729.

Timmis JN, Ayliffe MA, Huang CY, Martin W. 2004. Endosymbiotic gene transfer: organelle genomes forge eukaryotic chromosomes. Nat. Rev. Genet. 5:123–135.

Toft C, Andersson SGE. 2010. Evolutionary microbial genomics: insights into bacterial host adaptation. Nat. Rev. Genet. 11:465–475. Turmel M, Gagnon M-C, O’Kelly CJ, Otis C, Lemieux C. 2009. The chloroplast genomes of the green algae Pyramimonas, Monomastix, and Pycnococcus shed new light on the evolutionary history of prasinophytes and the origin of the secondary chloroplasts of euglenids. Mol. Biol. Evol. 26:631–648.

Turmel M, Otis C, Lemieux C. 2009. The chloroplast genomes of the green algae Pedinomonas minor, Parachlorella kessleri, and Oocystis solitaria reveal a shared ancestry between the Pedinomonadales and Chlorellales. Mol. Biol. Evol. 26:2317–2331.

Udagawa T, Shimizu Y, Ueda T. 2004. Evidence for the translation initiation of leaderless mRNAs by the intact 70 S ribosome without its dissociation into subunits in eubacteria. J. Biol. Chem. 279:8539–8546.

Ueda M, Fujimoto M, Arimura S, Murata J, Tsutsumi N, Kadowaki K. 2007. Loss of the rpl32 gene from the chloroplast genome and subsequent acquisition of a preexisting transit peptide within the nuclear gene in Populus. Gene 402:51–56.

Wilson RJ, Denny PW, Preiser PR, et al. 1996. Complete gene map of the plastid -like DNA of the malaria parasite Plasmodium falciparum. J. Mol. Biol. 261:155–172.

22

Yamaguchi K, Prieto S, Beligni MV, Haynes PA, McDonald WH, Yates JR, Mayfield SP. 2002. Proteomic characterization of the small subunit of Chlamydomonas reinhardtii chloroplast ribosome: identification of a novel S1 domain-containing protein and unusually large orthologs of bacterial S2, S3, and S5. Plant Cell 14:2957–2974.

Yoon HS, Hackett JD, Ciniglia C, Pinto G, Bhattacharya D. 2004. A molecular timeline for the origin of photosynthetic eukaryotes. Mol. Biol. Evol. 21:809–818.

Zhang Z, Schwartz S, Wagner L, Miller W. 2000. A greedy algorithm for aligning DNA sequences. J. Comput. Biol. 7:203–214.

Zheng X, Hu G-Q, She Z-S, Zhu H. 2011. Leaderless genes in bacteria: clue to the evolution of translation initiation mechanisms in prokaryotes. BMC Genomics 12:361.

23

Table 1. Eukaryotes harboring a plastid with an rRNA lacking the classical anti-SD motif Taxonomic group (supergroup)

Small subunit rRNA (plastid) Classical anti-

Anti-SD regionb

Gene copy

Lost

3′UCAAGAAUA

1

Acutodesmus obliquus

Lost

3′AUUUUUCUA

2

Schizomeris leibleinii

Reduced

3′UUCUUCCUC

1

Stigeoclonium helveticum

Reduced

3′AUUCUUCCU

1

Floydiella terrestris

Reduced

3′UAUCCUCUU

1

Monomorphina aenigmatica

Reduced

3′AUUACCUCA

1

Euglena viridis

Reduced

3′GUAACCUCA

1

Euglena gracilis

Reduced

3′CCCUCA

3

Reduced

3′CCCUUA

1

Lost

3′AAUUUUGUA

4

Chromera velia

Lost

3′UUUUAUUUU

1

Chromerida sp. RM11

Lost

3′ACUAUGUAC

2

Plasmodium falciparum

Lost

3′AAAUAAAAU

1

Leucocytozoon caulleryi

Lost

3′AUAAAAUAU

2

Babesia bovis

Lost

3′UAUAACUAU

2

Theileria parva

Lost

3′AAUGUGUAU

1

Eimeria tenella

Lost

3′CUAUCAUAU

2

Toxoplasma gondii

Lost

3′AAAUCAUUU

2

Species

SD

motifa

Chlorophyta (Archaeplastida) Helicosporidium sp. ex Simulium jonesi

Euglenophyta (Excavata)

Euglena longa Chromerida (Chromalveolata)

Apicomplexa (Chromalveolata)

a

Reduced: changes ≤ 2 nt (insertion, deletion, or substitution) compared to the classical anti-SD motif

(3′CCUCC). Lost: changes > 2 nt. b

The remnants of the classical anti-SD motif are underlined.

See supplementary dataset S1 (Supplementary Material online) for more detailed information on the above plastids.

24

Table 2. Statistically significant (p < 0.01 ) altered SD interactions Classical anti-SD

Taxonomic group

mRNA signal consensusb

Species

motifa

Classical

Streptophyta

Zygnema

Gene # (%) with the

Expected % (p) d

consensusc

GAAGG

6 (6.5)

0.55 (1.35e-05)

GAAGG

8 (8.3)

0.57 (9.8e-08)

GAAGG

4 (4.9)

0.58 (1.40e-03)

GAAGG

17 (28)

0.52 (6.3e-25)

GGGA

4 (6.5)

0.90 (2.4e-03)

GAAGG

8 (10.7)

0.45 (2.2e-09)

GAAGG

7 (9.2)

0.44 (5.1e-08)

circumcarinatum

Classical

Chlorophyta

Pseudendoclonium akinetum

Classical

Chlorophyta

Dunaliella salina

Classical

Chlorarachniophyta

Bigelowiella natans

Reduced

Euglenophyta

Euglena gracilis

Reduced

Chlorophyta

Schizomeris leibleinii

Reduced

Chlorophyta

Stigeoclonium helveticum

a

Reduced: changes ≤ 2 nt (insertion, deletion, or substitution) compared to the classical anti-SD motif

(3′CCUCC). b c

mRNA signal consensus of an altered SD interaction. See figures 6, 7, and 8 for details.

Number and percentage of protein-coding genes with the consensus in their SD regions (−20 to −5 (16 nt) from

the start codon). d

Probability (%) of observing the consensus among artificially generated 16 nt sequences (n = 106 ) using the

nucleotide frequencies of the genome. p for a binomial test of the null hypothesis that the actual number of consensus observed is equal to the expected number.

25

Fig. 1. The Shine-Dalgarno (SD) interaction for prokaryotic translation initiation. The 3′ tail of rRNA in the small subunit of the ribosome, which includes the classical anti-SD motif (3′CCUCC), recognizes a complementary sequence in the 5′ UTR of mRNA (i.e., the SD sequence) by rRNAmRNA base paring, as shown in this example.

26

Fig. 2. A diagram of bacterial and eukaryotic tree of life depicting primary, secondary, and tertiary endosymbiosis events. A cyanobacterium has evolved to become plastids in the eukaryotic supergroup Archaeplastida through primary endosymbiosis (labeled as 1°). These plastids are classified into cyanelles (in Glaucophya), green-lineage plastids (in Chlorophyta and Streptophyta), and red-lineage plastids (in Rhodophyta). Green- and red-lineage plastids within this group (Archaeplastida) have been transferred to other eukaryotic supergroups through secondary endosymbiosis (labeled as 2°). A plastid in Cryptophyta was transferred to a subgroup of Dinoflagellata through tertiary endosymbiosis (labeled as 3°). Phylogenetic placement of Haptophyta and Cryptophyta as well as their plastids remain unclear (Keeling 2013), so they are represented by dotted branches or an arrow.

27

Fig. 3. Defining the anti-SD region. f anti-SD represents the site-specific base pairing frequency calculated for each genome. Means and standard deviations of f anti-SD values in the tentative 3′ tails (+20 nt from the 5′ start of the small subunit rRNA 3′ tail) for all plastid genomes assessed in this study. Inter-strand base pairings from the hybridization structures with MFE ≤ −3.7 kcal/mol were included. Mean f anti-SD values within the box are significantly higher than the other sites, and are thus defined as the anti-SD region for all plastid genomes in this study (with an exception of the E. gracilis plastid, see Materials and Methods for details). The position of the classical anti-SD motif (3′CCUCC) within the anti-SD region is shown.

28

Fig. 4. SD interaction usage in plastids. (A) Plastid phylogeny and FSD values (fraction of SD sequence-carrying genes in a genome) in eukaryotic groups. A vertical bar represents median FSD. A boxplot was drawn for a group with > 6 members. Origins of Haptophyta and Cryptophyta plastids remain unclear. (B) Plastid multi-gene phylogeny and plastid FSD values in selected species of Chlorophyta, Chlorarachniophyta, or Euglenophyta. (i) Plastids with classical anti-SD motifs (classica l anti-SD plastids); (ii) plastids with reduced anti-SD motifs (reduced anti-SD plastids); (iii) plastids lacking anti-SD motifs (lost anti-SD plastids). The species with (ii) and (iii) are indicated in bold font. 29

See supplementary figure S3 (Supplementary Material online) for another version of the tree in panel B, with scaled branch lengths and bootstrap values. See Materials and Methods for details on tree construction.

30

Fig. 5. Lost anti-SD plastids are likely to carry a small number of protein-coding genes. (A) Comparison among green-lineage plastids (Streptophyta, Chlorophyta, Chlorarachniophyta, and Euglenophyta plastids). (B) Comparison among red-lineage plastids (Rhodophyta, Heterokontophyta, Chromerida and Apicomplexa plastids). 1°: plastids from primary endosymbiosis. 2°: plastids from secondary endosymbiosis. Classical: classical anti-SD plastids. Reduced: reduced anti-SD plastids. Lost: lost anti-SD plastids. A vertical bar represents a median of the protein-coding gene numbers. A boxplot is drawn for a group with > 6 members. Strptophyta species with non-photosynthetic plastids are Rhizanthella gardneri, Epifagus virginiana, and Neottia nidus-avis.

31

Fig. 6. Coordinated evolutionary changes in anti-SD motif (in rRNA) and SD signal (in mRNA) for SD interactions in Euglenophyta plastids. (A) Inferred stages of the coevolution. Two canonical SD interactions were limited to one interaction, then extended for the emergence of an altered SD interaction (GGGA interaction), and then eliminated. (B) Anti-SD motifs (in the anti-SD region) and genes in plastids of the indicated species at various evolutionary stages. Genes (boxes) are colored according to their SD interactions. Three boxes are shown when three paralogous genes are present. Major: three of the four rRNA genes. Minor: one of the four rRNA genes. Miscellaneous: secondary structures that cannot be categorized as one of the interactions in panel A, mostly due to the bulge or internal loop structure, or non-Watson-Crick base paring. See supplementary table S1 (Supplementa r y Material online) for the orthologous group, pcog01 and pcog09. (C) Predicted SD interactions for several orthologous genes. 32

Fig. 7. Coordinated evolutionary changes in anti-SD motif (in rRNA) and SD signal (in mRNA) for SD interactions in Chlorophyceae plastids. (A) Inferred stages of the coevolution. Left: The emergence of an altered SD interaction (GAAGG interaction), through an extension of the classical anti-SD motif, was followed by stepwise loss of the canonical SD interactions. Right: One of the canonical SD interactions was disallowed. (B) Anti-SD motifs (in the anti-SD region) and genes in plastids of the indicated species at various stages. Genes (boxes) are colored according to their SD interactions. Two boxes are shown when two paralogous genes are present. Miscellaneous: secondary structures that cannot be categorized as one of the interactions in panel A, mostly due to the bulge or internal loop 33

structure, or non-Watson-Crick base paring. See supplementary table S1 (Supplementary Material online) for the orthologous groups, pcog02−pcog08. (C) Predicted SD interactions for several orthologous genes.

Fig. 8. Coordinated evolutionary changes in anti-SD motif (in rRNA) and SD signal (in mRNA) for SD interactions in the B. natans (Chlorarachniophyta) plastid. (A) Inferred stages of the coevolution, leading to the emergence of an altered SD interaction (GAAGG interaction). (B) Anti-SD motifs (in the anti-SD region) and genes. Genes (boxes) are colored according to their SD interactions. Two boxes are shown when two paralogous genes are present. Miscellaneous: secondary structures that cannot be categorized as one of the interactions in panel A, mostly due to the bulge or internal loop structure, or non-Watson-Crick base paring.

34