inverted repeat (IR) of chloroplast DNA (cpDNA) from the unicellular green alga Chlorella ellipsoidea has been determined. The sequence includes: (1) the ...
Current Genetics
Curr Genet (1991)19:139-147
9 Springer-Verlag 1991
Repetitive sequence-mediated rearragements in Chlorella ellipsoidea chloroplast DNA: completion of nucleotide sequence of the large inverted repeat Takashi Yamada *
Department of Molecular Biology, Mitsubishi Kasei Institute of Life Sciences, 11 Minamiooya, Machida-shi, Tokyo 194, Japan Received June 18/October 3, 1990
Summary. A 3 454 base pair (bp) sequence of the large inverted repeat (IR) of chloroplast D N A (cpDNA) from the unicellular green alga Chlorella ellipsoidea has been determined. The sequence includes: (1) the boundaries between the IR and the large single copy (LSC) and the small single copy (SSC) regions, (2) the gene forpsbA and (3) an approximately 1.0 kbp region between psbA and the rRNA genes which contains a variety of short dispersed repeats. The total size of the Chlorella IR was determined to be 15 243 bp. The junction between the IR and the small single copy region is located close to the putative promoter of the r R N A operon (906 bp upstream of the -35 sequence on each IR). The junction between the IR and the large single copy region is also just upstream of the putative psbA promoter, 218 bp upstream from the ATG initiation codon. A few sets of unique sequences were found repeatedly around both junctions. Some of the sequences flanking the IR-LSC junction suggest a unidirectional and serial expansion of the IR within the genome. The psbA gene is located close to the LSCside junction and codes for a protein of 352 amino acid residues. A highly conserved C-terminal Gly is absent. Unlike the psbA of Chlamydomonas species, which contains 2 - 4 large introns, the gene of Chtorelta has no introns. The overall gene organization of the Chlorella IR is very different from that of higher plants, but a similar gene cluster of rrn-psbA is also found in the IR of Chlamydomonas species and in a single copy region of some chlorophyll a/c-containing algae, indicating a common evolutionary lineage of these cpDNAs. The origin and evolution of the IR structure are discussed in the light of these observations.
Key words: cpDNA evolution - IR expansion - p s b A rRNA genes - tRNA genes
* Present address: Department of Fermentation Technology, Faculty of Engineering, Hiroshima University, Saijo, Higashihiroshima 724, Japan
Introduction
The cpDNAs of a wide range of higher plants share a common molecular organization, involving circular molecules with an average size of 130-150 kbp and composed of two large inverted repeat sequences (IRs) separated by single copy regions of different sizes (Palmer 1985). In angiosperms, typical IRs are of the order of 20 kbp with single copy regions of 20 kbp and 80 kbp. Geranium cpDNA, whose IR is 76 kbp (Palmer et al. 1987), and the cpDNAs of a number of legumes, which have lost one arm of the IR (Palmer 1985), are exceptions. As for non-angiosperms, the IRs so far studied are much smaller, ranging from 9.4 to 17 kbp, with a size of 9.4 kbp for the moss Physcomitrella patens (Calie and Hughes 1987), 10 kbp for the fern Osmunda cinnamomea (Palmer and Stein 1986), 11 kbp for the liverwort Marchantia polymorpha (Ohyama etal. 1986), and 17 kbp for the gymnosperm Gingko biIoba (Palmer and Stein 1986). In spite of such a wide variation in the size of the IR, the order of the genes encoded on it is highly conserved in higher plants: for example, the tobacco IR (approximately 25 kbp) contains the genes tbr four rRNAs, seven tRNAs, five proteins and four unknown open reading frames (Shinozaki et al. 1986) and the 10 kbp region on the SSC side, which contains the genes for trnV, rrn, trnR and trnN, corresponds exactly to the IR (10 kbp) of the liverwort Marchantia polymorpha (Ohyama et al. 1986). The gene order of the remaining part of the tobacco IR (LSC-side) is the same as that of the LSC region just adjacent to the IR in the liverwort. In contrast, most variant forms o f c p D N A have so far been observed among algal species (Cattolico 1986). The largest cpDNA (400-600 kbp), for Aeetabularia (Padmanabhan and Green 1987), and the smallest one (85 kbp), for Codium fragile (Hedberg et al. 1981), are both found in green algae (Chlorophyta). The gene arrangement on the cpDNA in Chlorella (Yamada and Shimaji 1987b; Yoshinaga etal. 1988), Chlamydomonas (Harris et al. 1987; Turmel et al. 1987) and Codiumfi'agile
140 (Manhart et al. 1989) is very different from that of higher plants; this is especially true in the last, which lacks the IR structure. In Euglena gracilis (Euglenophyta), r R N A operons are not in an inverted order in its c p D N A but rather in a tandem array. The r R N A operon exists only once in red algae (Rhodophyta) such as Porphyra yezoensis and Griffisthia pacifica (Li and Cattolico 1987). In brown algae ( Chromophyta), Dictyota dichotoma (Kuhsel and Kowallik 1987) and Pylaiella littoralis (G6er et al. 1988) both possess cpDNAs with very short IRs of no more than the r R N A operon size. On the other hand, the cpDNAs of Ochromonas danica and Olisthodiscus luteus contain much larger IRs (Reith and Cattolico 1986). Recently, the c p D N A of the cryptomonad Cryptomonas q~ (Cryptophyta) was shown to contain an IR of the r R N A operon size (Douglas 1988). Thus, it appears that the presence, or absence, of an IR and its size, if present, are highly variable among algae. In order to understand the biological and evolutionary significance of the IR structure of cpDNA, which is highly conserved in a wide range of higher plants, it is important to study various forms of cpDNA. In the present study, the IR structure of the unicellular green alga C. ellipsoidea was examined in detail because of its peculiar organization (Yamada and Shimaji 1987b). It was found that the Chlorella IR was 15 243 bp long and contained the genes for three rRNAs, three tRNAs, psbA and four U R F s organized in a unique manner. The origin and evolution of IR structure, conserved in higher plant and some algal cpDNAs, are discussed in the light of these observations.
Materials and methods
Algal and bacterial strains. C. ellipsoidea C-87 was obtained from the algal culture collection of the Institute of Applied Microbiology, University of Tokyo. E. coli HB101, JM101 and JMI09 were used for bacterial transformation and propagation of plasmids.
DNA andRNA. Chloroplast DNA from C. ellipsoideawas prepared as described previously (Yamada 1982). Plasmid DNAs were prepared according to Maniatis et al. (1982). A gene library of C. ellipsoidea cpDNA was constructed by ligation of SstI-digested cpDNA fragments and Sst I-digested pUC 13 (Yamada et al. 1986) and transformed into E. coli HBI01 and JM109. Subcloning was carried out using the same vector and hosts. Total chloroplast RNA was prepared from the isolated chloroplasts (Yamada 1982) by phenol extraction and salt precipitation (Davis et al. 1986).
Nick translation and hybridization. Southern, Northern and colony hybridizations were carried out according to Davis et al. (1986). DNA probes were labeled by nick translation with a Takara nick translation kit (Takara Shuzo) and [e-32p]dCTP (110 TBq/mmol, Amersham, Buckinghamshire).
Sequencing of DNAfragments. Restriction fragments containing the 1R region of the ChlorellacpDNA were cloned into M13 mp18 and 19 (Horrander et al. 1983) in both orientations. Single-stranded DNA was sequenced by the chain termination procedure (Sanger et al. 1977), using [~-35S]dCTP (30 TBq/mmol, New England Nuclear, Wilmington, DE) and Sequenase (Toyobo Biochemicals). Both DNA strands were sequenced at least twice, and overlaps were obtained at each restriction site. Sequences were compiled and analyzed using GENETYX software on a NEC PC-98RX computer.
S1 mapping. S1 mapping, to determine the 5'- and Y-ends of the psbA mRNA, was carried out with total cpRNA as described previously (Yamada and Shimaji 1987b).
Results
Location of psbA on the Chlorella cpDNA When total C. ellipsoidea cpDNA, digested with five restriction enzymes (KpnI, PvuII, SacI, SphI and XbaI), was electrophoresed on an agarose gel, transferred to nitrocellulose and hybridized with a 3Zp-labeled 1.2 kbp SpeI fragment from pTB28, which contained the entire coding region of tobacco psbA (Sugita and Sugiura 1984), two hybridizing bands always appeared among the fragments produced by each restriction enzyme. They are fragments of 25 kbp and 20 kbp for KpnI, 12.0 kbp and 8.0 kbp for PvuII, 14.0 kbp and 8.0 kbp for SacI, 3.5 kbp and 2.3 kbp for SmaI, 25 kbp and 15 kbp for SphI and 12.5 kbp and 2.8 kbp for J(baI (data not shown). This suggests that there are two copies ofpsbA in C. ellipsoidea cpDNA. To determine the precise location and molecular structure of psbA, colones containing this gene were selected by colony hybridization with the tobacco psbA probe from the gene library of Chlorella c p D N A (Yamada and Shimaji 1986a). As expected, two different clones were obtained; one contained a 13.5 kbp SstI insert (pCCS14) and the other a 7.5 kbp SstI insert (pCCS65). Restriction maps for these clones are shown in Fig. 1 a. A region of approximately 4.0 kbp at the 3' end of both inserts gave the same restriction map. In this region there is a 1.5 kbp-HindIII fragment that hybridized to the psbA probe (data not shown). Previous mapping studies of the whole c p D N A by Southern hybridization of the restriction fragments (Yamada et al. 1986) showed that, except for the IR on the Chlorella cpDNA, there were no extended repeat sequences. Thus, the 4.0 kbp sequence o f p C C S 1 4 and pCCS65 seems to be part of the IR. This was confirmed by hybridization of the psbA probe to the 9.0 kbp and 8.5 kbp EcoRI fragments (data not shown), which were previously found to contain a part o f the IR including the genes for 23S Alac and the 3' half o f t R N A cSet rRNA, tRNAu6 c Uin common and different sequences of the LSC adjacent to the IR (Yamada and Shimaji 1987 b). Indeed, the nucleotide sequence determined around the SstI cloning site of both the pCCSI4 and pCCS65 inserts included the 5' half of the coding region for the tRNAGctj Set gene (Yamada 1989). Southern hybridizations, probed with a 300 bp PstISpel fragment of tobacco psbA containing the coding region for the first 86 amino acid residues of the protein, revealed that the 5' end of Chlorella psbA was within a 600 bp SpeI-NheI fragment of pCCS14 and a 700 bp SpeI-NheI fragment of pCCS65 (data not shown). Thus, the psbA gene is located entirely within the IR of the C. ellipsoidea cpDNA.
Location of the junction between the IR and the LSC Figure 1 a shows that the same pattern of restriction sites between pCCS14 and pCCS65 continues through the
141 E I /
s
,
i Ikb
Sp ~ I | $3~ HaXH
Si1~
5'[
psbl
X B
I
BH II
s J[ BS ld
pCCS14
5' trnS
13r
0 I>
s xsp Ij fiLl
x L
S I I
pCCS65
BS It
lkb
5i
psi] A
5'trnS
]3'
D h
~
S
I
)(11
I
~b b
t
B
SIS
B I
SIS X E [I I 9
I
Xh I "
i l
5"
pCCS208
Ir~68rBNA L~
pCCEX302
Fig. I a, b. Restrictionmaps of the pCCS14 and pCCS65 inserts (a) and the pCCS208 and pCCEX302 inserts (b). The coding regions for psbA and trnS are shown by boxes. The coding region for a 5' part of 16S rRNA is shown below the pCCEX302 map. Restriction sites for BglII(B), EcoRI(E), HindIII(H), HaeIII(Ha), SacI(S), Sall(Sl), SpeI(Sp), Sau3A(S3), XbaI(X) and XhoI(Xh) are as indicated. The size bars for each map represents I kb. Sequencing strategies are given under the map
psbA region and ends beyond this gene, indicating that the endpoint of the IR is very close to the 5' end ofpsbA. Therefore, the 1.55 kbp SpeI-XbaI fragment of pCCS14, and the 1.65 kbp SpeI-XbaI fragment of pCCS65 containing the coding region ofpsbA and the end point of the IR, were subcloned in order to determine the precise location of the junction point. Fine maps for these clones, shown in Fig. I a, indicate that the junction point is within a 350 bp SpeI-HindIII fragment of pCCSI4 and a 450 bp SpeI-HindIII fragment of pCCS65. The nucleotide sequences of these fragments were determined by the strategy indicated in Fig. 1 a and are shown in Fig. 2 a. The junction is not within any coding region, but is very close to the putative promoter sequence ofpsbA, 218 bp upstream of the ATG initiation codon. Though there is no obvious sequence homology beyond the junction, AT-clusters of 10-15 bp occur frequently in both flanking LSC regions.
Location of the junction between the IR and the SSC Previous studies showed that the XhoI site located 5.8 kbp upstream of the 16S rRNA gene on one side of the SSC was absent from the other (Yamada 1983) and, therefore, one endpoint of the IR has to be between the J(hoI site and the 16S rRNA gene. A 2.5 kbp XhoI-EcoRI fragment containing this region was cloned (pCCEX302) and a part of it was sequenced (Yamada and Shimaji 1987 b). Using this clone as a probe, clones containing the
other endpoint of the IR were screened by colony hybridization from the gene library (Yamada and Shimaji 1986a; Yamada et al. 1986). Two kinds of clones were obtained and designated as pCCS114 and pCCS208. Clone pCCSl14 contained a 6.0 kbp insert, which included the entire insert of pCCEX302 (2.5 kbp), while pCCS208 contained a 9.5 kbp insert. Figure I b shows restriction maps of pCCS208 and pCCEX302. Since the same pattern of restriction sites between the two clones occurs on the 3' side of the BglII site on pCCEX302, one junction must be within the 850 bp region between Xhol and BglII on pCCEX302. The nucleotide sequence of this region was determined by the strategy outlined in Fig. 1 b and is shown in Fig. 2b. Subcloning of the pCCS208 fragment corresponding to this region was difficult; several trials with different vectors, different host strains, and different restriction enzymes all failed. Therefore, the nucleotide sequence of the other junction region was determined with pCCS208 and a synthetic oligonucleotide, 5'-AGATCTAAATTTTGTTCT-Y, which was complementary to the sequence just upstream of the BgIII site of pCCEX302. The nucIeotide sequence is compared with that of pCCEX302 in Fig. 2b. The endpoint of the IR was identified between positions 162 and 163 where the sequence homology ends. This endpoint is also not within any coding region and is close to the putative promoter region of the 16S rRNA gene (906 bp upstream of the -35 region). Determining both endpoints of the IR made it possible to estimate the size of the IR to be approximately 56 kpb; this is less than the value of 22.5 kbp determined previously by electron microscopy (Yamada 5983).
Determination qf the entire nucteotide sequence of the IR A map of the entire IR region of C. ellipsoidea cpDNA is shown in Fig. 3. The nucleotide sequences of some parts of this region have already been reported: namely, 892 bp of the upstream regions of the 16S and 23S rRNA genes (Yamada and Shimaji 1987b), 1 532 bp of the coding region for 16S rRNA (Yamada 1988), 4894 bp of the 16S23S rRNA spacer region (Yamada and Shimaji 1986a), 3 468 bp of the coding regions for 23S rRNA (Yamada and Shimaji 1987 a), for 5S rRNA (Yamada and Shim@ 1986 b) and for tRNAGc Seru (Yamada 1989). The remaining parts of the IR were sequenced by the strategy outlined in Fig. 1 a and the sequence (3 202 bp) is shown in Fig. 2 c and d. Addition of this sequence to the 252 bp SSC-side sequence made it possible to determine the total size of the Chlorella IR to be 15 243 bp. Only two genes, psbA and trnS (GCU), were found in this region by computer analysis. Matrix analyses, however, revealed a surprisingly complicated structure for it: a region of about 1.0 kbp (Fig. 2d, positions 354-1 319) between psbA and trnS is interposed between a pair of inverted repeat sequences of 185kbp (/~-elements). Within this region, there is a pair of a-elements, which were previously found in the 16S-23S rRNA spacer region as terminal repeated sequences of a transposon-like structure (Yamada and Shimaji 1986a). The sequence between the a-elements is a chimera composed of se-
]0 20 30 40 50 60 70 80 90 lO0 JLA ACTAGTTTCTTTAGTAAATTATCATCCAGGTTTAAATTGAAGACTCGACTTAAATACTTTTTTGTAGGGAAATTGAAATAATCTTTATTGCTTGTTTTTT
1]0 120 130 ~40 150 160 ~70 ]80 190 200 JLA CTAAAGTT/TTTCCCATTCTGCTAAATACTTTTTTAAACTTGTIAAAAAATCCCCAACGGAGTTTTTAGGATATCAAGTTGTTTTAATATTACGAAAAGA JLB ACTA GTTTCTTTAAATATAGAGGG~GGCTTCCTAAATTAAAAACGAAACGCAGC
E 210 220 230 240 250 260 270 28D 290 300 JLA AAAAGACTGTAAGGAGATTTAAAAAGCAGCTTTATAATTGGTGTCCAATCAAAAGATGCTACTCTTTGCTTGTGATTACTGTACAGCTCAAGATAGAAAA JLB GCC~G~I'GGTAAGAACTAATTAGTTCTTACAGAAAAAAAATCCi~AAGTAAGACCCATAAAACTTGTGTTCTGGACTAGGGCGTACGITATTATTAGGCTTT
F 310 320 330 340 350 360 370 380 390 400 JLA CTGTTGCTGAAATTATTATTTCGCATAAGCTAGACTCTCTTACATCTCATTGCTAATCGCTTTATACTACGCCTAAGAAACCTTTCTAAAGACGCCAATG JLB CCCCGAAGGCTTTCACAAGCGTACGTTGTGAAAACTCTATGTTTTACCGTTGCCCTGCCGGACACCGAAGGGTCGGAGATAAAACATTTAGGGGAACCTT
B 420 430 440 450 460 470 480 490 500 410 JLA TATATTAAGCTTTGCTTGCCTACCAAAGGGCAACACTTTGTGTTTTTATTTAATACCAATAAATTA~TAGTACAGAGAAGACCATAGTTTTGCTTAAATT JLB TGTC ~HHH~X-*
10
20
30
4o
60
60
70
80
90
100
AAGCTTTGCTTGCCTA~CAAAGGGCAACA~TTTGTGTTTTTATTTA/~TA~CAATAAATA~GTA~AGAGAAGACCATAGTTTTG-~CTTAA.m-AAT~'TTCTT
210 F220 230 240 250 260 270 280 290 AGAGTTAAAAATTAT.•lATG•CTGCT•TTTTAGAAAGACGTGAAAGCGCT•GCCTATGGGCTCGCTTCTGTGAATG••TTACTAGC•CTGAAAACCGTTT
~
300 ^
etThrA~a~eLeuG~uArgArgG~u~erA~a~erLeuTr9A~aArgPhe~ysG~uTrp~eThrSerThrG~uAsnArgLeu
310 320 330 340 350 360 370 380 390 TACATCGGTTGGTTTGGTGTTCTA•TG•TCCC•A•TTT•TTAACTGCAACTTCTGT•TTTAT•ATCGC•T•••T•GCTGCAC•TCCAGT•G•TATCGATG
400
Tyr••eG•FTrpPheGl••alLeuNet•lePr•ThrLeuLeuThrAlaThrSer•alPheIle[leAlaPhe[leA•aA•aPr•Pr••a••splleAspG•y 410 420 430 440 450 460 470 480 490 500 GTATTCGTGAGCCTGTTTCTGGTTATTTACTTTACGGA•ACAATATC•TTT•TGGTGCTGTTGTTCCA•CTT•A•A•GCGATTGGTCTTCACTTCTACCC
•|eArgGluPr••alSerGlyTyrLeuLeuTyrGl•AsnAsn•leI•e•erGl•Ala•al•alPr•ThrSerAsnA•aI•eG•yLeuHisPheTyr•r• 510 520 530 540 550 560 570 680 590 600 AATTTGGGAAGCTGCTTCTTT•GACGAGTGGTT•TAC•ACGGTGGTCCTTACCAACTT•TCGTTTGCCATTTCTTCTTAGGTATCTGCTGCTACATGGGT
••eTrpGluAlaAlaSerLeuAspG•uTrpLeuTyrAsnGlyG•yPr•T•rGlnLeu••eValCysHisPhePheLeuGlyIleCysC•sTyrHetG•• 610 620 630 640 650 660 670 680 690 700 cGTGAGTGGGAACTTTCTTTCCGTTTAGGTATGCGTCCTTGGATTGCTGTAGCTTACTCTGCTCC•GTTGCTGCTG•T•CTGCTGT•TTTATCATTTACC
ArgG~uTrpGluLeu~erPheArgLeuGlY~etArgPr~Trp~eA~a~a~AlaTyrSerA~aPr~alA~aAIaAlaThrAlaValPheIlelleT~rPr~ 710 720 730 740 750 760 770 780 790 800 CTATCGGTCAAGGTTCTTTCTCTGATGGTATGCCTTT•GGTATTTCTGGTACTTTCAACTTC^TGATCGTATTCC••GCTG•ACACAACATCTTAATGCA
~IeG~yGlnGlY~erPhe~erAspGlyMetPr~LeuG~[e~erGlyThrPheAsn~heMet~le~alPheGlnA~aGluH~sAsnIleLeuMetH|s 810 820 830 840 850 860 870 880 890 900 •CCATTC••CATGCTTGGTGTTGCTGGTGTTTTTGGTGGTTCTTTATTCTCTGCTATGCACGGTTCTCT•GTA•CTTCTTCTTT••TCCGTG•AACTACT
Pr~PheHis~etLeuG~yVa~AlaG~Va~PheGlyGly~erLeuPhe~erA~a~etHisGIySerLeuVs1ThrSerSerLeuI~eArgGluThrThr
910 920 930 940 950 960 970 980 990 1000 GAGAATGAATCTCGTAA•GCTGGTTAC•AATTTGGT•A•GAAGA•GAAACTTAC•ACATCGT•GCTGCTCACGGTTACTTTGGTCGTTTAATCTTC•AAT
GluAsnG~uSerAr&Asn~laG~Tu 1010 1020 1030 1040 1050 1060 1070 1080 1090 1100 ACGCTTCTTTCAACAACTCTCGTTCTCTACACTTCTTCCTAGCTGCTTGGCCTGTAGTTGGT^TCTGGTTCACTGCTTTAGGTATTTCAACTATGG•ATT
Ala~erPheA~nAsnSerArg~erLeuH~sPhePheLeuAlaA~aTrpPr~al~alG~y~|eTrpPheThrA~aLeuGly~eSerThrNetAla~he
III0 1120 1130 1140 1150 1160 1170 1180 1190 1200 CAACCTAAATGGTTT•AACTTCAA•CAATCTGTTGTAGA•TCTCAAGGTCGTGTAAT•AACA•TTGGGCTGACATTATTAA•CGTGCTAA•TTAGGTATG
AsnLeuAsnGIyPheAsnPheAsnG~nSerVa~Va~AspSerG~nG~yArgVa~eAsnThrTrpA~aAsp~e~e~snArgA~aAsnLeuG~y~et 1210 1220 1230 1240 1250 1260 1270 I 1280 1290 1300 GAAGTAATGCACGAACGTAACG~G~A~AACTTCCCTCTAGACTTAGCTTCTGTTGAAG~T~TT~AATTGCG~AAT~AAAGCAATCGAAGCACAGAAGAT
GIuValMetH|sGluArgAsnAlaHisAsnPheProLeuAspLeuAlaSerValGluAlaProSerlleAla~*~***
1310 1320 1330 1340 1350 1360 1370 1380 1390 1400 TATGTCTAATAA•T•TCTGTAGAGACAGAAGAATATCCCCCCTA•GGTATAGGCTTCAATCCTTCTCAAGTTT•AAAGCACACCTTCCATTGTTTGAAT•
1410
1420
1430
1440
1450
.~TTG3ATTCAAATTAAACTTGCAAAAATAT~TTTTATATCTAAGTTTAAAAGCTT
c Fig. 2 a - d . Nucleotide sequences of the junction between the IR and the LSC (a), the junction between the IR and the SSC (b), the psbA region (c) and the chimeric region downstream ofpsbA on the IR (d). a Sequences of the SpeI-HindIII fragments ofpCCS14 (JLA) and pCCS65 (JLB) are compared. Stars indicate nucleotide sequences identical between the two clones. The junction was found
between positions 404 and 405. A sequence of about 90 bp of the IR a ~ a c e n t to the HindIII site is also shown. Sequence elements repeated on the IR are boxed (B, E, and F). b the sequences of pCCEX302 (3SA) and pCCS208 (JSB) are compared. Sta~ indicate identical nucleotide sequences. The junction was found between positions 162 and 163. e the HindIII site (positions 1 - 6 ) corre-
30 40 50 60 70 80 90 lO0 10 20 JSA GTGGCAATCAAGGACCTATA GGATACAAAA TGACAAATTA CCACTTCATC CGATTTTGGA GCGACAAGCT CGGCAATCTT TAAAATACTT TTGTTCTTCC TCG JSB
130 ]40 150 160 170 180 190 200 110 120 JSA TCACCGGAAATGCTCAGTCT CGAATACAAG TTCATGAAAT ATAACCTTGCAATAGACCAACAAGAGAAGCAAATCGAAAAGTGGGTCGAAAAGAGCCTAG JSB ATACGCTCTACGCGATTTCG ACCGCGTAGA AGTGATGGCCGATCAAGAAGGTGATTGCGGTC=X==X=== =======x== ,-w-w-==:==:H~ ~,-x--H;xxx~,-~ 210 220 230 240 250 260 270 280 290 300 JSA AGCTACCACCTGAATTTAGT TGCTTGGTATTCGACACTAATGTTTTGATA AACAGTATTA CACGCGATTCACACAATATA AAACGTTTAG CCTTCAAGGC
310 320 330 340 350 360 370 380 390 400 OSA AGACTATACCCCTGAAATTATAATATTCTT ACAAGATTTT ACAGCTAAATGTAAACATGT CCGTGCAGCTCAAACTCATCTAAATTCTAC TGCTTATCAA
410 420 JSA TTTTTAGAACAAAATTTAGA TCT JS8 ~-w.w-.x-w-w-w-w~.w-xx=::.:==:
430
b
10 C 20 30 40 50 60 70 80 90 100 'AAGCTTTCGC CCATTGGTTG CATTCGCACA AACATTTTAA AC[AAAGTTTTGTjTCGTAGAA Cz~ACAAAACA TTGAGACTAG AAAATATAAACTAAAACGCT 110
120
130
140
150
160
17,9
AGGTTACGGC GGTTTTAAAG TAAGCAATGA CAAGTTGGCT TATAACAAAC AATTTGCATT T
T
A
_J~_~ A
A
A
,~.j~,~O . . . . . . . . T
A
~
200
~
3'60 370 380 390 400 TCTACT TCGTGGAAGT CAATGCTGGT CAAATTTTCT TGTCTTTACC
,i~:~
410 420 430 440 450 460 470 480 490 500 CTTTGGTAAA GACAGAAGGA TCAGGTTATT TTTTTAACGC TGGTCTTCGT CCAGTAATTA ACGTAGCAGT ATAAAACATA AAGTTTTACG CTTTGCCCCA 510 520 530 540 550 560 E 570 580 590 600 TAAAACTTGG CTTTATAAGT TTGTGCTTTA CCCGTTGCTA AATGTTTTAT C~GGCTTCCTA AATTAAAAAC GAAGCGCAGC GCCT~CTCCAT TGTTTGACTT 610 620 630 640 F 650 660 670 680 690 700 CTGCGAAGCA GAAGTTAAA[r GGTAAGAACT AATTAGTTCT TACAGAAAAA AAGTCCb~rAA TTCGTTCGCA CAAATTAAGA TCTTGTGCTT AGCACAAGTC
710
720
730 ~
~
z
~
'
~
/
~
~
770
780
790
800
810 92o 64o 860 850 87o B 860 69o 9o0 TGTTCTGGGC 'rACGGCGTAC GTC~A~TAGG CTTCCCCCGA AGGCTTTCAC AAGCTACGCT TGTGAAAACT TTATGTTTTA CCGTTGCCC, GCCGGACACG 910 920 930 940 950 950 C 970 980 990 1000 ~GCAAGGGTCG GAGATAAAAC ATTTAGGGGA ACCTT!~GTTr AAAAGCT~CG CCCATTGGTT TGGCATTCGC ACAAACGTTT TAAAC~I'TAGA GATAAAACAT
1010 1020 1030 1040 1050 1060 1070 1080 1090 1100 AAAGTTTTAA AACGTACGCT TGTTAAAGCC TTTCAACGC~. AGGGTATACG AACAAAGTTC GTCAACCTTA CTACCAACGC AGAAGACTTT TTTTCCTGCG 1110 1120 1130 1140 1150 ~ G C G C A T C G T TGGTAGTTTA GGCA~ACTTA TAAAACTTTG TATAAATACA
11600"
1170
1180
1190 G ~ 1200 ' i':~:!~
~
1260
1270
1280
1290
"
~
....
:
"
1310 132 GAAGACTTC CACGAAGTG
I ~
ATTA
1250
1300 1400
'
'
i
1510 1520 1530 1540 1550 1560 1570 1580 1590 1600 TTCT_~ CAAAGTAACA CCAGAGGTGT TACTAGGATT TTCTCTTTCA GAGA~ATCTT TCAGAGA~GA TCTAAGTGTA ACGATGTTAC ACAAAGTTTT
1610 1620 1630 1640 1650 1660 1670 1680 1690 1700 GATCTTCGAT CAAAGCCTAA TACAATCGGA GATTGTATTT TGGACTTGAT CGAAGTTCAA CATCTTAATA GAGCGTAGCT TGTAAAACTA TTATGGACTT 1710 1720 1730 1740 1750 1760 t [ n ~ 1770 1780 1790 1800 GTGTTCTGCC TTCGCTTGTT TTGGTT~GAG TGATGTCTGA GTGGCCGAAA GAGCTCGATT GCTAATCGAG TATACAGCTC CCTGTACCGA GGGTTCGAAT 1810 1820 ~,830 1840 1850 1860 G 1870 1880 1890 1900 C~CCTCTCACT CCGJTACTTGT[A'ACCCGGGAGCCTAAGTATA AATCACACTC TCCATTGTTT GAATACAAAGTATTCAAAAT GTGAAAATATTTt~TACAAAG 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 TTGTGTGCTT CGCCCTTTGT ACCAAAGGGT AAAGACACAA AGCCTAGGTT CCTAACTATT GTTGCC~AGG GTATACGAAC AAAGTTCGTC ^ A C C T T A C T A
2010 D 2020 2030 2040 2050 2060 2070 2080 2090 2100 CCAACGCAG^ AGACTTTTTT TCTGCGCTGC GCTACGTTGG AGTTTAGGCA AJCAATATAAA GTTTTAACAA GCTACGAGCC CCCTTGCCAT TGCTTTCTAA d
s p o n d s to t h a t at p o s i t i o n s 4 0 7 - 4 1 2 in a. A n O R F forpsbA a n d the p u t a t i v e p r o m o t e r sequences ( - 1 0 a n d - 3 5 ) are all boxed. T h e 5'a n d 3'-ends o f the psbA m R N A , as d e t e r m i n e d by S l - m a p p i n g , are s h o w n by vertical arrowheads. Horizontal arrows indicate small inverted repeat sequences. T h e wavy line s h o w s a S h i n e - D a l g a r n o (SD) sequence, d the HindIII site (positions 1 - 6 ) c o r r e s p o n d s to
t h a t at p o s i t i o n s 1 451 I 456 in e. R e p e a t e d sequences (B, C, D, E, F a n d G) a n d the gene for trnS ( G C U ) are all boxed. Shadowed boxes indicate repeated sequence elements (p a n d a) associated with t r a n s p o s o n - l i k e s t r u c t u r e s ( Y a m a d a a n d Shimaji 1986a). Arrows s h o w small inverted repeat s e q u e n c e s
144
1•16S
/
rRNA
252
892
IAI
B
I
1532
C
I
ORFI
I
kA I
~
u ~ lie ortr-4
I~
5s rRN^
ItL LU
4894
D
~
23S rRNA
s=
^ k kk,l
LI 'I
1260
3206
E
I
F
I
psbA
I
1
I I 3202
,I
t I
G
Fig. 3. Map of the inverted repeat (IR) region of C. ellipsoidea epDNA. Coding regions are indicated by boxes. The orientation of transcription is shown by arrows. Trianglesrepresent repetitive sequences (e,/~ and a) found in this region. P1 and P2 are the putative promoters of back-to-back rrn operons (Yamada and Shimaji 1987b). LSC, large single copy region; SSC, small single copy re-
gion; psbA, gene for the photosystem II thylakoid protein D1 or the 32 kDa QB-bindingprotein. The nucleotide sequencesof regions A and G are shown in Fig. 3; those of B-F were reported previously (Yamada and Shimaji 1986a, ]987a, b; Yamada 1988). Sizes are shown in bp
quences upstream of the IR-LSC junction (B, 125 bp), downstream ofpsbA (C, 49 bp) and downstream of trnS (D, 87 bp), as indicated in Fig. 2 d. Most interestingly, the 125 bp sequence of B is a direct repeat of the IR-LSC junction sequence of the LSC side (Fig. 2 a and d). In the vicinity of this part, there are also two fragments of the LSC-junction sequence repeated directly, E (33 bp) and F (38 bp). Such a mosaic structure of the 1.0 kbp region flanked by/~-elements strongly suggests frequent recombinational events within it. It is interesting to note that there is a 11 bp sequence (CTCCAAAA/GTAA), repeated tandemly, next to the/~-elements, giving a structure typical of transposon-integration sites. Transposable element-like structures were previously found (both c~linked and a-linked) in the 16S-23S rRNA spacer region of the C. ellipsoidea IR (Yamada and Shimaji 1986a). Thus, three kinds of transposable element-like structures (4.5 kbp in total) occupy about a third of the C. ellipsoidea IR (15 243 bp).
215 and 272 and the QB- and herbicide-binding domains from residues 219 to 275 (except for 233-238)]. Codon usage in the C. ellipsoidea psbA is strongly biased against G in the third position; codons ending in G account for only 8.2% of the total. Such a bias is also found for rbcL from another strain of C. ellipsoidea (Yoshinaga et al. 1988). The termination codon forpsbA, TAA, is the same as all psbAs so far studied. The 5' and 3' ends of the psbA m R N A were determined by $1 mapping and are indicated on the DNA sequence shown in Fig. 2 c. A putative transcription initiation site is found about 60 bp upstream of the first ATG of the reading frame (Fig. 2 c), where the -35 and -10 sequences correspond to TTGTTC and TGTATT respectively. There is a Shine-Dalgarno (SD) sequence, AAGG, several base pairs downstream from the transcription initiation site. S1 mapping of the 3' end of the message showed that the termination site is about 166 bp downstream from the stop codon. A stem-loop structure for a prokaryotic termination signal can be formed in the sequence around this site (Fig. 2c).
Nucleotide sequence and deduced protein sequence of psbA The coding region of C. ellipsoidea psbA contains 1 056 bp, which corresponds to a protein of 352 amino acid residues. In contrast to the psbA genes of Chlamydomonas reinhardii (Erickson etal. 1984), C. smithii (Palmer et al. 1985) and C. moewusii (Turmel et al. 1988), which contain four, three and two large introns respectively, the Chlorella psbA contains no intron. In Fig. 4, the predicted amino acid sequence of the Chlorella gene is compared with the corresponding proteins of the bluegreen alga Anacystis nidulans (Golden et al. 1986), the prochlorophyte Prochlorothrix hollandia (Morden and Golden 1989), the unicellular green alga C. reinhardii (Erickson et al. 1984) and tobacco (Sugita and Sugiura 1984), with which it shows homologies of 89.5%, 88.1%, 93.2% and 93.8%, respectively. Like the higher plant proteins, there is a seven amino acid gap near the C-terminus compared to the blue-green algal proteins (Golden et al. 1986). In addition, the Chlorella protein lacks the C-terminal Gly residue, resulting in a protein which is one-amino acid smaller than that of higher plants. This Gly residue is also absent for the psbA protein of C. reinhardii (Erickson et al. 1984). Figure 4 shows that all the proposed functional residues making up the domains of this protein (Rochaix and Erikson 1988; Gingrieh et al. 1988) are also conserved in Chlorella, [e.g. the chlorophyll- and non-heme iron-binding residues of His 198,
Discussion
Organization of the IR of C. ellipsoidea cpDNA Although the size of the IR varies from 76 kbp for geranium (Palmer etal. 1987) to 4.7 kbp for Dictyota dichotoma (Kuhsel and Kowallik 1987), all IRs so far known contain the rRNA operon; in other words, there is no evidence for a IR without rRNA genes. It is noteworthy that the algal IRs ofDictyota dichotoma (4.7 kbp, Kuhsel and Kowallik 1987), Pylaiella littoralis (6 kbp, G6er et al. 1988) and Cryptomonas (5.5-6 kbp, Douglas 1988) contain only enough room for the rRNA operon. As determined in the present work, the IR of C. ellipsoidea contains the gene for psbA in addition to rrn (Figs. 1 a and 3). The rrn-psbA cluster is also reported to exist in the IRs of Chlamydomonas reinhardii (Harris et al. 1987), C. eugametos and C. moewusii (Turmel et al. 1987); the last two also contain rbcL next to the cluster (Fig. 5). A similar rrn-psbA linkage occurs in the cpDNAs of Pylaiella (G6er et al. 1988) and Cryptomonas (Douglas 1988), where psbA is, however, in a single copy region immediately adjacent to the IR, which consists solely of rrn (Fig. 5). These structures suggest a definite relationship between algal cpDNAs, at least for unicellular green algae and some chlorophyll a/c algae. Based on these
145 An Pr Ce Cr Nt
I0 2O **TA*QR**S ASL*QQ**E* **TA*RQ**S ANA*EQ**Q* MTAILERRES ASLWARFCEW * * A I * E R * * N SSL*AR**E* * * A I * E R * * S ESL*GR**N*
An Pr Ce Cr Nt
60 '*****~**** *****~**** AFIAA~PVDI *****~**** *****~**** II0 ~**** .M***~**** SLDEWLYNGG .k***~****
An
* L***
Pr Ce Cr Nt
~ V ~ ~
30 40 ********************* ********************* ITSTENRLYIIGWFGVLMIPT ********************* *********************
*****IC**V *****IC**I LLTATSVFII *****SV**I *****SV**I
70 80 *******A*S *M******S* *******A*S *M******S* DGIREPVSGY LLYGNNIISG *******S*S *L******T* *******S*S *L******S*
90 I00 *VV*S*N*** *VV*S*N*** AVVPTSNAIG LHFYPIWEAA *VI*T*N*** *II*T*A***
120 **Q*V*F**L **Q*V*F**L PYQLIVCHFF **Q*I*C**L **E*I*L**L
140 150 ****Y***** ~ C ~ ****Y***** , ~ C ~ WELSFRLGMR PWIAVAYSAP ****F***** ...A###.#. ****F***** ***A******
130 I*VF****~* I*IF****~* LGICCYMG~E L*VY****~* L*VA***~[ * *
An Pr Ce Cr Nt
160 170 '****T***LI ****T***LI **L******* VAAATAVFII YPIGQGSFSD ****S***LV ,****T***LI
180 190 200 ********** ***F****** ~********* ********** ***L****** *~******** GMPLGISGTF NFMIVFQAEH NIILMHPFHML
An Pr Ce Cr Nt
210 220 230 "********** ********** *~*V~***T ********** ********** ***V*]****N GVAGVFGGSL FSAMHGSLVT SSLI~ETTEN ********** ****~***** ***I~I****N ********** ********** ***I~****N
An Pr Ce Cr Nt
260 270 280 ********** *********~ ********** ********** *********~ ********** AHGYFGRLIF QYASFNNSR~ LHFFLAAWPV ********** *********~ *~******** ********** **********************
**##******
**#I******
~i*********
240 250 **Q*Y**K** ********** **Q*Y**K** ********** ESRNAGYKFG QEEETYNIVA **A*E**R** ********** **A*E**R** **********
Fig. 4. Amino acid sequences of the
An Pr Ce Cr Nt
290 300 photosystem II thylakoid protein D1 or V*****S**I the 32 kDa QB-binding protein (psbA). V*****S**I The putative transmembrane regions are VGIWFTALGI STMAFNLNGF boxed. The chlorophyll- and non-heine I*****A**L iron-binding residues of His are V*****A**I underlined. Stars indicate residues conserved among all five proteins. An, 310 320 330 340 350 Anacystis nidulans (Golden et al. 1986); ******L*** ***I*****V L********* ********** ****AGEATP Pr, Proehlorothrix hollandia (Morden ******M*** ***I*****I L********* ********** * * * * A V K ~ - - and Golden 1989); Ce, Chlorella NFNQSVVDSQGRVINTWADI INRANLGMEV MHERNAHNFP LDLASVE--- ellipsoidea; Cr, Chlamydomonas reinhardii ******V*** ***L*****I I********* ********** * * * * S T N - - - (Erickson et al. 1984); Nt, Nieotiana ******V*** ***I*****I I********* ********** ****AIE--tabacum (Sugita and Sugiura 1984)
An Pr Ce Cr Nt
360 VALTAPAING . . . . APSIIG APSIA. . . . SSSNN. . . . APSTNG
observations, it is possible that the IR structure, characteristic of cpDNAs, might have originated from a duplication of the r R N A operon, with the duplicate arranged in an inverted array on the cpDNA. According to this idea, the ChloreIla-Chlamydomonas type of IR could be formed from the Pylaiella and Cryptomonas types by a mechanism of expansion/contraction as has been suggested to have operated in geranium cpDNA (Palmer 1985; Palmer et al. 1987). The direction of expansion in this example is from rrn to psbA (Fig. 5) and, interestingly, as shown in Fig. 5, is the same as that of the replication of cpDNA from the origin located upstream of the
16S r R N A gene of Chlamydomonas. Althought the exact position of the replication origin is not known for the cpDNAs of Pylaiella, Cryptomonas and Chlorella, an ARS sequence on the Chlorella cpDNA has been mapped to a similar position in Chlamydomonas ori (Yamada et al. 1986). One mechanism that could account for the unidirectional expansion of the IR via replication and recombination is depicted in Fig. 6. Figure 6 A shows a replicating cpDNA molecule with a fork containing the IR and its flanking single copy regions. Double reciprocal recombination in this molecule, involving one point within the IR (Palmer 1983; Palmer et al. 1985) and the
146 ARS 1
--I
l ori I
ori I
sbA j
C,
2~s Psbi~ 'iq ~s 2as
FsbA
Ceu '~,H
Cm Ck
~stls
~,A
PI
Fig. 5. Comparison of the large inverted repeat sequences (IRs) among various kinds of cpDNAs. Abbreviation of genes: ARS, autonomously replicating sequence of yeast; ori, replication origin; 16S and 23S, 16S and 23S rRNAs; psbA, photosystem II thylakoid protein D1 or the 32 kDa Q~-binding protein; rbcL, large subunit of ribulose-1, 5-bisphosphate carboxylase. Abbreviation of organisms: Ce, Chlorella ellipsoidea; Cr, Chlamydomonas reinhardii (Erickson et al. 1984); Ceu, Chlamydomonas eugametos (Turmel et al. 5987); Cm, Chlamydomonas moewusii (Tnrmel et al. 5988); Cqb, Cryptomonas ~ (Douglas 1988); Pl, Pylaiella littoralis (Goer et al. 5988)
Fig. 6A-C. Postulated mechanism of expansion of the large inverted repeat sequence (IR) of cpDNA. Arrows along the circular cpDNA represent IRs. A a part of the cpDNA molecule is replicating from ori(*). Double reciprocal recombination at one point within the IR and with the other in a repetitive sequence (triangle) on the single copy region will give a parental type molecule (B) and one with an extended IR (C) after the completion of replication
other in a repetitive sequence on the single copy region, would, after the completion of replication, give rise to two different structural molecules; a parental type (B) and one with an extended IR (C). This model requires the presence of repetitive sequences within and around the IR to serve as recombinational hot spots. Indeed, as described below such sequences were found on the c p D N A of C. ellipsoidea. In order to extend the hypothesis of rrn as the origin of the IR to include higher plant cpDNAs, it would be interesting to look, among primitive algae and p r o c h l o r o -
phytes, for a postulated c p D N A with an IR consisting of only rrn and single copy regions, whose portions immediately adjacent to the IR retain the higher plant gene order of the IR. Such a structure might represent a different line of evolution from that of Chlorella and Chlamydomonas.
Mechanism of rearrangements of the C. ellipsoidea IR The IR region of C. ellipsoidea c p D N A includes various kinds of rearrangement involving insertions/deletions of small repeated sequences, possible transpositions (insertions) of ORFs with terminal repeated sequences (Yamada and Shimaji 1986a), and an inversion of a 5.0 kbp rrn region (Yamada and Shimji 1987 b). All of these rearrangements seem to have occurred via small repetitive sequences (for example, e- and a-elements) in the genome (Yamada and Shimaji 1986a). Although ~- and a-elements were first found as the terminal repeated sequences of transposon-like structures, the entire nucleotide sequence of the Chlorella IR revealed that there are ten copies of c~and ten copies of a within and around the IR. In addition to these, a pair of inverted repeat sequences of 185 bp (/?-elements) have been found within the IR in this study. A region of about 1.0 kbp (Fig. 2 d, positions 354-1 319), between psbA and trnS, is interposed between the/?-elements. According to the IR expansion/contraction model of Fig. 6, there should be repetitive sequences in the vicinity of the endpoints of the IR. The 125 bp B sequence on the IR, flanked by both l% and a-elements (Figs. 2 d and 3), is a good candidate for such sequences, since B is also located adjacent to the endpoint of the IR (Fig. 2a). Between the tandemly repeated B-sequences, there is a run of 2273 bp which contains the entire psbA gene (Fig. 3). This structure suggests that an expansion of the IR may have occurred, via B-sequences, to include psbA. In a similar way, the sequence of 597 bp next to psbA (positions ~ 224-1 821), including trnS (GCU), is sandwiched by the tandemly repeated a-linked sequences of 73 bp (G) as well as the tandem D-sequences (85 bp) shown in Fig. 2 d, again suggesting an expansion of this region by the same mechanism. If so, the event involving trnS (GCU) must be older than that involvingpsbA, and both of them may have occurred serially in the same direction, possibly by D N A replication (Fig. 6). Since multi-copies of ~r- and fl-elements were detected by Southern hybridization in the single copy regions of C. ellipsoidea c p D N A (data not shown), a transpositional mechanism could account for the dispersed distribution. In contrast with higher plants, algae (especially unicellular ones, such as Chlorella) seem to have much more freedom to rearrange their chloroplast genomes. The peculiar structure of the IR of C. ellipsoidea c p D N A provides an example, showing dynamic rearrangements mediated by multiple copies of the unique element.
Acknowledgements. The author wishes to thank Dr. M. Sugiura for plasmid pTB28 containing the tobacco psbA gene, Dr. R. Crouch for critical reading of the manuscript, Miss E Ozawa for synthetic oligonucleotides, and Miss R. Matsuura for typing the manuscript.
147
References Calie PJ, Hughes KW (1987) Mol Gen Genet 208:335-341 Cattolico RA (1986) Trend Ecol Evol 1:64-67 Davis LG, Dibner MD, Battey JF (1986) Basic methods in molecular biology. Elsevier, New York, pp 80-229 Douglas SE (1988) Curr Genet 14:591-598 Erickson JM, Rahire M, Rochaix JD (1984) EMBO J 3:2753-2762 Gingrich JC, Buzby JS, Stirewalt VL, Bryant DA (1988) Photosyn Res 16:83-99 G6er SL, Markowicz Y, Dalmon J, Audren H (1988) Curr Genet 14:155-162 Golden SS, Brusslan J, Haselkorn R (1986) EMBO J 5:2789-2798 Harris EH, Boynton JE, Gillham NW (1987) In: O'Brien SJ (ed) Genetic Maps. Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, pp 257-277 Hedberg MF, Huang YS, Hommersand MH (1981) Science 213:445 -447 Horrander J, Kempe T, Messing J (1983) Gene 26:101-106 Kuhsel M, Kowallik KV (1987) Mol Gen Genet 207:361-368 Li N, Cattolico RA (1987) Mol Gen Genet 209:343-351 Manhart JR, Kelly K, Dudock BS, Palmer JD (1989) Mol Gen Genet 216:417-421 Maniatis T, Fritsch EF, Sambrook J (1982) Molecular cloning: a laboratory manual. Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, pp 363-402 Morden CW, Golden SS (1989) Nature 337:382-385 Ohyama K, Fukuzawa H, Kohchi T, Shirai H, Sano T, Sano S, Umesono K, Shiki Y, Takeuchi M, Chang Z, Aota S, Inokuchi H, Ozeki H (1986) Nature: 572-574 Padmanabhan U, Green BR (1978) Biochim Biophys Acta 521:6773 Palmer JD (1983) Nature 301:92-93 Palmer JD (1985) Annu Rev Genet 19:325-354
Palmer JD, Stein DB (1986) Curr Genet 10:823-833 Palmer JD, Boynton JE, Gillham NW, Harris EH (1985) In: Arntzen C, Bogorad, L, Bonitz S, Steinback KS (eds) Molecular biology of the photosynthetic apparatus. Cold Spring Harbor laboratory, Cold Spring Harbor, New York, pp 269-278 Palmer JD, Nugent JM, Herbon LA (1987) Proc Natl Acad Sci USA 84:769-773 Reith M, Cattolico RA (1986) Proc Natl Acad Sci USA 83: 85998603 Rochaix JD, Erickson J (1988) Trend Biochem Sci 13:56-59 Sanger F, Nicklen S, Coulson AR (1977) Proc Natl Acad Sci USA 74:5463-5467 Shinozaki K, Ohme M, Tanaka M, Wakasugi T, Hayashida N, Matsubayashi T, Zaita N, Chunwongse J, Obokata J, Yamaguchi-Shinozaki K, Ohto C, Torazawa K, Meng BY, Sugita M, Deno H, Kamogashira T, Yamada K, Kusuda J, Takaiwa F, Kato A, Shimada H, Sugiura M (1986) EMBO J 5:2043-2049 Sugita M, Sugiura M (1984) Mol Gen Genet 195:308-313 Turmel M, Bellemare G, Lemieux C (1987) Curr Genet 11:543- 552 Turmel M, Lemieux B, Lemieux C (1988) Mol Gen Genet 214: 412419 Yamada T (1982) Plant Physiol 70:92-96 Yamada T (1983) Curr Genet 7:481-487 Yamada T (1988) Nucleic Acids Res 16:9865 Yamada T (1989) Nucleic Acids Res 17:4372 Yamada T, Shimaji M (1986a) Nucleic Acids Res 14:3827-3839 Yamada T, Shimaji M (1986b) Nucleic Acids Res 14:9529 Yamada T, Shimaji M (1987a) Curr Genet 11:347-352 Yamada T, Shim@ M (1987b) Mol Gen Genet 208:377-383 Yamada T, Shimaji M, Fukuda Y (1986) Plant Mot Biol 6:245-252 Yoshinaga K, Ohta T, Suzuki Y, Sugiura M (1988) Plant Mol Biol 10:245-250 C o m m u n i c a t e d b y C. S. Levings I I I