I NTRON -2 o. 200nt. B. ccUGc AAGuAAcUUooo AGUAUAUAAACAUCGAUuU .6c I,, . ,,, . -I. -,-6 ..III. III. 11. -. c. cucOLA.JAUw Uuu. cUUAUCI UJUUAGUU u.
Nucleic Acids Research, 1993, Vol. 21, No. 7 1667
Nucleotide sequence and secondary structure of the chloroplast group I intron Cr.psbA-2: novel features of this self-splicing ribozyme Yijia Bao and David L.Herrin* Botany Department, University of Texas at Austin, Austin, TX 78713, USA Received February 22, 1993; Accepted March 5, 1993 The chloroplast psbA gene in Chlamydomonas reinhardtii contains self-splicing introns (1). Intron-2 was particularly efficient at selfsplicing (1). Thus it was of interest to determine its sequence. Cr.psbA-2 shows two unique features: (1) two free-standing open reading frames (ORFs), and (2) a stem-loop (and additional sequences) between helices P8 and P7. The sequence of Cr.psbA-2 was determined from two independent clones, pGEMR14.2 (1), and P-66 (Chlamydomonas Genetics Center, Duke University). The intron is 1410 nt, and is A/T rich (-67%). The ORFs are located at residues 206 to 706, and 949 to 1098, respectively (Figure IA). ORF-1 potentially encodes a protein of 167 amino acids (18.6 kDa). The P1I/P2 peptides, which occur in some group I intron ORFs (2) are not present in ORF-1. However, there is a perfect Shine-Dalgarno sequence 7 nt 5' to the start codon (see Figure iB). A search of databases revealed no sequence similarity with other group I ORFS, nor any other proteins. ORF-2 is 50 amino acids. However, chloroplast genes can be very small ( - 4 kDa) and some lack a Shine-Dalgarno sequence. There is little similarity of ORF-I with other known proteins. Figure lB shows a proposed secondary structure. Cr.psbA-2 contains P1 -P0 helices similar to other group I introns, and both ORFs are in loop 6, out of the core structure. Although Cr.psbA-2 contains large peripheral structures (e.g. PSa, b, c), the essential helices P3, P4, P5, P6, P7, P8, and P9 are small and give a compact structure to the ribozyme core. Figure lB (inset) shows how the 5'- and 3'-splice sites can be aligned by the internal guide sequence (IGS) (2). We have attempted to classify Cr.psbA-2 (3). The presence of the P5 extension (PSa, P5b and P5c) is characteristic of group IC introns. However, Cr.psbA-2 also contains P7.1 (Figure iB) which is typical among subgroup IA introns. Thus, Cr.psbA-2 appears to be an intermediate between the IC and IA subgroups. Alternatively, because of the additional sequences and stem-loop (P8. 1) between P8 and P7, which have not been observed previously, Cr.psbA-2 may represent the first case of a new subclass of group I introns.
ACKNOWLEDGEMENTS This research was supported by grants from the NSF (DMB89-05303), USDA (92-37301-7682) and the Welch Foundation (F-1 164) to D.L.H. *
To whom correspondence should be addressed
EMBL accession no. Z19597
REFERENCES 1. Herrin,D.L., Bao,Y., Thompson,A.J. and Chen,Y.-F. (1991) Plant Cell 3, 1095-1107. 2. Davies,R.W., Waring,R.B., Ray,J.A., Brown,T.A. and Scazzocchio,C. (1982) Nature 300, 719-724. 3. Michel,F. and Westhof,E. (1990) J. Mol. Bio. 216, 585-610.
A.
EXON-2
ORF- 1 ORF-2 EXON-3 ..I.. NTR.. t. .0...
_
. ....... .... .
.................:
I NTRON -2
cc UGc
B.
o
200nt
AAGuAA cUUooo AGUAUAUAAACAUCGAUuU
. ,,, . III III 11 -. .6c I,, -I ,-6 .. cucOLA.JAUw Uuu cUUAUCI UJUUAGUU u UUC G.U A 0 P5b P5c
c
0 0 A -U
U U
GUU
-UU C u A-U A-U
'A CA
A-U
P5 A-U
P1O U
U.G
C
U
Ui0 GA
u
C-G
U
A-U
p,
,2gX- -A-Up2 A-'J >C-u pi A_ A-U A- C-GAUUAAAAAAACU 5
02pA-AU A-Ecl--AACCCOU-AA^A
UUGA^ORF5G4nt U A- ORF 1504n t 239n t
0
c A-u r A AAAUAU
PC
.~~~AAPO0Au0
C-7-l
44nt &~-l 4n t -U-A-26n
'
~
-UAAAC-G
U-A
A_U CG C U
COGAA
CGAGAGA-U 0 -C
0-C AG AA
P7t1
0-site
Ps.0
P~i
Pi
AUCCACUGUUJJgugaqugq-
U-At
3'
A-U P9 UA-U
GA AA
Figure 1. A. Map of the group I intron Cr.psbA-2. Exons, filled boxes; ORFS, open boxes; intron, solid lines. B. Proposed secondary structure of the Cr.psbA-2 ribozyme. Intron sequences are in upper case letters and exon sequences in lower case letters; the 5' and 3' splice-sites are indicated by arrows. P1 to PlO refer to helices numbered by convention. The base-pairs in boxes are proposed to have at least one base participating in a tertiary interaction (3). Circled residues are 100% (or nearly) conserved among group I introns. The ORFs are located in loop 6 and the guanosine-binding site (G-site) in P7 is indicated. S-D, Shine-Dalgarno sequence. Inset: Arrows indicate 5' and 3' splice-sites; the IGS is boxed.