globulins in oats (Avena sativa L.) Michael A. Tanchak, Marc Giband, Bernard Potier,. Johann P. Schernthaner, Stefan Dukiandjiev, and lllimar Altosaar. Abstract: ...
Genomic clones encoding IIS globulins in oats (Avena sativa L.)
Genome Downloaded from www.nrcresearchpress.com by Guangzhou Jinan University on 06/05/13 For personal use only.
Michael A. Tanchak, Marc Giband, Bernard Potier, Johann P. Schernthaner, Stefan Dukiandjiev, and lllimar Altosaar Abstract: We have isolated two complete genomic clones, Glavl and Glav3, encoding 11s globulins (legumins) in oat. The structure of Glavl deviates from that of the typical legumin gene. This clone possesses an extra intron and an extra exon that is composed entirely of repeats of sequences found elsewhere in the clone. If this exon is functional, the protein encoded by Glavl will contain novel octapeptide and hendecapeptide repeats. The two Glav clones show stronger and more extensive homology with one another than with the two previously published genomic clones, OGI-El and ASglob5. This result suggests that the oat globulin gene family may be divided into distinct subfamilies or that there may be significant cultivar-specific differences among members of this gene family. Key words: Avena sntiva, gene structure, globulin, legumin.
Resume : Les auteurs ont isolk deux clones ginomiques complets, Glavl et Glav3, qui codent pour les globulines 1 1 S (lkgumines) chez l'avoine. La structure de Glavl diffkre de celle d'un gkne de ligumine typique. Ce clone posskde un intron de plus ainsi qu'un exon supplkmentaire composi de skquences ripkties que I'on peut trouver ailleurs dans le clone. Si cet exon est fonctionnel, la protiine codke par Glavl contiendra de nouvelles rkpktitions d'octapeptide et d'un dkcapeptide. Les deux clones Glav montrent une homologie plus forte et plus ktendue entre eux qu'aux deux autres clones gknomiques isolks antkrieurement, OGI-El et ASglob5. Ce rksultat suggkre que la famille de g h e s des globulines chez l'avoine pourrait Etre divisie en des sousfamilles distinctes, ou encore qu'il y aurait des diffkrences significatives entre cultivars au niveau de cette famille de gknes. Mots cle's : Avena sntiva, structure des gknes, globuline, Ikgumine. [Traduit par la Rkdaction]
Introduction The seed storage protein profiles of species from the different taxonomic branches of the plant kingdom display distinct tendencies. Dicot seeds, especially legumes, tend to be rich in globulins and (or) albumins, and do not possess any prolamins at all. In contrast, the seeds of monocot plants tend to be rich in prolamins and relatively poor in albumins and globulins. Significant exceptions are two monocots, oat and rice. In the cultivated oat, Avena sativa, globulins make up the bulk, up to 8096, of the seed storage proteins (Colyer and Luthe 1984; Robert et al. 1983). These globulins are found in the protein bodies of the
endosperm tissue of the oat kernel (Adeli et al. 1984; Lending et al. 1989) and are predominantly of the legumin or 11-12s type (Burgess et al. 1983). Legumins are hexameric proteins. The monomers are approximately 60 kDa in relative molecular mass (M,) and are held together through noncovalent interactions. Each monomer consists of an acidic polypeptide of approximately 4 0 kDa covalently bound by a highly conserved disulfide bond to a basic polypeptide having an approximate M, of 20 kDa (Borroto and Dure 1987). The monomer is synthesized a s a preproprotein (Brinegar and Peterson 1982; Matlashewski et al. 1982) containing a signal peptide that
Corresponding Editor: G. Fedak. Received May 24, 1994. Accepted March 6, 1995.
M.A. ~ a n c h a k , 'M. ~ i b a n d , 'B. ~ o t i e r , )J.P. ~chernthaner,' S. ~ukiandjiev: and I. ~ l t o s a a r Department .~ of Biochemistry, University of Ottawa, 40 Marie Curie Private, Ottawa, ON KIN 9B4, Canada.
'
'
Present address: Department of Behavioural and Life Sciences, University College of Cape Breton, P.O. Box 5300, Sydney, NS B I P 6L2, Canada. Present address: Laboratoire de Biologie Cellulaire, Institut National de la Recherche Agronomique, Centre de Versailles, Route de Saint Cyr, 78026 Versailles Ckdex, France. Present address: Department of Botany, University of Queensland, Brisbane, QLD 4072, Australia. Present address: Department of Chemistry, University of Ottawa, 40 Marie Curie Private, Ottawa, ON KIN 9B4, Canada. Present address: Biochemistry Department, Plovdiv University, Plovdiv, Bulgaria. Author to whom all correspondence should be addressed.
Genome. 38: 627-634 (1995). Printed in Canada 1 Imprime au Canada
Genome Downloaded from www.nrcresearchpress.com by Guangzhou Jinan University on 06/05/13 For personal use only.
Genome, Vol. 38, 1995 Fig. 1. Aligned sequences of oat globulin genomic clones Glavl (ENIBL accession No. X7470) and Glav3 (EMBL accession No. X74741). Only sequences common to the two clones are shown. Complete nucleotide and derived amino acid sequences for this region of Glav3 are shown. The sequence of Glavl is shown only where it differs from Glav3. Dashes represent deletions in one sequence relative to the other. Numbers on the left indicate nucleotide position in Glav3 relative to the start of transcription. Numbers on the right indicate amino acid position in Glav3. Amino acids are indicated by their one-letter code. Coding sequences are shown in capital letters, while noncoding sequences are shown in lower case letters. Potential CAAT and TATA boxes and the nucleotide corresponding to the start of transcription, as determined by primer extension analysis (not shown), are underlined. The V-JUNlGCN4 motifs are double underlined. The A/T rich sequences containing the modified core element of the "-300 box" are boxed. Potential polyadenylation signals are overlined. The polyadenylation site (nucleotide 2009) is identified by an arrowhead. The inserted sequence in intron 2 of Glavl is located between nucleotides 754 and 755 of Glav3. Glav3 Glavl
-1688 -1575
gctgcg$gagagcaaggagagagcagaacttgaggtataagcgttgtttttttctatttactgtatggactaacagaaatacaatgtaatattttcattttgcaagcatga t a c a 9 C 9 a ttccgatatgtcaaaaaaaaatgaaaaacaatgttgttatga~gt------------------atgctcttgctagacat~acgagcgcatgcttcctaaaaatgttttcg 9 9 cggaatcatgtgcattgt 9 gaataagagattctccaaa$~aa$tgt$tttgtagacgacaacttttggtaaattggatggagaaatggtggtaattatggttgatcttatacctgatgttggtactcca 9 a t c c t a aagattaatacaatgtataaatcagcaaatgaaccaacctttcctatagctttataaaaaaaatgagttatgagtattttaagctttaattttattatgcgaaacgctga a 9 t a 9 t t$ttgacgctaggggcagctatggaagtatcgatatgcttcaaccgcaaaaatagacttagactttgtgactaggggctacac$ttttatgtttta$tttttttgaca
-
9 C attagattaaatctgctctctccttatgctttgacaatgctctttgcaagagtaaatatcaagggatc$aaa-gaagaaggcagatcaatttttccagcggactgctaaaac t C acatgctgatt-----c----- ggggaacataggggaggctatcaatttttgtcaggatgcaacttaatttttctagcggactgctaaaacacatgctgatcgacacgattcg gagga agatt t 9 gggaacttattaggggaggctatccttctggcctccaagagaataatcacgtctctt$tttag$atcttaccttgtatagctgggcgctaggatagggacatttccaac t t
-
---
ttt$gacttggttcacaccac$taagatgggatccttcataattaacttggcttcgttggtgtttcgtggcccaacaa
a c tatttg$catgt$tgcactccctccttccaattctcttttagtttatttttggacgtctatttttgtcttctccttttaagtcccttataatgcttcttttc$atc-tcC a Cacctgacaatacaaattggacagagagtccaatagatagcaattacaactaatttaatatcgttatcaacgtttggcatcaagtaaagcaacaaatttt$ttgtacact
--
C
C
atcttgtattctacttaggcggattattcaccatcataagtttcttatc~tggaa$attcgtgtgtcgatatggacatgactatccactttacactttatgttatatgga
Glav3.prot Glavl.prot
M
A
T
T
S
F
P
S
M
L
F
Y
ACTTTTGCATTTTCCTCTTGTTCCATGGTTCCATGGCTCAGTTGTTTGGCCAGA~TCTACTCTGGCGCTCTCGTCMGTTTGTGTGCATTTGATAGG A F C I F L L F H G S M A Q L F G Q S S T P W Q S S R Q G G L R G C R F D R
L
Q
A
F
E
P
L
R
Q
V
R
S
Q
A
G
I
T
E
Y
F
D
E
Q
N
E
Q
F
R
C
T
G
V
S
V
I
R
R
V
A 4 T C G M C C T C M G C C C T T G T G C T C C C T C M T A C C A C ~ g t c t a a g t g a a t a t a g a a t t g c a t t g t c a t a c t a c a c t t
t I
E
P
Q
G
L
V
L
P
Q
Y
H
N
A
P
A
L
V
Y
I
L
Q
G
aggagtttagtggtgccaaatattaacccattcctaagttttaaatattaaacaaatatgttttattttagGTAGAGGTTTCACGGGGTTMCTTTCCCTGGATGCCCTGCG a G R G F T G L T F P G C P A ACCTTCCMCMCAGTTCCMCCATTTGATCMTCCCAGTTTCCTCAAGGTCAAAGACAAACCCAMCTATTMGGATGAGCACCAMGAGTTCMCGCTTCAAACMGGAGA C T F Q Q Q F Q P F D Q S Q F A Q G Q R Q S Q T I K D E H Q R V Q R F K Q G D S TGTTGTTGCACTTCCGGCAGGCATTGTGCATTGGTGCTACMCGACGGTGATGCACCGATTGTAGCAATCTATGTCTTTGATGTAAACMCMCGCTMTCAGCTAGMCCTA C T V V A L P A G I V H W C Y N D G D A P I V A I Y V F D V N N N A N Q L E P R GACAAAAG---------------------------------------------------------------------------------------------------------
g t a a t t a t a t a a a t t a a t c c a c a t a a c a a a t a t a t a a a t g T T Q
. . . . . . . . . . . . . . . . . . . .
K
--------------GMCCTAGACAAAAG
- - - - -
E
P
R
Q
K
K F L L A G N N K F L L A G N N A N Q $aattatacaaaa$atatattattggggtattaatg--aacttt$tttactatatccatataaatt$cagGAGTTCTTGTTGGCTGGTMCMC t9
L
c E
F
L
L
A
G
N
N
12
Tanchak et al.
Fig. 1 (concluded). 851
MGAGAGAGCMCAGTCTGGAAACMCATATTCAGTGCATTMGTGTCCMCTTCTTAGTGAGGCCCTTGGTATMGTCMCMGCAGCACAAAGGATCCAAAGTCAAAATGA T K R E Q Q S G N N I F S G L S V Q L L S E A L G I S Q Q A A Q R I Q S Q N D S G S K
- -
CCAAAGAGGT-GAGATMTTCGTGTGAGTCMGGCCTTCMTTCTTGMGCCCATTGTGTCCCMCMGTACCTGGAGAGCAGCMGTCTACCMCCMTTCAAACGCAAGM T T A A T Q R G E I I R V S Q G L Q F L K P I V S Q Q V P G E Q Q V Y Q P I Q T Q E R V V D GGACMCCMCCCMTACCAGGTAGCCCMTCMCCCMTACCAGGTAGGCAAATCMCCCCATATCMGGAGCACMTCMGTCMTACCAGCCAGGAChCTCATGffiACCA
247
-
284
T
G
Q
A
T
Q
Y
Q
V
G
Q
S
T
Q
Y
Q
V
G
K
S
T
P
Y
Q
G
G
Q
S
S
Q
Y
Q
A
G
Q
S
W
D
Q
322
Genome Downloaded from www.nrcresearchpress.com by Guangzhou Jinan University on 06/05/13 For personal use only.
v MGTTTCAACGCTTTGCAGGAAAACTTTTGTTCATTGGAGGCMGGAAAAACATTGAAMCCCCCMCATGCCGACACATACMCCCACGTGCTGGCAGGATMCACGTCTCA
N
A
I
L
S
P
F
W
N
I
N
A
H
S
V
I
Y
M
I
Q
G
ACATGCTCGAGTTCMGTCGTCMCMCMTGGCCAGACTGTATTCMTGATATTCTTCGTCGAGGACMCTGCTMTCGTACCACMCACTTTGTTGTTCTCMGMGGCAG G A T T H A R V Q V V N N N G Q T V F N D I L R R G Q L L I V P Q H F V V L K K A E S H N AGCGTGAAGGATGCCMTACATTTCATTCAAGACTMCCCAAACTCTATGGTTAGTCACATCGCAGGAAAGAGCTCCATCCTACGTCCCTTACCTATCGATGTCCTCGCCMT E
G
C
Q
Y
I
S
F
K
T
N
P
N
S
440
C
C
R
402
M
V
S
H
I
A
G
K
S
S
I
L
R
A
L
P
I
D
V
L
A
N
477
T GCATACCGCATTTCTAGACMGMGCCCGAAACCTTAAAAACMCCGAGGAGMGAGTTTGGTGCATTCACACCTAAACTTACCUAAAAGGCTTCCAGAGTTATCAAGACAT G C A Y R I S R Q E A R N L K N N R G E E F G A F T P K L T Q K G F Q S Y Q D I
C
E
E
G A
S
-
S
S
G P A
V
R
A
S
C E
515
527
v acctgaataaaaactttaccatgttatat~a~ttgttt$$cttgtacctttcttaaatttatctcttccgaataatctccttccccatccctcgattttctgcttgttttt t C acctcatgtgcatgcatattgacgagatatatgggcca$tggacagga-ttttttttcaaagtacatcttttttggtccttcctataactacactttttatttagttcttgg c '3 t ttgctaqattttttcataaattcqaaccacctaaa-ttatqaatcacataggttaaaa~tctat$caaagttgtgcaactttaataaataaaaccgtaatttttg~t~at a c tgataatgatacctttgatacaggacattgcgcactgcaactccattaagcatctctta-cacatattattgacataccgagaacacattgaagataattaatcttga~ta~a t t a t c tccttgaggacaacacacgtcgaacaatgttgtgtt$ttgttggaaggtgaggatttttcctgtagctgaaggagctacataatttatctataataattcacttatCaag t '3 t a t t t a t g a t g t g t c t t c a g t t c g a g a a t t c 2571
is removed by a signal peptidase in the endoplasmic reticulum. The formation of the disulfide bond between the acidic and basic polypeptides is also likely to occur in the endoplasmic reticulum (Chrispeels 199 1 ). Cleavage of the monomer proprotein into acidic and basic polypeptides probably occurs in the protein body itself (Adeli et al. 1984). The dominance of legumin-type globulins in oat seeds is an interesting phenomenon. The relative abundances of prolamins and globulins are not reflected by messenger RNA levels (Boyer et al. 1992; Fabijanski et al. 1985). Prolamin messenger RNAs appear to be present in the developing oat seed in at least equimolar concentrations relative to the globulin mRNAs (Boyer et al. 1992; Chesnut et al. 1989). Therefore, it appears that some type of control mechanism, most likely at the translational level, results in the synthesis of disproportionately large amounts of oat globulins relative to oat prolamins (Boyer et al. 1992; Fabijanski and Altosaar 1985).
cDNA clones have been isolated for oat globulins (Shotwell et al. 1988; Walburg and Larkins 1986) and for oat prolamins (Chesnut et al. 1989). Sequences of a genomic clone containing four oat prolamin genes, AAV45 (Shotwell et al. 1990), and of two oat globulin genomic clones, OGIE l (Shotwell et al. 1990) and ASglob.5 (Schubert et al. 1990), have also been recently published. The analysis of AAV45 and of OGI-El revealed that although oat globulin and oat prolamin (avenin) genes are coordinately expressed, their respective promoter regions do not appear to possess any conserved sequence elements that might account for their common temporal and spatial expression. The two oat globulin genomic clones, O G I - E l and ASglob.5, possess three introns. Their positions in the coding sequence are conserved compared with those of legumin genes from legumes and other dicots (Shotwell et al. 1990). OGI-El and ASglob.5 have remarkably conserved sequences (>99% identity) and provide little information on oat globulin
Genome, Vol. 38, 1995
gene diversity. Here, we report on two new genomic clones that we have isolated from a high nitrogen variety of oat, A. sativa cv. Hinoat. One genomic clone displays the conserved structure found in O G I - E l , ASglob.5, and dicot legumin clones and the other represents a variation from this conserved structure.
Genome Downloaded from www.nrcresearchpress.com by Guangzhou Jinan University on 06/05/13 For personal use only.
Materials and methods Construction and screening of an oat genomic library Genomic DNA was purified from oat (A. sativa cv. Hinoat) leaves as previously described (Gupta et al. 1992). Size fractionated DNA was cloned into XgtlO using standard protocols (Sambrook et al. 1989). Screening was carried out using a previously isolated incomplete globulin cDNA clone (unpublished data). Positive clones with inserts large enough to encompass a complete coding sequence and flanking sequences (4-5 kilobase pairs (kbp)) were subcloned into pGem-4Z (Promega) and subjected to further analysis. Sequencing of oat globulin genomic clones Both strands of positive clones were sequenced using a combination of restriction fragment subclones and overlapping deletions (Sambrook et al. 1989). Sequencing reactions were performed using a T7 DNA polymerase sequencing kit (Pharmacia), according to the manufacturer's recommended procedure. When necessary, custom-made oligonucleotides were also used as primers. Generation of full-length oat globulin cDNAs by reverse transcriptase polymerase chain reaction First strand synthesis was performed using 1-2 pg of total cellular RNA and Superscript reverse transcriptase (BRL) in a 20 pL reaction volume under the conditions recommended by the manufacturer. One hundred nanograms of the primer, dT,,HindIII (5'-CTCAAGCTTTTTTTTTTTTTTTTTTTTTT-3'), was used to prime the synthesis. Following heat inactivation of the reverse transcriptase, the reaction was diluted to a final volume of 100 p L in 1 X reaction buffer (Boehringer Mannheim) and immediately used in a polymerase chain reaction. An additional 100 ng of the dT,,HindIII primer and 200 ng of the 5' primer (5'-CTCGAATTCACCAATCCATCTTCTACAATCAC-3') were added. Two units of Taq DNA polymerase (Boehringer Mannheim) were used under the manufacturer's recommended reaction conditions. The following cycles were used: 1 X (5 min at 93°C; 30 s at 55°C; 1.5 min at 72"C), 40 X (30 s at 95°C; 30 s at 55°C; 1.5 min at 72"C), and 1 X (5 min at 72°C). DNA fragments of the expected size (approximately 1700 bp) were cloned into pGEM-4Z. Analysis of DNA sequences DNA sequence analysis was performed using the LASERGENE software package (DNASTAR). After initial sequence alignment using the ALIGN program, further optimization of the alignment was made manually. Release 73 of the EMBLlGenBank database was used for sequence searches.
Results and discussion Oat globulin genomic clones were isolated and two, Glavl and Glav3, were found to contain a complete coding sequence and varying lengths of 5 ' and 3 ' flanking
sequences. The sequences of Glav3 and Glavl are shown in Fig. 1. In Fig. 1, the 5 ' flanking sequences of Glavl and Glav3 are truncated at the point (nucleotide - 1688) where they start to diverge extensively.
Genes or pseudogenes Avena sativa is a hexaploid species (AACCDD genome) derived from the hybridization of wild progenitor species (Gupta et al. 1992; Fabijanski et al. 1990). One estimate of gene number suggests that there are as many as 50 oat globulin genes per haploid genome (Chesnut et al. 1989). Undoubtedly, some members of this gene family are pseudogenes. However, with respect to Glavl and Glav3, there are no obvious sequence defects either in the flanking regions or in the coding regions that would suggest that either of these clones represent pseudogenes. In fact, based on the results of a reverse transcriptase polymerase chain reaction (RT-PCR), Glav3 is almost certainly a clone of a functional gene. RT-PCR has been used in this laboratory to generate "full-length" cDNA clones for oat globulins. The DNA sequence of one of these clones perfectly matched the mRNA sequence predicted to be encoded by Glav3. 5' Flanking region of the Glav clones Glavl and Glav3 possess examples of the common regulatory sequences typically found in the 5' flanking regions of eukaryotic genes. Specifically, putative CAAT and TATA boxes can be seen in the 5 ' flanking region (Fig. I). In addition, a V-JUNlGCN4 sequence motif, TGAGTCA, is present at position -104 and variants of this sequence are found at positions - 13 1 and - 172 (Fig. I). These motifs have been shown to be part of protein-binding regions in other cereal storage protein genes (e.g., Kim and Wu 1990). In Gluv3, there is a 18 nucleotide AIT rich sequence (TTTTTTTGTAGAAAGTAC) in the -250 region of the clone. This sequence has undergone a duplication in Glavl with the direct repeats of this sequence being separated by three nucleotides. Within this AIT rich sequence, the sequence TGTAGAAAG is found. This sequence is a good match with the core element of the "-300 box" (TGTAAAG) (Forde et al. 1985) but has a two purine base (AG) insertion. The "-300 box" is frequently found in the promoter regions of other cereal storage protein genes (Forde et al. 1985) and is thought to be involved in the regulation of tissue-specific expression of these genes (e.g., Colot et al. 1987). The "-300 box" core element of the AIT rich sequence is found in a region displaying a significant level of dyad symmetry. Such symmetry is a common occurrence in transcriptional regulatory elements. In Glavl, the two core elements are separated by 21 nucleotides, or approximately two turns of the DNA helix. At position -275, the previously isolated oat globulin genomic clones, O G I - E l and ASglob.5, have a related variant "-300 box" core element sequence, TGTAGGAAG, with a similar two purine base insertion (GG). Since these variant core elements of the cereal "-300 box" are present in the four known oat globulin genomic sequences, they could represent a new form or component of such a box that is functional in oat. A region of 5' flanking sequence in Glavl and Glav3, corresponding to base pairs -1068 to -882 in Glav3, appears
Tanchak et al.
631
Genome Downloaded from www.nrcresearchpress.com by Guangzhou Jinan University on 06/05/13 For personal use only.
Fig. 2. Duplicated sequences in the -1068 to -882 region of Clav clones. Nucleotide sequences of the corresponding regions of Glav3 (3 in left margin of the figure) and Glavl (1 in left margin of the figure). Different segments of the duplication are indicated by regions labelled 1 , 2 , 3, and 4 . The dashes in the Glavl sequence indicate deletions relative to Glav3. The underlined cga triplets indicate sequences missing in the first occurrence of region 2 in Clav3 but which are present in all other occurrences of the repeat.
to contain a duplicated sequence approximately 70 bp in length (Fig. 2). In both Glav3 and Glavl, the two copies of the sequences are not identical but appear to have diverged slightly. The significance of this duplication is not known but this question should be amenable to study through deletion and (or) mutagenesis analysis. In addition, it is not possible to determine if this type of duplication also occurs in OG1-El and ASglob5. Their published sequences only extend as far upstream as nucleotide -944 and nucleotide -349, respectively.
The coding region of the Glav clones Within the coding region of Glav3, there are three introns ( I 17, 72, and 99 bp in length) located at the same relative positions as those in the other oat globulin genomic clones, ASglob5 and O G 1 - E l , and in other legumin clones (Shotwell et al. 1990). For all regions of the coding region except the second intron, Glavl is highly homologous to Glav3 (Fig. I ) . In Glavl, the region corresponding to the second intron is considerably longer than in Glav3 (194 bp in Glavl versus 72 bp in Glav3). Visual inspection of the relevant sequences reveals that additional potential 5' and 3' intron splice sites are present. These sites obey the GT, AG rule for intron splice sites (Mount 1982). Thus, this region of Glavl appears to consist of two short introns, intron 2' (45 bp) and intron 2" (74 bp) separated by a short exon, exon 2' (75 bp) (Fig. 3A). Intron 2" is highly homologous to intron 2 of Glav3, possessing only a single nucleotide mismatch and an additional 2 bp insertion. Therefore, the extra sequence in Glavl consists of intron 2' and exon 2' (Fig. 3B). Exon 2' consists entirely of repeats of sequence elements found elsewhere in the clone, at the 3' end of exon 2 and the 5' end of exon 3 (Fig. 3A). There are two 24 bp repeats, the second of which shares a 6-bp overlap with an additional distinct 33 bp repeat. If exon 2' is functional, these repeated elements encode for previously unreported octapeptide (K-F-L-L-A-G-N-N) and hendecapeptide (N-N-AN-Q-L-E-P-R-Q-K) repeats. These novel repeats share no homology with the imperfect oat glutamine-rich octapeptide repeats, Q-Y-Q-VIE-G-Q-S-T (consensus sequence from Shotwell et al. (1988)) or the wheat lysine-rich decapeptide
repeats (Singh et al. 1993) found in the hypervariable Cterminal region of the legumin acidic polypeptide. Furthermore, these novel oat oligopeptides are located in a position that is N-terminal to this hypervariable region. The presence of highly repeated oligopeptide units is a common occurrence in cereal storage proteins of the prolamin class (Shewry and Tatham 1990) but again these oligopeptides do not resemble the novel repeats seen in Glavl. The predicted gene products of Glavl and Glav3 are very similar in amino acid sequence. Ignoring the effect of exon 2', the Glavl gene product possesses only 16 different amino acid residues and a single, one amino acid deletion relative to the protein encoded by Glav3. The Glavl product would be either 551 or 526 amino acids in length depending on whether or not exon 2' is used. The Glav3 product is predicted to be 527 amino acids in length. Each gene product is expected to have a 24 amino acid signal peptide and a basic polypeptide that is 202 amino acids in length. The acidic polypeptide of the Glavl product should consist of 325 amino acids, if exon 2' is used, or 300 amino acids, if it is not. The acidic polypeptide for the Glav3 product is expected to be 301 amino acids in length. Results from this laboratory (Robert 1985) show that globulins with larger acidic subunits accumulate in the endosperm at the later stages of seed development in the cultivar A. sativa Hinoat. As deduced by SDS-PAGE (SDS polyacrylamide gel electrophoresis), this shift in apparent molecular mass is approximately 4 kDa. In Glavl, the expression of exon 2' would result in an acidic polypeptide that is 25 amino acids longer or approximately 3.5 kDa larger in molecular mass. It is possible that the expression of Glavl and related members of the gene family are responsible for this change in polypeptide composition.
3' Flanking region of the Glav clones Through the first 143 bp of the 3' flanking sequence, Glav3 and Glavl are identical. This 143 bp segment includes two typical polyadenylation signals and the nucleotide corresponding to the polyadenylation site itself (Fig. I). This latter site was identified through the comparison of the Glav3 clone with its corresponding cDNA. Through the
,
Genome, Vol. 38, 1995
Genome Downloaded from www.nrcresearchpress.com by Guangzhou Jinan University on 06/05/13 For personal use only.
632
Fig. 3. Structure of the region spanning intron 2 of oat globulin genomic clones. (A) Nucleotide and derived amino acid sequences in the region of the second intron illustrating structural differences between Glavl and Glav3. The one-letter code is used to represent the encoded amino acids. Hendecapeptide repeats and their corresponding nucleic acid sequences are shown in bold type. Octapeptide repeats and their corresponding nucleic acid sequences are displayed in italics. The amino acid changes resulting from the G to A transition in the repeats in region C are indicated by asterisks. These positions also correspond to the beginning of the two direct tandem repeats in region C (Fig. 3B). The first two asparagine amino acids / codons in region D represent the beginning of the repeat present in region D and the end of the second repeat started in region C. This overlap is depicted by a filled block in Fig. 3B. The amino acids shown in regions C and D apply to Glavl only. The nucleotide sequence of Glav3 is shown only where it differs from Glavl. Dashes indicate deletions in Glav3 relative to Glavl. (B) Schematic representation of the region spanning intron 2. Different regions of interest are indicated by letters. A, 3' end of exon 2 in Glavl and Glav3; B, intron 2' in G l a v l ; C, 5' end of exon 2' in G l a v l ; D, 3' end of exon 2' in G l a v l ; E , intron 2" in Glavl and intron 2 in Glav3; F, 5' end of exon 3 in Glavl and Glav3. Repeated exonic sequences are indicated by blocks, while introns are indicated by lines. Different repeats are indicated by different types of blocks (open versus vertical cross-hatched). The filled block indicates the region where the two types of repeats overlap. Region i n Fig. 3B.
Glavl Glav3 Glavl Glav3
N N A N Q L E P R Q K AACAACGCTAATCAGCTTGAACCTAGACAAAAG A
gtaattatataaattaatccacataacaaatatataaatgtttag
* Glavl Glav3 Glavl Glav3 Glavl Glav3
Glavl Glav3
*
K F L L A G N N K F L L A G AAGTTCTTGTTGGCTGGTAACAACAAGTTCTTGTTGGCTGGT ..........................................
N N A N Q L E P R Q K AACAACGCTAATCAGCTTGAACCTAGACAAAAG
.................................
gtaattatacaaaagtatatattattggggtattaatgtgaactttgttttactctatccatataaattgtcag -a
E
E F L L A G N N GAGTTCTTGTTGGCTGGTAACAAC
(B) Exon 2
Intron 2 '
Exon 2 '
I
I n t r o n 2"
Exon 3
I
Glavl
entire 3' flanking region ( 6 6 5 bp in Glav3, 666 bp in Glavl), the two clones diverge only slightly (97% identity).
Comparison of the Glav clones to two previously published oat globulin genomic clones OG1-El and ASglob.5 are the only other oat globulin genomic clones whose sequences have been published. These two clones show >99% sequence identity through the comparable 5 ' flanking and coding and 3 ' flanking
regions (not shown). (The published sequence of ASglob.5 has a shorter 5' flanking region relative to OG1-El.) Thus, the observations made below apply to both of these clones. The proposition that the Glav clones are members of the gene family that includes OG1-El and ASglob.5 is demonstrated by the similarity in the amino acid sequences of the corresponding gene products. The products of Glavl and Glav3 are very similar to those of OG1-El and ASglob.5. For example, upon maximal sequence alignment,
Tanchak et al.
Genome Downloaded from www.nrcresearchpress.com by Guangzhou Jinan University on 06/05/13 For personal use only.
Fig. 4. Alignment of 5' flanking sequences of Glav3 and OGl-El. The sequence of OGl-El is shown only where it differs from that of Glav3. Dashes represent deletions in one sequence relative to the other. The numbers on the right indicate the nucleotide position in the Glav3 sequence. The underlined A residue of the Glnv3 sequence represents the major site for the start of transcription. The sequence of the modified core element of the "-300 box" is overlined and the variations of the V-JUNfGCN4motif are boxed. GLAV3 OG1-El
CATATCGTTATCAACGTTTG-GCATCAAGTAAAGCAACAATTTTGTTTGTACACTATCTTG-TAT-TCT -465 -G GGGA C TG T CCCATAC CACGAA TCT C TC A CT A T TC A
GLAV 3 OG1-El
ACTTAGGCGGATTATTCACCRTCATAAG'rTTCTTATCCATGGAAGTATTGTGTGTCGATATGGACATGA -395 A C T A T AG - T CA'L'GT CT T TA I! G
G LAV 3 OG1-El
CTCATCCACTTT-ACACTTTATGTTATATGGATAGACTCGAGTCATGGAAGTTTGTCCACATAC-TAAAA T TGGG G T AT A T C G G
-327
GLAV 3 OG1-El
GTATTTACATGCGTCGTTGTTCATAGACCAAAGCGTAAATAAACCATGACm-GGACTCACAAGTGCCA-
-259
GLAV 3 OG1 -El
A T T T T T T T G T A G A A A G T A C A A G T A A T ~ ~ ~ , - T C A A T T ~ C G T C A C A C A A T G G A C A T C A T C T A T G-1 T 98
CC
CC T
AAC G
ATA T
G
G T
A C
A AC
GC
G ATGAGGCqT TT
G
A
TG
GG
A
G
G C
C
AG G
CA
7
GLAV3 OGI-~1 GLAV3 OG1-El
ACGAAACTTGGTGCTAAGTTT~TGTGTG~GCACTAGTTGTTAGCGGAGTAGCACGTATCTT-TGAGGC-129 TA GA GA c c A G A ACA A
~ A C I
h
7
G~GTTATAcT-----------AAACCAT-ATC---'TGAGTC~AATACATATCTTCTTATTTCTGTTA A1 TCTATTCACATTTA G TGT, C GA C A
-
455 of the 527 amino acids in the Glav3 gene product are matched in the 5 18 amino acid gene product of Asglob5 (not shown) and 454 are matched by the gene product of OGI-El. (The products of Asglob5 and O G I - E l differ at a single amino acid position, amino acid 60. An arginine in ASglob5 is replaced by glycine in O G I - E l . ) These gene products differ in length because the Glav3 product possesses an extra octapeptide repeat in the hypervariable region near the C-terminus of the acidic polypeptide. The number of such octapeptide repeats is known to vary among different oat globulin cDNA clones (Shotwell et al. 1988). In addition, the Glav3 product has a single amino acid insertion near the C-terminus of the basic polypeptide. Despite the similarity in the amino acid sequences of the various oat globulin genomic clones, there is a striking difference among the nucleotide sequences of the different clones. Although homology in the 5' flanking sequences of Glavl and Glav3 extends upstream as far as nucleotide 1688 and downstream all the way through the available 3' flanking sequence, recognizable homology between Glavl/Glav3 and OGI-EI/ASglob5 only commences in a region approximately 480 bp 5' to the start of transcription (Fig. 4) and ends in a region approximately 150 bp 3' to the stop codon (not shown). This result suggests that the sequences, which are common among clones, that regulate transcription and translation of oat globulin genes are found in this region where homology can be detected. However, it is conceivable that additional sequences outside the region of detectable homology may play a role in regulation. For example, if the two Glav clones are differentially regulated relative to O G I - E l and ASglob5, these additional
1
-75
regulatory sequences may be involved in "gene-specific" regulation. An alternative explanation for the observed extended homology between the two Glav clones is the possibility that there are subfamilies within the oat globulin gene family. Given the complexity of this gene family, it would not be surprising if the members of the oat globulin gene family could be grouped into different subfamilies perhaps corresponding to the different progenitor genomes. The divergent 5' and 3' flanking regions could then be an indication of "subfamily membership". Glavl and Glav3 would represent different members of one subfamily, while OGI-El and Asglob5 would represent members of a second subfamily. In addition, the different subfamilies may or may not be differentially regulated. The existence of these subfamily groupings can only be verified by the isolation and characterization of additional members of this gene family and by gene and (or) chromosome mapping experiments. Another alternative explanation for the extended homology between the Glav clones relative to OGI-EI/ASglob5, is that this homology merely reflects the different origins of the clones. Glavl and Glav3 are from cultivar Hinoat, while Asglob5 is from 'Solidor' (Schubert et al. 1990). The cultivar from which O G I - E l (Shotwell et al. 1990) was isolated was not reported. For this hypothesis to be correct, O G I - E l and Asglob5 would have had to have been isolated from the same cultivar or from very closely related cultivars. This hypothesis would assume that the coding sequences and important 5 ' and 3 ' flanking sequences would be conserved between clones owing to selection for the activity/functionality of these sequences.
Genome, Vol. 38, 1995 T h e more distal 5 ' and 3 ' flanking sequences a r e m o r e likely to diverge a s a result of the occurrence of random mutations, as such mutations are more likely to be neutral (i.e., having no effect on gene activity).
Genome Downloaded from www.nrcresearchpress.com by Guangzhou Jinan University on 06/05/13 For personal use only.
Summation T h e two previously isolated oat globulin genomic clones O G I - E l and ASglob.5 are >99% identical. T h e isolation of t w o additional g e n o m i c c l o n e s , G l a v l a n d G l a v 3 , has demonstrated more heterogeneity within this gene family. Glavl deviates from the structure typical for legumin genes. It appears to possess an extra intron and an extra exon. If this exon is functional, it would encode for novel octapeptide and hendecapeptide repeats. Comparison of the various genomic clones has led to the identification of potential 5 ' regulatory sequences, including a variant core element of the "-300 box" and V-JUNlGCN4 motifs. The Glav clones also possess a substantial (approximately 7 0 bp) duplication in a distal segment of the 5 ' flanking region. The significance of the duplication is not known. Most interestingly, the detectable homology between the two Glav clones extends much farther in the 5 ' and 3 ' directions than the homology seen with sequence alignments between either of the Glav clone and O G I - E l (or ASglob.5). These differences may be explained in three ways. Members of the oat globulin gene family may have common proximal regulatory sequences with the possibility of different distal "gene-specific" regulatory elements. The oat globulin gene family may consist of two or more subfamilies. Oat globulin genes from different cultivars may have common conserved coding and proximal flanking sequences but highly divergent distal flanking sequences. Additional experimentation involving genomic cloning and promoter mutagenesis will help resolve some of the issues arising from the present results.
Acknowledgements Operating (A671 1 ) and Strategic (40659) grants t o I.A. from the Natural Sciences and Engineering Research Council of Canada are gratefully acknowledged, as is its International Scientific Exchange Award to S.D.
References Adeli, K., Allan-Wojtas, P., and Altosaar, I. 1984. Intracellular transport and posttranslational cleavage of oat globulin precursors. Plant Physiol. 76: 16-20. Borroto, K., and Dure, L. 1987. The globulin seed storage proteins of flowering plants are derived from two ancestral genes. Plant Mol. Biol. 8: 113-131. Boyer, S.K., Shotwell, M.A., and Larkins, B.A. 1992. Evidence for the translational control of storage protein gene expression in oat seeds. J. Biol. Chem. 267: 17 449 - 17 457. Brinegar, A.C., and Peterson, D.M. 1982. Synthesis of oat globulin precursors. Analogy to legume 1 1 S storage protein synthesis. Plant Physiol. 70: 1767- 1769. Burgess, S.R., Shewry, P.R., Matlashewski, G.J., Altosaar, I., and Miflin, B.J. 1983. Characteristics of oat (Avena sativa L.) seed globulins. J. Exp. Bot. 34: 1320-1 332. Chesnut, R.S., Shotwell, M.A., Boyer, S.K., and Larkins, B.A. 1989. Analysis of avenin proteins and the expression of their mRNAs in developing oat seeds. Plant Cell, 1: 913-924. Chrispeels, M.J. 1991. Sorting of proteins in the secretory system. Annu. Rev. Plant Physiol. Plant Mol. Biol. 42: 21-53.
Colot, V.. Robert, L.S., Kavanagh, T.A., Bevan, M.W., and Thompson, R.D. 1987. Localization of sequences in wheat endosperm protein genes which confer tissue-specific expression in tobacco. EMBO J. 6: 3559-3564. Colyer, T.E., and Luthe, D.S. 1984. Quantitation of oat globulin by radio-immunoassay. Plant Physiol. 74: 455-456. Fabijanski, S., and Altosaar, I. 1985. Evidence for translational control of storage protein biosynthesis during embryogenesis of Avenn sativa L. (oat endosperm). Plant Mol. Biol. 4: 21 1-218. Fabijanski, S., Matlashewski, G.J., and Altosaar, I. 1985. Characterization of developing oat seed mRNA: evidence for many globulin mRNAs. Plant Mol. Biol. 4: 205-210. Fabijanski, S., Fedak, G., Armstrong, K., and Altosaar, I. 1990. A repeated sequence probe for the C genome in Avena (Oats). Theor. Appl. Genet. 79: 1-7. Forde, B.G., Heyworth, A., Pywell, J., and Kreis, M. 1985. Nucleotide sequence of a B 1 hordein and the identification of possible upstream regulatory elements in endosperm storage protein genes from barley, wheat and maize. Nucleic Acids Res. 13: 7327-7339. Gupta, P.K., Giband, M., and Altosaar, I. 1992. Two molecular probes characterizing the A and C' genomes in the genus Avenn (oats). Genome, 35: 9 16-920. Kim, S.Y., and Wu. R. 1990. Multiple protein factors bind to a rice glutelin promoter region. Nucleic Acids Res. 18: 6845- 6852. Lending, C.R., Chesnut, R.S., Shaw, K.L., and Larkins, B.A. 1989. Immunolocalization of avenin and globulin storage proteins in developing endosperm of Avena sativa L. Planta, 178: 3 15-324. Matlashewski, G.J., Adeli, K., Altosaar, I., Shewry, P.R., and Miflin, B.J. 1982. In vitro synthesis of oat globulins. FEBS Lett. 145: 208-212. Mount, S.M. 1982. A catalogue of splice junction sequences. Nucleic Acids Res. 10: 459-472. Robert, L.S. 1985. The expression of seed storage proteins in oats, other cereals and in legumes. Ph.D. thesis, University of Ottawa, Ottawa, Ontario. Robert, L.S., Cudioe, A., Nozzolillo, C., and Altosaar, I. 1983. Total solubili~ationof groat proteins in high protein oat (Avena sntiva L. cv Hinoat): evidence that glutelins are a minor component. Can. Inst. Food Sci. Technol. J. 1 6 : 196-200. Sambrook, J., Fritsch, E.F., and Maniatis, T. 1989. Molecular cloning, a laboratory manual. 2nd ed. Cold Spring Harbor Laboratory, Cold Spring Harbor, New York. Schubert, R., Baumlein, H., Czihal, A., and Wobus, U. 1990. Genomic sequence of a 12s seed storage protein gene from oat (Avena sntiva L. cv. "Solidor"). Nucleic Acids Res. 18: 377. Shewry, P.R., and Tatham, A.S. 1990. The prolamin storage proteins of cereal seeds: structure and evolution. Biochem. J. 267: 1-1 2. Shotwell, M.A., Afonso, C., Davies, E., Chesnut, R.S., and Larkins, B.A. 1988. Molecular characterization of oat seed globulins. Plant Physiol. 87: 698-704. Shotwell, M.A., Boyer, S.K., Chesnut, R.S., and Larkins, B.A. 1990. Analysis of seed storage protein genes of oats. J. Biol. Chem. 265: 9652-9658. Singh, N.K., Donovan, G.R., Carpenter, H.C., Skerritt, J.H., and Langridge, P. 1993. Isolation and characterization of wheat triticin cDNA revealing a unique lysine-rich repetitive domain. Plant Mol. Biol. 22: 227-237. Walburg, G . , and Larkins, B.A. 1986. Isolation and characterization of cDNAs encoding oat 1 2 s globulin mRNAs. Plant Mol. Biol. 6: 161-169.