shows the nt sequences of the 1741-base pair cDNA insert and the derived aa sequence. Only one long open reading frame was found, and themol wt of theĀ ...
Plant Physiol. (1992) 99, 354-355
Received for publication December 17, 1991 Accepted January 23, 1992
0032-0889/92/99/0354/02/$01 .00/0
Plant Gene Register
Nucleotide Sequences of a Soybean Complementary DNA Encoding a 50-Kilodalton Late Embryogenesis Abundant Protein1 Yue-le C. Hsing*, Zuei-ying Chen, and Teh-yuan Chow Institute of Botany, Academia Sinica, Taipei, Taiwan, Republic of China We report the sequence of a cDNA clone, pGmPM2, corresponding to a soybean mature seed-abundant mRNA (GmPM2). The characteristics of pGmPm2 cDNA are given in Table I. It was selected from a pod-dried cotyledon cDNA XZAPII library screened by differential hybridization with single-stranded cDNA probes prepared from immature seeds (35 DAF) and pod-dried seeds poly(A+) RNA (7). Figure 1 shows the nt sequences of the 1741-base pair cDNA insert and the derived aa sequence. Only one long open reading frame was found, and the mol wt of the deduced protein was similar to that predicted by hybrid select translation (7). The deduced protein is very hydrophilic, as are most other Lea proteins (3), and consists of 463 aa residues corresponding to a molecular mass of 50.6 kD. One potential polyadenylation signal, AATAAA, is found in the 3'-noncoding region, 102 nt upstream from the poly(A+) tail. A search for polypeptide homologies in data banks revealed a strong local similarity with several Lea proteins, including embryonic protein DC8 in carrot (4), Lea proteins D7 and D29 in cotton (1), ABA-induced protein PHVA1 in barley (6), and Lea protein 76 in rape (5). These genes have been shown to be expressed during late seed
development and also respond to ABA or water stress treatments. Another protein with strong local similarity is the Plasmodiumfalciparum S antigen present in the sera of some malaria-infected individuals (2). All of these proteins are soluble heat-stable proteins with a series of tandemly repeated aa domains, and most of these proteins have no Trp or Cys residues. LITERATURE CITED 1. Baker J, Steele C, Dure L III (1988) Sequence and characterization of 6 Lea proteins and their genes from cotton. Plant Mol Biol 11: 277-291 2. Cowman AF, Saint RB, Coppel RL, Brown GV, Anders RF, Kemp DJ (1985) Conserved sequences flank variable tandem repeats in two S-antigen genes of Plasmodiumfalciparum. Cell 40: 775-783 3. Dure L III, Crouch M, Harada J, Ho T-hD, Mundy J, Quatrano R, Thomas T, Sung ZR (1989) Common amino acid sequence domains among the LEA proteins of higher plants. Plant Mol Biol 12: 475-486 4. Franz G, Hatzopoulos P, Jones T, Krauss M, Sung ZR (1989) Molecular and genetic analysis of an embryonic gene, DC8, from Daucus carota L. Mol Gen Genet 218: 143-151 5. Harada JJ, DeLisle AJ, Baden CS, Crouch ML (1989) Unusual sequence of an abscisic acid-inducible mRNA which accumulates late in Brassica napus seed development. Plant Mol Biol 12: 395-401 6. Hong B, Uknes SJ, Ho T-hD (1989) Cloning and characterization of a cDNA encoding a mRNA rapidly-induced by ABA in barley aleurone layers. Plant Mol Biol 11: 495-506 7. Hsing Y-iC, Wu S-j (1992) Cloning and characterization of cDNA clones encoding soybean seed maturation polypeptides. Bot Bull Acad Sin 33: 193-201
' This research was supported in part by a grant from the National Science Council to Y.C.H. and T.C. 2Abbreviations: GmPM, Glycine max physiological mature; Lea, late embryogenesis abundant; aa, amino acid(s); nt, nucleotide; poly(A+), polyadenylate.
354
SOYBEAN SEED MATURATION POLYPEPTIDE cDNA
355
Table I. Characteristics of pGmPM2 cDNA Organism: Soybean (Glycine max L. Merrell), variety Williams'82. Function: Encodes 50-kD soybean seed maturation polypeptides (7). Clone Type; Designation: cDNA, full-length, pGmPM2. Source: cDNA library in XZAPII vector constructed from the poly(A+) RNA of 4-d pod-dried 35 DAF soybean cotyledons. Method of Identification: Differential hybridization with single-stranded cDNA probes prepared from fresh immature seeds (35 DAF) and 4-d pod-dried seed poly(A+) RNA. pGmPM2 showed a strong hybridization signal only with the homologous probe. Sequence Strategy: Single-strand DNA template; unidirectional deletion subcloning and complete dideoxy sequencing of both strands. Feature of mRNA Structure: Transcript of approximately 1800 nts as detected on northern blots; this clone of 1741 nt contains 57 nt 5' untranslated region, 1389 nt open reading frame, and 294 nt 3' untranslated region. Codon Usage: Codons not present in the cDNA: TGG(W), TGT(C), TGC(C), TTA(L), CGG(R), CTA(L), CCG(P), and CCT(P). (G + C) Content: 46.82% along entire length; 50.86% in protein-coding region. Structural Features of Protein: 463 aa. No Trp or Cys residues are present in the protein. There are internal repeats from aa 129 to 260 and aa 304 to 398, with the consensus sequences of VNKMGEYKDYAAEKAKEGKDAT and TAET.EAAK.K, respectively. Antibody: Not available. GenBank Accession Number: M80664.
CACAAAAGTGTTCCACTTGAGTGAAAAGTAGTGTGTTAAGAACTAAACAATTTTTCAATIGCGTCCAAGA
981
GGAGGCAACAAAGAAAAAGACGGCGGAGACCGCGGAGGCGGCGAAGAATAAGGCGGGGGAGATCAAGGAC E A T K K K T A E T A
71 AACAAGAGGAGCGAGCTGAGGCAGCTGCGAAAGTTGCTGCCAAAGAACTCGAACAAGTCAACAGAGAAAG Q E E R A E A A A K V A A K E L E Q V N R E R
1051
AGAGCCGCGGAGACGGCGGAGGCCGCGAAAAACAAGACCGCGGAGACCGCGGAAGTGACGAAGAATAAGG R A A E T A E A A K N K
141 AAGAGACCGTGATTTCGGTGTTGTTGCTGAACAACAACAACAACATCATCAGGAAGATCAACAAAAACGT R D R D F G V V A E Q Q Q Q H H Q E D Q Q K R
1121
CTTTGGAGATGAAGGATGCAGCGAAGGACAGGACCGCTGAGACAACGGATGCGGCGAAGCAGAAAACTGC L E M K D A A K D R T A E T
211 GGTGTAATCGGGTCCATGTTTAAGGCGGTGCAAGACACCTACGAGAACGCCAAGGAAGCTGTCGTTGGCA G V I G S M F K A V Q D T Y E N A K E A V V G K
1191
ACAGGCAAAGGAGAACACCAAGGAAAATGTGAGTGGTGCAGGTGAAACTGCAAGGAGGAAAATGGAAGAG Q A K E N T K E N V S G A G
AGAAAGAAGCTACTAATAACGCGTACAGTAATACAGAGGTTATTCACGATGTTAACATTCAGCCCGATGA
1261
CCAAAGCTTCAAGGTAAAGAAGGGTATGGGGGCCGTGGAGACAAGGTGGTGGTGAAAGTGGAAGAGAGTC P K L Q G K E G Y G G R G D K V V V K V E E
1331
GACCAGGGGCAATTGCGGAAACGCTGAAAGCCGCCGACCAGATTGCGGGACAGACCTTCAACGATGTAGG P G A I A E T L K A A D Q
1401
ACGCTTCGATGAAGAGGGTGTCGTCAATGTGGAGCGCCGCAAGAAATATTAAAACGTGATCTATGATAC R F D E E G V V N V E R R K K
1471
AACAATATTAGTATATATAGACGCATGCAGTTTATATAGTATATATTGTCATGTTGTATGTTTTTACATT
1
M
281
E
K
A
T
N
N
A
Y
S
N
T
V
E
I
D
H
V
A
I
N
K
S
P
Q
K
D
421
GACAACAACAACAACAAAACCGGTTCCAAGGTCGGAGAGTACGCAGATTACGCTTCTCAGAAGGCCAAGG
D
N
T
A
N
N
E
G
R
V
N
K
T
*
*
*.
G
D
I
S
K V
S
A
G
T
T
K
E
Y
.
D
A
D
H Y
.
I
Y S
A
D
Q
S
A
K
A
631
K
T
T
M
E
G
K
E
G
Y
D
K
A
T
N
V
K
M
E
G
Y
D
K
A
S
Y
Y
K
E
A
K
E
K
R
K
A
E
A
K
A
K
E
G
K
D
A
CTACTGTGAATAAAATGGGAGAGTATAAGGACTATGCTGCGGAGAAAACGAAAGAGGGGAAAGATGCCAC
771
TGTGAATAAGATGGGAGAGTATAAGGATTACACTGCGGAGAAGGCGAAAGAGGGGAAAGATACGACGTTG
841
GGGAAGCTTGGGGAGCTGAAGGACACGGCTTCGGATGCGGCGAAGAGGGCCGTGGGTTACTTGAGCGGCA'
911
... AGAAAGAGGAAACTAAAGAGATGGCTTCGGAGACCGCCGAGGCGACGGCGAATAAGGCAGGGGAGATGAA
N
V
N
V
G
K
E
E
E
G
E
G
T
E
G
M
M
L
K
K
K
Y
L
K
Y
K
E
D
D
K
M
D
K
Y
T
A
Y
T
A
S
D
T
A
A
E
A
A
E
T
K
K
E
A
S
E
A
A
T
A
.
~*
*.
*
A
G
V
E
E
T
K
I
K
D
N
A
K
...
E
.
E
701
T
E
K
A
D
T
A
A
R
K
R
Q
K
K
M
T
E
A
E
...
R
A
G
Q
T
F
N
D
V
G
..
GATGCTACTGTGAATAAGATGGGAGAGTATAAGGACTATGCTGCGGAGAAAGCCAAAGAGGGGAAAGATG D
A
I
GAAGAAGACGACCATGGAGAAGGGTGGAGAATACAAGGATTACTCTGCGGAGAAAGCTAAGGAGAGAAAA K
*
N
T
T
491 AAACAAAAGATGCAACGATGGAAAAAGCTGGAGAGTACACAGATTATGCTTCGCAGAAAGCGAAGGAAGC T K D A T M E K A G E Y T D Y A S Q K A K E A
561
*
K
A
S
CGTGTCGGCAACGGGGGAAGTAAGGGACATATCAGCCACAAAGACTCATGATATCTACGATTCTGCCACG S
A
T
.*
D
351
V
E
K
A
E
T
A
A
A
T
Y
G
K
D
D
K
V
N
K
G
G
A
R
K
E
K
G
GTTCTAGGCTCCGTGGGAACCATTTCAACAATAAACATTTTGCGCGTTCTTGTACACGTAGTGATGAGAA
1681
GAGATGCCTTATGGGCAGTATCATCTAAAACTTATTTTCATCCATCATAGAATTTGGATCT
L
G
S
E
AAAAATGTGTAGTACGTGTAAGGTTTTGAAGATTG
1611
T
T
L
1541 TTGGTTTGCTTGTTTACATTCTCTTC
M
K K
Figure 1. Nt sequence of the 1741-base pair cDNA insert and the derived aa sequence of pGmPM2 from soybean. Underlining, the start codon, stop codon, and potential polyadenylation sites; asterisks, the putative glycosylation sites.