Beckman 890C sequencer using0.1 M Quadrol and single-acid cleavage (7, 8). ..... Comparison of the lentil and jack bean lectin sequences al- lowsone to ...
Proc. Nati. Acad. Scd. USA Vol. 75, No. 3, pp..136-1139, March 1978
Biochemistry
Common ancestor for concanavalin A and lentil lectin? (amino acid sequence/homology/proteolytic processing)
A. FORIERS, R. DE NEVE, L. KANAREK, AND A. D. STROSBERG Medische en Speciale Biochemie, Scheikunde der Proteinen, Pathologische Biochemie, Vrije Universiteit Brussel, Paardenstraat 65, B-1640 Sint-Genesius-Rode, Belgium
Communicated by Elvin A. Kabat, December 12,1977
ABSTRACT The primary structure of the a subunit from Lens culinaris lectin was determined by analysis of tryptic peptides and was shown to consist of 52 amino acid residues. The molecular weight calculated on the basis of the sequence is 5928. The whole chain is homologous with the region between positions 72 and 121 from concanavalin A. The NH2terminal sequence of the P chain, determined by automated Edman degradation, is homologous to another portion of the concanavalin A molecule, between positions 123 and 165. Comparison of the 94 residues from the lentil lectin a and P chains with concanavalin A reveals the existence of 43 identities. Thirty-four other homologies could have arisen, each by a single nucleotide substitution. This extensive homology suggests that the lentil lectin a and P chains may be proteolytic fragments from a single polypeptide chain of the same length as concanavalin A. Plant and animal lectins possess specific receptor sites for carbohydrate (1, 2) and react with glycoproteins in solution or on cell membranes. In many instances they are mitogenic and cause blast transformation and movement of receptors in cell membranes, resulting in patching and capping. Although the specific activity for binding oligosaccharides is becoming better known for many lectins, few structural data are available for these proteins. We have recently shown the existence of extensive homologies among the Leguminosae lectins in a study which included proteins from pea, lentil, soybean, peanut, and red kidney bean (3). To investigate further such homologies, we have studied the sequences of the lentil a and f3 chains and compared them to concanavalin A (Con A), the only lectin whose complete amino acid sequence is known (4). Here we report the primary structure of the lentil a chain and the NH2-terminal sequence of 42 residues of the f# chain. A possible explanation is suggested for the extensive but apparently "shifted" homologies with concanavalin A. MATERIALS AND METHODS Lens culinaris lectin was isolated from a commercial sample of lentil seeds by the method of Hayman and Crumpton (5). Isoelectric focusing to separate the isolectins was performed with a Multiphor electrophoresis unit (LKB, Sweden) in granulated gel (6). Gel filtration, to separate the a and ,B subunits, was performed at room temperature on Sephadex G-75 in the presence of 6 M guanidine-HCI. Protein in the eluted fractions was estimated spectrophotometrically at 280 nm. The fractions containing protein were desalted by gel filtration on a Bio-Gel P-2 column. Sequence analyses were performed by automated Edman degradations on 0.05-0.1 ,umol of the purified subunits with a Beckman 890C sequencer using 0.1 M Quadrol and single-acid cleavage (7, 8). Phenylthiohydantoin amino acids were identified by gas/liquid chromatography (9), thin-layer chromatography (10), and amino acid analysis after back hydrolysis with 56% HI (11) on a Durrum 500 analyzer.
The a subunit (-5 ,mol) was digested with trypsin (trypsin treated with L-1-tosylamido-2-phenylethyl chloromethyl ketone) in 1% NH4HCO3 for 3 hr at 370 with enzyme-to-protein weight. ratios of 1:50. Digestion was terminated by gel filtration of the reaction mixture on a Sephadex G-25 and a Bio-Gel P-10 column (1 X 200 cm) in 0.1 M acetic acid. The absorbance of the eluents at 230 or 280 nm was monitored spectrophotometrically. Fractions containing peptides were pooled and lyophilized. Peptides were also fractionated by high-voltage electrophoresis at pH 6.5 (12). Digestion of peptides with carboxypeptidases A and B was performed in 1% NH4HCO3 with enzyme-to-peptide weight ratios of 1:50. Digestion was terminated by addition of 10 M acetic acid. The amino acid sequences of peptides were determined qualitatively by Edman degradation and dansylation (13). RESULTS NH2rTerminal Sequence of the a and , Subunits. The NH2-terminal sequence of the a and ,B subunits was determined by using the automated Edman degradation in a protein sequencer as described (7, 8). The number of positions sequenced was equal to 34 for the a chain and 42 for the # chain. Identification of the residues occupying these positions was subsequently confirmed by sequence-determination of the tryptic peptides. Amino Acid Composition. The amino acid composition of the a chain is presented in Table 1. The total number of residues based on a molecular weight of 6000 (14) was found to be 52, in agreement with the number of residues identified by the sequence studies. The molecular weight of the a subunit calculated on the basis of the sequence is 5928. Analysis of Tryptic Peptides from the a Chain. When the a chain was subjected to trypsin digestion, six peptides were obtained instead of the expected four (considering the two lysine and one arginine residues). The partial separation of these peptides is presented in Fig. 1, their composition in Table 2, and their partial or complete sequence in Fig. 2. Tryptic digests of the a subunit were initially fractionated on a Sephadex G-25 column in 0.1 M acetic acid. Gel filtration of fraction A on a Bio-Gel P-10 column in the same solvent gave peptide T2. Similar treatment of material in fraction B gave peptides T1 and Tia. Peptides TIb, Tl,, and TId were purified from material in fraction C by high-voltage electrophoresis (pH 6.5). The unexpected peptides resulted most probably from a chymotryptic-like cleavage between a tyrosine and a threonine residue on the one hand and from the incomplete tryptic cleavage of a lysyl-aspartyl peptide bond on the other hand. The proposed chymotryptic cleavage is unexpected in that modified trypsin was used. The incomplete cleavage of lysylaspartyl peptide bonds has been described and discussed (15).
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U. S. C. §1734 solely to indicate this fact.
Abbreviation: Con A, concanavalin A. 1136
Biochemistry:
Foriers et al.
Proc. Natl. Acad. Sci. USA 75 (1978)
1137
Table 1. Amino acid composition of the a subunit* Residues*
Amino acid
H
S
Aspartic acid Threoninet Serinet Glutamic acid Proline Glycine Alanine Valine Isoleucine Leucine Tyrosine Phenylalanine Histidine Lysine Arginine
2.28 5.03 7.14 5.50 1.93 4.22 4.20 6.67 1.08 3.22 1.05 3.20 3.01 1.47 1.00 2.00 52§
3 5 7 6 2 3 4 7 1 3 1 3 2 2 1 2 52
Tryptophant Total
E C
cc"1.2
0
Tic -Tld tS0 3 21020
0.4
150
* The numbers represent the values obtained from 24-hr hydrolysates (H) and from the sequence (S). t Values for threonine and serine are calculated after extrapolation to 0 time.
270
330
Effluent volume (ml) FIG. 1. Gel filtration of a tryptic digest of the a subunit on a column (1 X 20 cm) of Sephadex G-25 in 0.1 M acetic acid. Each tube contained 3.0 ml of effluent. The solid line denotes the absorbance of effluent fractions at 230 nm. Letters A, B, and C designate the fractions pooled for further purification. Peptides obtained from each fraction are indicated.
Determined spectrophotometrically.
§ Based on a molecular weight of 6000 (14).
COOH-Terminal Sequence. The COOH-terminal sequence was determined by the combined use of carboxypeptidase A and carboxypeptidase B. Four residues were placed providing the COOH-terminal sequence: -Thr-Ser-Lys-Ser-COOH. The determination of this sequence allowed the-placing of all the tryptic peptides and provided the appropriate overlapping sequence. Isolectins. Two isolectins as mentioned by Howard et al. (16) were found by isoelectric focusing, with isoelectric points of 7.6 and 8, respectively. Although these isolectins were sequenced together, no differences in structure were found in the whole a subunit or in the NH2-terminal sequence of the # subunit.
210
- DISCUSSION The complete amino acid sequence of the lentil a chain has been obtained by the use of a sequenator and by purification and sequencing of tryptic peptides. All the appropriate overlapping sequences were obtained. An interesting feature of the NH2-terminal sequence is the quasi-repetition of residues from positions 6 to 11 and 12 to 17: 6 11 -Leu-Asn-Glu-Val-Val-Pro12 17
-Leu-Lys-Asp-Val-Val-Pro-
Table 2. Amino acid composition of tryptic peptides from the a subunit Amino acid
T,
Aspartic acid Threoninet Serinet Glutamic acid Proline Glycine Alanine
1.6 1.8 1.4 2.2 1.7
Valinel
6.2 (6)
Isoleucine Leucine Tyrosine Phenylalanine Histidine Lysine Arginine
Tryptophan§ Total
(2) (2) (1) (2) (2)
Tia 1.9 (2) 1.4 (1)
Tryptic peptide residues* TIb Tic 1.2 (1) 1.4 (1)
1.8 (2) 1.7 (2)
TId
0.8 (1) 1.1 (1)
0.8 (1)
1.0 (1) 0.8 (1)
1.0 (1) 1.0 (1)
T2 0.7 2.8 5.4 3.5
(1) (3) (5) (4)
3.5 (4) 3.7 (4) 1.5 (2) 0.5 (1)
0.9 (1) 0.7 (1)
(1) 21
4.7 (5)
1.1 (1)
1.8 (2)
1.6 (2)
2.7 (3)
0.7 (1)
1.7 (2)
0.5 (1) 1.0 (1)
1.0 (1)
2.7 (3) 2.1 (2) 1.0 (1)
0.8 (1) 1.2 (1) 0.6 (1)
0.9 (1) (1)
(1) 17
4
9
8
(1) 31
* The numbers represent the values obtained from 24-hr hydrolysates (H) and their nearest integers (in parentheses). Amino acids present at a level less than 0.2 residue are omitted. t Values for threonine and serine are calculated after extrapolation to 0 time. The value for valine is corrected for incomplete hydrolysis. § Presence of tryptophan was determined by staining with p-dimethylaminobenzaldehyde.
1138
Proc. Natl. Acad. Sci. USA 75 (1978)
Biochemistry: Foriers et al. a subunit:
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 1 2 3 Val -Thr -Ser -Tyr -Thr-Leu-Asn-Glu-Val -Val -Pro -Leu-Lys-Asp-Val -Val -Pro -Glu -Trp-Val -Arg-
TaT
I -,
-~
-~
~7~7-7 w-7 -7-
-
-7-
--7
-
-7
-T7
-7
7
I- -7I
--
7
-
-7d
IT-
I-
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 Ile -Gly-Phe-Ser -Ala-Thr-Thr-Gly-Ala-Glu -Phe-Ala -Ala-Gln-Glu-Val-His -Ser -Trp-Ser -Phe-
~~~~~~~~~~T2
I
43 44 45 46 47 48 49 50 51 52 Asn-Ser -Gln-Leu-Gly-His -Thr-Ser -Lys- Ser 'Nz
q- 'T-
j3 subunit: 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 1 2 3 Thr -Glu -Thr-Thr -Ser -Phe -Ser -Ile -Thr-Lys -Phe-Ser -Pro -Asp-Gln-Gln-Asn-Leu-Ile -Phe-Gln22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 Gly -Asp-Gly-Tyr -Thr-Gly -Lys -Glu-Gly-Leu-Thr-Leu-Thr-Lys -Val -Ser -Lys -Glu -Gly-Thr-GlyFIG. 2. Amino acid sequence of the a and # lectin subunits from L. culinaris, showing residues identified by dansyl-Edman degradation (-) by carboxypeptidase digestion (-), and by automatic sequencer analysis ---
More striking are the homologies found between the whole
a chain and the region between positions 72 and 121 of Con A.
Because we had also found homologies between the NH2-terminal portion of the lentil (3 chain and yet another part of Con A, we have represented all the homologous portions of the lentil and jack bean lectins in Fig. 3. Alignment of the 52 a chain and 42 # chain residues with corresponding portions of the Con A molecule between positions 72 and 165 reveals the existence of 43 identical residues for both lectins. These identical residues appear to be concentrated in stretches of 10-30 residues. For instance, 14 of 20 amino acids between positions 79 and 98 (Con 72 Con A
A numbering) are identical. Similarly, 18 identities are found between positions 133 and 163. In contrast, all nine residues between position 99 and 107 are different. Only 19 residues of the 94 are neither identical nor due to a possible single nucleotide substitution. More detailed analysis of the sequence homology between the a chain and its homologous portion in Con A shows a striking preservation of charged residues. Indeed, both contain the same five positive and five negative residues. The number of aromatic residues is almost the same. The a chain contains six aromatic residues, one more than the Con A portion. Two
80
Ala Thr Ser
[
90
Val Ser Tyr AspE Asp LeuAsn Asp Val Leu Pro Glu Trp Val Arg Val Gly Leu
Val Thr Ser Tyr Thr Leu Asn GlulVal Pro L Lys Asp Val Val Pro Glu Trp Val Arg Ile Gly Phe 100 110 Con A Ser Ala Ser Thr Gly Leu Tyr Lys Glu Thr Asn Thr Ile Leu Ser Trp Ser Phe Thr Ser Lys Leu Lys Ser
Lens
Lens
II I I a~~~o30 Ser Ala Thhr Gly iAla Glu Phe Ala Ala Gln Glu Val His
Con A
As Ser Thr His GI Thr Asp Ala Leu His Phe Met Phe Asn Gin Phe Ser Ly Asp Gin Lys Asp Leu Ile
Lens
Thr
Con A
150 Leu Gin Gly Asp Ala Thr Thr Gly Thr
120
Lens
Lys Ser
a40
iSer
Trp Ser Phe
I G AsniGin
130
TJ Glu Thr Thr Ser
I J Gly His
140
Ser Ile Thr Lys Phe Se Pro Asp Gin Gin As Leu Ile 160
Asp~GAsn Leu Glu Leu Thr Arg Ser Ser Asn Giy Ser Pro 1320 13301 II I I 113401 Phe Gin Giy As Gly Tyr[Tb2.li Lys Giu I (ly ]ejThr [jLe Tj Lys [al Se Lys Glu IGyIThr Gly
FIG. 3. Amino acid sequences of the lectins from Con A and L. culinaris. The sequences in the boxed positions are identical. Deletions between
positions 74 and 75 in Con A and between 30 and 31 in the ft chain of the L. culinaris lectin are introduced in order to maximize the homology-
Biochemistry: Con A F-
Lc Ps
SB
PN
.72
Proc. Natl. Acad. Sci. USA 75 (1978)
Foriers et al. i123
237,
ocI1 ( I
-i
,1
,1
la
I
I
FIG. 4. Various lengths of the homologous lectins and positioning of the subunits, showing the shifted homologies with Con A. The arrow indicates the position of the natural proteolytic cleavage occurring in Con A. Full lines indicate known residues. Lc, L. culinaris; Ps, Pisum sativum; SB, soybean; PN, peanut.
deletions were introduced to allow maximal homology. A two-residue deletion was placed in Con A in a region between 74 and 78, already characterized by a stretch of four residues different in lentil lectin and Con A. The other single-residue deletion was placed in the subunit in a part of the chain very homologous between both molecules. Position 122 was left blank in lentil lectin: it marks the separation between a and The ,B chain with its 126 residues is slightly longer than the corresponding portion 123-237 of Con A. The same may be true for the other lectins, whose homologies we have discussed in a previous paper (3). The various lengths, based on the published molecular weights and their positioning according to their homology with Con A, are schematically depicted in Fig. 4. All the lectins in this figure are found in the seeds of the family Leguminosae. The very extensive homologies and the alignment of the a and , chains with portions of Con A situated between positions 70 and 160 suggest that the lentil and jack bean lectins have evolved from each other. It is possible that the lentil lectin is synthesized as a single polypeptide chain and only cleaved subsequently into two or possibly three fragments, two of which would be a and ,B and the third a fragment homologous to portion 1-70 from Con A. This fragment may be further degraded postsynthetically or lost during the preparation of the lectin. The NH2-terminal sequences of soybean and peanut lectins correspond with the NH2-terminal sequences of the lentil and pea , subunits and thus with the sequence of Con A starting at position 123. The hypothesized postsynthetic degradation or preparative artifact may have caused the loss of a 122-residue-long NH2-terminal fragment from soybean and peanut lectins. Considering its length, however, it may also be argued that genetic rather than regulatory mechanisms are involved. Postsynthetic cleavages have been observed in Con A (17, 18) and other lectins (19) and appear to be quite common in plant seeds. These suggest the existence of regulatory mechanisms based on proteolytic processing (20). In Con A this proteolytic cleavage occurs at position 118, suggesting that this region is probably exposed to the solvent both in the jack bean lectin and in the hypothetical lentil precursor polypeptide chain. The cleaved jack bean lectin appears to have the same crystal structure and biological activity as the intact protein (18). The same is likely to be true for the processed lentil lectin in comparison to the uncleaved molecule. Comparison of the lentil and jack bean lectin sequences allows one to predict some common features of the tridimensional structures. Residues 103-130 from Con A constitute two antiparallel chains, and the corresponding residues in lentil lectin a
(3.
1139
may do the same, as predicted by analysis according to Chou and Fasman (21). The hydrophobic cavity defined by the binding in Con A crystals of fl(o-iodophenyl)-D-glucopyranoside may in part by conserved because leucine-81, valine-89, phenylalanine-lIl, and serine-1113 are common to both proteins. In contrast, two of the three residues involved in Con A dimer interaction, lysine-114 and lysine-116, are replaced in the lentil lectin, which probably explains why this protein is only found in the form of dimers (54,000 daltons) and not tetramers (108,000). No predictions are made about the residues involved in sugar or metal binding because these appear to be localized, in Con A, in regions not yet sequenced in the lentil lectin. In conclusion, the comparison of the jack bean and lentil lectins strongly suggest a common evolutionary origin, evidenced by the conservation of major features of the proteins, such as primary and secondary structure, hydrophobic regions, and specificity for sugar binding. We thank Dr. N. Sharon for helpful discussions. We thank Ms. Marleen Van Der Linden, Mr. Willy Verheulpen, Mr. Ignace Caplier, and Mr. Urbain Lion for their excellent assistance. We are grateful to Mr. D. J. Perry and Dr. M. N. Margolies (Boston) for helping in the determination of the COOH-terminal sequence and to R. Zeeuws for the isoelectric focusing. We thank Dr. L. E. Mole for helpful suggestions. This work was supported by grants from the Belgian Government, the "ASLK Cancer Fund," the "Fonds voor Kollektief Fundamenteel Onderzoek," and the "Fonds voor Onderling Overlegde Akties."
1. Sharon, N. & Lis, H. (1972) Science 177,949-959. 2. Lis, H. & Sharon, N. (1977) in The Antigens, ed. Sela, M. (Academic Press, New York), Vol. 4, pp. 429-529. 3. Foriers, A., Wuilmart, C., Sharon, N. & Strosberg, A. D. (1977) Biochem. Blophys. Res. Commun. 75,980-986. 4. Cunningham, B. A., Wang, J. L., Waxdal, M. J. & Edelman, G.
M. (1975) J. Biol. Chem. 250, 1503-1512.
5. Hayman, M. J. & Crumpton, M. J. (1972) Biochem. Biophys. Res. Commun. 47,923-930. 6. Radola, B. J. (1974) Biochim. Biophys. Acta 386,181. 7. Edman, P. & Begg, G. (1967) Eur. J. Biochem. 1, 80-91. & Brauer, A. W., Margolies, M. N. & Haber, E. (1975) Biochemistry
14,3029-3035.
9. Pisano, J. J. & Bronzert, T. J. (1969) J. Biol. Chem. 244, 55975607. 10. Summers, M. R., Smythers, G. W. & Oroszlan, S. (1973) Anal. Biochem. 53, 624-628. 11. Smithies, O., Gibson, D., Fanning, E. M., Goodfliesh, R. M., Gilman, J. G. & Ballantyne, D. L. (1971) Biochemistry 10, 4912-4921. 12. Offord, R. E. (1966) Nature 211,591-593. 13. Gray, W. R. (1972) in Methods in Enzymology, eds. Hirs, C. H. W. & Timasheff, S. N. (Academic Press, New York), Vol. 25, pp.
333-344. 14. Foriers, A., Van Driessche, E., Wuilmart, C., Kanarek, L. & Strosberg, A. D. (1977) FEBS Lett. 75,237-240. 15. Smith, D. G. (1967) in Methods in Enzymology, ed. Hirs, C. H. W. (Academic Press, New York), Vol. 11, pp. 214-231. 16. Howard, I. K., Sage, H. J., Stein, M. D., Young, N. M., Leon, M. A. & Dyckes, D. F. (1971) J. Biol. Chem. 246, 1590-1595. 17. Wang, J. L., Cunningham, B. A., Waxdal, M. J. & Edelman, G. M. (1975) J. Biol. Chem. 250, 1490-1502. 18. Wang, J. L., Cunningham, B. A. & Edelman, G. M. (1971) Proc. Natl. Acad. Sci. USA 234,283-295. 19. Lotan, R., Lis, H. & Sharon, N. (1975) Biochem. Biophys. Res. 20.
Commun. 62, 144-150. Neurath, H. & Walsh, K. A. (1976) Proc. Nati. Acad. Sci. USA 73,38253832.
Fasman, G. D. (1974) 13, 22221. Chou, P. Y. &BIochemetry 245.