Jul 25, 2018 - proinsulin moiety indicating that the converting en- zyme system ... sequences obtained from three overlapping recombinant plas- mids.
THEJOURNAL OF 6lOLOGlCAL CHEMISTRY Vol. 256, No. 14, Issue of July 25, pp 7595-7602, 1981 Printed in U.S.A.
Messenger RNA Sequence and Primary Structureof Preproinsulin in a Primitive Vertebrate, theAtlantic Hagfish * (Received for publication, March 12, 1981)
Shu Jin Chan+@,Stefan 0. Emdinn, SimonC. M. Kwokg, Janet M. Kramerg, Sture Falkmerll, and Donald F. Steinerg From the §Department of Biochemistry, University of Chicago, Chicago, Illinois 60637, the IDepartment of Pathology, Universitv of Umed. S-90187 Umed. Sweden,and theHDepartment of Pathology, Universityof Lund, Malm0 General Hospital,"S-21401i a l m o , Sweden'
Messenger RNA encoding the preproinsulin of a cyclostome, the Atlantic hagfish (Myxine glutinosa), has been cloned in the plasmid pBR322. Insulin-related clones were identified using a cDNA probe generated from hagfish islet mRNA, and threeof these were analyzed to establish thenucleotide sequence of the entire coding region and most of the 5' and 3*flanking regions. The mRNA of this primitive vertebrate species is over 900nucleotides long due to thepresence of an unusually large 3*nontranslated segment of over 500 nucleotides. The nucleotide sequence homology within various regions of the coding sequence, in comparison with those of the human, rat I and 11, chicken, and anglerfish preproinsulin mRNAs, are respectively preregion, 25%, B-chain, 48%, C-peptide, 19%, and A-chain, 578, thus underscoring the marked variations that can occur in the ratesof evolutionary change in functionally different regions within a gene that specifies a single protein. Hagfish preproinsulin is slightly larger than itsmammalian counterparts, being 115 residues long. The prepeptide region is partially homologous with the prepeptides of higher vertebrate preproinsulins but contains all those structural determinantspreviously identified as important for its presumed function and cleavage during secretory protein segregation (Steiner, D. F., Quinn, P. S., Chan, S. J., Marsh, J., and Tager, H. S. (1980) Ann. N. Y. Acad Sci 343, 1-16). The B- and Achain regions predicted from the mRNA sequence agree with the known primary structure of hagfish insulin (Peterson, J. D., Steiner, D. F., Endin, S. O., and Falkmer, S. (1975)J. Biol. C h m . 250, 5183-5191) except at 2 positions in the A-chain region (Al5 and AI7) where the free acids are encoded rather than their amidated forms. The C-peptide region, while similar in overall size and composition, exhibits no significant sequence homology with the C-peptides of higher vertebrates. Of particularinterest, however, isthe conservation of paired basic residues at the cleavage sites within the proinsulin moiety indicating that the converting enzyme system, consisting of trypsin-like and carboxypeptidase B-like enzymes, antedated the origin of the vertebrates. These results along with other studies on the three-dimensional structure, receptor-binding activity, and biological activity of hagfish insulin all support the conclusion that the insulin molecule and its precursor forms were structurallyand biologically well defined more than 500 million years ago. * This work was supported by Grants AM 13914 and 12X-718 from the National Institutes of Health and the Swedish Medical Research Council, respectively, and by grants-in-aid from the Kroc Foundation, $ Recipient of United States Public Health Service Postdoctoral Fellowship AM 06154.
Insulin is one of several important anabolic hormones whose integrated actions are essential for normal growth and metabolic regulation in higher organisms. Recent evidence indicates thatinsulin is not a uniquemolecule but is a memberof a superfamily of structurally related peptide hormones and growth factors which includes relaxin (I), insulin-like growth factors I and I1 (2), and probably also nerve growth factor (3). Although the origins and interrelationships of thesesubstances hasnot been clarified, it is considered likely that they have arisenduring evolution from a common ancestral protein through gene duplication and diversification (4). The emergence of insulin must have occurred well over 500 million years ago, antedating theappearance of the vertebrates, since it occurs as a hormoneand appears to function similarly in all vertebrate species (5). Moreover, evidence has been reported for theinvolvement of insulin-like substances inthe regulation of carbohydrate metabolism in several invertebrate phyla, although the chemical identity of these substanceshas not yet been established (6, 7). Recentadvancesinrecombinant DNA technology have made it possible to isolate and sequence the mRNAs and chromosomal genes coding for unique protein products. Utilizing these techniques, cDNA copies of the mRNAs encoding human (8, 9), rat (10-12), and anglerfish (13) preproinsulins have been cloned and their structureshave been determined. In addition, the chromosomal genes for rat (14, 15), human (16,17),and chicken (18) insulins have been sequenced following their isolation from phage genomic libraries. However, relatively little information is available on the structures of the genes encoding insulin precursors in primitive vertebrates. A suitable animalfor such studies,the hagfish, is a member of the class Agnatha of the phylum Chordata, which diverged from the main line of vertebrate evolution approximately 500600 million years ago and thusoccupies a key branch point on the evolutionary tree, separating the vertebrates from the invertebrates (5). It is the most primitive vertebrate from which insulin has been isolated and well characterized. Previous studies from our laboratories have established the primary structure of hagfish insulin (19), and its three-dimensional structure in crystals has recently been determined to 2.3 A resolution by Cutfield et al. (20). These studies show that, despite an overall difference of38% in the amino acid sequences of porcine and hagfish insulins, the spatialorientations of the peptide backbones of both theA- and B-chains of the hagfish hormone are closely similar to those of porcine insulin. We report here the molecular cloning and nucleotide sequence determination of the mRNA coding for preproinsulin in the hagfish. We also present the complete amino acid sequence of hagfish preproinsulin derived from protein se-
7595
7596
Hagfish Preproinsulin mRNA Sequence and Primary Structure
J I
- 68,000 I 1
,
,'H'Alonme
I
- 2 5,700
I
- 14,300
1 40
30
I
SEOUENCER CYCLE CD
R
o?
0
"
0
L
- 9, I O 0
~ ~ 14'
~
,
' 4 4
,
, J
~ Ad'
~
4
~
,
4
,
,
~
,
~
~
~
,
,
~
acc"~~oa~a+>uu+rruo
,
FIG. 2. Partial amino acid sequence of hagtish preproinsulin. Hagfish preproinsulin, labeled with ["Hlleucine, [:'H]alanine, or
FIG. 1. Analysis of hagfish islet mRNA cell-free translation products on SDS-polyacrylamide slabgel. Hagfish islet poly(A) RNA (1pg) was translated in a wheat germ cell-free reaction mixture containing [:'"S]methionine for 1 h; after precipitation with trichloroacetic acid. the products (+) wereanalyzedona 16.5% polyacrylamide-SDS slab gel and autoradiographed as described (22). Control lane (-) shows results obtained without added mRNA. Protein size markers werebovineproinsulin (9,100). lysozyme (14,300), chymotrypsinogen (25.700). and bovine serum albumin (68,000).
quencing studies and analyses of the cloned cDNA nucleotide sequences obtained from three overlapping recombinant plasmids. These studies provide the first detailed information on long range evolutionary changesin preprohormone sequences and their cleavage sites. The results suggest that precursor processing mechanisms were already well established at an early stage in vertebrate evolution.
[.'S]methionine, was isolated from cell-free translation reactions by immunoprecipitation and subjected to automated sequential Edman degradation as described under "Materials and Methods." An aliquot of the productsreleased after eachcycle wascounted for radioactivity in OCS scintillation fluid (Amersham). The alignment of leucine identified a t cycles 31.36. and 40 with the first three leucine residues of the hagfish B-chain (underlined) indicated the prepeptide segment was 25 residues long, excluding the initiator methionine. Therelease of [:'S]methionine radioactivity in cycle 1 was considered to be an artifact since it accounted for only a small proportion of the total (120,000 cpm) added and a sharp ['Hlalanine peak was also found in this cycle.
radioactively labeled products were analyzed by electrophoresis on SDS'-polyacrylamide slab gels as described previously (22). Specific immunoprecipitation was performed in the presence of 1% sodium deoxycholateand 1% Triton X-100, using guinea pig anti-hagfish insulin sera prepared in our laboratories (12, 23). For radioactive proteinsequencing, larger scalecell-free translation reactions were performed and hagfish preproinsulin was isolated by immunoprecipitation. The immunoprecipitates were disrupted by boiling in 1%SDS, precipitatedwith trichloroacetic acid, washed with MATERIALSANDMETHODS ethanol and ether, resuspended in 50% acetic acid, and subjected to Enzymes-Reverse transcriptase was provided by J. Beard (Life automated Edman degradationin a Beckman model 890 C Sequencer Sciences, Inc., St. Petersburg, FL). Escherichiacoli DNA polymerase (12). Molecular Cloning-Double-stranded cDNA was synthesized from I was obtained from Boehringer Mannheim. Calf thymus terminal transferase was a gift of M. S. Coleman (University of Kentucky). S1 poly(A) RNA by the sequential actions of reverse transcriptase and nuclease was obtained from Sigma. Restriction endonucleases were E. coli DNA polymerase I and treated with S1 nuclease to generate blunt-ended molecules (10). The DNA was then fractionated on a 5% purchased from New England Bio-Labs or Bethesda Research Labpolyacrylamide gel and size fractions between 500-1200 base pairs oratories andused according to the manufacturers' recommendations. RNA Isolation-Hagfish were captured at the Kristineberg marine were excised and electrophoretically eluted.Following extraction with biology station in Fiskebackskil, Sweden, and the islet organs were phenol/chloroform/isoamyl alcohol (25:25:1) and ethanol precipitaisolated by dissection and immediately packed in dry ice. The frozen tion, the double-stranded cDNA was tailedwith approximately 15 islets were transported to Chicago and pulverized in liquid ND and residues of dCMP at the3' ends using terminal transferase under the total cellular RNA was isolated as described (12). After removal of conditions described by Roychoudury et al. (24). Similarly, pBR322 residual DNA by centrifugation through 5.7 M CsCI, poly(A)-enriched plasmid DNA was linearized by digestion with Pst I and tailed with RNA was isolated by passage through two cycles of oligo(dT)-cellu- about 10 residues of dGMP. Equimolar amounts of tailed double-stranded cDNA and pBR322 lose affinity chromatography (21). Cell-free Translation--Poly(A) RNA was translated in a cell-free protein-synthesizing systemprepared from wheat germ (12). The I The abbreviation used is: SDS, sodiumdodecyl sulfate.
,
,
~
Hagfish Preproinsulin mRNA Sequence and Primary Structure tailed with dGMI' were mixed in 150 1.11 of 10 mM Tris-CI (pH 7.4). 140 mM NaC1, 0.1 mM EDTA and annealed by warmingto 65 "Cfor 5 min,incubatingat 42 "Cfor 2 h,and slow coolingto 4.OC. The mixture was added to 0.3 ml of CaCIZ-treated E . coli JA221 (C600trpAE51eu6recAr ) and transformed as described by Lederberg and Cohen (25). T h e cells were spread on L-agar plates containing12 pg/ml of tetracycline.andresistantcolonieswereobtainedafter overnight growth at 37 "C. Colony hybridization was performed as described by Grunstein and Hogness (26) with modifications. Bacterial colonies were imprinted onto nitrocellulose filters (Millipore). denatured in situ with 0.5 N NaOH. neutralized for 5 min each in 1 M Tris-Cl, pH 7.4, and in 1.5
7597
M NaCI, 0.5 M Tris-CI, pH 7.4, and blotted dry. The filters were then 70 "C for 2 h and prehybridized in 4 X baked in a vacuum oven at SET ( 1 x S E T = 0.15 M NaCI, 0.03 M Tris-CI, pH 8, 0.2 mM EDTA). containing 5 X Denhardt's solution (27). 20 pg/ml of denatured calf thymus DNA. 0.lr; SIX,0.1'i sodium pyrophosphate at 65 "C for 4 h. Hybridizations were performed in sealed plastic bags containing '"1'-labeled probe ( 6 X 10" cpm/ml) in the above solution at65 "C for 24 h. Afterward, the filters were washed twice in 2 X SET, O,lci SIIS, 0.1% sodium pyrophosphate at65 "C, twice in 1 x SET at 65 "C, once in 1 X SET at room temperature and were blotted dry and autoradiographed using an intensifying screen ( I h p o n t Lighting Plus). OtherTechniques-PlasmidDNAwasisolated by the cleared lysate method (28) and further purified by gel filtration on Sepharose 4Bandbuoyantdensitycentrifugation in CsCI-ethidiumbromide (29). DNAsequenceanalysiswasperformed using the chemical modification method descrihed b.v Maxam and Gilbert (30).
RESULTS
Identification a n d P a r t i a lAmino Acid Sequence of Hagfish Preproinsulin-In the hagfish, the insulin-producing /Icells are localized in a grossly visible "islet organ" located near the entry of the bile duct into the duodenum ( 5 , 30). Islet organs from approximately 2000 hagfish were collected and poly(A) RNA was prepared asdescribed under "Materials and Methods." Fig. 1 shows the cell-free products translated from hagfish islet poly(A) RNA resolved by SDS-polyacrylamide gel electrophoresis and autoradiography. There is a predominant band of M , = 13,000, although on prolonged exposure of the film, additional proteins of both higher and lower molecular weights appeared. The M , = 13,000 band is consistent with the expected size of hagfish preproinsulin; moreover, this material was specifically immunoprecipitated with antisera directed against hagfish insulin and was displaced by unlabeled hormone (data not shown). We then obtained a partial sequence of the NHp-terminal region of the M , = 13,000 protein by means of radiosequencing. After labeling with [,"H]leucine, ["Hlalanine, or ["Hlproline FIG. 3 . Colony hybridization of hagfish islet recombinant and [:"S]methionine, the cell-free product was isolated by clones. Bacterial colonies were immobilized on nitrocellulose filters immunoprecipitation and subjected to automated sequential and hybridized with total '"1'-labeled cIINA reverse-transcribed from Edman degradation. The results shown in Fig. 2 suggest that hagfish islet poly(A) RNA as described under "Materials and Methods." In the representative autoradiograph shown, the x-ray film was hagfish preproinsulin contains a prepeptide segment of 25 amino acid residues in length with leucine residues at positions developedafter6-hexposureat -70 "C. Arroursindicatethree stronglyreactivecolonies,designated pH139. pH144. and pH188, 2, 7, 12, 14, 15, and 16, alanine residues at positions 1, 7, 8, 19, which were analyzed further. and 23, and no methionine residues. Since the NH2-terminal 5' Uniranrlaled Re-pepiide 0-chain I
C-peptide
-
o n
cow m mg
1
(HPO XI)
H p a U Hoed n 2
c 0
Hpa II H hHahI a I
We I Hinf
Nmon ,T,,T,G~
n o g
Haem
g g nn
I
n n
n n
We I
Dde I
0
0
n
m
1
-
pH144
3' Uniranrlaied
I
Avo lI H h o I D d e I Hwll HoCh Hinf
I
A-chain
pH 139
1 pH 324
-
+
I
(Avo 5 II-1 1
(Avo
5
(Hinf) I (Hpa II)
U)
I
3A)
(Hinf)
(Sou
>
I
-
(Dde I )
P (Hpo It)
(Alu
I) f
iw
(Hpa II)
FIG.4. Restriction map and sequencing strategy for hagiish preproinsulin cDNA-containing plasmids. T h e restriction map was constructed from the inserts contained in pH144. pH139, and pH324 which overlap as shown. Arrows indicate DNA fragments which were sequenced by the method of Maxam and Gilbert(30) after labeling the 5' end at the restriction site shown within parentheses.
Hagfish Preproinsulin mRNA Sequence and PrimaryStructure
7598
-20
-26
Met A l aL e uS e r 5'
Pro P h eL e u
CcMCUUGUCGCA(;UACK;GCAOOGAGhCCAUUAUOCAMAGCAACCMAUGCX*;CUCUCAOCAUUC(XRI
I
I
-40
-10
A l aA l a
Val I l e P r o L e u
I
I
1
-20
20
-1
Val L e uL e uL e uS e rA r gA l aP r oP r o
+1
His
Ser A l a A s p T h r A r g T h r T h r G l y
COCGCAGuGAUACOC(XNGuGCUCUU:(XT;AGUMGGCACCACCAAGUGCAcAUACAOOCMCMCoccCAU
1
I
40
I
80
60
10
20
L e u Cys G l yL y sA s pL e u
Val A s nA l aL e uT y r
lle Ala
Cy8
C l y Val Arg G l y P h e P h e T y r A s p P r o T h r
C U U U O C ~ ~ c k c C U C ~ M U G C A ~ U ~ A ~ C a : u a C o ( 1 A C V V a ; V o ( 1 A U U C V U C U h c c A U O C A M C
I
140
I
I
100
120
I
30
40
50
Ser G l n
Pro L e u A l a T y r A l a G I u A s p A s n G l u
L y s Met L y s A r g A s p T h r G l y A l a L e u A l a A l a P h e L e u
AAGAuCAMOOC~Aa:QohocAuU:oIxlocAuLRl~CCAuuGaa3uAuoIxloK:~UMcGAGucGcM
I
I
I
220
200
180
60
70
Glu A s pA s pG l u
Ser I l e G l y I l e
Asn C l u Val L e uL y sS e rL y sA r gG l yI l e
Val C l u G l n
C y sC y s
His L y s
G cAUGAUGAGucCAuAGGAAuA~GMGu;(xIoMGAM:MGa3GGGAAuCGuGGMcMwx3uGccAcMc
I
I
I
260
240
I
280
300
89
80
Tyr Cys Asn
A r g Cys Ser l l e T y r A s p L e u G l u A s n
aX:vcVuX:A~ruU:~uU:~lucurU:uaCAACuohlucGuCa3GaX:AuC~Ca:vaC(XRICAUAAC
I
Gu;
CAA GUU UGC ACG AAA ACC AUG UAG
I
820
1
I
720
TOO
I
840
380
UCC UUA AAA UCU Affi AUA UCC UAU GM AUU UGU
GUG A W CAG UUU CUA AGC CUG UUU GUU UAU UOC UCA AUA AAU UGU
I
I
360
SI0
680
(xx;
I
I
310
I
860
UUU UG
CUU CCA AUC
I
740
3'
Hagfish Preproinsulin mRNA Sequence Primary and residue was found to be alanine (see Fig. 2), we inferred that an initiator methionine residue was rapidly and efficiently removed by aminopeptidase(s)presentin the wheat germ extract. In analogous studies on the cell-free synthesis of rat preproinsulin and pregrowth hormone,we have demonstrated that the NHz-terminalMet-Ala bond is highly susceptible to cleavage in this system (32). mRNA Cloning-The remaining amino acid sequence of hagfish preproinsulin as well as thenucleotide sequence of the mRNA was obtained by constructing recombinant plasmids containing double-stranded cDNA copies of the mRNA and cloning these in E. coli. The dC/dG tailing technique (33) was used and thecDNA was inserted into the Pst I site of pBR322. The principal advantage of this procedure is that the Pst I site is reconstructed during cloning which facilitates the subsequent excision of the inserted DNA. In practice, both flanking Pst I sites were regenerated in 80% of the hagfish cDNA clones that we examined. Starting from 90 ng of double-stranded cDNA, 345 transformed (tetracycline-resistant) colonies were obtained. To identify those clones containing hagfish preproinsulin sequences, we performed in situ hybridization on colonies immobilized on nitrocellulose filter paper (26). As the probe, we fust used a 315-base pair Hha/Hznf fragment of cloned rat preproinsulin I cDNA labeled with [32P]dCMPby nick translation (34). This probe contains most of the coding sequence of rat proinsulin I. However, even though hybridization was performed under low stringency conditions (1M NaC1,50 "C), no colonies were observed which gave consistent signals above background. Alternatively, we hybridized with "P-labeled total hagfish cDNA. On the basis of the cell-free translation results (Fig. l), we inferred that hagfish preproinsulin cDNA is probably the most abundant unique sequence represented in the total cDNA mixture and that clones containing this sequence would react more strongly with the probe. Fig. 3 shows the autoradiogram from a representative Titer hybridization using total hagfish islet cDNA as the probe. Three colonies on this filter reacted strongly with the cDNA probe and a larger number reacted wtih lower intensities. From all 345 transformants, 8 clones strongly reactive to total hagfish cDNA were obtained.Subsequently,one of these, pH144, was further studied since this clone also hybridized weakly to a cloned rat insulin A-chain DNA fragment.* NucleotideSequence Analysis-DNA sequence analysis revealed that pH144 contained the complete structural sequence coding for hagfiih preproinsulin. When all the transformants were rehybridized with the insert from pH144, 6 of the 8 previously selected clones again reacted strongly. The remaining 2 clones apparently contain cloned inserts of other abundant hagfish islet mRNAs but these have not yet been characterized in greater detail. Fig. 4 shows the restriction map for the hagfish preproin-
Structure
7599
sulin structural gene derived from analysis of 3 of these clones designated pH144, pH139, and pH324. The restriction map and DNAsequence analysis indicate thatthese 3 clones overlap and the DNA sequence in the regims of overlap are in perfect sequence agreement, with one exception (see below). The DNA sequence data obtained from the overlapping clones were compiled into a composite nucleotide sequence for hagfish preproinsulin mRNA which is given in Fig. 5. The structural codons indicate that hagfish preproinsulin contains 115 amino acid residues, 26 in the prepeptide segment, 31 in the B-chain, 37 inthe C-peptide region, and 21 in the A-chain. The prepeptide segment is highly enriched in hydrophobic residues consistent withits function in vectorial discharge and segregation (35) and in agreement with the partial sequence determined by radiosequencing (Fig. 2). The deduced primary sequence of hagfish insulin is in agreement with the protein sequence determined by Peterson et al. (19) except that the codons in the mRNA sequence for residues AISand A,, specify aspartic acid and glutamic acid instead of the amide forms. We believe the nucleotide-derived sequence is correct since this sequence was found in two independent clones, pH144 and pH139, and in one clone (pH144), both DNA strands were sequenced through this region (Fig. 4). However, we did find a single nucleotide difference between pH144 and pH139 within the region of overlap which results in residue 55 (located in the C-peptide) being either Asp or Glu (Fig. 5). In addition to the coding segment, we also sequenced 49 nucleotides of the 5' untranslated region. The 3' untranslated segment is at least 519 nucleotides long and could be more extensive since we did not find the poly(A) tail characteristic of most eukaryotic mRNAs among the cloned plasmids (Fig. 5). However, the characteristic hexanucleotide sequence AAUAAA appears between nucleotides 852 and 856. Proudfoot and Brownlee (36) have previously noted that this sequence appearsto be a signal for termination of mRNA transcription in eukaryotes with poly(A) addition beginning about 20 nucleotides downstream. Moreover, in preliminary experiments we have electrophoresed hagfish islet poly(A) RNA on 1%agarose gels under denaturing conditions and transferred the RNA onto DBM-paper (37). Subsequent hybridization with pH144 revealed a broad band centering about 1050 nucleotides in length. Based on this data and assuming that thepoly(A) tailis about 100 nucleotides long, we estimate that the sequence shown in Fig. 5 is only 30-40 nucleotides short of being the full length sequence of the hagfish preproinsulin mRNA. DISCUSSION
The rat insulin DNA fragment used was an 80-base pair Hue 111 fragment isolated from plasmid pRI-11 (12). This Hue 111 fragment in pRI-11 encodes amino acid residues 72 to 93 of rat preproinsulin I1 or the entire A-chain segment of rat insulin 11. We used this fragment as a hybridization probe based on the finding that the nucleotide sequence for hagfish A-chain predicted from its published amino acid sequence (19) had a relatively high nucleotide sequence homology to the rat insulin I1 A-chain (60430%).Subsequent DNA sequence analysis showed that the actual homology is 75% (see Fig. 6).
Our strategy for identifying clones of hagfish preproinsulin mRNA was based on the likelihood that this mRNA would be the most abundant species in hagfish islets, which are composed chiefly of insulin-producing p-cells (5, 31). We therefore hybridized our clone bank with total reverse-transcribed islet cDNA and selected for further analysis only those colonies which gave the strongest signals on short term exposure of the autoradiogram. This approach proved successful as 6 of the 8 clones initially chosen were shown to contain insulin cDNA sequences. The remaining 2 may contain sequences representing other abundant hagfish islet mRNAs, such as somatostatin (5, 31), but definitive proof must await DNA sequence analysis.
FIG. 5. Nucleotide sequence of haglish preproinsulinmRNA. The composite sequence was determined from pH144, pH139, and pH324 using the sequencing strategy illustrated in Fig.4. The sequences of the fragments which overlap were in agreement, with the exception that nucleotide 243 was identified as dT(U) in pH144 and dG in pH139. This resulted in an ambiguity in the codon for residue
55of hagfish preprolnsulin between Asp and Glu. The underlined sequence in the 5' untranslated region can form a potentialsecondary structure with 18 S rRNA (AG = 5.4 kcal) (53). Also underlined is the putative recognitionsequence, AAUAAA, for transcription termination and/or polyadenylation located at theend of the 3' untranslated region.
7600
Hagfish Preproinsulin mRNA SequencePrimary and
Three overlapping clones were analyzed to construct the nucleotide sequence of hagfish preproinsulin mRNA shown in Fig. 5. The restriction map derived from these three clones yielded an unambiguous structural arrangement, and in most regions both strands of the DNA weresequenced. The overall length of this mRNA is greater than 900 nucleotides, including a sequence of 49 residues upstream from the initiation codon. This region contains no other AUG codons. Although this segment probably does not include the 5’ cap site, it does contain an oligonucleotide sequence, CUUGUCGCA, which may function as a recognition site for binding to the 18 S ribosomal RNA during translation (38). However, a stable stem structure (inverted repeat) bracketing this region, as reported for the 5’ untranslated regions of rat, anglerfish, and human insulin mRNAs (13, 14), cannot be deduced from the available sequence data. An interesting feature of the hagfish preproinsulin mRNA sequence is the extensive length of the 3’ untranslated region which we estimate to be about 530 nucleotides long exclusive of the poly(A) tail. This region contains numerous stop codons in all three reading frames. In comparison, the 3’ untranslated segments of the human, rat I and 11, and chickeninsulin mRNAs contain 74, 52, 53, and 80 nucleotides, respectively, while the anglerfish insulin mRNA contains 222 nucleotides (13-18). It is possible that the longer 3’ untranslated regions found in both hagfish and anglerfish insulinmRNAs may represent anextantfeature of the more primitive insulin mRNA or be an indication of the origin of preproinsulin from a larger ancestral protein, such as the serine proteases as suggested elsewhere (39, 40). However, the high degree of variability in this region makes it unlikely to be a reliable indicator of evolutionary relationships. In earlier biosynthetic experiments with intact hagfish islets, we identified a precursor form similar in size to mammalian proinsulins which, during chase incubation, was slowly processed into insulin (41-43). The existence of a preproinsulin was established by cell-free translation of hagfish islet mRNA (Ref. 44; Fig. 1). The primary structure of hagfish preproinsulin predicted from the mRNA sequence demonstrates that the overall length and organization of the polypeptide chain have been remarkably well conserved although, as expected, amino acid substitutions have occurred with much greater frequency in the precursor regions, especiallyin the C-peptide region. We have also isolated small amounts of hagfish proinsulin and C-peptides and carried out compositional analyses on these materials3 (43). However, a partial amino acid sequence determination of the first 16 residues of the putative C-peptide fraction isolated from hagfish islets revealed an NH2-terminal sequence4that diverged considerably from the 10 previously known C-peptide structures (4). The present studies have confirmed its identity as the hagfish C-peptide and have corroborated the amino acid assignments in the f i s t 14 positions of the connecting segment (peptide positions 3447 of Fig. 5). The lack of conserved structural features in the C-peptide region is consistent with the proposed role of the C-peptide in converting the bimolecular reaction of chain combination for sulfhydryl oxidation to a more efficient and concentrationindependent monomolecular reaction, without necessarily imUnpublished data.
S. 0. Emdin, unpublished data. Two peptides differing in electrophoretic mobilitiy were isolated. These were similar to the mRNApredicted sequence in overall composition but differed from each other in that one contained a single lysine residue, while the other contained none. These results indicate that the removal of COOHterminal basic residues occurs during the conversion of hagfish proinsulin due to exopeptidases in the maturing secretion granules, as has been shown to be the case in rat islets (52).
Structure
posing any constraints on the folding process itself (45). Severalstudies have shown that the C-peptide’s function in promoting sulfhydryl oxidation can be mimicked by simple nonpeptide cross-linkers of an appropriate size to span the relatively short distance between the €-amino group of lysine BZ9and the free a-amino group of glycine A, (46). We might then inquire why the connecting segment of proinsulin has not become shorter than the 27 residues found in dog proinsulin (47). One possibility which we have proposed (35,48) is that secretory protein precursors must maintain a minimum overall chain length in order to be efficiently segregated via vectorial discharge into thecisternae of the rough endoplasmic reticulum. In this hypothesis, the C-peptide serves essentially as a spacer region with little restraint on the amino acid sequence. In the case of insulin-like growth factors I and 11, which are structuralanalogues of proinsulin (2), the analogous C-peptide region is only 11 residues long. However, in insulin growth factors, the A-chain region is extended at its COOH terminus such thatthe overall length of the moleculeis conserved. One structuralfeature which has beenconservedin all proinsulins including hagfish is the pairing of basic amino acids at thecleavage sites of the C-peptide. This suggests that the modified tryptic-like conversion mechanism for proproteins, requiring paired basic residues, may be an ancient one which antedated theappearance of the vertebrates. Moreover, the complete conversion of hagfish proinsulin to native insulin and C-peptide requires the successive actions of both a trypsin-like endoprotease as well as a carboxypeptidase B-like exopeptidase. It has been suggested by Hobart et al. (13) that in anglerfish insulin the retention of a basic residue at the COOH terminus in the B-chain5 may indicate that the conversion of fish proinsulins does not require a carboxypeptidase B-like activity. However, the retention of the lysine residue in the anglerfish P-chain could simply reflect its reduced susceptibility to carboxypeptidase B-like cleavage due to the presence of a proline residue immediately preceding it. Furthermore, it is not known whether theanglerfish C-peptide retains a COOH-terminal basic residue. The prepeptide region of hagfish preproinsulin is two residues longer than are the othersequenced preproinsulins and alsodiffers considerably in overall sequence. However, it exhibits those topological features that we have identified as important for its function in facilitating vectorial discharge (35). Thus, it contains a strongly hydrophobic central region and a predicted (50)p-turn near thecleavage site (at positions -3 to -6). The small, neutral residue at which cleavageoccurs is threonine rather than alanine in the hagfish prepeptide. The predicted amino acid sequence for hagfish insulin derived from the cloned mRNA is identical with the sequence determined by Peterson et al. (19) except for two positions in the A-chain (Al5 and AI7) where the predicted residues are the free acids rather than their amidated forms. It seems unlikely that this discrepancy indicates the existence of allelic forms of hagfish insulin. We have found no significant charge heterogeneity after electrophoresis of hagfish insulin on polyThe COOH-terminal lysine residue of the B-chain of anglerfish insulin is derived from the basic residue pair linking the B-chain to the C-peptide. This arrangement differes from that in most other vertebrate insulins in which the corresponding single lysine residue (at B29)is followed by a neutral or acidic residue (at B,) and then by the pair of basic residues which link the B-chain to the C-peptide. The BZ9lysine residue is not required for biological activity or for binding enhancement but may add stability to the molecule (49). Alternatively, its presence in mammalian insulin may enhance the cleavage rate at thenearby basic residue pair, but this also does not appear to be an essential role, as indicated by its replacement in rat insulin I1 by methionine.
Hagfish Preproinsulin mRNA Sequence and Primary Structure
7601
Human Rat IL
I
A CHAINGIy
- Ila - Val
- Gln -Cyr-Cys
Human
GGC A T T GTG GAA CAA
Rat X
GGC ATCGTGGAT
- - - Cya
- - - - Leu - Asn-Tyr-Cyr
- Asn
TGC TGT ACC AGC ATC
CAG TGCTGC
ACC AGC ATC
I Chocken
GGG A TG TTT
GAG CAA TGC TGC CAT AAC ACG
Pnglerflsh
GGC P T C GTG GAG CAG TGC TGC CAT
Hagfish
I I I I I IGAACAAI1 1 1 TGC I lTGC l 11 GGA ATC GTG
AGA CCC
I
CAC AAG CGC
FIG. 6. Comparison of theB- and A-chain coding sequences for human, Irat and 11, chicken, anglerfish, and hagfish insfins. The invariant amino acid residues are identified above the nucleotide sequence. Two oligonucleotide segments which are conserved in sequence are boxed (see text for discussion).
TABLEI Distribution of nucleotides in the third position ofpreproinsulin mRNA codons mRNA G C A U G+C 5%
Human 45 Rat I Rat I1 Chicken Anglerfish Hagfish
43 42 38 16 28 38 27 27
38 45 41 47 41
14 10 12 14
8 20 15 22 17 20
80.0 72.7 75.5 64.5 73.3 59.0
acrylamide gels, and the isoelectric pH of the structure predicted from the nucleotide sequence is more consistent with its observed electrophoretic m ~ b i l i t y However, .~ we did find a single nucleotide difference at position 243 between plasmids pH144 and pH139 (Fig. 5). This difference results in the conservative substitution of glutamic acid for aspartic acid at residue 55 in the C-peptide and may represent a true allelic difference. Our compositional data on isolated hagfish proinsulin and C-peptides also gave higher glutamic acid values than those predicted from the sequence shown in Fig. 53 (42). In comparing the coding sequences of hagfish, human, rat, chicken, and anglerfish preproinsulins, the overall homology is low. This is not unexpected in view of the high percentage of changes which have occurred in the amino acid sequences and the rapid rate of fixation of silent mutations within the codons (14). Fig. 6 shows that the homology in the A- and Bchain coding segments, which contain the greatest number of invariant amino acid residues, are 57% (36 of 63 nucleotides) and 48% (43 of90 nucleotides), respectively. However, we noticed a highly conserved oligonucleotide sequence within eachchain which may be of interest. In the B-chain, the nonanucleotide sequence encoding Phe-Phe-Tyr (Bz4-BZ6) is conserved in all six preproinsulins. Similarly, the dodecanucleotide sequence coding for Asn-Tyr-Cys-Asn (A18-AZ1)in the A-chain is conserved in five preproinsulins with a single nucleotide mismatch in the anglerfish. Although it is tempting to speculate that thtse sequences may have additional noncoding functions, such as regulation of transcription or formation of secondary structure in the mRNA, the data are
insufficient to rule out the possibility that the observed homology is due to chance alone. As illustrated in Table I, there is an apparentpreference for codons containing G or C inthe third position in the insulin genes, although this bias is less evident in the hagfish sequence. The conserved sequences in the B- or A-chain region may prove useful for identifying genes coding for insulin or insulinlike peptides in more primitive organisms and invertebrates. Using the technique of primer extension (51), DNA oligonucleotide sequences complementary tothe B~4-28or Als-21 mRNA coding sequence could be chemically synthesized and hybridized to total or partially purified mRNA isolated from organisms of interest. These primers could then be extended with reverse transcriptase and hybridized to cloned total cDNA to identify colonies containing insulin-like cDNA sequences. In further studies on the evolution of the insulin genes, it will be of particular interest to determine the positions and number of intervening sequences (introns) in these genes and toidentify and studyconserved sequences flanking the coding regions which may be involved in the regulation of their expression. We haverecentlyconstructeda hagfish genomic library, and experiments to isolate the insulin chromosomal gene(s) from this species are now in progress. Acknowledgment-We wish to thank Myrella Smith for her able assistance in preparing this manuscript. REFERENCES 1. Isaacs, N., James, R., Niall, H., Bryant-Greenwood, G., Dodson, G., Evans, A., and North, A. C. T.(1978) Nature 271, 278-281 2. Rinderknecht, E., and Humbel, R. E. (1978) J. Biol. Chem. 253, 2769-2776 3. Frazier, W . A., Angeletti, R. H., and Bradshaw, R.A. (1972) Science 176,482-488 4. Dayhoff, M. 0. (1978) in Atlas ofproteinSequence and Structure, Vol. 5, Suppl. 3, pp. 145-151 Biomedical Research Foundation, Silver Spring, MD 5. Van Noorden, S., and Falkmer, S. (1980) Znuest. Cell Pathol. 3, 21-35 6. Tager, H. S., Markese, J., Kramer, K. J., Speirs, R. D., and Childs, C. N. (1976) Biochem. J. 156,515-520 7. Marques, M., and Falkmer, S. (1976) Gen. Comp. Endocrinol. 29, 522-530 8. Bell, G., Swain, W., Pictet, R., Cordell, B., Goodman, H., and
7602
Preproinsulin Hagfish
mRNA Sebquence and PrimaryStructure
Rutter, W. (1979) Nature 282,525-527 9. Sures, I., Goeddel, D. V., Gray, A., and Ullrich, A. (1980) Science 208,57-59 10. Ullrich, A,, Shine, J., Chirgwin, J., Pictet, R., Tischer, E., Rutter, W. J., and Goodman, H. M. (1977) Science 196,1313-1319 11. Villa-Komaroff, L., Efstratiadis, A., Broome, S., Lomedico, P., Tizard, R., Naber, S. P., Chick, W. L., and Gilbert, W. (1978) Proc. Natl. Acad.Sci. U. S. A . 75,3727-3731 12. Chan, S. J., Noyes, B. E., Aganval, K. L., and Steiner, D. F. (1979) Proc. Natl. Acad.Sci. U. S. A. 76, 5036-5040 13. Hobart, P. M., Shen, L., Crawford, R., Pictet, R. L., and Rutter, W. J. (1980) Science 210,1360-1363 14. Lomedico, P., Rosenthal, N., Efstratiadis, A., Gilbert, W., Kolodner, R., and Tizard, R. (1979) Cell 18, 545-558 15. Cordell, B., Bell, G., Tischer, E., DeNoto, F. M., Ullrich, A., Pictet, R., Rutter, W. J., and Goodman, H. M. (1979) Cell 18, 533-543 16. Bell, G. I., Pictet, R. L., Rutter, W. J., Cordell, B., Tischer, E., and Goodman, H. M. (1980) Nature 284, 26-32 17. Ullrich, A,, Dull, T. J., Gray, A,, Brosius, J., and Sures, I. (1980) Science 209,612-615 18. Perler, F., Efstratiadis, A., Lomedico, P., Gilbert, W., Kolodner, R., and Dodgson, J. (1980) Cell 20,555-566 19. Peterson, J. D., Steiner, D. F., Emdin, S. O., and Falkmer, S. (1975) J. Biol. Chem. 250, 5183-5191 20. Cutfield, J. F., Cutfield, S. M., Dodson, E. J., Dodson, G. G., Emdin, S. O., and Reynolds, C. D. (1979) J. Mol. Biol. 132,85100 21. Duguid, J., Steiner, D. F., and Chick, W. L. (1976) Proc. Natl. Acad. Sci. U. S. A . 73, 3539-3543 22. Patzelt, C., Labrecque, A. D., Duguid, J . R., Carroll, R. J., Keim, P., Heinrikson, R. L., and Steiner, D. F. (1978) Proc. Natl. Acad. Sci. U. S. A . 75, 1260-1264 23. Emdin, S . O., and Steiner, D. F. (1980) Gen. Comp. Endocrinol. 42,251-258 24. Roychoudury, R., Jay, E., and Wu, R. (1976) Nucleic Acids Res. 3, 101-116 25. Lederberg, E. M., and Cohen, S. N. (1974) J. Bacteriol. 126, 1072-1074 26. Grunstein, M., and Hogness, D. (1975) Proc. Natl. Acad. Sci.U. S. A. 72, 3961-3965 27. Denhardt, D. T. (1966) Biochem.Biophys. Res. Commun. 23, 641-652 28. Guerry, P., LeBlanc, D. J., and Falkow, S. (1976) J. Bacteriol. 116, 1064-1066 29. Radloff, R., Bauer, W., and Vinograd, J. (1968) Proc. Natl. Acad. Sci. U. S. A . 59,838-845 30. Maxam, A. M., and Gilbert, W. (1977) Proc. Natl. Acad. Sci. U. S. A . 74,560-564 31. Falkmer, S., Carraway, R.E., El-Salhy, M., Emdin, S. O., Grimelius, L., Rehfeld, J. F., Reinecke, M., and Schwartz, T. F. W. (1981) UCLA Forum Med. Sci. 23,21-42 32. Chan, S. J., Ackerman, E. J., Quinn, P. S., Sigler, P. B., and
Steiner, D. F. (1981) J. Biol. Chem. 256,3271-3275 33. Bolivar, F., Rodriguez, R. L., Greene, P. J., Betlach, M. C., Heyneker, M. L., Boyer, H. W., Crosa, J. H., and Falkow, S. (1977) Gene 2,95-113 34. Maniatis, T., Kee, S. G., Efstratiadis, A., and Kafatos, F. C. (1976) Cell 8, 163-182 35. Steiner, D. F., Quinn, P. S., Chan, S. J., Marsh, J., and Tager, H. S . (1980) Ann. N. Y. Acad. Sci. 343, 1-16 36. Proudfoot, N. J., and Brownlee, G. G. (1976) Nature 263,211-214 37. Alwine, J. C., Kemp, P. J., and Stark, G.R. (1977) Proc. Natl. Acad. Sci. U. S. A . 74,5350-5354 38. Hagenbiichle, O., Santer, M., Steitz, J., and Mans, R. J. (1978) Cell 13,551-563 39. De Haen, C., Swanson, E., and Teller, D. C. (1976) J. Mol. Biol. 106, 639-661 40. Chan, S. J., Kwok, S. C. M., and Steiner, D. F. (1981) Diabetes Care 4,4-10 41. Steiner, D. F., Peterson, J. D., Tager, H., Emdin, S., Ostberg, Y., and Falkmer, S . (1973) Am. 2001.13,591-604 42. Emdin, S . O., and Falkmer, St. (1977) Acta Paediatr. Scand. SUppl. 270, 15-23 43. Steiner, D.F., Terris, S., Emdin, S. O., Peterson, J. D., and Falkmer, S. (1975) in Early Diabetes:A Symposium (CameriniDavalos, R. A., and Cole, H. S., eds) pp. 41-48, Academic Press, New York 44. Chan, S. J., Patzelt, C., Duguid, J . R., Quinn, P., Labrecque, A., Noyes, B., Keim, P., Heinrikson, R. L., and Steiner,D. F. (1979) in From Gene to Protein: Information TransferinNormal and Abnormal Cells (Russell, T. R., Brew, K., Faber, H., and Schultz, J., eds) Vol. 16, pp. 361-378, Academic Press, New York 45. Steiner, D. F. (1978) Diabetes 27, Suppl. 1, 145-148 46. Wollmer, A,, Brandenburg, D., Vogt, H. P., and Schermutzki, W. (1974) Hoppe-Seyler’s 2.Physiol. Chem. 355, 1471-1476 47. Steiner, D. F. (1976) in Handbook ofBiochemistry andMolecular Biology, (Fasman, G. D., ed) 3rd Ed, Vol. 3, pp. 378-381, CRC Press, Cleveland 48. Patzelt, C., Chan, S. J., Duguid, J., Hortin, G., Keim, P., Heinrikson, R. L., and Steiner, D. F. (1978) in Regulatory Proteolytic EnzymesandTheir Inhibitors (Magnusson, S., et al., eds) Proceedings Vol. 47, Symposium A6, pp. 69-78, Pergamon Press, New York 49. Blundell, T., Dodson, G., Hodgkin, D., and Mercola, D. (1972) Adv. Protein Chem. 26, 279-402 5 0 . Chou, P. Y., and Fasman, G. D. (1978) Annu. Rev. Biochem. 47, 251-276 51. Aganval, K. L., Brunstedt, J., and Noyes, B. E. (1980) J . Biol. Chem. 256, 1023-1028 52. Kemmler, W., Steiner, D. F., and Borg, J. (1973) J. Biol. Chem. 248,4544-4551 53. Tinoco, I. J. R., Borer, P. N., Dengler, B., Levine, M. D., Uhlenbeck, 0. C., Crothers, P. M., and Gralla, J . (1973) Nut. New Biol. 246,40-41