(Shackelford and Strominger, 1980), p31 (Owen et al., 1981). -y (Kvist et al., 1982) or p33 (Long et al., 1983b). The in- variant chain associated with HLA-DRĀ ...
The EMBO Joumal vol.3 no.4 pp.869-872, 1984
The complete sequence of the mRNA for the HLA-DR-associated invariant chain reveals a polypeptide with an unusual transmembrane polarity
Michel Strubin, Bernard Mach* and Eric O.Longt Department of Microbiology, University of Geneva Medical School, 1205 Geneva, Switzerland 'Present address: Laboratory of Immunogenetics, National Institute of Allergy and Infectious Diseases, NIH, Bethesda, MD 20205, USA *To whom reprint requests should be sent Communicated by B.Mach
A non-polymorphic polypeptide is associated intracellularly with the a and ( chains of murine Ia antigens and of human HLA-DR antigens. The exact role and the structure of this invariant chain have not been determined so far. A cDNA clone encoding the 33 000 dalton human invariant chain has been isolated. The nucleotide sequence of a near full-length cDNA clone, together with the sequence of the 5' portion of the mRNA determined by primer-extension, are reported here. The protein structure deduced from that sequence shows an unusual feature: the presence of a hydrophobic transmembrane region near the NH2 terminus, and of two glycosylation sites near the middle, indicates that the invariant chain has a polarity of membrane insertion which is inverted relative to histocompatibility antigens and most trans-
membrane proteins. Key wordss: Ia antigen/major histocompatibility complex/invariant chain/transmembrane proteins Introduction T lymphocytes can only be stimulated by foreign antigens presented to them in association with 'self antigens of the major histocompatibility complex (MHC) (Klein et al., 1981). MHC antigens are characterized by a high degree of allelic polymorphism. Class II antigens of the MHC restrict antigen recognition by helper T lymphocytes. Expressed primarily at the surface of antigen-presenting cells, B lymphocytes and activated T lymphocytes, class II antigens consist of two noncovalently associated subunits, the ax and chains (Shackelford et al., 1982). In addition, a third subunit is associated intracellularly with the a and a chain. This subunit is nonpolymorphic and has been called the invariant chain (Ii) (Jones et al., 1979; Charron and McDevitt, 1979), MI (Shackelford and Strominger, 1980), p31 (Owen et al., 1981) -y (Kvist et al., 1982) or p33 (Long et al., 1983b). The invariant chain associated with HLA-DR antigens, the prominent class II antigens in man, has an apparent mol. wt. of 33 000 (p33). cDNA cloning has been reported for all three components of HLA-DR antigens, the chain (Lee et al., 1982; Wake et al., 1982), the polymorphic a chain (Long et al., 1982), and the p33 invariant chain (Long et al., 1983b). The p33 chain spans microsomal membranes and is associated with HLA-DR a and chains during their transport to the cell surface (Owen et al., 1981; Kvist et al., 1982). a
IRL Press Limited, Oxford, England.
It has therefore been assumed that the invariant chain plays a role in the assembly and/or transport of HLA-DR antigens. However, in the absence of p33, assembly of a and ( chains does take place in Xenopus oocytes injected with mRNA (Long et al., 1983b) and HLA-DR expression has been observed at the surface of mouse cells transfected with the genes for a and ( chains, with and without the gene for the invariant chain (Rabourdin-Combe and Mach, 1983). No amino acid sequence has been determined for p33. It is known to be rich in methionine (McMillan et al., 1981), and to carry two N-linked oligosaccharides as well as 0-linked oligosaccharides (Owen et al., 1981; Charron et al., 1983; Machamer and Cresswell, 1982; Claesson and Peterson, 1983). The expression of the invariant chain is constitutive in B lymphocytes and a recent survey of different mouse cell types has shown its presence in every Ia-positive cell examined (Koch and Harris, 1984).
Results We have isolated cDNA clones (Long et al., 1983b) and genomic DNA clones (Rabourdin-Combe and Mach, 1983) corresponding to the p33 chain gene. The p33 chain is encoded by a single gene which maps outside the MHC (Long et al., 1983b). The homologous gene for the Ii chain in mouse was also shown genetically to be unlinked to the H-2 complex (Day and Jones, 1983; Koch et al., 1982). The mRNA for p33 is 1400 nucleotides long, is present in B cells but could not be detected in T cell lines (Long et al., 1983b). We screened a size-selected cDNA library derived from mRNA of a B cell line (Wake et al., 1982) with an insert from a p33 chain cDNA clone (33-10) and isolated several long cDNA clones. Two of them are shown in Figure 1 together with the original clone 33-10. The complete sequence of clone p33-1 and the complete coding sequence of clones p33-2 and 33-10 were determined. The nucleotide sequences of the three clones were identical. The first ATG in p33-1 initiates an open reading frame corresponding to 216 amino acids (second arrow in Figure 2). In the absence of any structural information on the NH2terminal portion of the p33 invariant chain, and because the cDNA we had analyzed might not contain the sequence corresponding to the 5' end of the p33 mRNA, it was essential to determine the structure of the mRNA itself. This was done by primer-extension, using as primer a 43-bp DNA fragment shown on Figure 1. After elongation of the 32P-labeled primer by reverse transcriptase, discrete products could be resolved by gel electrophoresis. The longest product appeared as a sharp band of 110 nucleotides. This fragment was eluted and sequenced by the chemical degradation method and an unambiguous sequence was obtained. This experiment showed that the p33 mRNA extends 21 nucleotides upstream from the 869 -
M.Strubin, B.Mach and E.O.Long
described in the cDNA sequence and it is followed by an open reading frame corresponding to 232 amino acids. The reason for considering the second AUG codon as the initiator and for the numbering of the amino acids of p33 as given in Figure 2 is presented in the Discussion. The amino acid sequence deduced in Figure 2 has the expected features of p33, with 14 methionines and with two N-linked glycosylation sites at positions 114 and 120. Some important unexpected features were also found. The only stretch of hydrophobic residues compatible with a transmembrane region is near the NH2 terminus, at positions 32 to 56 (see Figure 2). Because glycosylation is known to occur in the lumen (Bretscher, 1973), the NH2 terminus of p33 must be located on the cytoplasmic side of the membrane. Basic charged residues, which are characteristic of a cytoplasmic 'anchor' region (von Heijne, 1981), are present at positions 27 and 30. Furthermore, no transient signal sequence exists in p33. Discussion The finding of two AUG triplets in the same reading frame on the 5' end of the mRNA is of special interest. The general rule with eukaryotic mRNAs is that the functional initiation codon is the first AUG of the sequence (Kozak, 1983). However, specific exceptions to the first AUG rule have been reported (Kozak, 1983). It is quite remarkable in this respect that the nucleotide sequence context around almost all functional initiator AUGs fits an 'initiator consensus sequence' a consensus sequence is ANNAUGA (Kozak, 1983). Such AUG of p33 mRNA and found around the second in-phase not around the first. On that basis, it is quite likely that the second AUG (second arrow in Figure 2) is the actual initiator codon and we have therefore numbered the amino acids of p33 accordingly (Figure 2). An interesting possibility is that both AUG codons could be used, generating two polypep-
longest cDNA insert sequenced, and it allowed us to determine the nucleotide sequence of the 5' portion of p33 mRNA (Figure 2). The segment which overlaps with the cDNA insert showed, as expected, an identical sequence. Near the 5' end, at position 10 of the mRNA, as defined by primer-extension, an additional AUG codon was identified. This first AUG codon (first arrow in Figure 2) is in-phase with the AUG just mRNA S1
AUG
3UUT
term
A3 AAA 3'
~
RE. p33- 1 p33-2 33-10 RE. p33- 1
p33-2 33-10
200
*
400
600
800
400O
600
800
*
200
J
1000 ----xI1200
bp
1200
bp
1000
Fig. 1. Representation of cDNA clones of p33 invariant chain and sequencing strategy. 33-10 was the original clone identified by positive hybrid selection (Long et al., 1983b). Clones p33-1 and p33-2 were isolated from a size-selected cDNA library by hybridization with 33-10. The restriction fragment derived from clone p33-1 and used for primer extension (PE) is represented by a box. Sites for restriction endonucleases used for sequence analysis are indicated with the following symbols: Rsal (O); Pstl (|); HindJII (y); BgII (t) and Sau3A (Y). End-labeled fragments sequenced by the procedure of Maxam and Gilbert (1980) are indicated by an x. Sequences determined by the procedure of Sanger et al. (1980) are indicated by a dot. The structure of the mRNA is diagrammed on the upper line. term: termination site; 3' UT; 3' -untranslated region. The scale is in base pairs and starts with the first nucleotide of the mRNA.
V
v
1
10
20
M D D Q R D L I S N N E Q L P M L G R R P AGA AGC AGG AGC TGT CGG GAA GAT CAG MG CCA GTC ATG GAT GAC CAG CGC GAC CTT ATC TCC MC AAT GAG CM CTG CCC ATG CTG GGC CGG CGC CCT NNTTCCCAG ATG CAC AGG AGG AGA AGC AGG AGC TGT CGG GM GAT CAG AAG CCA GTC ATG GAT GAC CAG *** *** *** *** *** *** *** *** *** *** *** *** *** *** *
M
H
R
R
R
S
R
S
30
C
R
E
D
Q
K
P
V
40
50
G A P E S K C S R G [A L Y T G F S I L V T L L L A G Q A T T A Y F L Y Q Q Q G R GGG GCC CCG GAG AGC AAG TGC AGC CGC GGA GCC CTG TAC ACA GGC 111 TCC ATC CTG GTG ACT CTG CTC CTC GCT GGC CA GCC ACC ACC GCC TAC TTC CTG TAC CAG CAG CAG GGC CGG 70
80
90
110
120
L E Q TTG GAG CAA
360
140
L P M G A L P Q G P M Q N A T K Y G N M T E D H V M H L L Q N A D P L K V Y P P CTG CCC ATG GGA GCC CTG CCC CAG GGG CCC ATG CAG AAT GCC ACC AAG TAT GGC AAC ATG ACA GAG GAC CAT GTG ATG CAC CTG CTC CAG AAT GCT GAC CCC CTG AG GTG TAC CCG CCA L K G CTG AAG GGG
240
100
L D K L T V T S Q N L Q L E N L R M K L P K P P K P V S K M R M A T P L L M Q A CTG GAC AAA CTG ACA GTC ACC TCC CAG AAC CTG CAG CTG GAG AAC CTG CGC ATG AAG CTT CCC AAG CCT CCC AAG CCT GTG AGC AAG ATG CGC ATG GCC ACC CCG CTG CTG ATG CAG GCG 130
120
60
180 170 160 R H L K N T M E T I D W K V F E S W M H H W L L F E M S R H S S F P E AGC TTC CCG GAG AAC CTG AGA CAC CTr AAG AAC ACC ATG GAG ACC ATA GAC TGG AAG GTC TTT GAG AGC TGG ATG CAC CAT TGG CTC CTG TTT GAA ATG AGC AGG CAC TCC 216 210 190 200 K P T D A P P K E S L E L E D P S S G L G V T K Q D L G P V P M ATG TGA GAGCAGCAGAGGCGG CCC GTC CCA GGC GAT CTG AAG CAG ACC GTG CTG GGT GGG TCT CCG TCT AAG CCC ACT GAC GCT CCA CCG AAA GAt; TCA CTG GA CTG GAG GAC 150 N L
TCTTCAACATCCTGCCAGCCCCACACAGCTACAGCMCTTGCTCCCTrCAGCCCCCAGCCCCTCCCCCATGTCCCACCCTGTACCTCATCCCATGAGACCTGGTGCCTGGCTCMCGTCACCCT'rGTACAAGACAAACCAAGTCGGAACAGCAGATA
480
600
723 882
Ar-AATr,C-W,CAAr,GCCCT(;CT(;CCCAATCTCCATCTGTCAACAGGGGCGTGAGGTCCCAGGAAGTGGCCAAAAGCTAGACAGATCCCCGTTCCTGACATCACAGCAGCCTCCAACACAAGGCTCCAAGACCTAGGCTCATGGACGAGATGGGAAGGCAC 1041 Ar4T.Ar",C,AATAACCCTACACCCAr-ACCCCAGGCTC,GACATGCTGACTGTC=CCCCTCCAGCCMGGCCTTGGCTTTTCTAGCCTATrrACCTGCAGGCTGAGCCACTCTCTTCC=CCCCAGCATCACTCCCCAAGGAAGAGCCAATGTT 1200 T'rGCACCCATAATCCMCTGCCGACCCCTAGTrCCCTCTG
Fig. 2. Nucleotide sequence corresponding to the complete p33 mRNA. The sequence is shown in the 5' to 3' direction of the mRNA strand. The sequence deduced from the primer extended fragment is shown below the sequence of the cDNA. The nucleotides corresponding to the restriction fragment used as primer are indicated by asterisks. Nucleotides are numbered starting with the first nucleotide of the mRNA as defined by primer extension. The deduced amino acid sequence is given in the single-letter code. The numbering of amino acids starts with the second ATG codon (see text). The amino acids from the first to the second ATG are represented by italic characters. The transmembrane region is boxed. The putative polyadenylation signal is underlined. The first two nucleotides (unidentified) are designated as NN.
870
1320
Nucleotide sequence of the mRNA for HLA-DR-associated invarant chain
Inv NH2
NH2
NH2
cytoplasm Fig. 3. Linear model of the p33 invariant chain showing the inverted polarity of membrane insertion. For comparison, the a and ,B chains of the HLA-DR antigen are represented in the same way. The cysteine residues are indicated by dots and the methionine residues by dashes. The N-linked oligosaccharides are represented by 'forks'.
tides differing only in a 16 amino acids NH2 segment. A specific initiation at each of the two AUG codons might be related to the two forms of the invariant chain observed in intact cells, p33 and p35 (Long et al., 1983b; Charron, 1983). In fact, when hybrid-selected p33 mRNA is translated in a cellfree system and in Xenopus oocytes, the product consists of two polypeptides differing in size by 2000 daltons (Long et al., 1983b). The possibility of alternative initiations is being explored further but in such studies it is important to consider that the preferential use of one of two possible initiation codons under cell-free conditions does not necessarily reflect the in vivo situation (Giorgi et al., 1983). The important structural features of the invariant chain presented in Figure 2 are the finding of a transmembrane region near the NH2 terminus and of glycosylation sites closer to the COOH end of the chain. This allows us to conclude that the polarity of membrane insertion of the p33 chain is inverted relative to that of histocompatibility antigens (Figure 3). Presumably the hydrophobic transmembrane portion serves as a membrane attachment signal. The rest of the molecule can then be translocated as a loop across the membrane (Sabatini et al., 1982). Because there is no apparent hydrophobic 'stop-transfer' sequence (Sabatini et at., 1982; Blobel, 1978) in the rest of the molecule, the carboxy terminus probably reaches the lumen (Figure 3) where it undergoes glycosylation. It is therefore entirely possible that the glycosylated COOH portion of p33 ends up on the external side of the cellular membrane. Most cell surface proteins, like membrane immunoglobulins, histocompatibility antigens and most viral antigens, have an NH2-terminal signal sequence and are translocated across microsomal membranes during translation. As a result, the cytoplasmic portion corresponds generally to the carboxy terminus of the chains. The p33 invariant chain therefore represents an exception, and joins the few transmembrane proteins which have a cytoplasmic NH2-terminal portion such as the neuraminidase of influenza virus (Fields et al., 1981), a
chicken hepatic lectin (Drikamer, 1981), intestinal isomaltase (Frank et al., 1978) and protein band 3 of erythrocyte plasma membrane (Steck, 1978). With the use of the second AUG as initiation codon, this inversed transmembrane polarity defines an NH2 cytoplasmic tail of 30 amino acids, which is in excellent agreement with the loss of 3000 daltons from the invariant chain observed after protease digestion of microsomal vesicles (Kvist et al., 1982). Two computer programs have been used to search the Dayhoff protein databank for either local or global homologies (Wilbur and Lipman, 1983) with the p33 invariant chain sequence. No significant homology was found. In particular, there is no homology with the chains of histocompatibility antigens. The exact role of the p33 invariant chain is not clearly established. The physical association with HLA-DR chains in the intracellular compartment has suggested a function in the assembly and/or transport of class II antigen to the cell membrane (Owen et al., 1981; Kvist et al., 1982). We have recently observed examples of a coupling in the induction of expression of the invariant chain gene with those of class II antigens in response to Py interferon (de Preval and Mach, in preparation). In two other cases however, HLADR-negative lymphocytes of patients with congenital immunodeficiency (Bernstein et al., in preparation) and HLA-DR-negative B cell variants obtained by mutagenesis (Long et al., 1984), we have observed expression of the invariant chain gene in the absence of DR a and ,B chain gene expression. It is likely that the availability of cloned genes for the invariant chain will facilitate studies on the exact role of this polypeptide. Materials and methods Analysis of cDNA clones Screening of a cDNA library and preparation of plasmid DNA was as described elsewhere (Long et al., 1983a). DNA sequencing was performed by the procedure of Maxam and Gilbert (1980) and of Sanger et al. (1980) after subcloning of restriction fragments in the mp8 and mp9 derivatives of phage M13. Analysis of mRNA by primer-extension The 43-bp primer was labeled with [y-32P]ATP to a specific activity of 3 x 105 c.p.m./pmol and 140 x 103 c.p.m. were hybridized to 12 Ag poly(A)+ RNA. Hybridization and extension with avian myeloblastosis virus reverse transcriptase were performed as described (Ghosh et al., 1980), except that hybridization was done in the absence of formamide. The 110 nucleotides band corresponding to the extended primer was eluted from a 8% acrylamide-8 M urea gel and submitted to DNA sequencing by the Maxam and Gilbert procedure.
Acknowledgements This research was supported by the Swiss National Fund for Scientific Research.
References Blobel,G. (1978) FEBS Proc. Meet., 43, 99-108. Bretscher,M.S. (1973) Science (Wash.), 181, 622-629. Charron,D.J. and McDevitt,H.O. (1979) Proc. NatI. Acad. Sci. USA, 76, 6567-6571. Charron,D.J., Aellen-Schulz,M.F., Gene,J.S., Erlich,H.A. and McDevitt, H.O. (1983) Mol. Immunol., 20, 21-32. Charron,D.J. (1983) in Pierce,C.W., Cullen,S.E., Kapp,J.A., Schwartz, B.D. and Schreffler,D.C. (eds.), Ir Genes, Past, Present and Future, Humana, Clifton, NJ, pp. 185-190. Claesson,L. and Peterson,P.A. (1983) Biochemistry (Wash.), 22, 3206-3213. Day,C.E. and Jones,P.P. (1983) Nature, 302, 157-159. Drikamer,K. (1981) J. Biol. Chem., 256, 5827-5839. Fields,S., Winter,G. and Brownlee,G.G. (1981) Nature, 290, 213-217. Frank,G., Brunner,J., Hauser,H., Wacker,H., Semenza,G. and Zuber,H. (1978) FEBS Lett., 96, 183-188. Ghosh,P.K., Reddy,V.B., Piatak,M., Lebowitz,P. and Weissman,S.M. (1980) Methods Enzymol., 65, 580-595. 871
M.Strubin, B.Mach and E.O.Long Giorgi,C., Blumberg,B.M. and Kolakofsky,D. (1983) Cell, 35, 829-836. Jones,P.P., Murphy,D.B., Hewgill,D. and McDevitt,H.O. (1979) MoL Immunol., 16, 51-60. Klein,J., Juretic,A., Baxevanis,C.N. and Nagy,Z.A. (1981) Nature, 291, 455-460. Koch,N. and Harris,A.W. (1984) J. Immunol., 132, 12-15. Koch,N., Hammerling,G.J., Szymura,J. and Wahl,M.R. (1982) Immunogenetics, 16, 603-606. Kozak,M. (1983) Microbiol. Rev., 47, 1-45. Kvist,S., Wiman,K., Claesson,L., Peterson,P.A. and Dobberstein,B. (1982) Cell, 29, 61-69. Lee,J.S., Trewsdale,J. and Bodmer,W.F. (1982) Proc. NatI. Acad. Sci. USA, 79, 545-549. Long,E.O., Wake,C.T., Strubin,M., Gross,N., Accolla,R.S., Carrel,S. and Mach,B. (1982) Proc. Natl. Acad Sci. USA, 79, 7465-7469. Long,E.O., Wake,C.T., Gorski,J. and Mach,B. (1983a) EMBO J., 2, 389394. Long,E.O., Strubin,M., Wake,C.T., Gross,N., Carrel,S., Goodfellow,P., Accolla,R.S. and Mach,B. (1983b) Proc. Natl. Acad. Sci. USA, 80, 57145718. Long,E.O., Mach,B. and Accolla,R.S. (1984) Immunogenetics, in press. Machamer,C.E. and Cresswell,P.J. (1982) Immunology, 129, 2564-2569. Maxam,A. and Gilbert,W. (1980) Methods Enzymol., 65, 499-560. McMillan,M., Frelinger,J.A., Jones,P.P., Murphy,D.B., McDevitt,H.O. and Hood,L.J. (1981) J. Exp. Med., 153, 936-950. Owen,F.J., Kissonerghis,A.M., Lodish,H.F. and Crumpton,M.J. (1981) J. Biol. Chem., 256, 8987-8993. Rabourdin-Combe,C. and Mach,B. (1983) Nature, 303, 670-674. Sabatini,D.D., Kreibich,G., Morimoto,T. and Adesnik,M. (1982) J. Cell. Biol., 92, 1-22. Sanger,F., Coulson,A.R., Barrell,B.G., Smith,A.J.H. and Roe,B.A. (1980) J. Biol. Chem., 143, 161-178. Shackelford,D.A. and Strominger,J.L. (1980) J. Exp. Med., 151, 144-165. Shackelford,D.A., Kaufmann,J.F., Korman,A.J. and Strominger,J.L. (1982) Immunol. Rev., 66, 133-187. Steck,T.L. (1978) J. Supramol. Struct., 8, 311-324. von Heijne,G. (1981) Eur. J. Biochem., 120, 275-278. Wake,C.T., Long,E.O., Strubin,M., Gross,N., Accolla,R.S., Carrel,S. and Mach,B. (1982) Proc. Natl. Acad. Sci. USA, 79, 6979-6983. Wilbur,W.J. and Lipman,D.J. (1983) Proc. NatI. Acad. Sci. USA, 80, 726-730. Received on 13 January 1984
872