Related Gene Family in Streptococcus pyogenes ... - Europe PMC

2 downloads 0 Views 2MB Size Report
Apr 10, 1989 - ELIZABETH J. HAANESt AND P. PATRICK CLEARY*. Department ...... Blake, M. S., K. H. Johnston, G. J. Russell-Jones, and E. C.. Gotschlich.
Vol. 171, No. 12

JOURNAL OF BACTERIOLOGY, Dec. 1989, p. 6397-6408

0021-9193/89/126397-12$02.00/0 Copyright © 1989, American Society for Microbiology

Identification of a Divergent M Protein Gene and an M ProteinRelated Gene Family in Streptococcus pyogenes Serotype 49 ELIZABETH J. HAANESt AND P. PATRICK CLEARY*

Department of Microbiology, University of Minnesota, Minneapolis, Minnesota 55455 Received 10 April 1989/Accepted 23 August 1989

The antigenically variant M protein of Streptococcus pyogenes enhances virulence by promoting resistance to phagocytosis. The serum opacity factor (OF), produced by a subset of M serotypes, is also antigenically variant, and its antigenic variability exactly parallels that of M protein. OF-positive and OF-negative streptococci are also phenotypically distinguishable by a number of other criteria. In order to study the differences between OF-positive and OF-negative streptococci, we cloned and sequenced the type 49 M protein gene (emm49), the first to be cloned from an OF-positive strain. This gene showed evolutionary divergence from the OF-negative M protein genes studied previously. Furthermore, emm49 was part of a gene family, in contrast to the single-copy nature of previously characterized M protein genes.

The M protein, a major virulence factor of Streptococcus pyogenes (group A streptococci), enhances the pathogenicity of the organism by blocking phagocytosis in the nonimmunized host (26, 27). Antibodies to M protein develop in infected individuals and confer immunity (34). Streptococcal M protein, however, is antigenically variant, and opsonic antibodies to M protein are largely type specific. Thus, with 70 or more M protein serotypes (35), group A streptococci can successfully avoid immune surveillance. Molecular biological techniques have increased our knowledge of the structure of M protein and its antigenic variability. Five M protein genes have been cloned, and all have been at least partially sequenced (16, 24, 40, 42, 46). The M proteins encoded by these sequenced genes have a common framework which includes a conserved signal peptide, a hypervariable amino terminus, and a highly conserved C-terminus. Each gene also appears to contain some internal sequence repetitions, although the extent and degree of similarity of these repeats vary among the genes. Furthermore, regions preceding and following these M protein genes are conserved (16, 42). This suggests that the genes are situated in a common expression site on the streptococcal chromosome. Our understanding of how the M protein hypervariable regions evolved is still incomplete. Duplication and deletion of intragenic repeats present in the variable regions of M molecules is a possible mechanism for the introduction of variant amino acids into the M protein sequences (25). Recent evidence shows that intragenic recombination can indeed introduce variant opsonic determinants into the repetitive regions of the M6 protein (28). Not all of the opsonic antigenic determinants of M proteins, however, are located in the repetitive regions of the molecules (25, 32). Thus, additional mechanisms must be involved in the generation of M protein antigenic diversity. A model explaining the generation of M protein antigenic diversity must account for the fact that other streptococcal proteins exhibit antigenic diversity in a nonrandom manner relative to the M type. Among these are the T antigen, a

surface protein used in streptococcal serotyping schemes, and the serum opacity factor (OF). A given T antigen is normally associated with only a subset of M antigens (38). Approximately half of the known M serotypes express OF. This substance (probably a lipoproteinase) causes a measurable opacity in serum (22). OF is antigenically variant, and type-specific antiserum against OF will neutralize the opacity reaction (52). The antigenic variability of OF parallels that of M protein (56). For example, a strain with a type 49 M protein will always express a type 49 OF. The two antigens, however, are separable both by purification (18) and by genetic means (8). Clearly, the processes responsible for M protein antigenic diversity must in some way be coordinated with the mechanisms of diversity of these other variant surface antigens. The parallel antigenicity of M protein and OF is obviously an important consideration in fully understanding antigenic diversity in group A streptococci. In addition, OF-positive streptococci are distinct from OF-negative streptococci in a number of properties, making their study even more important. For example, the OF-positive strains are predominantly isolated from skin infections, as opposed to throat infections (37). Furthermore, Bessen et al. recently demonstrated that antibodies directed against defined epitopes in the conserved domain of M6 bound to surface epitopes of OF-negative but not OF-positive strains (4). Detailed studies of M proteins from OF-positive strains have been hampered, however, by the fact that these M proteins tend to be less immunogenic than their OF-negative counterparts (53). Indeed, all the presently cloned M proteins are from OFnegative serotypes. Thus, for several reasons, we are interested in learning more about OF-positive M protein genes and their relationship to the OF-negative M protein genes studied previously. In this study, we determined the nucleotide sequence of the OF-positive type 49 M protein (emm49). We also identified and sequenced a related gene situated directly downstream of emm49. Here we present evidence that these two genes represent a gene family that is not present in previously studied M serotypes. We also show that emm49 and surrounding chromosomal regions are distinct enough from previously sequenced M protein genes to represent a separate lineage.

* Corresponding author. t Present address: Infectious Diseases Section, Department of Medicine, University of Minnesota, Minneapolis, MN 55455.

6397

6398

HAANES AND CLEARY

MATERIALS AND METHODS

Bacterial strains. S. pyogenes CS101 was the source of DNA used in cloning the M49 gene. Strain CS103 (27), an M-negative, OF-negative variant of CS101, was used as an absorbent and as a negative control in serological experiments. Growth medium and storage for streptococci were as described previously (16). Escherichia coli C600 and an hflA variant of this strain (kindly provided by Stewart Scherer) were used for propagation of bacteriophage lambda. Extracts for in vitro packaging of bacteriophage lambda were prepared from E. coli strains BHB2688 and BHB2690 by standard methods (23). E. coli DHSa (19) was used as a host for plasmid vectors, and strain JM109 (57) was used as a host for M13 recombinants. DNA preparation. Streptococcal chromosomal DNA for cloning was prepared as previously described (20). Plasmid DNA was purified by the alkaline lysis method (36). Crude bacteriophage DNA for preliminary screening was isolated from plate stocks by a rapid method (6). Bacteriophage lambda was purified from plate stocks on cesium chloride block gradients, and DNA was extracted from the purified bacteriophage with formamide (51). Molecular cloning. Chromosomal DNA from S. pyogenes CS101 (M49) was digested completely with EcoRI under conditions allowing star activity (14). These fragments were then inserted into the bacteriophage lambda vector NM1149 (43). Recombinants were selected on an hflA mutant of E. coli C600. To ensure that the majority of the recombinant plaques contained a single insert, the ligation ratios of vector to insert were adjusted so that less than 10% of the resulting plaques were recombinants. Plaques were screened by hybridization (3) with the nick-translated plasmid pPC134, which contains the 3' conserved portion of the M12 gene (46). Hybridizations were done in a heparin-based system (49) under conditions calculated to allow 30% base pair mismatch (39). Recombinant phage DNAs that hybridized with the M12 probe were mapped with restriction enzymes purchased from Bethesda Research Laboratories (BRL; Gaithersburg, Md.) and International Biotechnologies, Inc. (IBI; New Haven, Conn.). Various DNA fragments were subcloned into plasmids pUC9 (55) and pUC18 (57) for further mapping and expression analyses, and into the M13 vectors mp18 and mp19 (57) for nucleotide sequencing. Nucleotide sequencing. DNA was sequenced by the dideoxy chain termination method (47). Overlapping singlestranded templates for sequencing were derived by the T4 polymerase deletion method (9) with enzymes and reagents purchased from IBI. Sequencing reactions were done with both the Klenow fragment of DNA polymerase I (IBI) and the modified T7 polymerase Sequenase (U.S. Biochemical Corp., Cleveland, Ohio). Some internal regions of the templates were sequenced by using synthetic oligonucleotide primers based on known sequences. These primers were synthesized on a Biosearch 8600 DNA synthesizer and were purified by high-pressure liquid chromatography by the University of Minnesota Microchemical Facility. Computer sequence analyses. The sequencing project was managed by using the GEL program provided in the IntelliGenetics software package. Similarity searches and alignments of nucleic acid and amino acid sequences were done by using the SS2 algorithm (1) contained in the Molecular Biology Information Resource software package. Protein secondary structure analyses were done with the algorithm

J. BACTERIOL.

of Gamier and Robson (15), also contained in the Molecular Biology Information Resource package. Sequence data in computer-readable form. The nucleotide sequence data presented in this article can be obtained in a variety of computer-readable forms directly from the authors. An electronic mail query to [email protected]. umn.edu is most expeditious. The query can also be directed to [email protected], with the request written in a fashion similar to a reprint request. In both cases, the data will be returned via electronic mail to the requestor. Direct retrieval can be done via anonymous file transfer protocol (ftp) to microbe.med.umn.edu, internet address 128.101.81.5. The login for this option is "anonymous," and the password is the queryer's home machine user number. The sequence is in subdirectory/pub/sequences, filename haanes.seq. In this instance (ftp), electronic mail to [email protected] noting the retrieval and real-life name and affiliation of the retriever would be appreciated. Finally, an ASCII text file will be copied onto an IBM-compatible low-density disk on receipt of such a disk via surface mail to the authors. The information distributed via all these routes is identical and is a copy of the form and data submitted directly to GenBank (accession number M23689) via electronic mail by the authors. Protein and serological techniques. Bacteriophage proteins were concentrated by lyophilization from phage supernatants as described elsewhere (45). Log-phase E. coli cells containing plasmids were concentrated 20-fold in ice-cold STE buffer (10 mM Tris hydrochloride [pH 8.0], 1 mM EDTA, 100 mM NaCl) supplemented with Triton X-100 (0.1%) and phenylmethylsulfonyl fluoride (0.01%). The cells were then subjected to five 15-s bursts of sonication to release intracellular and periplasmic proteins. Cell debris was then removed by centrifugation at 12,000 x g. Streptococcal surface proteins were extracted with mutanolysin

(29).

Rabbit antiserum raised against whole streptococcal cells was the source of anti-M protein antibodies (41). M proteinspecific serum was obtained by absorbing the type 49 rabbit antiserum exhaustively with CS103 cells. Human serum was from a healthy volunteer known to have neutralizing antibodies against the type 49 OF. The crude protein extracts were separated by 12% polyacrylamide gel electrophoresis with sodium dodecyl sulfate (SDS-PAGE) (33), electroblotted to nitrocellulose (54), and screened for reactivity with rabbit or human antibodies by standard methods (5). RESULTS Cloning of the type 49 M protein gene. Preliminary Southern hybridization studies showed that pPC134, an internal emml2 probe containing the distal two-thirds of the gene (46), hybridized to a single 6.4-kilobase (kb) EcoRI* restriction fragment from the M49 strain CS101 under conditions allowing 30% base pair mismatch (data not shown). From these results, we made a library of EcoRI* fragments in the bacteriophage lambda vector NM1149. We isolated 10 recombinant plaques that hybridized to the pPC134 probe. In addition, each of the clones that hybridized with pPC134 also hybridized to a probe derived from the cloned streptococcal CSa peptidase gene (scpA) (7; C. Chen and E. Haanes, unpublished data). We constructed restriction maps of four of the recombinant clones, and three of these inserts were identical. In addition, DNA hybridization analysis of CS101 chromo-

X z iL p1U2 (pUCI8)

o

a

6399

TYPE 49 M PROTEIN GENE OF S. PYOGENES

VOL. 171, 1989

I

.

IX

I

III

II-11

I

.

II ..

.

I

1I .

.

Iz

I

I

2

. .

.

.

.

I l

(.n (Nx P_ 0

sepA

emME9 pIL9 (pC)

L

( ena)

pIL12 (p9)

I

eMM49

L

pLII (pUC9)

plL10 (pUC9)

Gem

( enY)

JpLac

I

sepA

LAcst emm49

(ena )

scpA

lkb

FIG. 1. Restriction map and deletion analysis of cloned DNA containing the M49 gene. Clone pIL2 is the entire insert of the bacteriophage lambda NM1149 recombinant 15A inserted into plasmid pUC18. The other pIL clones are subclones of pIL2 in various pUC plasmids as indicated. pLac refers to the lac promoter in the pUC plasmids and is shown for orientation purposes. The genes that are expressed in each of the various clones are shown on the right, and the schematic location of these genes is shown at the bottom. Expression from ennX is shown in parentheses, since this gene product has not yet been fully characterized.

somal DNA restriction digests with one of the cloned inserts probe showed that these three clones contained the same-sized HaeIII, HindlIl, PstI, and BglII restriction fragments as the chromosomal DNA (see sites shown in Fig. 1). We tested protein extracts from three of the hybridizing clones for reactivity with absorbed M49 rabbit antiserum by double diffusion analysis. These recombinants, 15A, 6A, and 11A, all gave a line of identity with an M49 streptococcal mutanolysin extract (data not shown). Clone 5A, a randomly selected nonhybridizing recombinant clone, was nonreactive, as was a mutanolysin extract from M- variant CS103. None of the clones reacted with either preimmune rabbit serum or Ml rabbit antiserum. We ligated the entire 6.4-kb insert of clone 1SA into the EcoRI site of the plasmid vector pUC18, creating plasmid pIL2 (Fig. 1). Various regions of pIL2 were then subcloned or deleted, as shown in Fig. 1. Expression products from pIL2. Polyclonal rabbit serum against whole type 49 streptococcal cells reacted in Western immunoblotting with at least two distinct gene products in a sonicated protein extract of cells carrying pIL2. These gene products were separated in various subclones, as diagrammed in Fig. 1. We tentatively identified the multiple bands migrating at approximately 38 to 40 kilodaltons (kDa) in the 1SA, pIL2, pIL9, and pIL12 extracts as the M49 protein (Fig. 2, lanes 2 to 5). Each of these extracts gave a line of identity with an M49 streptococcal extract in doublediffusion assays (data not shown). Although the reason for the multiple banding pattern was not clear, the protein expressed from the cloned M6 gene also gave a multiple as a

9 8 7 6 5 4 3 2

"*I -205 SCP +

-116

-97.4 -66 -45

M49 f(38-40Kd) \-29

FIG. 2. Immunoblot showing gene products from the various clones and subclones. Lanes 1 to 9 contain transferred protein extracts resolved on 12% SDS-PAGE from the following: lane 1, 5A (negative control) lambda lysate; lane 2, 15A lambda lysate; lanes 3 to 7, sonicated extracts of E. coli DH5a containing pIL2, pIL9, pIL12, pIL11, and pUC18, respectively; lanes 8 and 9, mutanolysin extracts of CS101 (M49+) and CS103 (M49-). The blot was reacted with polyclonal M49 rabbit antiserum. Kd, Kilodaltons.

6400

HAANES AND CLEARY

banding pattern in Western immunoblotting (12). The apparent size of the M49 protein in the CS101 streptococcal mutanolysin extract (lane 8) was larger than in the E. coli extracts (42 to 45 kDa), but slower migration of M protein extracted from streptococci has been documented (13). A large protein (approx. 116 kDa) in the pIL2 extract (lane 3) and also in pIL10 (not shown) was faintly reactive with the M49 rabbit antiserum. This protein was reacted strongly with monospecific antiserum raised against purified SCP protein (C. Chen and E. Haanes, unpublished data), thus identifying it as the product of scpA. The approximate location of scpA was assigned in Fig. 1 based on the size of the protein and the localization within pIL10. The third possible gene product was separated from the other gene products in the pIL11 extract. This extract did not react with the M49 rabbit antiserum, but reacted strongly with human serum known to neutralize the type 49 serum opacity reaction (data not shown). This putative gene product appeared as multiple protein bands of lower molecular weight than the M49 gene product. It was also present in the 15A, pIL2, and pIL9 extracts, but was not present in the pIL10, pIL12, or pUC18 extracts. We provisionally call the gene encoding this product ennX, owing to its proximity to the emm gene. We have not yet reliably identified this gene product in streptococci, however, because comparably sized protein bands were not detected in the streptococcal mutanolysin extracts. Nucleotide sequencing of the pIL9 insert. We determined the nucleotide sequence of the EcoRI*-HindIII fragment contained in pIL9 (Fig. 1). Using overlapping templates, we sequenced all portions of the insert at least four times with both the Klenow fragment of DNA polymerase I and Sequenase. Although not all regions were sequenced in both directions, no base was considered certain until it was read unambiguously in at least three different sequencing reactions. We identified two complete open reading frames (ORFs) (Fig. 3; see Fig. 5). In addition, a partial ORF further downstream (unpublished but submitted to Genbank) was nearly identical to the scpA nucleotide sequence determined by Chen and Cleary (C. Chen and P. P. Cleary, submitted for publication). Characterization of emm49. The deduced amino acid sequence of the first ORF verified that it was the M49 gene. The sequence began with an apparent signal peptide of 41 residues which bore considerable similarity to those of other sequenced M protein genes (16). Assuming cleavage after residue 41, the deduced mature M49 amino acid sequence predicted a protein of 39.6 kDa, consistent with the size observed on the Western blot (Fig. 2). The 147 N-terminal residues of the deduced mature protein were nearly identical to the amino acid sequence of the M49 protein fragment released from the streptococci by mild pepsin digestion (pepM49) (31). The first residue varied between the two sequences, and our sequence contained a four-residue insertion relative to pepM49 between residues 7 and 12 (Fig. 3). These differences presumably represent variation between the strains of type 49 streptococci used in the two studies. As in other M proteins, a Garnier and Robson analysis (15) predicted an a-helical secondary structure throughout the majority of the M49 protein. The deduced protein contained a proline- and glycine-rich cell wall-spanning region near the carboxyl end of the protein, a hydrophobic membrane anchor region, and a polar tail, as previously characterized for M proteins (24). Also in accordance with other M protein genes, emm49 contained distinct repetitive units, as illustrated in Fig. 3.

J. BACTERIOL.

The 11-residue A repeat appeared twice in the gene's 5' half. The sequence similarity between these two A units may extend further in either direction in the protein sequence (31), but satisfactory alignments (greater than 70% similarity) of additional nucleotide sequence surrounding these A units required the introduction of several gaps. The B repeat region consisted of three 93-bp repeat units, separated by two 33-bp spacer regions. The two spacer regions were also identical to each other. Comparison of emm49 with other sequenced M protein genes. Although emm49 had internal regions of similarity with emm5 (40), emm6 (24), emml2 (46), and emm24 (42), it also had distinct differences from these genes. Following the conserved leader sequence, the 5' end of emm49 was unique from the other genes up to the B repeat region. The variable region of emm49 was considerably shorter than the variable regions of emm5, emm6, emml2, and emm24, consisting of only 351 (33%) of the 1,044 nucleotides encoding the mature M49 protein. By comparison, the corresponding variable regions emm6 and emm24 (up to the C and B repeat regions, respectively) constituted approximately 52% of each gene. The B repeat region of emm49 was similar to the C repeat regions of emm5, emm6, and emml2 and to the B repeat region of emm24. The relationship between these various repeat regions is diagrammed in Fig. 4A. The B repeat region in emm49 contained three copies of the repeat unit and thus was larger than the C repeat regions in emm6 and emml2, which contained only two copies of the repeat unit. Because the two spacer regions in emm49 were similar, the extra repeat was probably the result of a recombination event between two original repeat units. This resulted in the observed triplication of the repeats and duplication of the spacer region. The second two B repeat units of emm49 showed 70 to 73% similarity with the two C repeat units of emm6 and emml2, but the spacer regions had only 57% similarity with the analogous regions in emm6 and emml2. The emm6 and emml2 C repeats, on the other hand, had greater than 90% similarity to each other. The framework of the emm24 B repeats and the emm5 C repeats was similar to that of the emm49 B repeats in that each contained three repeat units, but the regions spanning the end of the repeats and the beginning of the spacer regions in emm24 and emm5 contained gaps relative to emm49. The B repeats of emm24 and the C repeats of emm5 had only 72 to 78% similarity with the emm49 B repeats. Following the repeat regions, emm49 was very similar to emm5, emm6, emml2, and emm24 up to the point encoding the previously defined proline-glycine-rich region (Fig. 4B). Here, emm49 was dissimilar to the other four genes, which were nearly identical. The proline-glycine-rich region of emm49 was considerably shorter, containing only 98 nucleotides (or 36 amino acid residues), while the analogous regions of emm5, emm6, emml2, and emm24 contained 162 nucleotides (or 54 amino acid residues). The shorter region in the deduced M49 amino acid sequence also contained proportionately fewer proline and glycine residues (Fig. 3), which were less evenly spaced than those characterized in the M6 protein (44). Towards the end of the proline-glycine region, emm49 again showed similarity to the other M protein genes, which continued into the portion encoding the membrane anchor region. A small block of dissimilarity occurred at the end of the membrane anchor region, but the last 18 nucleotides of emm49, encoding the six-residue polar tail of the M protein, were identical to those of emm6 and emm24.

~..

10 20 43 58 73 CTAAAGATGA AAAAATAAGG AGCAAATA ATG GCT AGA AAA GAT ACG AAT AAA CAG TAT TCG CTT AGA AAA TTA AAA ACA GGT ACA |MET Ala Ag Lys Asp Thr Asn Lys Gln Tyr Ser Leu Arg Lys Leu Lys Thr Gly Thr ._ WS. -ptis 163 88 118 133 148 103 GCA TCC GTA GCG GTC GCT GTG GCT GTT TTA GGA GCA GGC TTT GCA AAC CAA ACA GAA GTT AAG GCT GCT GAA AAA AAA Lye Ala Ser Val Ala Val Ala Val Ala Val Leu Gly Ala Gly Phe Ala Asn Gln Thr Glu Val Lys Ala Ala -%

GluJLye

-la

238 208 223 178 193 GTT GAG GCT AAA GTT GAG GTT GCG GAG AAT AAC GTG TCT AGC GTT GCA AGA AGA GAA AAA GAG CTA TAC GAC CAA ATC Val Clu Ala Lys Val Glu Val Ala Clu Aen Val SBr SBr Val Ala Arg arg Clu Lye Glu Lou Tyr Asp Gin II* Len

*

*

*

*

283 298 313 253 268 GCC GAT CTT ACA GAT AAA AAC GGA GAA TAT CTA GAA AGA ATA GGA GAA CTG GAA GAG CGA CAA AAA AAT CTA GAA AAG Ala Lap Lou Thr .ap Lye Aen Gly Giu Tyr Lou Clu Arg Ilo Cly Clu Leu'C,,,tp p,,rct ,p,,t,,^P.. P.Iu.kTPL.. LI

.............--*--------

A1 328 358 373 388 343 ,CTA GAAMCAT CAA TCT CAA GTA GCA GCA GAT AAA CAT TAT CAA GAG CAG GCA AAA AAA CAT CAA GAG TAT AAA CAA GAA A* R:H±ei Gln Ser CIn V2l Ala Ala Lap Lye Ile Tyr Gin Giu gin Ala Lye Lye Nis Gin Gln Tyr Lye Gin Giu ------

-

-

403

----------------

--------------------------_---

Am Ara Qln.Ky OlA ClGn- qpl. Crlu C*lu

tW.qR.Arg Lys

tyr Gin Arg Clu Val Clu Lys rg yr Oln

Clu

538

523

508

493

463

GAA: CGT AAA TAC CAA CGA GAA GTA GAA AAA CGC TAT CAA GAA CAA

fiR.M p

A2

478

448

433

418

CAA GAA GAA CGT CAA AAA AAT CAA GAA CAA TTA

Gin 553

CTC CAA AAA CAA CAA CAA TTA GAa ACA GAA AAG Caa ATC TCA GAA GCT AGT CGT AAG AGC CTA AGC CGT GAC CTT GAA Lys Ser L LOU Gin Lys Gin Gin Gin lo Clu frClu Lys Gin I. er GinluA aaa r Ara am LOu Gi

ILI 568

JI) ,.,,,,,,,,,6, *59.........

583

GCG TCT CGC GAA GCT AAG AAA AAA GTA GAA GCA GAT CTA GCT GCT CTT ACT GCT GAG CAC CAA AAA CTC AAA GAG Ala Ser Arg Clu Alla Lys L Ly Val Gin Ala aep Lou Ala Ala Leu Thr Ala Glu His Gln Lys Leu Lys Glu

GM

S2

643 688 703 658 673 AMA CMA ATC TCA GAC GCA AGC CGT CMA GGC CTA AGC CGT GAC CTT GMA GCG TCT CGC GMA GCT MAG MAA AMA GTA GMA1 Lys Gln Ile Ser Asp Ala Ser Arg Gln Gly Leu Ser Arg Asp Leu Glu Ala Ser Arg Glu Ala Lys Lys Lys Val Glu

...............7..33.....................7,.48...........

763 ..718 778 GCA GAT CTA GCTGCT CTT ACT GCT GAG CAC CAA AAA CTC AAA GAG GAA AAA CAA ATC TCA GAC GCA AGC CGT CAA GGC Ala Asp Leu Ala Ala Leu Thr Ala Glu His Gln Lys Leu Lys Glu Glu Lys Gln Ile Ser Asp Ala Ser Arg Gln Gly 4. ...

..

.

793

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

808

823 853 838 CTA AGC CGT GAC CTT GAA GCG TCT CGC GAA GCT AAG AAA AAA GTA GAA GCA GAC TTA GCA GAA GCA AAT AGC AAA CTT Leu Ser Arg Asp Leu Glu Ala Ser Arg Glu Ala Lys Lys Lys Val Glu Ala AsP Leu Ala Glu Ala Asn Ser Lys Leu 868 883 898 913 928 943 CAA GCC CTT GAA AAA CTA AAC AAA GAG CTT GAA GAA GGT AAG AAA TTA TCA GAA AAA GAA AAA GCT GAG TTA CAA GCA Gln Ala Leu Glu Lys Leu Asn Lys Glu Leu Glu Glu Gly Lys Lys Leu Ser Glu Lys Glu Lys Ala Glu Leu Gln Ala

958

973

988

1003

1018

AGA CTA GAA GCT GAA GCA AAA GCT CTT AAA GAG CAA TTG GCT AAA CAA GCT GAA GAA CTT GCT AAA CTA AAA GGC AAC Arg Leu Glu Ala Glu Ala Lys Ala Leu Lys Glu Gln Leu Ala Lys Gln Ala Glu Glu Leu Ala Lys Leu Lys Gly Asn

1033

1048

-,o"W

1063

1078

CAA ACA CCA AAC GCT AAA GTA GCC CCA CAA GCT AAC CGT TCA AGA Gln Thr Pro Asn Ala Lys Val Ala Pro Gln Ala Asn Arg Ser Arg

1108

1123

TCA GCA Ser Ala

1138

1093

ATG ACG CAA CAA AAG AGA ACG TTA CCG MET Thr Gln Gln Lys Arg Thr Leu Pro 1153

1168

TCA ACA GGC GAA ACA GCT AAC CCA TTC TTT ACA GCA GCA GCT GCA ACA GTG ATG GTA TCT GCA GGT ATG CTT GCT CTA Scr Thr Gly Glu Thr Ala Asn Pro Phe Thr Ala Ala Ala Ala Thr Val MET Val Ser Ala Gly MET Leu Ala Leu

1183

1218

1208

1198

1228

1238

IAAA CGC AAA GAA GAA AAC TAA GCTATTAGAC TGATGCTAAA GCTAAGAGAG AATCAAATGA

I Lys

1248 1258 TTCTCTCTTT TTGAGTGGCT

1268

AAGTAACTAA

Ar_ Lys Glu Glu Asn

1298 1288 1308 1278 1318 1328 1338 1348 1358 CAATCTCAGT TAGACCAAAA AATGGGAATG GTTCAAAAAG CTGGCCTTTA CTCCTTTTGA TTAACCGTAT ATAATAAAAA CATTAGGMAA

1368

1378

1388

1398

ATAATAGTAA TATTAAGTTT GTTTCCTCAA TAAAATCAAG GAGTAGATA

FIG. 3. Nucleotide

sequence

of emm49, and the deduced amino acid

sequence.

The signal peptide and the assumed beginning of the

mature protein are shown. Amino acid residues identical to those reported for pepM49 (31) are shown in boldface. The four-residue insertion relative to pepM49 is marked with asterisks. The various repeated regions described in the text are boxed in different borders to identify the different repeats. Structural features of the M49 protein are indicated.

6401

J. BACTERIOL.

HAANES AND CLEARY

6402

IIm

A.

j2446

12353 C.1

mmwnl2

I&-i

11

1

spuK

j722

72% 59%

t

1063

106

a[8

C.1

74% [67%]

33% m

II4"t

b6%[1[

,]

116

183% [87%] C2

I 389

Ws IIwv

8.2

1913 ,

347.

75

I

B.2

92% " foB;l

P

F' B.1

I

271

27

j5% "64%I C3% r %_

74% fU%1

i 629

I

-nW5

3% [PM] IsA!. ["14 ci : Xit :T 96% [100J

92%[9P4%J

1503

1490.

,1398

:~~

imp

OnW24

~~73% [a%]

Cl

alum,

wrar49

,1365

47NXse

I mm

ANINININ-1411%1W b7

77% r67% t

89%

33% r%1 B.3

1249t

12

1 &3

"% IIW%l WB2 89% [90%]IOB.l

-

I

I

-1

.1.

I

I

78% 01

t

.1273 1 3b657 91!

67 74%7]

[75%1 C.3

1Q9 *

73% (64%]

urs

liiio

w

B. emm49 eirn5S emm6 emml2 emm24

:

GAAGCAAATAGCAAACTTCAAGCCCTTGAAAAACTAAACAAAGAGCTTGAAGAAGGTAAGAAATTATCAGAAAAAGAAAAAGC

: :

----C------T-AGCT--T-----------T------------------A-C ---------A-C -------------

emm49 :

TGAGTTACAAGCAAGACTAGAAGCTGAAGCAAAAGCTCTTAAAGAGCAATTGGCTAAACAAGCTGAAGAACTTGCTAAACTAA

emmS

emm6

emm12 : emm24 : emm49 : emmS

emm6

emml2 emm24 emm49

: :

: : : emm12 emml2 : emm2 4 :

emm6

GCTAAAGT... AGCCCCACAA GCAAC .... CAAACACC ....... AAAC .. G G--ct ggaaaa---T-agactca-----C--tgataca----caggaaacaaagttgttcca-G-----Gtca---A-----G--ctjggaaaa-- -T-agactca-----C--tgatgca----caggaaacaaagttgttcca-G-----Gtca---A-----........

G--ctjggaaaa---T-agactct-----C--tgatgca----caggaaacaaagttgttcca-G-----Gtca---A-----G--ctjggaaaa---T-agactca---C--tgatgca----caggaaacaaagctgttcca-G---Gtca-"-A----

IGC I .TAACCGTTCAAGATCAGCAATGACGCAACAAAAGAGAACGTTACCGTCAACAGGCGAAACAGCTAACCCA --aggtacaaaaccA----AAAAC-A-G--C------A-u--ACT------CA------A-------I-T---------------aggtacaaaacc-----AAAAC-A-G--C------A-G--ACT------CA --A--------T----------------aggtacaaaacc-----AAAAC-A-G--C------A-G--ACT------CA------A--------T----------------aqqtacaaaacc-----AAAAC-A-G--C------A-G--ACT------CA------A--------T----------- ---

emm49 : TTCTTTACAGCAGCAGCTGCAACAGTGATGGTATCTGCAGGTATGCTTGC... TCTAAAACGCAAAGAAGAAAACTAA emmS : C-----G-----CCTT--T--T----C-A-A--T--AG-AGCA--agt-G-------------------T--em6 : C-----G-----CCTT--T--T----C-A-A--T--AG-AGCA--agt-G----------------------emml2 :- C-----G-----CCTT--T--T----C-G-A--T-...................................... emm24 : C-----G-----CCTT--T--T----C-A-A--T--AG-AGCA--agt-G-C---------------------

FIG. 4. Comparison of the conserved regions of emm49, emm5, emm6, emml2, and emm24. The sequence numbers for emm5, emml2, and emm24 are from the respective publications, and the sequence numbers for emm6 are those obtained from GenBank. (A) Diagrammatic comparison of the constant repeat regions of the four M protein genes. The aligned regions are connected by dotted lines and represent the best alignments possible among the various repeats. The shaded regions represent gaps in emm5 and emm24 relative to emm49. The numbers inside the boxes representing emm5, emm6, emml2, and emm24 are the percent similarities with the aligned regions of emm49. The bracketed numbers are the percent similarity between the deduced amino acid sequences. For comparison, the percent similarities between the emm6 and emml2 repeats and the emm5 and emm24 repeats are shown between the boxes representing these sequences. The percent similarities between the three emm49 repeats and the two spacer regions are shown in the boxes representing this sequence. (B) Comparison of the 3' end of emm49 with those of emm5, emm6, emml2, and emm24. The aligned sequences shown begin immediately after the constant repeat regions diagrammed in Fig. 4A. The nucleotides encoding the proline-glycine-rich regions in the deduced protein sequences are boxed. Uppercase letters, Aligned nonidentical bases; lowercase letters, unaligned bases; ----, aligned identical bases; ..... gap.

.

1422 1348 1358 1378 1388 1368 1398 ATAATAAAAA CATTAGGAAA ATAATAGTAA TATTAAGTTT GTTTCCTCAA TAAAATCAAG GAGTAGATAj ATG GCT AGA CAA CAA ACC AAG AAA MET Ala Arg Gln Gln Thr Lys Lys

signal pepie

1512 1482 1497 AAT TAT TCA CTA CGC AAA CTA AAA ACC GGT ACG GCT TCA GTA GCC GTT GCT TTG ACC GTT TTG GGC GCA GGT TTT GCA AAC Asn Tyr Ser Leu Arg Lys Leu Lys Thr Gly Thr Ala Ser Val Ala Val Ala Leu Thr Val Leu Gly Ala Gly Phe Ala Asn 1437

1452

1467

1527 1542 CAA ACG GAA GTA AGA GCT GAA GGG GTA AAC Gln Thr Glu Val Arg AlaIGlu GlyVyal Asn mgurm proein 1602 1617 AAT ACT GGT TTA CGT GGT GAT CAG ACA AAA Asn Thr Gly Leu Arg Gly Asp Gln Thr Lys

1587 1557 1572 GCG ACT ACG AGC TTG ACA GAG AAG GCT AAA TAT GAC GCA TTG AAA GAT GAG Ala Thr Thr Ser Leu Thr Glu Lys Ala Lys Tyr Asp Ala Leu Lys Asp Glu 1647 1632 1662 TTA GTA AAA AAA CTT GAA GAA GAA CAA GAG AAG AGC AAA AAT CTA GAA AAA Leu Val Lys Lys Leu Glu Glu Glu Gln Glu Lys Ser Lys Asn Leu Glu Lys

1677 1752 1692 1737 1707 1722 GAA AAA CAG AAG TTA GAA AAC CAA GCC CTT AAC TTT CAA GAT GTA ATT GAA ACT CAG GAA AAA GAA AAA GAA GAT CTC AAA Glu Lys Gln Lys Leu Glu Asn Gln Ala Leu Asn Phe Gln Asp Val Ile Glu Thr Gln Glu Lys Glu Lys Glu Asp Leu Lys

1767 1782 1797 1812 1827 ACA ACT TTA GCT AAG GCT ACT AAA GAA AAC GAG ATC TCA GAA GCT:AGC CGT AAA GGG TTA AGCACCGA TTA GAA GCA TC,C AC Thr Thr Leu Ala Lys Ala Thr Lys Glu Asn Glu Ile Ser Glu AlaSer Ara LYS Glv Leu Ser Ara ASP Leu Glu Ala Sen

1842 1857 1872 1887 1902 1917 G CGT GCA GCT AAA GAG 'A AAA,GG GCT AAG CAT CAA AAA TTA GAA GCP GAA' AAC AAA AAA CTA ACA GAA GCC AAT CAG GTT ArQ Ala Ala Lvs Lvs:Glu Leu Glu Ala Lvs His Gln Lvs Leu Glu Al Gpl Asn Lys Lys Leu Thr Glu Ala Asn Gln Val c-i

1G32 ..1962 1947

Se GCT:AGT A

TCA GAA CGT AAA GGT CTA AGT AAC GAC TTA GAA GCA TCT CGT GCA GCT AAA Ser Glu Ala'Ser Ara LYS Glv Leu Ser Asn Asp Leu Glu Ala Ser Arg Ala Ala Lys

.2

.2

2.0Q7

2022

.

AMA TTA GAG ACT

2037

CAC CAA GCC:CTA LYS Leu Glu Thr:Asp His Gln Ala:Leu C.3 2082 2097 i CGC AAG GGT CTA AGC CGT GAC CTT GAA Leu Ser Arg Asp Leu Glu I*.*Arg Lys Gly %*%%*............... ..

.

..

.

1992 GAA GCT AAG TAC C Glu Ala LYs Tvr Gln

C2

2052

2067 GAA GCT AAG CAC CAA AAA TTA GAG GCT:GAT TTA CCA AAG TTT CAG AGA CCT:AGC: Glu Ala Lys His Gln Lys Leu Glu Ala:Asp Leu Pro Lys Phe Gln Arg Pro Ser:

,.................................................. I

GAT

......... ..... *

1977 MAAAGMCTA LvS!Glu:Leu

.

..

Bi.3

2112

2127

2142

2157

GCA TCA CGT GAA GCT AAT MAG AAG GTT ACA TCT GAG TTA ACA CAA GCA AAA GCT Ala Ser Arg Glu Ala Asn Lys Lys Val Thr Ser Glu Leu Thr Gln Ala Lys Ala

.

..

.

.

..

.

.

..

.

..

.

.

..

.

.

2172 2187 2202 2217 2232 CAA CTC TCA GCG CTT GAA GAA AGT AAG AAA TTA TCA GAA AAA GAA AAA GCT GAG TTA CAA GCA AAA CTA GAT GCA CAA GGA Gln Leu Ser Ala Leu Glu Glu Ser Lys Lys Leu Ser Glu Lys Glu Lys Ala Glu Leu Gln Ala Lys Leu Asp Ala Gln Gly 2247 2262 2277 2292 AAA GCC CTC AAA GAA CAA TTA GCA AAA CAA ACT GAA GAG CTT GCA AAA CTA AGA Lys Ala Leu Lys Glu Gln Leu Ala Lys Gln Thr Glu Glu Leu Ala Lys Leu Arg 2337

2352

2367

GAA AAA Ala Glu Lys

GCT

2382

2307

2322

GCA

GCA GGT TCA AAA ACA Ala Gly Ser_Lys Thr prorin/glyine region

Ala

2397

CCT GCT ACC AAA CCA GCT AAT AAA GAA AGA TCA GGT AGA GCT GCT CAA ACA GCT ACA AGA CCT AGC CAA AAT AAA GGA ATG Pro Ala Thr Lys Pro Ala Asn Lys Glu Arg Ser Gly Arg Ala Ala Gln Thr Ala Thr Arg Pro Serr Gln Asn Lys Gly MET

2412 2427 2442 2457 22472 AGA TCA CAA TTA CCG TCA ACA GGC GAA GCA GCC AAC CCA1 TTC TTT ACA GCA GCA GCT GCA ACA GTG ATG GTA TCT GCT GGT Arg Ser Gln Leu Pro Ser Thr Gly Glu Ala Ala Asn Pro Phe Phe Ths Ala Ala Ala Ala Thr Val MET Val Ser Ala Gly membrn andworrwon 2487 2502 2517 2527 2537 2547 2557 256, ATG CTT GCT CTA AAA CGC AAA GAA GAA AAC TAA GCCTTTAAAA CTTGGTTTTT GTAACGGTGC AATAGACAAA AGCAAGCAAG MET Leu Ala Leu IL_Arg Lys Glu Glu Asn iil

2577 2587 2597 2607 2617 2627 2637 2647 2657 GCCAAAAACT GAGAAAGTCC TAAAAAGCTG GCCTTTACCC CTCAAAATTA ATGTTTTATA ATAAAGATGT TAGTAATATA ATTGATAAAT

2667 2677 2687 2697 2707 2717 2727 2737 2747 GAGATACATT TAATCATTAT GGCAAAAGCA AGAAAAATAG CTGTATCATA TGCAAATAAC CCCTGTTTGC TCTTTAAAAA AGACGTTATC 2757 2767 2777 2787 2797 2807 2817 2827 2837 CTTATTTCTC CACGCACAGA TGGACAGCTA GGAGAGAATC GTTTGATTCT CTCTTTTCTT AATGGTCATA AAGACAAAGT CTCTCATCAT 2847 2857 2867 2877 2887 2897 GAAAGGACGA CACATTGCGT AAAMACAMA AATTACCATT TGATAMCTT GCCATTGCGC FIG. 5. Nucleotide sequence and deduced amino acid sequence of ennX. For orientation, the upstream sequence overlaps the sequence shown in Fig. 3. The sequence is numbered consecutively with this latter sequence. Repeated regions are boxed, and structural regions of

the deduced protein

are

indicated.

6403

6404

HAANES AND CLEARY

J. BACTEIUOL.

A

31 100 ATG GCT AGA AAL GAT ACG aAT AAA CAG TAT TCG CTT AGA AAA TTA AAA ACA GGT ACA GCA TCC GTA GCG GTC C-- C-A --C --G --- A-T --- --A --A C-C --- C-- --C ennX -- G --T --A --C --T 1408 1479

envn49

---

emn49 ennX

---

---

---

151 GCT GMG GCT GTT TTA GGA GCA GGC TTT GCA AAC CAA ACA GAA GTT AAG GCT --- T-- AC --- --G --C --- --T --- --- --- -----G-----A -G --1530

B.1

514

enra49

ATC TCA GAA GCT AGT CGT AAG AOC CTa AGC CGT GAC CTT GAA GCG TCT CGC GAL GCT AAG AAA aAa oTA GAA GCA --- --- --- --- --C --- --A G-G T --- --A --- T-A --- --A --- --T C- --- --A --- G-G C-- --- --T 1781 1863

ennX

B.2

586

640

712

enra49 ATC TCA GAC GCA AGC CGT CAa GGC CTA AGC CM GAC CTT GAA GCG TCT CGC GAL GCT AAG AAA AAA GTA GAA GCA ennX G0T --- --A --T --T --TAC --- T-A -----A-----T-C------A---G- C -----T 1917

B.3

766

838 GAC GCA AGC CGT CAL GGC CTA AGC CGT GAC CTT GAA GCG TCT CGC GAA GCT ALG AAA AAA GTA GMA GCL T-T CAG AGA C- -----C A-G --T--- --- ------ --------A--A --T --- --- -- T--G --G --T AC- T-T 2067 2139

enmn49 ATC ennX

C

emm49 ennX

1989

TC

895 967 CTT GAA GAA GOT AAG AAA TTA TCA GAA AAA GAA AAA GCT GAG TTA CAA GCA AGL CTA GAL GCT GAL GCA AAA GCT --- --- --- A-- ----------- -------A---- --T--AC- -G- --- --C

2175 emrn49 ennX

2247

1015 CTT AAA GAG CAA TTG GCT AAA CAA GCT GAA GAA CTT GCT AAA CTA AAA --C --- --A --- --A --A --- --- A -----G--- --A--- _-2295

D emm49 ennX

1096 1168 TTA CCG TCA ACA GGC GAA ACL GCT AAC CCA TTC TTT ACA GCA GCA GCT GCA ACA GTG ATG GTA TCT GCA GGT ATG --- --- --- --- --- --- G-- --C --- --- --- --- --- --- --- --- ___ --------T------T2415 2487 1198 emm49 CTT GCT CTA AAA CGC AAA GAA GAA AAC TAA ennX

---

---

---

---

---

---

---

---

---

---

---

2517 FIG. 6. Alignment of similar regions of emm49 and ennX. The dashes indicate aligned identical nucleotides. The sequences are arranged in codons, and codons that encode dissimilar amino acids are shown in boldface type. A, Leader sequences; B, similar regions within the B repeats of emm49; C and D, blocks of similarity in the 3' ends of the two genes.

Partial characterization of the ORF downstream of emm49. The second ORF was situated 207 bp downstream of the M49 gene (Fig. 5). This ORF was entirely located within pIL11 (Fig. 1). Therefore, we concluded that this ORF was ennX and that it encoded the gene product expressed in this subclone. Although this protein was identified by reaction with human serum known to neutralize the type 49 serum opacity reaction, we were not able to demonstrate OF activity in the E. coli clones containing this ORF. Immunodot-blot experiments confirmed that the protein did not react with purified human myeloma proteins of the four immunoglobulin G (IgG) subtypes (data not shown), eliminating the possibility that this protein was an IgG Fc receptor similar to that described by Heath and Cleary (20). Experiments are in progress to identify this gene and to determine whether it is expressed in streptococci. ennX was similar in both framework and sequence to M protein genes. The leader sequence of this gene was readily identified by its similarity with the emm49 leader (Fig. 6A).

The protein also had a proline-glycine region, a hydrophobic membrane anchor region, and a polar tail (Fig. 5). Also, ennX contained internal repeated sequences, as shown in Fig. 5. The large A repeats contained regions similar to the emm49 B repeats (Fig. 6B). Repeats B and C (Fig. 5), which were subsets of the A repeat, were each repeated a third time in the ennX sequence, but not in the same order. Two other blocks of ennX sequence were similar to emm49. The first preceded the proline-glycine region (Fig. 6C), and the second contained the last 105 nucleotides of the structural gene, including the stop codon (Fig. 6D). Chromosomal placement of emm49 relative to other M protein genes. We established in a previous study that emml, emm6, emml2, and emm24 were all situated in a common expression site on the streptococcal chromosome based upon their upstream sequence similarities (16). Mouw et al. (42) verified this common expression site by finding identical nucleotide sequences downstream of emm24 and emm6. Although sequences upstream of emm49 were not available

TYPE 49 M PROTEIN GENE OF S. PYOGENES

VOL. 171, 1989

6405

A. m49down : TAAGCTATTAG ........ ACTGATGCTAAAGCTAAGAGAGAATCAAATGATTCTCTCTT m6down :--C-Ctttgtaat-----GTGA-C-T----------C--GTC-G--------m24down : --------C-Ctttgtaat-----GTGA-C-T----------C--GTC-G---------

m49down m6down m24down

: : :

AAAGCTGGCCTTTACTCCTTTTGATTAACCGTAT ....... ATAATAAAAACATTAGGAA . ----- A---T---C-ttttttt--------G-TG---AT-.-------A--.--..-A---T---C-ttttttt-------. G-TG-.-AT--

m49down m6down m24down

m49down m6down m24down

TTTGAGTGGCTAAGTAACTAACAATCTCAGTTAGAC... CAAAAAATGGGAATGGTTCAA .--..T---TAG--G---GAG-T-A--GAggt--C---C-AAAC-ACTC-T--....-T---TAG--G---GAGGT-A--GAggt--C---C-AAAC-ACTC-T--

-------

:

AATAATAGTAATATTAAGTTTGTTTCCTCAATA ...... AAATCAAGGAGTAGATA

: :

T-----T-AT-A--G-GA-ACA---AA-G-T--tgacaa--G.----A-A .....T-----T-AT-A--G-GA-ACA---AA---T--tgacaa--GG------A .....-

B. ennXdown :

........

TAAGCCTTTAAAACTTGGTTTTTGTA

m24down : taagctatcactttgtaatactgagtgaacatcaagagagaaccagtcggttctctcttt ennXdown : ACGGTGCAATAGACAAAAGCAAGCAAGGCCAAAAACTGAGAAAGTCCTAAAAAGCTGGCC ----A-m24down : tatgtatagaagaatgaggttaagg---T--C-----A-AC--C--T ennXdown : TTTACCCCTCAAAATTAATGTTTTATAATAAAGATGTTAGTAATATAATTGATAAATGAG

m24down : ---ctaataatcgtcttttt.----------.----.--A-------------------ennXdown : ATACATTTAATCATTATGGCAAAAG.CAAG.AAAAATAGCTGTATCATATGCAAATAACC

m24down

:

------------------A------g----g----------------------------------------------------------------------------------------

ennXdown : CCTGTTTGCTCTTTAAAAAAGACGTTATCCTTATTTCTCCACGCACAGATGGACAGCTAG m24down : -------------------T----------------T--------G--A-------ennXdown : GAGAGAATCGTTTGATTCTCTCTTTTCTTAATGGTCATAAAGACAAAGTCTCTCATCATG m24down : -----------------------------------------------------TC----c

ennXdown : AAAGGACGACACATTGCGTAAAAAACAAAAATTACCATTTGATAAACTTGCCATTGCGCT m24down : a........................................................... FIG. 7. (A) Alignment of the nucleotides immediately downstream of emm49, emm6, and emm24, starting with the respective termination codons. (B) Alignment of the nucleotides immediately downstream of ennX and emm24, starting with the respective termination codons. See Fig. 4B legend for details.

for comparison, we found the sequence downstream of emm49 to be obviously different from those downstream of emm6 and emm24 (Fig. 7A). On the other hand, the sequence beginning 49 bp downstream of ennX was similar to the sequence beginning 82 bp downstream of emm24 (Fig. 7B). Therefore, the stretch of DNA containing emm49 and ennX is in the same location as emm6 and emm24, but emm49 and ennX represent a tandem duplication in the M49 strain relative to single-copy genes present in the M6 and M24 strains.

DISCUSSION Evolutionary divergence of the type 49 M protein gene from other M protein genes. In this article, we report the nucleotide sequence of emm49 and a related gene situated directly downstream. The M49 structural gene and adjacent se-

quences differed considerably from those of previously reported M protein genes. The homologous gene found adjacent to emm49 was also a unique finding. The data suggest that type 49 streptococci represent a divergent group

of M serotypes. The predicted molecular mass of the cloned M49 protein, only 39.6 kDa, was smaller than the smallest M protein size previously reported (13). In comparison to other M proteins for which the complete sequence is known, the smaller size of emm49 and its predicted protein were accounted for mostly in the region encoding the variable amino terminus. The predicted protein did not have the usual nonhelical region at the extreme amino terminus (31), nor did the variable region contain the extensive sequence repeats seen in emm5, emm6, emml2, and emm24. The A repeat in emm49 was only reiterated once, as opposed to the large number of A and B repeat units seen in emm6.

6406

HAANES AND CLEARY

J. BACTERIOL.

LINEAGE A

LINEAGE B

krA

emiX

FIG. 8. Model diagram summarizing the differences between two proposed M-type lineages, A (emm5, emm6, emml2, and emm24) and B (emm49). Flanking regions in the two lineages that would be similar according to the model are shaded with the same pattern.

The B repeats of emm49 were similar to the B repeats of emm24 and the C repeats of emm5, emm6, and emml2, but the repeated region was longer in emm49 and was less similar to those of the other three genes than these genes were to each other. The first two-thirds of the proline-glycine region of the predicted M49 protein sequence was dissimilar to the corresponding identical regions of M5, M6, M12, and M24. Protease protection studies with the M6 molecule showed this region to be embedded in the cell wall (44). This position precludes selection for variant antigens by antibody surveillance. Furthermore, these authors (44) suggest that the structure of this region is important in stabilizing the M protein molecule in the cell wall. Thus, mutations in this region would probably be selected against. The multiple gaps and substitutions in the M49 gene in this region therefore suggest considerable evolutionary distance between this gene and the other M protein genes. In addition, the variant proline-glycine region in the M49 strain might represent selection based upon a variant cell wall structure in this strain. In order to investigate further the evolutionary relationship of emm49 to emm6, emml2, and emm24, we used the progressive sequence alignment method of Feng and Doolittle (11) to construct phylogenetic trees of the conserved regions of these genes. For both conserved regions compared (Fig. 4A and B), emm49 was situated on a separate branch from the node of origin, distant from the other three genes, which were grouped together (data not shown). These results suggest that emm49 is divergent from the other genes and may represent a separate lineage of M protein genes. Preliminary evidence suggests that this putative lineage may represent the OF-positive serotypes. The conserved repeated regions of these two categories of M proteins are clearly different as determined by antibody-binding studies (4). In addition, DNA probes from various conserved regions within and surrounding OF-negative M protein genes bind much more strongly to chromosomal DNAs from other OF-negative serotypes than they do to DNAs from OFpositive serotypes (E. Haanes-Fritz, Ph.D. thesis, University of Minnesota, Minneapolis, 1989; E. Haanes, D. Heath, and P. P. Cleary, submitted for publication). emm49 is a member of a gene family. In addition to the differences observed in emm49, the region downstream of the gene was also unique. The sequence downstream of emm49 differed from sequences downstream of emm6 and emm24. Furthermore, emm49 was followed closely by ennX, an ORF that was homologous to emm49 based upon significant sequence similarities in the leader sequences and

the 3' conserved regions of the two genes. Clearly, ennX was not a cloning artifact, because Southern blotting ensured that the restriction fragments in the cloned DNA and the CS101 chromosomal DNA were identical, and the 5' region of the ennX gene bore no similarity to that of emm49. Kehoe et al. (30) showed by Southern hybridization that multiple regions similar to the M5 gene exist in the type 5 streptococcal genome. These authors found, however, that the homologous regions were on large restriction fragments distinct from those carrying the M5 gene. Thus, our finding of two tandemly arrayed genes in the type 49 streptococcal chromosome is in contrast to these previous results. We found that a putative ennX gene product was expressed in E. coli, but we have not yet determined whether this gene product is expressed in streptococci. Nonetheless, the existence of ennX in the CS101 chromosome confirms the presence of a family of M protein-related genes (or pseudogenes) in type 49 streptococci. Heath and Cleary (20) report that a type II IgG Fc receptor gene (fcrA) is present in a type 76 (OF-positive) streptococcal strain. These authors show thatfcrA contains sequences similar to M protein genes and is located in the same chromosomal position as emml2, as judged by upstream sequence similarities (21). Heath also reports (Ph.D thesis, University of Minnesota, Minneapolis, 1988) that the M76 gene is situated just downstream of fcrA. We are currently investigating whether a similar Fc receptor gene is located just upstream of emm49. Hybridization studies support this hypothesis. These showed that an oligonucleotide probe representing a unique region of fcrA and a nick-translated probe containing sequences within and upstream of fcrA hybridized to a 3-kb HaeIII chromosomal DNA fragment from the type 49 strain CS101, 667 bp of which was contained on the 5' side of pIL2 (Haanes et al., submitted). Given the possibility that emm49 is flanked on either side by homologous genes, we suggest the following model to explain the evolutionary distance of emm49 from the previously characterized M protein genes (summarized in Fig. 8). A primordial gene containing the framework common to M protein genes underwent a duplication at some point in evolution. This involved a rare unequal crossover event between nonhomologous sequences (2, 50). Once the region of DNA was duplicated, a second crossover event occurred between homologous sequences, resulting in a triplication on one progeny chromosome and a deletion of one repeat on the other progeny chromosome. The triplication lineage then evolved by mutational events to have three structurally related genes, including an M protein gene. The other lineage, lacking the triplication, evolved in parallel to have

TYPE 49 M PROTEIN GENE OF S. PYOGENES

VOL. 171, 1989

the single M protein gene. This would explain the considerable divergence in the emm49 sequence relative to the highly conserved regions of emm6, emml2, and emm24 and would also explain the inability to find additional regions of similarity to emm6 in the type 6 streptococcal chromosome (48). In support of this model, the sequences upstream offcrA are similar to those upstream of emml2 (21), and the sequences downstream of ennX are similar to the sequences downstream of emm24. Furthermore, scpA is linked to emml2 (7) and is also just downstream of ennX. The three tandemly arrayed homologous genes would tend to be unstable due to recombination with each other (50). We assume, then, that the products of these genes in the type 49 lineage must in some way be beneficial to the organism, allowing positive selection of the stable array of genes. On the other hand, the three genes might undergo some beneficial recombinations, resulting in antigenically variant molecules. A variety of pathogens, including gonococci and trypanosomes, undergo antigenic variation by recombination with homologous sequences elsewhere on the chromosome (10, 17). This mechanism was previously dismissed in the case of streptococcal M protein because of the singlecopy nature of emm6 (48). With the dearth of internal sequence repeats in the variable region of emm49 relative to the other sequenced M protein genes, this might be an important mechanism for the generation of variant M types in this lineage. ACKNOWLEDGMENTS We thank Stewart Scherer for supplying the bacteriophage lambda vector NM1149 and the host E. coli strains and Ernest Retzel for suggesting direct sequence retrieval by anonymous ftp. We are grateful to Stewart Scherer, David Heath, Cecil Chen, and Patrick Schlievert for helpful discussions. This work was supported by Public Health Service grant A116722 from the National Institute of Allergy and Infectious Diseases to P.P.C. E.H. was supported by a Doctoral Dissertation Fellowship from the University of Minnesota Graduate School. LITERATURE CITED 1.

Altschul, S. F., and B. W. Erickson. 1986. Optimal sequence alignment using affine gap costs. Bull. Math. Biol. 48:603-616. 2. Anderson, R. P., and J. R. Roth. 1977. Tandem genetic duplication in phage and bacteria. Annu. Rev. Microbiol. 31:473-505. 3. Benton, W. D., and R. W. Davis. 1977. Screening lambda gt recombinant clones by hybridization to single plaques in vitro.

Science 196:180-182. 4. Bessen, D., K. F. Jones, and V. A. Fischetti. 1989. Evidence for two distinct classes of streptococcal M protein and their relationship to rheumatic fever. J. Exp. Med. 169:269-283. 5. Blake, M. S., K. H. Johnston, G. J. Russell-Jones, and E. C. Gotschlich. 1984. A rapid, sensitive method for detection of alkaline phosphatase-conjugated anti-antibody on Western blots. Anal. Biochem. 136:175-179. 6. Cameron, J. R., P. Philippsen, and R. W. Davis. 1977. Analysis of chromosomal integration and deletions of yeast plasmids. Nucleic Acids Res. 4:1429-1447. 7. Chen, C., and P. P. Cleary. 1989. Cloning and expression of the streptococcal C5a peptidase gene in Escherichia coli: linkage to the type 12 M protein gene. Infect. Immun. 57:1740-1745. 8. Cleary, P. P. 1978. Genetic separation of serum opacity factor from M protein of group A streptococci. Infect. Immun. 22: 171-175. 9. Dale, R. M. K., B. A. McClure, and J. P. Houchins. 1985. A rapid single stranded cloning strategy for producing a sequential series of overlapping clones for use in DNA sequencing: application to sequencing the corn mitochondrial 18S rDNA. Plasmid 13:31-40. 10. Donelson, J. E., and A. C. Rice-Ficht. 1985. Molecular biology of

6407

trypanosome antigenic variation. Microbiol. Rev. 49:107-125. 11. Feng, D., and R. F. Doolittle. 1987. Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J. Mol. Evol. 25:351-360. 12. Fischetti, V. A., K. F. Jones, B. N. Manjula, and J. R. Scott. 1984. Streptococcal M6 protein expressed in Escherichia coli: localization, purification, and comparison with streptococcalderived M protein. J. Exp. Med. 159:1083-1095. 13. Fischetti, V. A., K. F. Jones, and J. R. Scott. 1985. Size variation of the M protein of group A streptococci. J. Exp. Med. 161:1384-1401. 14. Gardner, R. C., A. J. Howarth, J. Messing, and R. J. Shepherd. 1982. Cloning and sequencing of restriction fragments generated by EcoRI*. DNA 1:109-115. 15. Garnier, J., D. J. Osguthorpe, and B. Robson. 1978. Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. J. Mol. Biol. 120:97-120. 16. Haanes-Fritz, E., W. Kraus, V. Burdett, J. B. Dale, E. H. Beachey, and P. P. Cleary. 1988. Comparison of the leader sequences of group A streptococcal M protein genes. Nucleic Acids Res. 16:4667-4677. 17. Hagblom, P., E. Segal, E. Billyard, and M. So. 1985. Intragenic recombination leads to pilus antigenic variation in Neisseria gonorrhoeae. Nature (London) 315:156-158. 18. Hallas, G., and J. P. Widdowson. 1983. The relationship between opacity factor and M protein in Streptococcus pyogenes. J. Med. Microbiol. 16:13-26. 19. Hanahan, D. 1983. Studies on transformation of E. coli with plasmids. J. Mol. Biol. 166:557-580. 20. Heath, D. G., and P. P. Cleary. 1987. Cloning and expression of the gene for an immunoglobulin G Fc receptor protein from a group A streptococci. Infect. Immun. 55:1233-1238. 21. Heath, D. G., and P. P. Cleary. 1989. Fc-receptor and M-protein genes of group A streptococci are products of gene duplication. Proc. Natl. Acad. Sci. USA 86:4741-4745. 22. Hill, M. J., and L. W. Wannamaker. 1968. The serum opacity reaction of Streptococcus pyogenes: general properties of the streptococcal factor and of the reaction in aged serum. J. Hyg. (Cambridge) 66:37-48. 23. Hohn, B. 1979. In vitro packaging of lambda and cosmid DNA. Methods Enzymol. 68:299-309. 24. Hollingshead, S. K., V. A. Fischetti, and J. R. Scott. 1986. Complete nucleotide sequence of type 6 M protein of the group A streptococcus: repetitive structure and membrane anchor. J. Biol. Chem. 261:1677-1686. 25. Hollingshead, S. K., V. A. Fischetti, and J. R. Scott. 1987. Size variation in group A streptococcal M protein is generated by homologous recombination between intragenic repeats. Mol. Gen. Genet. 207:196-203. 26. Horstmann, R. D., H. J. Sievertsen, J. Knobloch, and V. A. Fischetti. 1988. Antiphagocytic activity of streptococcal M protein: selective binding of complement control protein factor H. Proc. Natl. Acad. Sci. USA 85:1657-1661. 27. Jacks-Weis, J., Y. Kim, and P. P. Cleary. 1982. Restricted deposition of C3 on M' group A streptococci: correlation with resistance to phagocytosis. J. Immunol. 128:1897-1902. 28. Jones, K. F., S. K. Hollingshead, J. R. Scott, and V. A. Fischetti. 1988. Spontaneous M6 protein size mutants of group A streptococci display variation in antigenic and opsonogenic epitopes. Proc. Natl. Acad. Sci. USA 85:8271-8275. 29. Jones, K. F., and V. A. Fischetti. 1987. Biological and immunochemical identity of M protein of group G streptococci with M protein on group A streptococci. Infect. Immun. 55:502-506. 30. Kehoe, M. A., T. P. Poirier, E. H. Beachey, and K. N. Timmis. 1985. Cloning and genetic analysis of serotype 5 M protein determinant of group A streptococci: evidence for multiple copies of the M5 determinant in the S. pyogenes genome. Infect. Immun. 48:190-197. 31. Khandke, K. M., T. Fairwell, A. S. Acharya, B. L. Trus, and B. N. Manjula. 1988. Complete amino acid sequence of streptococcal pepM49 protein, a nephritis-associated serotype: conserved conformational design among sequentially distinct M

6408

HAANES AND CLEARY

protein serotypes. J. Biol. Chem. 263:5075-5082. 32. Kraus, W., E. Haanes-Fritz, P. P. Cleary, J. M. Seyer, J. B. Dale, and E. H. Beachey. 1987. Sequence and type-specific immunogenicity of the amino-terminal region of type 1 streptococcal M protein. J. Immunol. 139:3084-3090. 33. Laemmli, U. K. 1970. Cleavage of structural proteins during the assembly of the head of bacteriophage T4. Nature (London) 227:680-685. 34. Lancefield, R. C. 1959. Persistence of type-specific antibodies in man following infection with group A streptococci. J. Exp. Med. 110:271-292. 35. Lancefield, R. C. 1962. Current knowledge of type-specific M antigens of group A streptococci. J. Immunol. 89:301-313. 36. Maniatis, T., E. F. Fritsch, and J. Sambrook. 1982. Molecular cloning: a laboratory manual. Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y. 37. Maxted, W. R. 1978. Group A streptococci: pathogenesis and immunity, p. 107-125. In F. A. Skinner and L. B. Quesnel (ed.), Streptococci. Academic Press, Inc., New York. 38. Maxted, W. R., and J. P. Widdowson. 1972. The protein antigens of group A streptococci, p. 251-266. In L. W. Wannamaker and J. M. Matsen (ed.), Streptococci and streptococcal diseases: recognition, understanding, and management. Academic Press, Inc., New York. 39. Meinkoth, J., and G. Wahl. 1984. Hybridization of nucleic acids immobilized on solid supports. Anal. Biochem. 138:267-284. 40. Miller, L., L. Gray, E. Beachey, and M. Kehoe. 1988. Antigenic variation among group A streptococcal M proteins: nucleotide sequence of the serotype 5 M protein gene and its relationship with genes encoding types 6 and 24 M proteins. J. Biol. Chem. 263:5668-5673. 41. Moody, M. D., J. Padula, D. Lizana, and C. T. Hall. 1965. Epidemiologic characterization of group A streptococci by T-agglutination and M-precipitation tests in the public health laboratory. Health Lab. Sci. 2:149-162. 42. Mouw, A., E. Beachey, and V. Burdett. 1988. Molecular evolution of streptococcal M protein: cloning and nucleotide sequence of the type 24 M protein gene and relation to other genes of Streptococcus pyogenes. J. Bacteriol. 170:676-684. 43. Murray, N. E. 1983. Phage lambda and molecular cloning, p. 395-431. In R. W. Hendrix, J. W. Roberts, F. W. Stahl, and R. A. Weiberg (ed.), Lambda II. Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y. 44. Pancholi, V., and V. A. Fischetti. 1988. Isolation and characterization of the cell-associated region of group A streptococcal M protein. J. Bacteriol. 170:2618-2624.

J. BACTERIOL.

45. Poirier, T. P., M. A. Kehoe, J. B. Dale, K. N. Timmis, and E. H. Beachey. 1985. Expression of protective and cardiac tissuecross-reactive epitopes of type 5 streptococcal M protein in Escherichia coli Infect. Immun. 48:198-203. 46. Robbins, J. C., J. G. Spanier, S. J. Jones, W. J. Simpson, and P. P. Cleary. 1987. Streptococcus pyogenes type 12 M protein gene regulation by upstream sequences. J. Bacteriol. 169: 5633-5640. 47. Sanger, F., S. Nicklen, and A. R. Coulson. 1977. DNA sequencing with chain-terminating inhibitors. Proc. Natl. Acad. Sci. USA 74:5463-5467. 48. Scott, J. R., S. K. Hollingshead, and V. A. Fischetti. 1986. Homologous regions within M protein genes in group A streptococci of different serotypes. Infect. Immun. 52:609-612. 49. Singh, L., and K. W. Jones. 1984. The use of heparin as a simple, cost-effective means of controlling background in nucleic acid hybridization procedures. Nucleic Acids Res. 12: 5627-5638. 50. Smith, G. 1976. Evolution of repeated DNA sequences by unequal crossover. Science 191:528-535. 51. Thomas, M., and R. W. Davis. 1974. Studies on the cleavage of bacteriophage lambda DNA with EcoRI restriction endonuclease. J. Mol. Biol. 91:315-328. 52. Top, F. H., and L. W. Wannamaker. 1968. The serum opacity reaction of Streptococcus pyogenes: demonstration of multiple, strain-specific lipoproteinase antigens. J. Exp. Med. 127:10131034. 53. Top, F. H., and L. W. Wannamaker. 1968. The serum opacity factor of Streptococcus pyogenes: frequency of production of streptococcal lipoproteinase by strains of different serological types and the relationship to M protein production. J. Hyg. (Cambridge) 66:49-58. 54. Towbin, H., T. Staehelin, and J. Gordon. 1979. Electrophoretic transfer of proteins from polyacrylamide gels to nitrocellulose ifiters: procedure and some applications. Proc. Natl. Acad. Sci. USA 76:4350-4354. 55. Vieira, J., and J. Messing. 1982. The pUC plasmids, an M13mp7-derived system for insertion mutagenesis and sequencing with synthetic universal primers. Gene 19:259-268. 56. Widdowson, J. P., W. R. Maxted, and D. L. Grant. 1970. The production of opacity in serum by group A streptococci and its relationship with the presence of M antigen. J. Gen. Microbiol. 61:343-353. 57. Yanisch-Perron, C., J. Vieira, and J. Messing. 1985. Improved M13 phage cloning vectors and host strains: nucleotide sequences of the M13mp7 and pUC19 vectors. Gene 33:103-119.