BRADLEY D. JONES AND HARRY L. T. MOBLEY* ... ganella morganii (L. Hu, B. Jones, M. Fox, E. Nicholson, ...... ates and John Wiley & Sons, Inc., New York. 4.
Vol. 171, No. 12
JOURNAL OF BACTERIOLOGY, Dec. 1989, p. 6414-6422
0021-9193/89/126414-09$02.00/0 Copyright © 1989, American Society for Microbiology
Proteus mirabilis Urease: Nucleotide Sequence Determination and Comparison with Jack Bean Urease BRADLEY D. JONES AND HARRY L. T. MOBLEY* Division of Infectious Diseases, Department of Medicine, University of Maryland School of Medicine, 10 South Pine Street, Baltimore, Maryland 21201 Received 3 July 1989/Accepted 5 September 1989 Proteus mirabilis, a common cause of urinary tract infection, produces a potent urease that hydrolyzes urea to NH3 and C02, initiating kidney stone formation. Urease genes, which were localized to a 7.6-kilobase-pair region of DNA, were sequenced by using the dideoxy method. Six open reading frames were found within a region of 4,952 base pairs which were predicted to encode polypeptides of 31.0 (ureD), 11.0 (ureA), 12.2 (ureB), 61.0 (ureC), 17.9 (ureE), and 23.0 (ureF) kilodaltons (kDa). Each open reading frame was preceded by a ribosome-binding site, with the exception of ureE. Putative promoterlike sequences were identified upstream of ureD, ureA, and ureF. Possible termination sites were found downstream of ureD, ureC, and ureF. Structural subunits of the enzyme were encoded by ureA, ureB, and ureC and were translated from a single transcript in the order of 11.0, 12.2, and 61.0 kDa. When the deduced amino acid sequences of the P. mirabiis urease subunits were compared with the amino acid sequence of the jack bean urease, significant amino acid similarity was observed (58% exact matches; 73% exact plus conservative replacements). The 11.0-kDa polypeptide aligned with the N-terminal residues of the plant enzyme, the 12.2-kDa polypeptide lined up with internal residues, and the 61.0-kDa polypeptide matched with the C-terminal residues, suggesting an evolutionary relationship of the urease genes of jack bean and P. miabilis.
Ostensibly, bacterial and jack bean ureases appear to be distinct with respect to subunit structure, subunit stoichiometry, and native molecular weight. Several purified bacterial ureases have been shown to have similar heteromeric subunit structures (6, 25, 33, 34; Hu et al., Abstr. Annu. Meet. Am. Soc. Microbiol. 1989). In P. mirabilis and Providencia stuartii, the three subunit polypeptides are transcribed on a single mRNA molecule from the smallest to the largest subunit (15, 25). In contrast to the heteromeric bacterial ureases, jack bean urease is composed of six identical subunits. Despite this difference, we present evidence that a striking similarity exists between the deduced amino acid sequence for the three subunits of the P. mirabilis urease and the known amino acid sequence of the jack bean urease subunit. In this report we present the complete nucleotide sequence of the region of the recombinant plasmid pMID1003 which encodes active urease. The operon encoded open reading frames (ORFs) for six polypeptides with molecular sizes, ordered as they appear in the operon, of 31.0, 11.0, 12.2, 61.0, 17.9, and 23.0 kilodaltons (kDa). The 11.0-, 12.2-, and 61.0-kDa polypeptides represented the subunits of the P. mirabilis urease and exhibited a high degree of homology with the jack bean urease subunit at the amino acid level. Evidence is presented that suggests that the three bacterial urease subunits merged to form the single plant urease subunit.
Urinary tract infection with Proteus mirabilis can lead to serious complications, including cystitis, prostatitis, urolithiasis, pyelonephritis, bacteremia, and death (30, 38). The enzyme urease is recognized as an important virulence factor for this uropathogenic bacterial species and, indeed, as the causative agent of infection-induced kidney and bladder stones, which are estimated to represent 20 to 40% of all urinary stones (14). Alkalinization of the urine by hydrolysis of urea to carbon dioxide and ammonia facilitates precipitation of struvite, MgNH4PO4 6H20 and carbonateapatite, Ca1O(PO4CO3OH)6(OH). Furthermore, in catheterized patients, precipitation of urinary stones results in encrustation and blockage of indwelling urinary catheters. This complication has been uniquely correlated with the presence of P. mirabilis but not other ureolytic organisms (23). Further evidence suggests that the ammonia per se generated by ureolysis may be toxic to the kidney epithelia (5). Recent work has begun to yield an understanding of the biochemistry and genetics of ureases produced by members of the Proteeae tribe (21). Urease gene sequences from Providencia stuartii (22), P. mirabilis (15, 37), and Morganella morganii (L. Hu, B. Jones, M. Fox, E. Nicholson, and H. Mobley, Abstr. Annu. Meet. Am. Soc. Microbiol. 1989, B64, p. 41) have been identified by cloning and expression in Escherichia coli. Genetic analyses of the cloned ureases of Providencia stuartii and P. mirabilis have identified the coding regions for the structural subunits of the enzyme as well as the accessory polypeptides which are required for expression of enzyme activity in vivo (15, 25, 37). Mulrooney and co-workers (25) have purified the cloned urease of Providencia stuartii and determined its biochemical properties. In addition, they have demonstrated that the native enzyme possesses a heteromeric subunit structure of one large and two small polypeptides and contains four nickel ions per active enzyme molecule. *
MATERIALS AND METHODS Bacterial strains and plasmids. E. coli HB101 (F- hsdR hsdM proA2 leuB6 rpsL20 recA13) was used as a host for recombinant plasmids. E. coli DH5aF' [F' hsdR 480dlacZ AM15 (lacZYA-argF)U169 recAl] was used as the host of M13 derivatives (Bethesda Research Laboratories, Inc., Gaithersburg, Md.). Plasmid pMID1003 encodes the urease of P. mirabilis H14320 and has been described previously
Corresponding author.
(15). 6414
P. MIRABILIS UREASE GENE SEQUENCE
VOL. 171, 1989
DNA isolation. Replicative forms of the M13 vectors mpl8 and mpl9 were obtained from J. B. Kaper (University of Maryland School of Medicine). M13 DNA was isolated from cultures of E. coli DHSaF' that was grown for 6 h in 2 x tryptone-yeast medium (3) by alkaline sodium dodecyl sulfate extraction (4) and purified by centrifugation to equilibrium in cesium chloride-ethidium bromide density gradients (18). Molecular cloning and production of sequential deletions. Specific DNA fragments (1.95-kilobase [kb] Hindlll, 1.5-kb HindIll, 0.9-kb PstI-XhoI, and 2.5-kb HindIlI-BamHI fragments) derived from pMID1003 were subcloned into either M13mpl8 or M13mpl9, so that both strands of the entire urease genetic sequences were represented. Deletions were created by cutting the M13 derivatives with appropriate restriction enzymes (see Fig. 1), religating, and transforming the new derivatives into DH5aF'. This method allowed approximately 80% of the 4,952-base-pair (bp) length to be sequenced. In regions where no useful restriction enzyme sites existed to create deletions, oligonucleotides were synthesized for use as primers to determine these sequences (see Fig. 1). Labeling and electrophoresis of templates. Dideoxy sequencing was carried out by using Sequenase as specified by the commercial supplier of the kit (U.S. Biochemicals, Cleveland, Ohio). [a-35S]dATP (ca. 800 to 1,000 Ci/mmol) was purchased from Dupont, NEN Research Products (Boston, Mass.). The labeled DNA reaction mixtures were separated by electrophoresis in one of two types of gels. (i) Gels of 50% urea-7.2% acrylamide (bisacrylamide-acrylamide [1:20] [1]) were poured with wedge spacers (width, 0.4 to 1.2 mm), and samples were electrophoresed for 2.5 h to resolve up to 250 bases from the primer; or (ii) gels of 50% urea6.0% acrylamide were poured with straight spacers (width, 0.4 mm), and samples were electrophoresed for 5.5 h to resolve up to 400 bases from the primer. The gel was dialyzed with 10% acetic acid-12% methanol for 1 h, dried under vacuum, and exposed to film (XAR-2; Eastman Kodak Co., Rochester, N.Y.) for 18 h before reading the sequence directly from the autoradiograph. DNA and amino acid sequence analysis. The DNA-protein sequence analysis software programs, version 2.02, of International Biotechnologies Inc. and Pustell and Kafatos (28) were used for analysis of the DNA sequence for restriction enzyme sites, ORFs, ribosome-binding sites, promoterlike sequences, catabolite repressor protein-binding sites, and nitrogen regulation sequences. The deduced amino acid sequences of the ORFs were analyzed for signal sequences, ATP-binding sites, divalent cation-binding sites, amino acid composition, isoelectric points, and hydropathy. The Genetics Computer Group sequence analysis software package, version 5 (University of Wisconsin, Madison, Wis.), was used to screen the National Biomedical Research Foundation protein sequence bank for sequence similarities to UreA, UreB, UreC, UreD, UreE, and UreF, as well as to construct hydropathy plots. RESULTS DNA sequence of the P. mirabiis urease. The series of overlapping M13 subclones created by restriction enzyme deletions on DNA spanning urease-encoding regions of plasmid pMID1003 is shown in Fig. 1. In addition, 10 synthetic oligonucleotide primers were synthesized to generate sequence where no suitable restriction enzyme sites existed. The nucleotide sequence determined from these
pBR322
EV
4
3
2
0
B
HH
H
6415 EV
5
BomHI 4-4
Bcl Bst Ell
-4--5
s-. 4-.
'
-+----~
EcoRV Hind III
4--"-----@ 4-4----4>
---------3.
~ _NO-00-
@
Nrul Nsi
Pst Synthetic Primers
FIG. 1. Sequencing scheme of the P. mirabilis urease operon. The urease gene boundaries were previously determined by Tn5 transposon insertions. The 5.0-kb region that was sequenced is expanded in the lower portion of the figure (numbered vertical lines represent 1 kb of DNA). The direction of sequencing on the DNA is shown by the arrows; restriction endonucleases on the right refer to the enzyme used to create the deletion for sequencing that particular region. Restriction endonuclease abbreviations: H, HindIII; B, BamHI; EV, EcoRV.
subclones covered a 4,952-bp region (Fig. 2). Both strands of DNA were completely sequenced, except for the last 60 bp of the noncoding strand. The sequence on the coding strand was confirmed in this area by two overlapping subclones. ORFs associated with multiple polypeptides in the urease operon. The DNA sequence (Fig. 2) encoded six ORFs of greater than 95 codons, each beginning with the characteristic ATG start codon. No ORF of any significant length (>50 codons) was found on the reverse complement of the sequence shown in Fig. 2. We identified sites similar to the E. coli consensus ribosome-binding sequence (Shine-Dalgarno sequence) (31) that were present immediately upstream of the methionine start codon for five of the six ORFs (ureD, bp 431 to 436; ureA, bp 1277 to 1282; ureB, bp 1585 to 1590; ureC, bp 1908 to 1913; ureF, bp 4159 to 4164) (Fig. 2). The ORF (ureE) encoding a 17.9-kDa polypeptide lacked a detectable Shine-Dalgarno sequence. The predicted molecular masses of the polypeptides encoded by the six ORFs, in sequential order (5' to 3') as found on the coding strand of DNA, were 31.0, 11.0, 12.2, 61.0, 17.9, and 23.0 kDa. DNA sequence features. A search of the urease operon for putative transcriptional initiation sequences was carried out by using E. coli consensus promoter sequences (29), -35 (TT*G*ACA) and -10 (T*A*ATAAT*), with a -35 to -10 spacing of 17 + 2 bp. Conditions of the search for a possible promoter were that eight or more matches of the possible 12 nucleotides were required, with exact matches at the nucleotides marked with asterisks. We were unable to locate any putative promoter sequences until we reduced the required number of nucleotide matches to seven, relaxed the requirement for exact matches at the nucleotides marked with asterisks, and allowed the gap between the -35 and -10 sequences to extend to 23 bp. Using these modified search conditions, we located five promoterlike sequences (Fig. 3). Two sequences were found upstream of ureD (-35, bp 307; -10, bp 335; and -35, bp 354; -10, bp 381), one upstream of ureA (-35, bp 1232; -10, bp 1261), and two upstream of ureF (-35, bp 4019; -10, bp 4044; and -35, bp 4107; -10, bp 4136). No promoterlike sequences were found upstream of the ureE ORF by using these conditions; we did not expect or find any promoterlike regions upstream of ureB and ureC since they are transcribed with ureA on a single mRNA (15). Urease is known to have a role in the nitrogen regulation pathway of some microorganisms such as the bacterium
10
AA
*
OCT
TAA ACT
20
30 40 * *a ACT TAT AAC CAC TAA CCA
*
TA%
CCA
CT?
AAT
TTC TCA
50
1500
TGG CTT TTA TTA
100 90 70 60 so 110 * * * a * * TCA CAA TAT TCC TAT TCC CAL ACC CCC TCC TSA TAC CCA ATA CTA TAA GAC TGG CTC 120
130
140
150
160
*
*
*
*
*
10
210
200
190
TTA TTT TTA CCA TTC TTA
TT? TTT
250 240 a a TTT GMA ACT COO TOT AAA ATC aCC
260
230
1650
1660
280
1710
CGO CAT TGA TOO AGC OCT TTA TCC TOT TTG AGG
AAL ATO
CAA TTT ATC
330
320
310
300
TI' ATT CAC aCC
AT? TIC TCG
Val Cly
410
470
480
490
500
S10
*
*
*
*
*
*
530
670
660
650
1990
700
710
1860
1070
1980
2000
2030
2020
2010
2090 2060 2070 COT OCT COT AAL CT ATT COT OAT GOT ATO 0O0 CAA Tyr 0ly Glu Clu Vol Lys Ph. Gly Gly 0ly Lys Vol I1e Arg Asp 0ly Not Gly Gln
2040
2050
2100
680
730
720
1850
2060
TAT GCC GAA CAC GTC AAA TT' 2110
2120
2150
2140
2130
AGC CAA CTT GT? ALOT Oc GAO TOT GTC OAT CT? CTO ATC ACC AAT 0cc AT? AT? TTA Bar Gln Vol Vol Sr Ala Olu Cyo Vol ALp Vol Lou I1 TShr Asn Ala I1. I01 Lou
Lcc TTA TCC AT? ALT AT? ALT OTO CAA CCT TAC GCA CAT GCC CTA TTA ACA ACG CCG Thr Lou Bor 01 Lan I1 Asn Val Gln Pro Tyr Ala His Ala Lou Lou Thr Thr Pro 690
1840
CGT TTI CGA TTA CCA OAT ACC GML CTO 'TI CTT CAA ATT GAA AAA OAT TIC ACC ACT Ar; Lou Arg Lou Ala Asp Thr Glu Lou Ph. Lou Glu 010 Clu Lys Asp Pho Thr Thr
CAL GOT OTT GCA CAT AC: TAT TO TTG CAT COT OCT GOT CGG GTG GTC GGT GCT CAT Gin Gly Val Ala His Thr Tyr Lou Lou His Pro Pro Cly G1y Vol Val Cly Gly Asp 640
Ala Arg
Po
1010
ATG AAA ACT ATC TCA CGT CAA COT TAT CCC CAT ATG TTT 0OC CCA ACA ACA aaC GAT Not Lys Thr I10 Ser Arg GCn Ala Tyr Ala LAp Not Ph- 0ly Pro Thr Thr Gly Asp
CTT ACT GAA AAA COT CAT CTC GGC CCC TTA ATG CTT CAC CCA CCT TI' TAT CCA GAG Lou Thr Olu Lys Arg Hia Lou Cly Pro Lou Mat Val Gln Arg Pro Ph, Tyr Pro Glu 610 600 S00 590 620
630
L*u Arq
1800
1790
CGT TTA AAT AT? CCT CT OCT ATO OCT GTT CCC TTC GAG Pro Ala 0ly Not Ala Vol Arg Ph. Clu
1830
570
560
550
540
1780
GCA CCA
1890 1900 1910 1s80 1920 a a sa| * TAT OCT TTT CAT CCC AAA GTO ATO GCT ALA TIC GAG ACT GAO ALL MA TGA Tyr Gly Ph- His 0ly Lys Vol Not GCy Lys Lou Glu Bar Glu Lys Ly --1930 1970 1960 1940 1950
AAG GOT TOG CTT OCT CAC ATC aCT TTA CCA TAT GAC TTA AAG CCA GCG AAO ACA TGT Lys Gly Trp Lou Ala Asp I1. Ala Lau Arg Tyr Clu Lou Lys Arg 0ly Lys Thr cys 520
GIu Ala
Ph. Tyr GIU Val Assn
Tyr His
AGO TI'
CCC CGT CAA ACT CCC ACT CT? GAT CAG TTA OTG OCT TI' CCA CCA AAA CT GMA ATT Pro 0ly Gln Ser Arg Thr Val Ap Glu Lou Vol Ala Ph0 Ala 0ly Lys Arg Clu I10
ATT TAT TTC ATC AAT TTT GCC AAL TTC WCA GGA GTO CGT ATO CCT GAC TTT TCT GAG Not Pro Asp Ph. Sar Glu 460
Ser His
1760
1750
1740
Glu Thr Lou Gly Ph. Arg Lou Asn I01
1820
450
440
430
420
1700
1690
1680
1730
ALA CAG ACA TTA OCT mT'
Lys 390 370 400 380 . * a a * * ATT TTG AAT CAC ATA ATC TOL TOO OTA GTG COG TAT ATA TTC GTC TAT TIC CTG
OT
1720
1770
360
350
1670
CAA CTC GCC TCT CAT TAC CAC TI' TAT GAA GTG AAT CAG CCLA OT
340
CTA CCC AAC ATT CAT TIC ATT
1640
1630
1620
AAT OCT OCT CCC CGA ACA AAA ACC ATA CAG GTG COT AAT CAT GCC GAT AGA CCT CTA Aen Ala 0ly Arg Clu Thr Lys Thr I10 Gln Vol Ala Asn His Gly Ap Arg Pro Vol
Gln 290
1610
CT AAT AAC ATO ATC CCC OCT CAL ATT AOA CTT AAT GCA CCA TTA CaC OAT ATT CAA CTO not I10 Pro Gly Glu 010 Arg Val Asn Ala Ala Lou 0ly Asp I1 Clu Lou
ATT TAA AAC GCA T
270 a
*
1580
1570
1600
TTS TTA CTT
220
CTA AAC AAA TC CTCTmT
1560
GGC ACC AAA TTG GTT TCA ATT CAC TCA CCT ATT GTC TAG Gly Thr Lya Lou Vol Bar Il* His Ser Pro I1e Val ---
170
ACA ATA TAA TCT TCT TOC TCA CCA ATA ACA ATA TCG ATA TCT CT? CAT
1540 1550 TOC ACT TTC CCC GAT Cy0 Thr Pho Pro Asp
1530
1520
1510
ATG GAA GGG GTG CCA GAG ATG ATA AAA GAT GTT CAA GTA CAG Not Glu Gly Vol Pro Glu Mat Ile Lys ASp Val Gln Vol Glu
2160
740
2170
2190
2180
2200
O0G C000 CA ALL TI' TAT COT ALT OCA aG" GOT ACT OCA TCC CAA ACG CAG ACA TTM Gly Ala Thr Lyo PSi Tyr Arg Nr Al Gly Gly Thr Ala $or Gln Thr Gln Thr Lou 790 780 770 800 750 760
OAT TAT TOC 00c AT? OTA AAA OCA OAT ATT 00c AT? AaA GAT 00C COT ATS OTC OOT Ap Tyr Trp Gly I01 Val Lys Ala Asp I1 0ly 010 Lys Asp 0ly Ar; I1e Vol 0ly 2210 2220 2230 2240 2260 2250
ALO OTT BOA CAA GAG 0c TT? TTA GA: TOO TTA CCC CAAGAG AALT TC TT' mT CCT TSr Vol Ala Gin 0lu 0ly Pbe Lou Olu Trp Lou Pro Gln Glu Lan I1- Pho Phb Pro
AT? CCC AAG GCC GOT AAT CCA CAT CT? CAG CCC AAT aTO OAT ATT OTM ATT CGC CCC I1 0ly Lys Al. Gly asn Pro Ap Vol Gln Pro Asn Vol asp I1 Vol 1o 01ly Pro
010
030
820
GTO TOT TTA
2270
850
840
2280
2290
2300
2320
2310
TCA TCA GCC AAA TTT ATC Cys Lou TSr Tr HNs I11 His Lou Ala Ser Sar Ala Lys Phe Ile
CCA AOL OLA STT 0TC COT GGA GAA GOT ALA ATA GTC ACT GOCT G4 GOT AT? OAT ACC 0ly Thr Glu Vol Vol Ala Gly Glu Gly Lys I1e Vol Thr Ala 0ly 0ly I01 ALp TShr
890 900 000 910 070 000 TO GAL ATG CAG TOT TT GLA COO CCA GTTTTAAAT GAG TOO TI? GAA ACT GGC 0ly Trp 0lu Not Gln Cyo Pb. Oly Arg Pro Val Lou ALn Glu Trp Pho Clu Thr Cly
CAT ATC CAC TTT AT? TOT CCA CAL CLA GOCC CAA GAA GOT CTC O TOCT 0C OTA ACC ilia I0 His Ph. I01 Cys Pro Gln Gln Ala Gln Glu 0ly Lou Vol Sr Gly Vol Thr
OAT GOT CAA
Vol
Asp Ala aln
ACC ACA CAT ATT CAT TTA
CCC
2330
860
940 950 960 970 920 930 AAG OTA ALA 00 C0C TTA AAT TI' TAT GTT CAT CGA AOA TTA ATT TTA ACA CAC TCA Lys Val Lys Gly Arg Lou Asn Ph0 Tyr Vol Ap Glu Arg Lou I1e Lou Thr Glu Ser 1000
990
980
2380
2390
2350
2400
2370
2360
2420
2410
2430
ACC TT' ATT CGT GGA GCA ACA CGC OCT OTG GCO COT ACT ALT OCA ACC ACO OTT ACC Thr Ph I1e Gly Gly Gly TShr Gly Pro Vl Ala Gly TShr An Ala Thr TShr Vol Thr 2440
1020
1010
2340
2450
2400
2470
2460
2490
ATO CO6 OTT GA CC TSTA CAL AAA CAA OC 0CC CA ATG COT GAA TTT CCT ATC T not Arq Val alu aly Lou aln Lys Gln Ala Ala Ala Nat Arg Glu Ph. Pro Not Phe
CCC GOT ATT TGO A;T ATG TAC COO ATO TSA GAG 000 OTO OAT GA TSA COT AT? AAT Pro Gly I1. Trp Asn Not Tyr Arg Not Lou Glu Ala Vol Asp Glu Lou Pro I1. Lan
1050 1040 1060 1070 1080 00C TOO CTT TAT ATT TAT CCT CCA ACC GAT CCA TTa AAA CAC ATT ATT CAA CAC CAT 0ly Bor Lou Tyr I1e Tyr Pro lab Thr Asp Ala Lou Lys Clu 01l Ila Cln His Ilis 1120 1130 1110 1140 1100 1090
2500 2510 2520 2530 2540 2550 GTG GGT TTA T' GGC LAL GGT TGT GTC AGT CAG CCC GAA GCA ATC COC GAL CAA ATA Vaol ly Lou Ph. Gly Lys Gly Cys Vol B8r Gln Pro Glu Ala I01 Ar alu aln I1e
TTA GAO AAG OTA ALT CCC CTA GTT GAA TAT CGT TTA ACG CAT GTT CAT CCC ATT TTA Lou Glu Lys Val AnL Pro Lou Val Clu Tyr 0ly Lou Thr Asp Val Asp Gly Ile Let 1160 1170 1180 1190 1150
ACA GCO GCT GCT ATA GGT CTT ALA ATA CAT GAA CAC TOO acG WCA ALG CCA ATO Thr Ala 0ly Alb Il1 Gly Lou Lys 1. His Glu ALp Trp 0ly Ala Tbr Pro Not 2610 2620 2640 2630 2650 2660 ATT CAC AAT TGC CTT AAT GTC 0CC OAT CAA ATO GAT GTA CAL GTO GCT ATT CAC Ila Hi: Asn Cy- Lou Asn Vol Ala Asp Glu ot Asp Vol Oln Val Ala Ile H81
1030
2560
aT TTTA CT OTA TTA GGGC AC CAA ACC GAC CCG ATO ATG GCc TOT T?T CCC CAA GTA Val LOu Arq Val Lou Gly Thr Gin Thr Glu Pro Not Nat Ala Cya Pha Ala Gin Val 1210
1200
1220
1230
1240
TOG CAA ATC GTC AOL CACAC TOG CTA GGT TAT TGC CCT GAC CCA CCC CCC ATC TGG Trp Gln IIo Val Arg Gin Hib Trp Lou 0ly Tyr Cya Pro Glu Pro Pro Arg; 11- Trp
1200
1290
TCG T;T ATT TTA GCA GCC
1310
1320
CAA ATO GAA TTA AOA CCA AOA GaA aAA OAT AAA TTAC not Glu Lou Thr Pro Arg;lu LyG ALp Lys Lou Lou
1340
1330
1300
1350
1360
2840
CT? TTT ACT CCA GCO CT? CTT CCA CGA AOA COT TTA OCT AAL CCa TTA AAA CTT aAT Lou Ph TShr Ala 0ly Lou Val Ala Clu Ar; Arg Lou Ala Lys 0ly LOu Ly7 Lou Asn 1390
1400
1410
1420
2710
OcA Ala TCT for
2720
2750
2770
2760
2850
2860
2070
CCT OAT Pro asp
1430
aTO ATC Vol Ile
2030
LCO ATO OCT
TAT Thr Not Pro Tyr
2800
ACC AT? ALT ACC GTG CGA CAG CAT CST CAT ATG TIC ATG OTC TOT CAT CAT Thr I1- Asn Thr Val Asp Glu His Lou Asp Not Lou Not Vl Cy-HNs HNi 2900 2910 2920 2930 2040 CCC TOT AT OCT CLA OAT OGT GCA TTT OCT GCA TCT CST AT? COT COO GA Pro Sor Ile Pro Glu Asp Vol Ala Pha Ala Glu S-r Arg Ile Arg Ar; Olu
CCA CCA CAG GGG Ala Arg Clu Gly 1490
2600
GTG ATC CAT GTA TTC CAT ACC GAA GGC GOA OGT G0C CCOTOAT 0CC Vol lI ills Val Ph. Hi. Thr Clu aly Ala 0ly 0Gly ly H1s Ala 2780 2790 2300 2020 2010 AOG TCG GTA GCA CAG CCC AAT AT? TTA CCT GCA TCA ACC ALC CCA Lys 8ar Vol 0ly Glu Pro Asn Ile Lou Pro Ala 8ar TShr an Pro
1370
TAC CCT CAA CGT GTC CCC TTC AT? AGT TOO CCO AT? ATG CAA CCC Tyr Pro Clu Arg Val Ala Lou I1- SOr Cy0 Ala 11l Not Clu 0ly 1460 1450 1470 1480 1440 a * *a * * AAa AOA OTT GCT CLA TTA ATO ACT CAA CCA CGT ACT OTT TTa ACC Lys Thr Vol Ala Gln Lou Nt S-r Glu Gly Arg Thr Vol Lou Thr
2700
2690
2740
2730
1270
2590
GAC ACC TTA AAT GCA GCT GOT TTT TAT GAA GAG ACA GTA AAA 0CC ATT 0CC CGT CCA Ap Thr Lou Asn Clu Gly Gly Ph. Tyr Clu Glu Thr Vol Lym Ala Il1 Ala 0ly Ar;
1260
CCC ACA TAA Ala Thr ---
1380
2680
2670
1250
2580
2570
2090 CTC OAT Lou LAp
ACc AT?
Tbr I1l
2970 2980 2960 2990 3000 a a a a * * GCT GCA GCA GAT ATC TTA CAT GAT ATOG (tGi; GcA AT? TCG GOTO ATO TCG TCL GAC TCA Ala Ala Clu Asp Ila Lou Ilia LAp Net Gly Ala I1- Bar Val Not b-r aer Ap Sar
2950
CCA GAG CAL GTA Ala Clu Gln Val
FIG. 2. Nucleotide sequence of the P. mirabilis urease genes. Numbers above the sequence indicate the nucleotide position. Predicted amino acid sequences, in sequential order, for UreD (bp 441 to 1262), UreA (bp 1287 to 1586), UreB (bp 1598 to 1924), UreC (bp 1924 to 3624), UreE (bp 3655 to 4137), and UreF (bp 4168 to 4782) are shown below the DNA sequence. Putative ribosome-binding sequences (Shine-Dalgarno [S.D.] sites) are shown above the DNA sequence preceding each gene. 6416
VOL. 171, 1989
P. MIRABILIS UREASE GENE SEQUENCE
3020
3010
3030
3040
3050
4030
3060
GGA GAM GTT ATC TTA CGC ACT TGG CAG TGT GCA CAT MAA Glu Val le Leo Arg Thr Trp Gln Cys Ala His Lys
CAA GCC ATG GGA CGA GTC
Gln Ala Met Gly Arg Val Gly 3080
3070
3090
3100
3130
AAT GAT Asn Asp
GCG GGT GAT AGC GCA GAT ATG AAA TTG CAA CGA GGC ACA TTA Met Lys Leu Gln Arg Gly Thr Leu Ala Gly Asp Ser Ala Asp 3140
3130
3150
3160
Ile Lys Arg Tyr
3200
3190
3180
3210
3250
3260
Arg
GGC ATT GCT Gly Ile Ala
3220
CAT ACG GTG GGA TCA ATA GAA AAA GGT AAA CTT GCG GAT ATC Hiis Thr Val Gly Ser Ie Glu Lye Gly Lys Leu Ala Asp Ile 3240
Asn
3170
AAA CGT TAT ATC GCT AMA TAC ACG ATT AAC CCG GCA CTG GCA CAT le Ala Lys Tyr Thr Ie Asn Pro Ala Leu Ala His
ATT
3230
GTG CTA TGG GAT
CCT
3270
3280
3300 Pro Met
3370
3360
3380
3430
3420
3410
3440
3450
3480
3490
3500
3540
3530
3550
Gly
Arg
3560
3600
3610
3650
3640
3630
3660
3670
3710
3700
3690
3680
3720
3730
CTT
3750
Met Asp Glu 3800
ACC
Arg Thr
Gly
Leu Phe Leu Pro
TTG CCT CGA GGC
3860
Arg Gly
GTA
CTT
Thr Val Lou
3870
AAA GAG GGG GAT
3880
3890
GGC GAT GTT GTC ACC ATT GAA GCG GCT AAA Glu Gly Asp Val Val Thr Ile Glu Ala Ala Lys 3920
3930
3980
GAG Glu
3940
3990
CTG
CTG
4280
4380
4290
4340
4390
4250
4300
4310
4350
4360
4400
4410
4420
4440
4430
4450
4460
4470
1490
AGT CGC GAA ACC AAA GAG TTA AGG CAG GiAA GAG CGT CAA CCG GGG ATC GCT TTT CCC Ser Arg Glu Thr Lys Glu Leu Arg Gln Glu Glu Arg Gln Pro Gly Ile Ala Phe Pro 4500
4510
4520
4530
CGT TTA CTT CCT CAA TTA GGC ATT GAA TTA GAC GAT ACG TTA CAA CAG CGG GTT AAA Arg Leu Leu Pro Gln Leu Gly Ile Glu Lou Asp Asp Thr LOu Gln Gln Arg Val Lys 4550
4560
4570
4580
4590
CAG ACG CAA TTA ATG GCG TTT GCG TTA GCT GCC GTG CAT TGG CAT ATC GAT AGT GAA Gln TtSr Gln Leu Met Ala Phe Ala Leu Ala Ala Val His Trp His Ile Asp Ser Glu 4610
4620
4630
4640
4650
AAA CTG GTG CCA TTA GGG CAA AGC GCA GGG CAA AAA ATG TTG TTT GCT CTA GCT GAG Lys Leu Val Pro Leu Gly Gln Ser Ala Gly Gln Lys Met Leu Phe Ala Leu Ala Glu
4720
4730
CAG ATC CCC, GCT
ATT
4780
Arg
Gin
Lou
GAA
Ala
TGA
CCA
AGA
TGC
AAG
GAA AAA
CAG
AGT
TGG
CCA CAA GAG
GAT
ATT
His
Trp
Pro
Gln Glu
Asp
Ile Gly Ser 4820
4810
4800
TCG
GGC
CAT
GTA GTC GCC ATG
AAA CTC AAT ATA CTC GAC
TTT
Lys--
4850
4840
TTC GTT CAT
GCA
4760
4750
4740 TTA TCG
4790
TTA CGC CAG CTC AAG Lou
GTT GAG
Ile Pro Ala Ile Val Glu Leu Ser
Gin
4830
TCA
GAG ATA AAG
TAT
CAC
Tyr
His
4000
ACG
GTT
TAT
AGT
Thr
Val
Tyr
Ser
4010
GTG
0CC
CTG
4940
3960
GGT AAC CGA Leu Gly Asn Arg
TTA
4890
391C
3900
CAA GTA TCA Gln Val Ser 3950
GAT CCA TTA TTG CTT GCT CGT GTT TGT Pro Lou LOu Leu Ala Arg Val Cys
4240
AAC
CAC TAT TAG
AAG
ATA
4880
4870
4860 ATC
AAT
CAC
TOC
GTA
TTG
GTG
TT
Lys Glu Gly Asp Ile Leu Leu Ser Glu
GAO
Asp Asp
ATT
4140
AGT TTG GCA AAG GGT GAT AGC GAT ACA GTG AAA TAT TGG TGT GAC TTT ATG GTC SCA Ser Leu Ala Lys Gly Asp Ser Asp Thr Val Lys Tyr Trp Cys Asp Ph. Met Val Ala
3850
3840
3830
ACC
4130
4230
4330
4320
4770
AAA AGT CGC TTA AAA GTG GCT TTA AGT GAC GGG CAA GAA GCC Ser Arg Leu Lys Val Ala Leu Ser Asp Gly Gln Glu Ala 3820
CTA TTT
4120
CCG
CAA ATG ACC CGA ACA TTA GCC ACA CTC GAG CTT CCT ATA TTG CGG CAA TTA CAA ACG Gln Met Thr Gly Thr Leu Ala Tlhr Leu Glu Leu Pro Ile Leu Arg Gln Leu Gln Thr
Thr
Lys
3810
0G0
ACC
3790
3780
3770
3760
ATG GAT GAG CGC
3970
4270
3740
GAA CTA ACC TCT ACA GAA AAG CCA AAG TTA ACC CTT TGT CTT Gln Lys Ala Leu Glu Leu Thr Ser Thr Glu Lys Pro Lys Leu Thr Leu Cys Leu
CAA AAA GCG
Pro Glu Pro
Tyr
AAG CTC TGT TGT GCC TAT GTT TGG GGC TGG TTA GAA AAT ACG GTG ATG TCT GGG GTA Lys Leu Cys Cys Ala Tyr Val Trp Gly Trp Leu Glu Asn Thr Val Met Ser Gly Val 4660 4670 4680 4690 4700 4710
GCG TTT TTA TTG AGA ATT TAT TGA ATG AAA AAA TTT ACT CAG ATT ATT GAT CAA Met Lys Lys Phe Thr Gln Ile Ile Asp Gln
GAC
4260
4600
3520
TGT GAG CCA GCG ACT GAA TTA CCG ATG GCT CAA CGC TAT TTC TTA TTT TAA Cys Glu Pro Ala Thr Glu Leu Pro Met Ala Gln Arg Tyr Phe Leu Phe ---
CCA
Gln
Lys
Glu
GCC ATT GAA AAA GGT TGG GTA TGC TCA GCA GAA ACC TTG TCA GAT TGG TTA AGC GCA Ala Ile Glu Lys Gly Trp Val Cya Ser Ala Glu Thr Leu Ser Asp Trp Leu Ser Ala
4540
3570
GAT CCA CAA ACT TAC ATT GTT AAA GCG GAT GGT GTA CCA CTG GTT His Ile Glu Leu Asp Pro Gln Thr Tyr Ile Val Lys Ala Asp Gly Val Pro Leu Val 3590
Gl- Leu
GAG
GGT GGT CAT CAC CAC CAC CAT GAT CAC CAC CAT TAA Gly Gly His His His His His Asp His His His ---
4220
4490
CAT ATC GAG TTA
3580
CCT
4110
4210
3510
GTG GAG GGC TGT CGT CAT ATC ACA AAA GCT TCG ATG ATC CAC AAT AAC TAT GTT CCT Val Glu Gly Cys Arg His Ile Thr Lys Ala Ser Met Ile His Asn Asn Tyr Val Pro 3520
4200
3460
GGT CGT
GTG CCA GAA AAA TTA GGC TTA AAA AGC TTA ATT Ile Glu Ala Gly Val Pro Glu Lys Leu Gly Leu Lys Ser Lou Ile
ATT GAG GCG GGA
3470
4080 CAA
TTA GTT AGC CCC TCT CTT CCG GTA GGT GCT TTT ACT TAT TCT CAA GGG TTA GAG TGG Leu Val Ser Pro Ser Leu Pro Val Gly Ala Phe Thr Tyr SOr Gln Gly Leu Glu Trp
4370
3390
ATG TAT GCC TGT CTA GGA AAA GCC AAA TAT CAA ACG TCG ATG ATC TTT ATG TCA AAA Met Tyr Ala Cys Leu Gly Lys Ala Lys Tyr Gln Thr Ser Met Ile Phe Met Ser Lys
GCG GGT Ala Gly
4100
4070
3290
ATG GGG GAT ATT AAT GCG GCT ATT CCA ACC CCG CAA CCG GTT CAT TAT 'CGT CCA Gly Asp Ile Asn Ala Ala Ile Pro Thr Pro Gln Pro Val His Tyr Arg Pro
CCA 3350
4060
GGC TTA GAA AAA TAC
4170 4180 4190 * SD * A TGG CAC TGC GAT CAT CAA AGG AGG TGC ATG ATG CTA GCT GAT CTG CGC TTA TAT' CAA Met Met Leu Ala Asp Leu Arg Leu Tyr Gln
Gly Val Lys
Ala Phe Ph.
4050
GGG GCT TAT GGT GGG TCA TCC Gly Ala Tyr Gly Gly Ser Ser 4150 4160
Val Leu Trp Asp Pro
GTC AAA CCG GCA CTT ATC ATA AAM GGT GGT ATG GTC tGT TAT GCG Pro Ala Leu Ile Ile Lys Gly Gly Met Val Arg Tyr Ala 3310 3320 3330 3340
GCT TTC TTT GGC
4090
3120
AAT CGT
T Asn
4040
GCT CGC GGC TTA GGG GCT ACG GTG GTG GTT Ala Arg Gly Leu Gly Ala Thr Val Val Val
6417
4910
4900 TTG
GTT
CAG
4930
4920 TTC TTT
GTA AAG
CTA
TGC
4950 *
CAT
GTA
CCA
His
Val
Pro
GCG
ATA
OTT
ACC
AA
4020
TTG CAA ATA GAA 0CG GOT TGG TOT CGT TAT rTT CAC GAT CAT GTA TTA GAT GAT ATG Leu Gln Ile Clu Al. Gly Trp Cys Arg Tyr Phe His Asp His Val Iou Asp Asp Met
FIG. 2-Continued
Klebsiella
aerogenes (13) and the fungus Aspergillus nidulans (19). A search for nitrogen regulation sites with the sequence TGGYARN4TTGCA (2), where Y is a pyrimidine and R is a purine, was carried out in the regions upstream of each ORF in the P. mirabilis gene complex. A site upstream of the ureA locus at bp 1221 was found which matched 13 of 16 bases in the sequence. Preliminary physiological experiments indicated, however, that the operon was not under nitrogen regulation control (E. Nicholson, G. Chippendale, and H. Mobley, Abstr. Annu. Meet. Am. Soc. Microbiol. 1989, H126, p. 190). Another possible mechanism for operon regulation was through the CRP-cyclic AMP cascade. We were unable, however, to find sequences similar to the catabolite repressor protein-binding site (AANTGTGA N2TN4CA) (10) in the putative promoter regions for any of the cistrons. Sequences downstream of each cistron were analyzed for transcription termination signals similar to those established
for E. coli genes (29). Characteristically, these rho-independent signals formed a secondary structure in the mRNA which consists of a stem-loop structure followed by a string of uridylates. Regions downstream of the ureD (bp 1306 to 1328), ureC (bp 3791 to 3809), and ureF (bp 4804 to 4826) were found that could form small stem-loop structures followed by 4 to 6 uridylates. No such sites were found for the ureA, ureB, or ureE ORFs. Previous work has not delineated the ends of the urease gene complex. We demonstrated that DNA sequences 5' to the ureD ORF are unnecessary for an active urease by deletion of upstream sequences. pMID1003 was partially digested with ClaI to form a linear plasmid, followed by digestion with AccI and religation. When assayed for the ability to synthesize urease, the resulting plasmid produced active enzyme at the same level as the parent plasmid. In addition, we confirmed that DNA sequences downstream of the 23.0-kDa ORF were not required for urease activity. A
6418
J. BACTERIOL.
JONES AND MOBLEY
. _ r~zwum *{o Cigg~~~~~~~~ Ili,Z={IY ZXxzW111 ~
are
8I1
D77Al
I
31.0
kDa
P so
UUU~-. WZ0
I
I
11.0 12.2
p
c 61.0
n
17.9
m 11
I ILF 23.0
kb
p
n lSD so
SD
FIG. 3. Physical map of the urease gene complex. The rectangular boxes labeled D, A, B, C, E, and F indicate the physical positions in the operon of each of the ure ORFs. Numbers beneath each rectangle correspond to the predicted molecular size for each polypeptide. The lines with arrows beneath the map indicate the direction and predicted length of each transcript. Two putative promoter sites were found upstream for both ureD and ureF (see text for positions). Restriction endonuclease sites are indicated above the line. P, Promoter; SD, Shine-Dalgarno sites.
BalI deletion of pMID1003, which removed all sequences downstream of the urease operon, including a portion of the vector and the last 12 codons of the UreE protein, was constructed. This deletion plasmid also conferred an active urease phenotype to E. coli HB101, although with enzyme activity that was two- to threefold lower than that of the parent plasmid, presumably because of a truncated UreF protein (Fig. 3). The G+C content of the P. mirabilis urease gene sequences (43%) was not significantly different from the previously determined G+C content of the genomic DNA (39%) (12). Predicted amino acid sequence features. With the use of the predicted amino acid content, pIs were determined for each of the polypeptides. The ORFs for ureA-, ureC-, ureD-, ureE-, and ureF-encoded acidic proteins (pl values of 5.8, 5.4, 6.3, 6.0, and 4.9, respectively), whereas the polypeptide produced from ureB was basic (pI 9.0). We also noted that the UreB polypeptide contained no cysteine residues in its sequence and the UreE protein had a string of eight histidine residues at its COOH terminus. Other than these two fea-
Amino acid
Ala Val Leu
Ile Pro Met Phe
Trp Gly
Ser Thr
Cys Tyr
Asn
Gln Asp Glu
Lys Arg His
tures, nothing was remarkable about the amino acid compositions of the proteins. The amino acid composition of each ORF is shown in Table 1. Shown in Fig. 4 are the hydropathy plots of each polypeptide based on the prediction of Kyte and Doolittle (16). The plots for the three structural subunits were consistent with plots for cytoplasmic polypeptides. The plots for UreD and UreE contained both hydrophilic and hydrophobic regions, while the plot for UreF contained two large hydrophobic regions, residues 1 to 25 and residues 168 to 190, which indicated possible membrane-spanning domains. An examination of the predicted N-terminal regions of the polypeptides revealed a possible signal sequence in the UreE protein (20, 27, 32, 35). This region possessed the general properties of a leader sequence with charged residues near the N terminus followed by eight nonpolar and hydrophobic residues. These residues were followed by a short-side-chain amino acid (alanine) which was five residues from a serine at residue 18, the putative cleavage site. In addition, we searched the predicted amino acid sequences for metal-binding sites (C-X2-C-X3-F-X5-L-X2-H-X3-H) (11) and ATP-binding sites (GKGGVGKT) (36). No matches were found for these sequences in any of the polypeptides. Sequence homology. The NBRF-PIR protein data base was searched for similarities with the deduced amino acid sequences of each ORF. The deduced amino acid sequences of the ORFs for ureA, ureB, and ureC had'a high similarity to the amino acid sequence of jack bean urease (Fig. 5). No striking sequence similarity was found for UreD, UreE, or UreF with protein sequences in the gene bank. Closer examination of the similarity between the jack bean urease subunit and the three subunits of the P. mirabilis urease revealed that the P. mirabilis subunits aligned with the jack bean subunit in a nonoverlapping fashion in the order that the P. mirabilis subunits were transcribed, UreA, UreB, and UreC. UreA (100 amino acids) aligned with the first 100 amino acids of the jack bean subunit (840 amino acids) (17). Following a gap of 28 amino acids, UreB (109 amino acids) aligned with the next 109 amino acids, followed by a gap of 33 amino acid residues. Lastly, the UreC polypeptide (567 amino acids) matched with the last 567 amino acids of the jack bean urease subunit with no unmatched amino acids at
TABLE 1. Predicted amino acid compositions of UreA, UreB, UreC, UreD, UreE, and UreF Mol% (no. of amino acid residues) of: UreD (274) UreE (161) UreC (567) UreA (100) UreB (109) 6.21 (10) 7.30 (20) 9.17 (52) 8.00 (8) 9.17 (10) 7.45 (12) 6.57 (18) 8.11 (46) 8.26 (9) 10.00 (10) 12.42 (20) 10.95 (30) 5.82 (33) 12.00 (12) 6.42 (7) 3.11 (5) 4.74 (13) 9.52 (54) 5.00 (5) 5.50 (6) 3.73 (6) 6.57 (18) 5.47 (31) 5.00 (5) 3.67 (4) 1.86 (3) 2.92 (8) 3.70 (21) 5.00 (5) 2.75 (3) 1.86 (3) 4.74 (13) 2.65 (15) 6.50 (6) 2.00 (2) 0.62 (1) 2.55 (7) 0.88 (5) 0.00 (0) 0.00 (0) 9.32 (15) 8.03 (22) 10.76 (61) 7.00 (7) 11.01 (12) 4.97 (8) 2.92 (8) 3.53 (20) 4.00 (4) 2.75 (3) 6.21 (1) 7.30 (20) 6.35 (36) 3.67 (4) 7.00 (7) 1.86 (3) 1.82 (5) 1.59 (9) 0.00 (0) 2.00 (2) 3.11 (5) 3.65 (10) 2.65 (15) 1.00 (1) 2.75 (3) 0.62 (1) 2.19 (6) 3.35 (19) 4.59 (5) 1.00 (1) 4.35 (7) 5.84 (16) 3.00 (17) 2.75 (3) 3.00 (3) 6.83 (11) 2.92 (8) 5.82 (33) 3.00 (3) 2.75 (3) 7.45 (12) 6.93 (19) 5.82 (33) 11.00 (11) 10.09 (11) 6.21 (10) 4.01 (11) 4.76 (27) 6.42 (7) 7.00 (7) 4.35 (7).* 4.74 (13) 3.70 (21) 8.26 (9) 6.00 (6) 7.45 (12) 3.28 (9) 3.35 (19) 3.67 (4) 1.00 (1)
UreF (205) 9.27 (19) 5.85 (12) 15.12 (31) 3.90 (8) 4.39 (9) 3.41 (7) 2.44 (5) 3.90 (8) 5.85 (12) 6.83 (14) 5.37 (11) 1.95 (4) 1.95 (4) 0.49 (1) 8.29 (17) 4.39 (9) 6.83 (14) 4.39 (9) 3.90 (8) 1.46 (3)
P. MIRABILIS UREASE GENE SEQUENCE
VOL. 171, 1989
6419
5.0
URE D
Wl Hydrophlllclty S.o
URE A
HN HYdPOPhi1lCltY Hydophilicity H#E
-I.e l
1 il l l., l ll,l1
Bll
l l lllB l l1 l 11 l l l l 1l l 1l lll l Bll l l l B l lI l Bl I
20
0
60
40
l111 lB
B
100
6
5..
URE B
Hi ftyd&ophlllcSty
URE C
HW HydrophlBJIclty
5.0
*
URE E
URE F
100
aoo
see
400
600
|Hit Hydrophlllclty/ l
6.e9 Hi Hydrophi1Jc1fty _ j|
*
50
slee
-e
-
FIG. 4. Predicted hydropathy profiles for each of the six urease polypeptides. The numbered horizontal axis under each panel represents the amino acid number. The left vertical axis indicates the relative hydrophilicity (positive ordinate) or hydrophobicity (negative ordinate). Plotted is the calculated hydropathy value for a window of nine amino acids as the frame moves consecutively one amino acid at a time toward the C terminus (16). HW, Hopp and Wood analysis.
the carboxy termini of either polypeptide. There were a total of 446 exact amino acid matches, giving a similarity of 57.5%. When conserved amino acid matches were considered as well, 567 matches were found, giving 73.2% similarity. Using the hypothesis that the three bacterial urease subunits evolved into one jack bean urease subunit, we "searched the intervening region of the structural subunit ORFs with sequences of consensus splice sites necessary for the intron splicing of eucaryotic genes..This search revealed a sequence with a high percentage of matches (24) to the consensus sequence (YYYYYYYYYYYNYAGG, where Y is a pyrimidine-restue) located between the ureA and ureB ORFs. Inspection of the junction between the ureB and ureC
ORFs revealed that the last codon of the ureB ORF and the start codon of the ureC ORF share a single nucleotide
(Fig. 6). DISCUSSION We presented the nucleotide sequence of the chromosomally encoded urease operon which was cloned from P. mirabilis HI4320, an isolate cultured from the urine of a patient with bacteriuria. The urease operon encoded an
inducible, high-molecular-weight Ni2" metalloenzyme with a complex subunit structure. We sequenced a 4,952-bp region of DNA that was sufficient for expression of an active
JONES AND MOBLEY
6420
J. BACTERIOL.
P. mirabilis A
urease
C
B -0
Jr
mIWu
I,
I_Mlm1
|1-1S,!1
1oo
40
200
nn im I
3
ago
00
Jack bean
o 600S o o
m
m r iniiuil 700
m
urease
FIG. 5. Amino acid sequence similarity between P. mirabilis urease subunits and jack bean urease subunit. The letters A, B, and C above the lines refer to the structural subunits of the P. mirabilis urease encoded by ureA, ureB, and ureC, respectively. Numbers above and below the horizontal lines represent amino acid positions. The sequences of the three P. mirabilis subunits were combined and numbered sequentially to facilitate analysis. Black vertical lines between the sequences represent an exact amino acid match or conservative replacement.
urease protein. The sequence revealed that the operon encoded six ORFs which were named ureA, ureB, ureC, ureD, ureE, and ureF. The polypeptides UreA, UreB, UreC, UreD, and UreF were required for enzyme activity in E. coli HB101. The molecular sizes of the polypeptides which were calculated from the deduced amino acid sequences were similar in size to polypeptides previously identified as belonging to the urease operon (15, 37). Each of these polypeptides was previously mapped to a position within the operon by TnS transposon insertions which was the same as the position of the corresponding ORF revealed by sequencing (15, 37). The similarity between the P. mirabilis urease genes that we sequenced and those that Walz et al. (37) cloned was not surprising since it has recently been demonstrated by Southern hybridization that the urease genes encoded by nearly 100 different isolates of P. mirabilis are
z/re
ID
C
1@
E1FI I1kb
-.. end =1 start X ureA V P OS M TCA CCT ATT GTG TAG GTAATAAC ATG ATC
urek-
Splice - Accept
ure
ID
A
lf ZIXC EZ I1
F kb
E2> start ureCC M K T ATG AAA ACT ATC GAG AAA AAA TGA E K K -
ureB
=4
end
B
FIG. 6. DNA sequence characteristics at the junctions of the structural genes. (A) Junction between ureA and ureB. A region of DNA similar to a eucaryotic mRNA splice-acceptor site (13 of 16 nucleotides) (24), near the end of the ureA ORF and extending into the untranslated nucleotides, is underlined. (B) Junction between ureB and ureC. The last nucleotide of the last codon for ureB is the first nucleotide of the start codon for ureC. Letters in the rectangles refer to the polypeptides. Letters above and below the DNA sequence are standard single-letter amino acid codes.
highly conserved with respect to specific urease gene restriction fragments recognized by DNA probes (H. L. T. Mobley and G. Chippendale, submitted for publication). We have demonstrated (15) that the gamma, beta, and alpha subunits (11.0, 12.2, and 61.0 kDa, respectively) represent the three structural subunits of the urease enzyme, are transcribed on a single mRNA, and are translated in the order of the smallest to the largest subunit. This explains why a single transposon insertion in ureA or ureB had a polar effect on the translation of downstream ORFs. It is unclear whether ureD is encoded on the same transcript as the structural subunits or is translated from a unique mRNA. We were unable to determine whether transposon insertions in ureD exert a downstream polar effect on translation of the structural subunits. However, DNA sequences just downstream of the stop codon of the 31.0-kDa polypeptide (bp 1306 to 1328) resembled a rho-independent transcriptional termination region similar to those found in many E. coli genes. If termination of ureD transcription occurred at this point, the enzyme subunits would necessarily be transcribed on a separate message. In contrast, the 23.0-kDa polypeptide was produced from its own promoter. Insertion of a transposon in the ureE ORF (37), which would have a downstream polar effect on transcription of ureF if ureE and ureF shared the same transcript, did not affect urease expression. Therefore, transcription of the ureF cistron, which is required for urease activity, begins downstream of the 17.9kDa ORF. No promoter could be found for the ureE ORF, and the 17.9-kDa gene product is not required for urease activity in the recombinant host E. coli HB101, as shown by Walz et al. (37). However, the involvement of this polypeptide in some aspect of ureolysis cannot be ruled out since the loss of this protein in the recombinant host may be complemented in trans by a homologous E. coli protein. Current studies are aimed at studying the expression and regulation of UreD, UreE, and UreF, as well as identifying the functions that they perform. The possible roles being investigated for these proteins include urea transport, nickel transport, nickel insertion, and enzyme assembly. The enzyme is comprised of three different subunits previously designated gamma, beta, and alpha that are encoded by ureA, ureB, and ureC, respectively, which have predicted molecular sizes of 11.0, 12.2, and 61.0 kDa. We propose that the names of the polypeptide subunits be changed to be consistent with the genetic designations for the individual cistrons, so that future confusion will be
P. MIRABILIS UREASE GENE SEQUENCE
VOL. 171, 1989
avoided when referring to the operon and its translation products. The gamma subunit will become UreA, the beta subunit UreB, and the alpha subunit UreC (Fig. 3). The general structure of one large subunit and two smaller subunits has been observed in other bacterial ureases, with few exceptions. The enzymes of Selenomonas ruminantium, Klebsiella aerogenes, Sporosarcina ureae, P. mirabilis (6, 34), Ureaplasma urealyticum (33), Providencia stuartii (25), and M. morganii (Hu et al., Abstr. Annu. Meet. Am. Soc. Microbiol. 1989) have all been shown to have this subunit structure. In contrast, reports have been published of ureases with only a single large subunit for Bacillus pasteurii (8), Brevibacterium ammoniagenes (26), and Spirulina maxima (7). A possible explanation for this difference may be that small subunits were overlooked on low-percentage polyacrylamide gels. A true exception to this subunit structure in bacteria appears to be the urease from Campylobacter pylori, which has been reported to have only two subunits of 65 and 31 kDa (9; L. Hu and H. L. T. Mobley, unpublished data). With the assumption that the P. mirabilis urease is similar to the ureases produced by K. aerogenes and Providencia stuartii, the probable stoichiometry of the native enzyme would be two of the large subunits and four of each of the smaller subunits, to give a native molecular weight of approximately 215 kDa. Perhaps the most surprising and interesting result was the high percentage of similarity between the three subunits of the P. mirabilis urease and the subunit of the jack bean urease. This similarity suggests an evolutionary relationship between the eucaryotic jack bean urease and the procaryotic P. mirabilis urease. Interestingly, sequences which were very similar to the intron splice acceptor concensus sequence were found in the DNA between the ureA and ureB ORFs of P. mirabilis. One could speculate that this region is a remnant of sequences which allowed ancestral genes of these two cistrons to be spliced, resulting in the formation of a fusion UreA-UreB subunit. Examination of the junction between the ureB and ureC cistrons showed that the two ORFs share a single nucleotide. The third residue (adenosine) in the codon of the last amino acid of UreB was the first residue of the start codon of UreC. Further evolution to a single urease ORF from the hypothetical fusion ureA-ureB ORF and the remaining ureC gene could occur most simply by an insertion of an adenosine residue after bp 1924, resulting in a frameshift mutation which would allow translation of a single large urease subunit of 83.5 kDa. Sequence analysis of the P. mirabilis urease gene complex provided information which will be valuable for future study of the operon. The coordinates of each ORF were determined, making it possible to isolate and study specific gene-polypeptide relationships and identify the function for each gene product. The location and regulation of promoters can be investigated to determine the transcriptional organization of the operon and to provide insight as to how the operon is controlled in the pathogenic process. ACKNOWLEDGMENTS
This work was supported in part by Public Health Service grants A123328 and AG04393 from the National Institutes of Health. We thank Merrill Snyder and Robert Hausinger for editorial review and Jim Kaper for assistance with data analysis. LITERATURE CITED 1. Ansorge, W., and S. Labeit. 1984. Field gradients improve resolution on DNA sequencing gels. J. Biochem. Biophys. Methods 10:237-243.
6421
2. Ausubel, F. M. 1984. Regulation of nitrogen fixation genes. Cell 37:5-6. 3. Ausubel, F. M., R. Brent, R. E. Kingston, D. D. Moore, J. A. Smith, J. G. Seidman, and K. Struhl (ed). 1987. Current protocols in molecular biology, p. 1.1.3. Greene Publishing Associates and John Wiley & Sons, Inc., New York. 4. Birnboim, H. C., and J. Doly. 1979. A rapid alkaline extraction procedure for screening recombinant plasmid DNA. Nucleic Acids Res. 7:1513-1523. 5. Braude, A. I., and J. Siemienenski. 1960. Role of bacterial urease in experimental pyelonephritis. J. Bacteriol. 80:171-179. 6. Brietenbach, J. M., and R. P. Hausinger. 1988. Proteus mirabilis urease: partial purification and inhibition by boric and boronic acids. Biochem. J. 250:917-920. 7. Carvajal, N., M. Fernandez, J. P. Rodriguez, and M. Donoso. 1982. Urease of Spirulina maxima. Phytochemistry 21:28212823. 8. Christians, S., and H. Kaltwasser. 1986. Nickel-content of urease from Bacillus pasteurii. Arch. Microbiol. 145:51-55. 9. Clayton, C. L., B. W. Bren, P. Muliany, A. Topping, and S. Tabaqchali. 1989. Molecular cloning and expression of Campylobacter pylori species-specific antigens in Escherichia coli K-12. Infect. Immun. 57:623-629. 10. Ebright, R. H. 1982. Sequence homologies in the DNA of six sites known to bind to the catabolite activator protein of Escherichia coli, p. 91-99. In J. P. Griffin and W. L. Duax (ed.), Molecular structure and biological activity. Elsevier Science Publishing, Inc., New York. 11. Evans, R. M., and S. M. Hollenberg. 1988. Zinc fingers: gilt by association. Cell 52:1-3. 12. Fasman, G. (ed.). 1976. CRC handbook of biochemistry and molecular biology, nucleic acids, vol. 2, p. 104-114, CRC Press, Inc., Cleveland, Ohio. 13. Friedrich, B., and B. Magasanik. 1977. Urease of Klebsiella aerogenes: control of its synthesis by glutamine synthetase. J. Bacteriol. 131:446-452. 14. Griffith, D. P., D. M. Musher, and C. Itin. 1976. Urease. The primary cause of infection-induced urinary stones. Invest. Urol.
13:346-350.
15. Jones, B. D., and H. L. T. Mobley. 1988. Proteus mirabilis urease: genetic organization, regulation, and expression of structural genes. J. Bacteriol. 170:3342-3349. 16. Kyte, J., and R. F. Doolittle. 1982. A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 156:105-132. 17. Mamiya, G., K. Takishima, M. Masakuni, T. Kayumi, K. Ogawa, and T. Sekita. 1985. Complete amino acid sequence of jack bean urease. Proc. Jpn. Acad. 61:395-398. 18. Maniatis, T., E. F. Fritsch, and J. Sambrook. 1982. Molecular cloning: a laboratory manual. Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y. 19. Marzluff, G. A. 1981. Regulation of nitrogen metabolism and gene expression in fungi. Microbiol. Rev. 45:437-461. 20. Michaelis, S., and J. Beckwith. 1982. Mechanism of incorporation of cell envelope proteins in Escherichia coli. Annu. Rev. Microbiol. 36:435-465. 21. Mobley, H. L. T., and R. P. Hausinger. 1988. Microbial ureases: significance, regulation, and molecular characterization. Microbiol. Rev. 53:85-108. 22. Mobley, H. L. T., B. D. Jones, and A. E. Jerse. 1986. Cloning of urease gene sequences from Providencia stuartii. Infect. Immun. 54:161-169. 23. Mobley, H. L. T., and J. W. Warren. 1987. Urease-positive bacteriuria and obstruction of long-term urinary catheters. J. Clin. Microbiol. 25:2216-2217. 24. Mount, S. M. 1982. A catalogue of splice junction sequences. Nucleic Acids Res. 10:459-472. 25. Mulrooney, S. B., M. J. Lynch, H. L. T. Mobley, and R. P. Hausinger. 1988. Purification, characterization, and genetic organization of recombinant Providencia stuartii urease expressed by Escherichia coli. J. Bacteriol. 170:2202-2207. 26. Nakano, H., S. Takenishi, and Y. Watanabe. 1984. Purification and properties of urease from Brevibacterium ammoniagenes.
6422
JONES AND MOBLEY
Agric. Biol. Chem. 48:1495-1502. 27. Perlman, D., and H. 0. Halvorson. 1983. A putative signal peptidase recognition site and sequence in eucaryotic and procaryotic signal peptides. J. Mol. Biol. 167:391-409. 28. Pusteli, J., and F. C. Kafatos. 1984. A convenient and adaptable package of computer programs for DNA and protein sequence management, analysis, and homology determination. Nucleic Acids Res. 12:643-655. 29. Rosenberg, M., and D. Court. 1979. Regulatory sequences involved in the promotion and termination of RNA transcription. Annu. Rev. Genet. 13:319-353. 30. Rubin, R. H., N. E. Tolkoff-Rubin, and R. S. Cotran. 1986. Urinary tract infection, pyelonephritis, and reflux nephropathy, p. 1085-1141. In B. M. Brenner and F. C. Rector (ed.), The kidney. The W. B. Saunders Co., Philadelphia. 31. Shine, J., and L. Dalgarno. 1974. The 3'-terminal sequence of Escherichia coli 16S ribosomal RNA: complementarity to nonsense triplets and ribosome binding sites. Proc. Natl. Acad. Sci. USA 71:1342-1346. 32. Silhavy, T., S. Benson, and S. Emr. 1983. Mechanisms of protein
J. BACTERIOL. localization. Microbiol. Rev. 47:313-344. 33. Thirkeli, D., A. D. Myles, B. L. Precious, J. S. Frost, J. C. Woodall, M. G. Burdon, and W. C. RusseUl. 1989. The urease of Ureaplasma urealyticum. J. Gen. Microbiol. 135:315-323. 34. Todd, M. J., and R. P. Hausinger. 1987. Purification and characterization of the nickel-containing multicomponent urease from Klebsiella aerogenes. J. Biol. Chem. 262:5963-5967. 35. Von Heijne, G. 1983. Patterns of amino acids near signalsequence cleavage sites. Eur. J. Biochem. 133:17-21. 36. Walker, J. E., M. Saraste, M. J. Runswick, and N. J. Gay. 1982. The ATP operon-nucleotide-sequence of the genes for the gamma-subunit, beta-subunit, epsilon-subunit of Escherichia coli ATP synthase. EMBO J. 1:945-951. 37. Walz, S. E., S. K. Wray, S. I. Hull, and R. A. Hull. 1988. Multiple proteins encoded within the urease gene complex of Proteus mirabilis. J. Bacteriol. 170:1027-1033. 38. Warren, J. W., D. Damron, J. H. Tenney, J. M. Hoopes, B. Deforge, and H. L. Muncie, Jr. 1987. Fever, bacteremia, and death as complications of bacteriuria in women with long-term urethral catheters. J. Infect. Dis. 155:1151-1158.