II11. I. I. I. Rso!I. III. ~I. Figure 2. Strategy for detemination of nucleotide sequence.
Restriction sites for .... 9701 TAACCCETA GTCATAAiAATTATGG ritATTTCT ... S1
-mapping method (see the accomapnying paper) are indicated by bent arrows. ...
MS rUA. 9. I. E..o I. iS? 54. 6? Figure 4. Open reading frames in the oric region.
Volume 13 Number 7 1985 Structure
and
function
of
the
region
of
the
replication
origin
of
the
Nucleic Acids Research
Structure and function of the region of the replication origin of the BaciUus subtilis chromosome. Im. Nucleotide sequence of some 10,000 base pairs in the origin region
Shigeki Moriya, Naotake Ogasawara and Hiroshi Yoshikawa*
Cancer Research Institute, Kanazawa University, 13-1, Takaramachi, Kanazawa 920, Japan
Received 12 January 1985; Revised and Accepted 15 March 1985
ABSTRACT Approximately 10,000 nucleotides were sequenced in the oriC region of the BacilZus subtiZis chromosome. The first replicating DNA strands are hybridized with a SaZI-EcoRI fragment (nucleotide #1206-2954) in one direction (left to right) and an EcoRI-PstI fragment (#2949-4233) in the other. Seven open reading frames (ORF) accompanied with Shine-Dalgarno(SD) sequences were identified. ORF638 and ORF821 were identified as gyrB and gyrA genes respectively based on genetic evidences and amino acid sequence data. Comparison of amino acid sequences revealed that ORF44, ORF446, ORF378 and ORF323 are homologous with rpmHi, dnaA, dnaN and recF of Escherichia coZi, respectively. Thus, the organization of the ORFs from ORF44 to ORF638 resembles the organization of genes in the rpmHi-gyrB region of the E. coZi chromosome. Two non-coding regions characteristic for oriC signals were found near the site of initiation of the first replicating DNA. They are composed of repeating sequences whose consensus sequence TTAT(C/A)CACA is identical to that of 4 repeating sequences in the oriC of E. coZi. INTRODUCTION Two definitions have been associated with the replication origin of bacterial chromosomes, oriC. The first is to define the site of initiation of the DNA synthesis and the second is a minimal sequence essential for autonomous replication. In Escherichia coZi a 245 base pair (bp) sequence was identified as the autonomously replicating sequence of the chromosome (1). However, the exact site of initiation of DNA replication is obscure both in vivo and in vitro. We have determined recently the site of the initiation of DNA synthesis in vivo on each complementary strand of the BaciZZus subtiZis chromosome, using hybridization between the first replicating DNA and the cloned singlestranded DNA from the oriC region of the chromosome (2). On the other hand, attempts to isolate autonomously replicating sequence from the region have not been successful (3). Sequences near the oriC may be inhibitory for replication of plasmid molecules when they are inserted together with the oriC sequence (4). Alternatively an intact organization of the genes and © I RL Press Limited, Oxford, England.
2251
Nucleic Acids Research or,
-_
EP P L
PP
ES
5Kb
recF gyrO gyrA
spoOJ ES
E
E20
S
PS
EG
16S E
rrno 23S 5S
gwA
EP EBEP
El9 1 E2_1 I
E4
E27
chBS03 77777=
1::::::~
chBSOI
|pSM2001 pSM2003
lpNO1001 IpMS102'B7
Figure 1. Genetic and physical map of the oriC region. EcoRI fragments used in this study are shown. In addition, P: P8tI, S: SaZI and B: BamHI sites are shown. Location of spoOJ, guaA (unpublished result) and other genetic markers (7,8 ) was determined by transformation using each fragment indicated by ,i . Location of a ribosomal RNA operon rrnO was determined by sequence determination (5). Position of ori is the site of synthesis of the first DNA strands determined by hybridization (2). Open bars under the map indicate fragments cloned in Charon phage (chBS) or in pBR plasmid (pSM, pMS and pNO) vectors.
inter-gene-sequences near the oriC may be essential for constructing a specific higher order structure of the chromosome to activate oriC. To examine these possibilities and to identify structure and function of oriC, we considered it is essential to determine the nucleotide sequence of the chromosomal segment containing the oriC region. We have now sequenced approximately 10,000 bp in the oriC region. There are 7 open reading frames (ORF) including ones for recF, gyrB and gyrA. Comparison of amino acid sequences of these ORFs with those of known genes in E. coZi revealed that the organization of the ORFs from ORF44 to ORF638 resembles the organization of genes in the rpmH-gyrB region of the E. coZi chromosome. In addition, two inter-gene-sequences are found very close to or at the oriC. They contain remarkable sequences which may function as recognition sequences for regulatory proteins. MATERIALS AND METHODS Bacterial strains and plasmids. E. coZi C600 was used for cloning of B. subtiZis chromosomal fragments. E. coZi JM103 was used as a host for M13 phage vectors. Plasmids used in this study are shown in Fig. 1. Chemicals. a-32P-dATP (PB10384, 800 Ci/mmole) and M13 sequencing kit were purchased from Amersham International Ltd (Amersham, UK). T4 DNA ligase and restriction endonucleases were from Takara Shuzo Co. Ltd. (Kyoto, Japan) or 2252
Nucleic Acids Research 0
2
1
IIII
4
3
5
EcoR I Xho I Pvu ZI
Hind m Pst I CIO I
.
Bgll Alu I.
.
HoeR I
Hho I
I
Sou 3A
I-
laq I
Hinf I Rso I EcoR I
5
6
7
69
kb
XhoI
PvuIE
Hind M Pst I
CaluI
HAl I
Hoe N
HlnfI Hpal Rso!I
I
I7Gen Cy II11
4
IliIIII III I I I III ~I
Figure 2. Strategy for detemination Restriction sites for various enzymes A xhoI site within E5 is taken as the the sequence. Length and direction of shown by arrows.
of nucleotide sequence. within E5, E20a E6 and E19 are shown. start point of numbering nucleotides of fragments sequenced in this study are
from Nippon Gene Co. Ltd. (Toyama, Japan). Determination of the nucleotide sequence. Cloning of fragments used for sequencing is described in the text. These fragments were cut out from the vector and purified by digesting with appropriate restriction enzymes and fractionating through a low-melting agarose gel. Each fragment extracted from the gel was recloned into M13 phages (mp8, mp9 mplO or mpll) and nucleotide sequence of the inserted fragment was determined by the dideoxy chain termination method according to the procedure specified by the supplier of the kit. Exceptionally the fragments within E5 were cloned into M13 directly from Charon BS03 (Fig. 1) because cloning of E5 into pBR vectors caused structural changes in the fragment. RESULTS
Strategy for sequencing oric region of over 10,000 base pairs. The region of replication origin of the B. subtilis chromosome (oriC) was first cloned in bacteriophage )x Charon vectors to construct a map of restriction sites covering over 45 kilo base pairs (kbp) (5). Using these 2253
Nucleic Acids Research CAGCTGGCTCGCCGGTTTTCTGGCAATAATGATGTAATCCTTTTCCTTCAGTCTCTCTTTCTCTTCAAGAAAG
-600 -
S00
-400 -
aCCTGCCGAATCAAACGTTTGATCCGATTTCGCATCACAGCATTGC CAATiTTTTTTGCTGACGGAAAGCC CGACACGCAGTTCATCATTTTCAGGCTGAT CAAGCGTATATAAGACAAAC TGGCGGTTTGCAACTGATGTCCCATGTTTAAACACTTTTTGAAAATCTTCATTTTTCTTTAAACGATTTCaCTTCTTCAA
300 ATGACTCACTCCGATACTGGCTGGCAGAAACTTTTAAAATTTATTATGAACATGAGATGCATCTGAAC CAGGAAGCATGC CCCGGTTCATTACGCATCAC
-200
ATCTATCTTTTCCCATAACGTGAAAAAU
A
ATTAT
C .i;AAGCTGATAATACTTTTCTGCCTTTGCGGCGACGGCGTGCTAAAACT
GATTCGACTATTATGAAAAGACGGAAACGCCGCTGCCGCACGATTTTGA
AtaSerLeuVaZLyeArqCGZJLwaArgAr?ArgArgAZaL.uVaZL
-1 00 AGACGAC CGTTTTTTGAACT CATAC GGCTTCTGAAGC CATGAACTTTGCTGC GTTTAC GGTTATT CGGTTGGAATGTTCTTTTCATTTATGACAC CTC CC TCTGCTGGCTTTTTTCTTGAGTATGCCGAAGACTTCGGTACTTGAAACGACGCAAATGCCAATAAGCCAACCTTACAAGAAAAGTA GTGGAGG SD euArvG1yAsnLYeSerSerMetAr-SerArgPheGZ4fBiaVatLysSerArgLyeArgAsnAenProGZnPheThrArgLysMet 1 1ITCGAGGAATAGCTGTTAAAG GTCTTACTTATTATATTTGCGTTACCTATTCATTGTCAACTTCACTAGTGCTTTTATTTCTTGCACCTAATAGGA z
101
TACCATACCiTTTTCAACTTTCGAAAC CTTATTTTTTAGATTCCTTAATTTTACGGAAAAAAGACAAATTCAAACAATTTGCCCCTAAAATCACGCAIGTa
201 GATATCTTT_TCGGCTTTT_TTAGTATCCA_GAGGTTA_CGAC_ACAT_TTCACATTA_CAACCCG C C GGACAAGGTTTTTTCAACAGGTTGTC CGCT
301 TkGTGGATAAGATTGTGACAACC
CTCTCGTTiATTTTGGTTTGTGTTTTAACTCTTGATTACTAATCCTACCTTTCCTCTTTATC
401 CACACAG CT;GGAAGT 401 _____ _ _GG A G A A G G T T GGT CAC&AAGI.GTGGATAAGTTGTGGATTGATTT
A
GTT GTGAAATTTGTCGAAAAGCTATTTATCTAC TATATT
501
ATATGTTTTCAACATTTAATGTGTACGAATGGTAAGCGCCATTTGCTCTTTTTTTGTGTTCTATAACAGAGAAAGACGCCATTTTCTAAGAAAGGAG
601
ACGTGCCGGAAGATGGAAAA'TATATTAGAC CTGTGGAACCAAGCCCTTGCTCAAATCGAAAAAAAGTTGAGCAAACCGAGTTTTGAGACTTGGATGAAGT
SD
MetGZuAenIZeLeuAepLeuTrpA9nGLnAZaLeuAZaGLnIleGZuLy8Ly6LeuSerLysProSerPheGtuThrTrMvetLy8S
70 1 CAACCAAAGCCCACTCACTGCAAGGCGATACATTAACAATCACGGCTCCCAATGAATTTGCCAGAGACTGGCTGGAGTCCAGATACTTGCATCTGATTGC
erThrLyeAZaHisSerLeuGtnGtyAapThrLeuThrIteThrAtaProAanGtuPheAlaArgABPTrPLeuGluSerArgTkrLeuHieLGuIteAt
801 AGATACTATATATGAATTAACCGGGGAAGAATTGAGCATTAAGTTTGTCATTCCTCAAAATCAAGATGTTGAGGACTTTATGCCGAAACCGCAAGTCAAA
aAapThrIleTyrZGuLeuThrGtyGZuGtuLeuSerIteLysPhe VaZIteProGlnAunGtnAepVatGtuAepPheMetProLyeProGLnVaZLye
901
AAAGCGGTCAAAGAAGATACATCTGATTTTCCT CAAAATATGCTCAATCCAAAATATACiTTTTGATACTTTTGTCATCGGATCTGGAAACCGATTTGCAC LysAtaVatLysGtuAspThrSerAspPheProGZnAsnMetLeuAJsnProLysLTyrThrPheAspThrPheValIZ§etySerGZyAsnArgPheAtaH
1 001 ATGCTGCTTCCCTCGCAGTAGCGGAAGCGC CCGCGAAAGCTTACAAC CCTTTATTTATCTATGGGGGCGTCGGCTTAGGGAAAACACACTTAATGCATGC i8A ZaA ZaSerLeuAta VatA laG zuA ZaProA ZaLy8A taTyrAsnProLeuPheI ZeT.urGZ.uG Z_UVa GZy !L*uG ?LyeThrHisLeuMe tEi#A t
1 101 GATCGGCCATTATGTAATAGAT CATAATC CT T CTGCCAAAGTGGTTTATC TGTCTTC TGAGAAATTTACAAAC GAATTCATCAAC TCtATCCGAGATAAT
aIteGt.uHieTyrVatIZeAspHiiAsnProSerAZaLysVatVatTyrLeuSerSerGZuLysPh.ThrAenGtuPheIteAenSerIZeArgAepAen
1 201 AAAGC CGTCGACTTCCGCAATCGCTATC GAAAT GTTGATGTGCTT TTGATAGATGATAT TCAATTTTTAGCGGGGAAAGAACAAACCCAaGAAGAATTTT
LyjeAZaVatA8pPheArgAsnArgTyrAraAsnValAapVatLeuLeuIteAspAspIteGZnPheLeuAtaGtyLkuGtuGtnThrGtnGluGtuPheP
1 301 TCCATACATTTAACACATTACAC GAAGAAAGCAAACAAAT CGTCATTTCAAGTGACCGGCCGC CAAAGGiAAATTCCGACACTTGAAGACAGATTGCGCTC heHieThrPheAen ThrLeuHieGZuGluSerLyBGtnIleVaZIZeSerSerA.pArgProProLy.G1uIlteProThrLeuGluA.pArVqL.uArgSe
1401 ACGTTTTGAATGGGGACTTATTACAGATATCACACCGC
C;GATCTAGAAACGAGAATTGiAATTTTAAGAAAAAAGGCCAAAGCAGAGGaCCTCGATATi
rAz'PheGtuTrpSGILe*uIreThrAspIteThrProProAspLeuGluThrArgIteAtaIIeLeuArgLy#LsAtaLyuA taGLuGlyLeuAspIre
1 S01 CCGAACGAGGTTATG CTTTACAT CGCGAAT CAAAT CGACAGCAATATT CGGGAACTC GAAGGAGC ATTAATCAGAGTTGTCGCTTATTCATCTTTAATTA
ProAenGZuVaZMetLeuTyrIteAtaAenGtnIleAepSerAsnIteArgGtuLeuGLuGtyAtaLeuIteArgVatVatAZaTypSerSerLeuIleA
1 601 ATAAAGATAT TAATG CTGAT C TGGCC G CTGAGGC G TTGAiAAGATAT TATT CC TT C CTC AAAAC CGAAAGT CATTAC GATAAAAGAAATTCAGAGGGTAGT snLyAsA pI ZeAsnA taAs pLeuA ZaA ZaG tuA ZaL*uLysAt/pIZeI ZeProSerSerLysProLys Va tI eThrI teLy JGZuI ZeGZnArgVaZ Va
1 701
AGGCCAGCAATTTAATATTAAACTCGAGGATTTCAAAGCAAAAAAACGGACAAAGTCAGTAGCTTTTC CGCGTCAAATCaCCATGTACTTATCAAGG6AA ZCGySGnGtnPheALnLLELjLsLeuGtuAs pPheLyeAlaLMaLys
jThrLysSerVa ZA taPheProArgGLnIteA taMetTyrL.uSerArgGZu
1 801 AT GAC TGATTC C TCT CTT C CTAAAA T CGGT GAA GAGTT TGGAGGACGTGATC ATA CGAC CGTT AT TCATGC GCATGAAAiAAATTT CAAAACTGCT6GCA6 MetThrAspSerSerLeuProLyIZetaGCGZuGtuPheGlZGtCArgAepHisThrThrVaZtIZHisA ZaHisGZuLIaIteSerLyeLeuLeuA taA 1901
2001
2101
ATGATGAACAGCTTCAGCAGCATGTAAAAGAAATTAAAGAACAGCTTAAATAGCAGGACCGGGGATCAATCGGGGAAAGIG,AATAA,ACTTTTCGGAAGT *pAepGZuGtnLeuGtnGtnHieVaZLyeGZuIZeLysGZuGtnLeuLys CATACACIGTCTGTCCACAIGTGGATAGGCTGTGTTTCCTGTCTTTTTCACAACTTATCCACJAATCCACAGGCCCTACTATTACTTCTACTATTTTTTA
TAAATATATATATTAATACATTATCCGTTsEEjaTAAAAATGAAATTCACGATTCAAAAAGATCGTCTTGTTGAAAGTGTCCAA6AT6TATTAAAA6C AT 11 uster SD
2254
Ne tLsaPhe Thr I LZeGZnLyeAepArgLeuVaZGZuSerVaLZGnAepValLeurLyAZ
Nucleic Acids Research 2201 AGTTTCATCCAGAAC CACGATTCCCATTCTGACTGGTATTAAAATTGTTGCATCAGATGATGGAGTATCCTTTACAGGGAGTGACTCAGATATTTCTATT
aVaZSerSerArjThrThrIleProIleL.uThrGIZaeLysIZeVaZAZaSerAepAspGCtValSerPheThrGtJSereerAspIZeSerIrz
2301 2401
GAATCCTTCATTCCAAAAGAAGAAGGAGATAAAGAAATCaTCACTATTGAACAGC CCGGAAGCATCGTTTTACAGGCTCaCTTTTTTAGTGAAATTGTAA AMAAATTGCC GAT GGCAACTGTAGAAATTGAAGTC CAAAATCAGTATTTGACGATTATC C6TTCTGGTAAAGC TGAATTTAATC TAAACaGACTGGATGC
GCuSerPheIteProLytGZuGZjuGyAepLyaGluIZeVatThrIZeGZzuGnProGZCSerIteVatLeuGZnAZaAraPh.PheSerGiuIrzVaZL
yeLyeL#uProMetAZaTlhrVatGtuIteGtuVatGtnAenGtnTyrLeuThrlZoIZeArgSerGtwLyeAZaGZuPheAenLeuA#nGtyLouAepAt
2501
TGATGAATA;CCGCACTTGCCGCAGATTGiAGAGCATCA;GCGATTCAGiTCCCAACTGiTTTGTTAAAiAATCTAATCAGACAAACAGTATTTGCAGTG
2 701
SerThrSerGZuThrIAProrIZeLuThrCZk VaZAenTrpLyeVaZGZuGCnSerGZuLeuLeuCyeThrAZaThrAepSernieArgvLeuAZaLsuA GAAAGGCGAiACTTGATAT; CCAGAAGACiGAT CTTATAiCGTCGTGAT;C CGGGAAAAAGTTTAAC TGiACT CAGCAAGATTTTAGATGACAACCAGGi rgLysAZtaLysLeuAspIZeProGtuAepArgSe*rTyrAenVa ZVaZIteProGtyLmeSerLeuThrGtuLauSerLysIZeL!aA*ZAspA*nGZnGI ACTTGTAGATATCGTCATCACAGAAACCCAAGTTCTGTTTAAAGCGAAAAACGTCTTGTTCTTCTCACGGCTTCTGGACGGGAATTATCCAGkCACAACC
aAepGtuTyrProRiaLeuProGtnIZeGluGCZHuRiaeieA taIteGCnIrteProThrA.pLeuLeuLeAenLeuIleArgGZnThrValPh.A ZaVat 2601 TCCAC CTCAaAAACACGCC CTATCTTGACAGGTGTAAACTGGAAAGTGGAGCAAAGTGAATTATTAT GCACTGCAAC GGATAGC CACCGT CTTGC ATTAA
2801
uLeuVatAapIZeVaZIZeThrGZuThrGZnVaZLeuPheLysA taLysAunVaZLeuPhePheSerArgLeuALPeujA.TPGI oApThrThr 2901 AGCCTGATTC CGCAAGACAGCAAAACAGAiAT CATTGTGAACACAAAAGAATTC CTTCAaGC CATTGATCGTGCAT CT C;TTTAGCTAGiGAGGGACGCi
SerLeuIteProGZnAspSerLysThrGZuIZeIleVaZAsnThrLyaGtuPheLeuGZnAZaIZeAspArpAZaSerLeuLeuAZaArgZuCGtyArgA
3001 ACAACGTTGTAAAAC TGTCCGCAAAACC GaCTGAATC CAT TGAAATTT CTTCCAATTCGCCAGAAATC GGTAAAGTTGTaGAAGCAATTGTTGC GGATCA
snAJnVaZVaZLysLeuS*rAZaLysProAZaGZuSerIZeGZuIZeSerSerAsnSerProGZuIZeGZyLysVaZVaZGZuAZaIZeVaZAZaAJpGI
3101 AATTGAAGGTGAGGAATTAAATATCTC TT;TTAGTC CAAAATATATGCTGGATGCAC TAAiGGTGC TTGAiGGAGCAGAAATAC GCGTAAGCTTTACAGGC
nhteGZuGZdGtuGluLetdAenIZeSerPheSe2P2oLyjTrMetLeuAs_AZaL.uLy8VaZLouGZuGZyAZaGZuIZeArgVaZSerPheThrCGy
3201
GCAATGAGACCTTTCTTAATTCGCACGCCGAATGATGAAACGATTGTACAGCTTATCCTTCCTGTCAGAACCTATTAATCCGATACACTiCTGCCGACC AZaMetArgProPheLeuIZeArgThrProAsnAspGtuThrIZaVaZGZnL.uIZeLeuProVaZAr,ThrTyr
3301 T 3401
TTTCTATTCGGTATCTGCTCCGACAAGTT;TCCCTTTCC
3701
GATATT;T
AGA
AGATATAATGGCAAATCCGATTTCAATTGATACAGAGATGATTACACTCGGACAATTCTTAAAATTAGCCGATGTGATTCAGTCTGGCGGTATGGCGAA MetAZaAanProIZeSerIleAapThrGZuMetIZeThrLeuGZyGZnPheLeuLyaLeuAZaAepValIZeGtnSerGZyGZyMetAZaLy
3 501 GTGGTTTTTAAGC GAGCATGAAGTGCT TGTGAACGATGAGC CGGACAACCGC
3601
CTAATTCGTT;TTTTTTAGTCAATTAG
CGGGGCAGAAAGCTGTATGTTGGAGATaTGGTAGAGATTGAAGGATTT
sTrpPheL*uSerGZuRisGtuVaZLeuVaZAsnAepGZuProAspAenArgArgGZyArgLysLeuTyrVaZGZyAspVaZVaZGZuIteGZuGZyPhe GGTTCATTTCAAGTCGTCAiTTAAAGC GGGTGACACTGA;TGTATATC CiGAACTTAGAiCTGAC ATC T;AC CGCAACTAC GAC CATG C;GAACTTCAA;
GZySerPheGZnVaZVaZAun
TTGAAAATAiAGTAAATGTGATCATCGGAGAAAACGCCCiGGGGAAGACAAACCT1aV 1CTATGTCTTGTCCATGGCGAAAjCGCACCGGAi MetA ZaLyeSerHisArpTh
3801 ATCAAATGACAAAGAAC TTATACGGTGGGACAAAGACTATGCTAAAATAGAGGGAAGAGTGATGAAGCAAAACGGGGC GATC CCGATGCAGCTCGTCATC 3901
rSerAsnAspLyJGCuLeuIteAraTrpAspLyJAapTyrAZaLysIteGturaZMtLyeGZnA nG yAZa.ZtProMetGZnLeuVaZIZe TCCAAAAAGaGTAAAAAGGGCAAGGTCAAT CATATTGAACAGCAAAAGCTCAGC CAGTATGTCGGGGC C CTC AACAC CATTATGT TC GCGCCGGAAGATT SerLyeLyagZkLyeLkeGtyLysVaZAtnHiIZeeZuGZnGtnLnMaLeuSerGZnTyrVaZGZyAtaLeuAanThrIZeMetPh.AZaProGZuAspL
4001
TAAATCTTGTAAAGGGAAGC'CCTCAAGTGAGAAGGCGGTTTCTTGACATGGAAATCGGACAGGTTTCTC CC GTCTACC TTCATGATC TTTCTCTTTACCA *uAanLeuVaZLyegZMSerProGZnVaZAV0ArVArgPh.LeuAspMetGZuIleGZyGZnValSerProVaZTyrLeuHieAapLeuSerL.uTyrGt
4101
GAAAATCCTTTCCCAGCGGAATCATTTTTTGAAACAGCTGCAAACAAGAAAACAAACTGACCGGACGATaC TCGATGTTCTGAC CGATCAGCTTGTAGAA nLyLZttLeuS7rSGtnAraAenHisPheLeuLyeGZnLeuGZnThrArgLyaGZnThrAapArgThrMetLeuAepVa ZLeuThrAepGZnL.uVa ZGZu
4201
GTTGCAGCAAAAGTCGTCGTAAAACGCCTGCAGTTTACAGCACAGCTCGAGAAATGGGCGCAGCCCATCCATGCAGGCATCTCAAGAGGGCTTGAAGAAC VatAtaA
ZaLy eVaZ VaZVaZLy,oArgLeuGtnPheThrAZtaGtnLeuGZuLysTrpA taGZnProI ZeHieAlZaGZyI ZeS*rArgGZyLeuGZuGtuL 4 301 TGACCCTGAAATACCATACAGCT CTTGATGTATCAGAT CCCCTAGATTTGTC GAAAATAGGAGATAGCTATCAAGAAGCGTTTTCTAAATTAAGAGAAAA
euThrLeuLysTyrHi.ThrAZaLeuAepValSerAapProLeuAspLeuSerLyeIZeGZyAspSerTBrGZnGtuAtaPheSerLysLeuArgGzuLy
44 01 AGAAAT TGAGC GTGG TGTGACGC TGT CAGaGC C TCATC GCGAT GAT GTTC TTTT CTA T GTGAAC GGAC GCGAT GT GCAGAC GT ATGGTT CTCAAGGAC Aa 4 501
eaGuIteGZuArgGZyValThrLeuSErVGZProHisArgAspAspVaZLeuPheTyrVazA8nMg1,ArgA8pVatGZnThrTyrGZySer.GZnGlGIn CAGCGAACGACGGCGTTGTCCCTTAAGCTaGCGGAGATT'GACCTGATCCATGAAGAAATCGGAGAATATC CCATTTTACTATTGGATGATGTACTGAGTa GZnArgThrThrAZaLeuSerLeuLysLeuAZaGZuIZ.AepLeuItHieiuGZuIGuZeGZyGZuTyrProIZeLLeuLeuLeuAepAspVaZL.uSerG
4601 AACTGGATGATTATCGC CAGTCACACTTGCTTC ATACGATCCAAGGCCGTGTACAAACGTTTGTCACAACGACAAGCGTTGATGGCATTGATCACGAAAC 4 701
ZuL*uAepA8PTlyrd=GZnSerHi6L*uLeuHi8ThrIZGZnGZyArgVaGZInThripheVaZThrThrThrSerVaZAepGZyIleA.pHieGluTh CTTACGTCAAGCAGGAATGTTCCGTGTGCAAAATGGTGCaTTAGTGAAGTGAAGAAATGAGGTGAGCAATTGTATATTCATTTAGGTGATGACTTTGTGa rLeuArgGZnAtaGlyMetPheArgValGZnAengl,AZaLeuVaZLyJ
4801 TTTCAACACGAGATATT GTC GGCATTT TT GAC TTTAAAGCCAACAT GTCaGC CTAT TGTTGAAGAATTT C TGAAAAAAC AaAAAC AC AAGaT GGTGC CTTC
4901 CGTAAACGCACGC CCAAATCTATCGTAGTCACGGTTCAGAATATATATTACTCTCCCTTA'TCTTC CAGCACATTAAAAAAACGTGCGCAATTTATGTTTG
5001
AAATAGATTCTTAGAAATTTTTTATCACGAATATATCGTTTAGAAAAGT&FIIUTGACGTGGCTATGGAACAGCAGCAAAACAGTTATGATGAAAA SDv t VetGZuGGnGZnGZnAenSerSM rAPGZuAs
2255
Nucleic Acids Research
~ ~ SD
51 01 TCAGATACAGGTAC tAGAAGGATTGGAAGCTGTTCG6TAAAAGAC CGGGGAT GTAT ATCG6TTCGACAAAC AGCAAA66CC TT CACCACTT66TATGGGAA
nGZnIZeGZnVaZL.uGLCZGuLeuGZuA taVatArgLm#ArgProGZMyetThrILeGLySerT&hrAsnSerLyeGImLeuMteNiuLeuVaZTrpGZu
5201
ATTGTCGACAATAGTATTGACGAAGCCCTiGCCGGTTATTGTACGGATATCAATATCCAATC
AGTACCGIT'CAcGTATGGCC CGGAGGiAAAATTTGACGGAAGCGGCTA
TGAAAAA6ACAAC
rz.eVaZAepAenSerIZeAsGpZuA ZaLEuAZaGtJyrvCysThrABpIZUAISnZIeGInIZGZuLyeApA9nS*rIZerThrVaZVlYtA SRA&GZIA 53 01 GCGGTATTCC AGT CGGTATT CAT GAAAAAATGGGC CGTCC TGCGGTAGAA6TCATTAT GACGGTGC TTCAT GC rgGCI IZaPro VaZ ZGI IeHiJG ZuLyJJM1etOtjA rgProA taVa ZG tuVa ZI Zz{e t2hr VaZtL uNiJAZtaC Zx axLmPh#AspGItS*rtGTyr
5401 TAAAGTA TCC GGAGGAT TAC AC GGTGTAGaTGCGTCGG TCGtAAACGCAC TATCAACAGAGC TtGATG TGACGG TTCACCG TGACGGTAA'AATTCACCGC
rLysVaZStrGtyGZyLeuHisGZyVatGtyAZaSerVaZVaZAsnAZaLeuSerT'hrCZ,yL*uAepVaZThrVatHisArgA*pGtyLsIZglefiJArg
5 501 CAAAC CTATAAAC GCGGAGiTtC CGGTTACAGACCTTGAAATCATTGGC GAAACGGATCATA CAGGAACGAC GACA CATTT TGTC CCGGAiCC 56 01
5 701
5801 590 1
600i
CTGAAATtTi
GZnThrTlyrLyJArgGtyVa ZProValThrA8pL.uGLuIteZGaCZyGtuThrAepiisrThrGtZyThrThrThrEi8PheVaZProA.pProGiIuteP TC TCAGAAACAAC CGAGTA TGATTACGATC TG CTTGC CAAC CGCG TGC GTGAAT TAGCCiTTTTTAACAAA6GGCG TAAACAT CAC GATTGAAGATAAAC G heS.rGZuThrThrGZuTyrAepTyMrAapL.uLeuA taAsnArg VaZArgG&LueuA taPheLeuThhrLysGiyVaZAsnIXZeThrIehGtuAepLyeAr TGAAGGACAAGAGCGCAAAAATGAATACCATTACGAAGGCGGAAtTAAAAG TTATG TAGAGTATTTAAACC6CTC TAAAGAGGTTGTC CATGAAGAGCC6 9GZuGCyIyCnGZuArgLysAsJnG uTy rEits2'yrG IuGZyly ZIZeLysSerfryrVa ZGC uTlrLouAsnArgSorLysGIu Va ZVa HtGs tuG luPro AT TTACAT T6AAGGCG6AAAiGGAC GGCATTACGGTTGAAGTGG CT TTGCAATACAA TGACAGCTACACAAGCAACATTTAC TCG TTTACAAACAACATTi IZ T,myrIZeG ZuGCZifGCZuLy8Ae8pGtys eThr VaICZOluVaZA taL*uGtnTyrA snAspSer2lyrrhrS*rlAsnriZeyrSerPheThrA#nAl nIZeA, ACACG TACGAAGGC GGTAC CCATGAAGCT GGCTTCAAAAC GGGCC TGACTCG TGT TATCAACGATTAC GCCAGAAAAAAAGGGCTTAT TAAAGAAAA TGA snThlThryrGuGtyGZyThrHisGZGuA laGZyPh.LyeThrGtpL.uThrArgVaIZeAAenAspTyrALaArgLyeLyaGLyLouIieLy.GiuAanAs TC CAAACC TAA6CGGAGATGACGTAAGGGAAGGGC TGACAGCGATtATTT CAATCAAACACC CtGATCC GCAGTTTGAGGGC CAAACAAiAAACAAAGCTG pProAenLnuSerGZyAapAepVaZArgGZuGZuLeurhrAlZatt.leSerI Lveise$iProAupProGtnPh.GZuGtvGtnThrLyvThrLyeLou
6101 GGCAACTCAGAAGCACGGACGATCACCGATACGTTATTTTCTACGGCGATGGAAACATTTATGCTGGAAAATCCAGATGCAGCCAAAAAAAtTGATA GZyA enS#rG luA taArgT'hrl t eZhrAsapfhrLz uPh*S*rThrA ZaMz tGZuThrPh*M*etLeuG ZuA nProA epAZtaAZaLysLy*I l;Val A p~L 6 201 AA6GTTTAAT6GCGGCAAGAG CAAGAATGaC TGCGAAAAAAGCGC GTGAAC TAACAC GCCGTAAGAGTGC TTTGGAAATTTCAAACCT6C CCGGTAAGTT yGICZLeuJetA taA taArgA ZaArgMe tAIaA ZaLys Ly JA aArgGZuLeurhrArgJArgLysSerA ZaLeuGZuI teSerAsnL*uProG18ytyuLe 6301 AGCGGAC TGC TCTTCAAAAaATC CGAGCATC TCCGAGT TATATAT CGTAGAGGGTGAC T CTGC CGGAGG'ATCTGC TAAACAAGGACGCGACAGACATTTC uA taAapCyeSerSerLyeAapProSerIteSerGZuLeuTyrIteVatGZuGtyAepSerA ZaGtyCIySerAZaLysGtnGCyArgAepArgHiePhe
64 01 CAAGC CATTiTt6CCGCT TAGAGGTAAAATC CTAAACGT TGAAAAGGCCAGAC TGGATAAAATCCtTTTC TAACAACGAAGTTC GCTCTATGATCACAGCGC
GZnAZaIIZeLuArgGtyLysIZeLyetteLeuAenVatGZuLyeAtaArgL.uA.pLyei.*L.uSerA#nA.nGtuVatArgS.rMetrtrhrAtaL
6 501 TCGGCACAGGTATCGGAGAAGAC TTCAAC CTTGAGAAAGCCCG TTAC CACAAAGTTGTCATTATGACAGATGC CGATGTTGACGGCGCGCACATCAGAAC euGtyThrGZyIZeG tyGLuAepPheAevnLeuGtuLysA taArg2lyrNiLyaVaZ VaZIZeN.tThrAapAZaA*pVaZAspGtyA Zaili.I.tArgTh 6 601 AC TGC TGTTAACGTTCTTTTACAGATA TAtGC GCCAAATTAT CGAAAATGBCTAC GTGTACATTGCGCAGC CGC CGC TC TACAAGGTTCAACAGGGGAAA rL#uLouLeuT7hrPhsePh*2'rArg.TrMe tArgG tnIZelIZoG uA nGty Tyr Va ITyrIltA aZa nProProLouryrLys Va taGnCZnC nG,yLys 6701 CGCGTTGAATATGCATACAATGACAAGGAGCTTGAAGAGCTGTTAAAAACTCTTCCTCAAACGCCTAAGCCTGGACTGCAGCGTTACAAAGGTCTTGGT6 Arg VaGluClyrA laTaPrAnASpLVsGluLSuGZuGZuL
.uLeuLyeThrLnuProGInrhrProLyeProGyvLeuGlnArgTyrLyG0tyLvuGZyG
6801 AAATGAATGC CACCCAGCTA TGGGAGACAAC CATGGATCC TAGC TC CAGAACAC TTC TTCAGG TAACTC TTGAAGATGCAATGGATGCG6ATGAGAC TTT
ZuMetAennaThGrGtnLeuTrpCtulThrThrMNetAepProS.rS.rArgThrLeuLeuGInVatThrLouGaZuAApA iaNetAepAtaAapGCuTlhrPh AAATCTTGACTCACATCAAAAGCTT s*GuNetLeuNetGZyAepLyaVatGtuProArgArgAenPhIZeGZuAZaAAnALaArgryrVaZLyeAenLnuAepILe
6901 TGAAATGCTTATGGGCGACAAGGTAGAACCGCGCCGAAACTTCATAGAAGCGAATGCGA6ATACGTTAA 700t IT
CAATAAiAAAT^AAGG TTIT;CLUACAAGA TCAGATGCAGCGATGC CTGCAATAC CTATATA TTCTAGCAA TT TAAiGTGTATAATCATAAGTT
7101 tATTGATATAATGGAGAATGGAATCGTATTGAAGGTCATAATGGA
CAATCTACTC
CCACATATTTCATGTGATACTTCGi--TTTTAATGAG HetSe
mat* .~~~~~~~~~~~~~~~~~~~~~~~s
7 201 TGAACAAAACACACCACAAG TTCGTGAAATAAATATC AGT CAGGAAATGCGTACGTC CTTCTTGGATTATGCAATGAGCGTTATC GTGTCCC6TGCTC TT rG tuG InAenThrPro GZnVa ZArgG tutZoAonrleserG tnGtFuM tArgTlhrSerPheLeuA sp.TrAt4Mg tS*rV4 tiIVa ZSerAlrgA laL*u 7 301 C CGGATGTTCGTGACGG TTTAAAACCGGTTCATAGACGGATTTTGTATGCAATGAATGATTTAGGCATGACAAGTGACAAGCC TTATAAiAAAATC CGCGC
ProAWpVatArgAGpZyL.uLYUProVaZhi.ArgArgIXZLeufrA ZaMetAenAepLeuGZiyMtThrS.rA.pLy.ProTyvrLgeLeSerAtaA
7401 GTATCGTTG6AGAAGT TATC GGGAAATAC CACCCGCACG6TGATTCAGC66TATATGAATCCAT6GGTCA6AATGGCTCA66ATTTCAAC TACCGTTATAi
7501 7601
reIZtValtGtyGuV4aZIGZaG,Lys-Trffi@ProHioGtyAopS*rAtaV4ZrTrGtuSerMvetValArgMetAlaGtnsApPheAJ"ryrArg2ryrM GCTCGTTGACGGTCACGGAiACTTCGGTTCTGTTGACGGiGACTCAGCGGCGGCCAT6C6TTATACAiGCACGAATGTCTAAAATCTCAATG6A6ATT tLuVGaZASpGZyfiC9GZyASnPhSGtySerVaZAepGZyA.p$.vAZIAlaAZa et;ArgTfyrrhrGtuA ZaArgMHtS.rLyetIeSeNetGtstZ.I
CTTCGCGACATCACAAAAGACACAATCGATTACCAGGAT'AACTATGACGGGTCAGAAAGAGAACCTGTCGTTATGCCTTCAAGGTTCCCGAAtCTGCTCG
k*uArg A#pIterhrLy#AsaprhrrleA epTyrG ZnAwpA Jn2lyrAspGZyfS*rG ZuArC ZuPro Va ZVatM*etProSerArftPhoProA#xLeuLz V
TGAACGGTGCTGCCGGCAT;6C6GTAGGTiTGGCAACAAiCATTC CTCC 6CACCAGC TG66AGAAATCA;TGACGGTGTiCTTGCT6TTiGTGAGAATCC aZASnGZyAL4aA4GZIreA iaVaZGZyMetA tahrAAnI.PvroProHisGLnL.uGtyGluIterleAepGtyv4ZL.xuAtaVaiSerGzGAaPr 7801 GGACATTACiATTCCAGAGCTTATGGAA; CAtTC CAGG6CCTGATTTCCC6AC CGC6GGTCAAATC TT6GGACGCAGC6GTATCCGGAiA6CATACGAi oA apI lrhrrZProGtuL*uMfjtGtu Va ZI ProGZlwProA @pPh*ProrhrA alyGZtnI ZeDeuGtyJArgSerCGIyrZzArfLyrA Za,IrGZu 7 701
7901 TCAGGCCG6AG6CTCTA TCAC6ATCCGGGCiAAAGCT6AGiTCGAACAAACAtCttC6GGG;AAAGAAAGAiTTATCG6TTACAGAGTTACC;TACCAAGTAi SerCGyArgGCyserI trhrr l ArgAZaLv^AtaGZuI l GZuGt nrhsrrSerGZIyLM* GZuArgX ZsXZeta trhrCZuL*uPro8rCZ"hn ZA
2256
Nucleic Acids Research 8001
ATAAGGCGAAATTAATTGA6GAAAATTGC TGATC TCGTAAGGGACAAAAAGATAGAGGGTATCACAGATCTGCGTGATGAGTCAGATCGTACAGGTATGAG uGZyIZeThrAspLAuArgAspGZuSerAspArgThrGtyMAtAr anLyeAtaLysLeuIleGGuLysIZeAZaAapLeuVaZArgAspLyaLysIZeG
810 1
AATTGTCATT6AAATCAGACGCGATGCCAATGCGAATGTTATCTTAAACAATCTGTACA'AACAAACTGCTCTACAAACATCTTTTGGCATCAACCTGCTT gIleVaZIZeGluIZeArgArgAspAZaAenAZaAsnVaZIZeLeuAsnAsnLeuTyrLyaGZnThrAZaLeuGZnThrSerPheGZylZeAenLeuL#u
8201 GCGCTTGTTGATGGC CAGC'CGAAAG TTTTAACTC TTAAGCAATGC CTGGAGCATTAC CTTGAC CATCAAAAAGTTGTCATTAGAC GC CGTACTGCTTATG
8301
AZaLeuVatAspGZy C.nProLysVaZLeuThrLsuLyeGZnCyaLeuGZuH1isTyrLeuAspHisGZnLyeVaZValIZ*ArgArgArgThrAZaTyrG AATTGCGTAAAGCAGAAGC'GAGAGC TCATATC TTGGAAGGATTGAGAGTTGCACTGGATCATCTC GATGCAGTTATC TCC CTTATC CGTAATTCTCAAAC
ZuLeuArgLyaAZaGZuAZaArgAZaHisIZeLeuGZuGZyLeuArgVaZAZaLeuAapHieLeuAspAZaVaZIZeSerLouIZeArgAsnSerGZnTh
8401 GGCTGAAATTGCGAGAACAGGT TTAAT TGAACAATTCTCAC TGACAGAGAAGCAAGCACAAGC GATC CTTGACATGAGGCTC CAGCGTTTAACGGGACTG
rAZaGZuIZeAZaArgThrGZyLeuIZeGZuGZnPheSerLeuThrGZuLy8GCnAlaGZnAZaIZeLeuAspMetArgLeuGtnArgLeuThrGZyLue
8501 GAACGTGAAAAGATCGAAGAAGAATAC CAGTCT CTTGTTiAAATTAATTGCAGAGC TAAAAGACATC TTGGCAAATGAATATAAAGTGC TTGAGATCATTC
GZuArgGZuLysIZeGZuGZuGZuTyrGZnSerLeuVaZLysLeuIZeAZaGZuLeuLysAapIZeLeuAlaAsnGZuTyrLysVaZL*uGZuIZeIZeA
8601 GTGAAGAACTCACGGAAATCAAAGAGCGTTTTAACGATGAAAGAC GTACTGAGAT CGTCACTTCTGGACT6GAGACAATT6AAGATGAAaATC TCATCGA
rgGZuGtuLeuThrGZuIZeLyCeGuArgPheAsnAapGtuArgArgThrGZuIZeValThrSerGZyLeuGtuThrIZeGZuAapGZuAspLeufleGI
8701 GAGAGAAAATATCGTAGT TACTCTGACGCACAACGGATACGTCAAAC GTCTTCC TGCATCAACTTACCGCAGTCAAAAAC66GGCGGAAAAGGTGTACAi
uArgGZuAenIZeVaZVaZThrLeuThrEisAsnGZyTyrVaZLysArgLeuProAZaS*rThrTyrArgSerGtnLysArgGZyGZyLy#GZyVatGtn
8801 GGTATGGGAACAAACGAAGATGATTTCGTTGAACATTTGATC TCTACGTCAAC TCATGACACGATTCTCTTC TTC TC GAACAAGGGGAAAGTGTATCGT6
GlyMletGlyThrAAnGZuAspAapPheVaZGZuRiaLeuIZeSerThrS.rThrHisAApThrIZeLeuPhePheSerABnLy8GZyLy,VaZTyrArgA
89 01 CAAAAGGGTATGAAAT CC C TGAATACGGCA6AACGG6CAAAAGGAAT CC CaAT TA TTAAC CTG CTGGAGGTA6GAAAA6GGGTGAGT GGA TCAACGC GA TTA T
ZaLyeGZyTyrGtuIZeProGZuTyrGZyArgThrAZaLysCZyIZeProI eIZeAanLeuLeuGZuVaZGZuLyeGZyGZuTrpIZeAsnAZaIZtII
9001 TCCAGTCACGGAATTCAATGCGGAGC TTTACC TCTTC TTCACTACAAAGCATGGGGTTTCAAAACGAACATC GCTATCTCAATTC GC TAATATCCG6CAAC
eProVaZThrGZuPheAsnAZaGZuLeuTyrLeuPhePheThrThrLyeiiaGZyVatSerLysArgThrSerL.uSerGZnPheAAaAenIZeArgAen
9101 AATGGTC TAATTGCTCTGA'GTCTTCGTGAAGATGATGAAC TGATGGGTGTACGTC TGACTGACGGCACAAAACAAATCATCATTGGAACGAAAAACGGTT
AsnGZyL*uIZeAZaL*uS*rLeuArgGZuAspAapGZuLeuM#etGtyVaZArgLeuThrAspGtyThrLyaGZnIZeIZeIZeGZyThrLysAsnGtyL
9201
TACTGATTC6TTTCCCTGAAACAGATGTCCGAGAGATGGaAAGAACTGC6GCGGGCGTAAAAGGCATCAiCCCTGACGGATGACGACGTTaTTGTCGGCAT
euLouIleArgPheProGZuThrAspVaZArgGZuM*tGZyArgThrAZaAZaGZyVaZLyeGZyIZeThrLeuThrAspAspAspVaZVaZVaZGZyM*
9301 GGAGATTTTAGAGGAAGAATCACACGTCCTTATCGTAACTGAAAAAGGGTACGGAAAACGAACTCCTGCTGAAGAGTACAGAACC CAAAaCCGGGGCGGA
tGCuIZLeuGtuGZuGZuSerHiaVaILeuIteVaZThrGluLyCaGZyTyrGZyLysArgThrProA ZaGCZuGuTyrArgThrGLnSerArgGZyGZy
9401 AAAGGAC TCAAAACAGC GAAAATCA CCG6AaAACAA CGGC CAAC TAG TAGC AG TGAAAGC TAC TAAAGGTGAAGAGGATC TAATGAT TA TTACAGC TAGCG LyeGtyL*uLyeThrA ZaLyaIZeT'hrGZuAsnAsnG ZyGCZnLeuVa ZAZa VaZLysA ZaThrLyesCZyCGZuGZuAaepLsuMe tI ZelteThrA taS*rG 9 501 GCGTACTCATCAGAATGGACAT CAATGATATC TCCATCACCGGACGTGTCACTCAAGGTaT6GCGTCTCATCAGAATGGCAGAAGAAGAGCATGTTGCTAC
tyVaZLeuIZeArgMetAspIZeAsnAspIteSerIVeThrGlyArgVatThrGtnGZyVaZArgLeuIZeArgMetAZaGZuGluGZuHieVaZAtaTh
9601 AGTAGCTTTAGTTGAGAAAAACGAAGAAGATGAGAATGAAGAAGAACAAGAAGAAGTGTGAAAAAAAGCGCAGC
GAAATAGCTGCGCTTT-TTGTGTCA
rVaZAZaLeuVatGluLyeAAnGZuGZuAspGZuAsnGZuGttGuGZtnGtuGZuVaZ TCTTTTTAAAGACACAAGCATGACCATTATGACTAGTAAAAACTTTTTCAAAAAA 9701 TAACCCETA GTCATAAiAATTATGG ritATTTCT -35
9801 GTA;
i
-35
-10
AGTTAACTAAAAATGTffitITAAGTA -10
TCGCTTTGAGAGAAGCACACAAGTTCTTTGAAAACTAAACAAGACAAAACGTACCTGTTAA
CTGCACGACGCAGGTCACACAGGTGTCGCCGCAGGATGCGGTGAACTTAi CCTGTGATCCATTTATCGGAGAGTTTGATCCTGGCTCAGGACGACGCTGaCGGCGTGCCTAATACATGCAAGTCGAGCGGACAGATGGGAG 6 1-S RNA
9901 TTCATTTTTATAAATCGCACAGCGATGT6C 6TAGTCAGTCAAACTAGGGC
10001
Figure 3. Nucleotide sequence of the oriC region. The nucleotide number 1 is the XhoI site in E5 fragment as in Fig. 2. Amino acid sequences are also shown for ORFs accompanied by SD sequences. Conserved amino acids in homologous proteins between B. subtiZis and E. coZi are underlined. Regions of initiation sites for transcription determined by the S1-mapping method (see the accomapnying paper) are indicated by bent arrows. Possible promoter and SD sequences are boxed. Putative termination signals are indicated by open arrows. Thick black arrows indicate the characteristic repeats with TTAT(C/A)CACA as a consensus sequence. cloned fragments as well as fragments sub-cloned into pBR and M13 vectors, the site of initiation of DNA replication was determined within a 3 kbp region (Fig. 1) (2). To determine the nucleotide sequence around the origin of replication, a more detailed map of restriction sites was constructed as 2257
Nucleic Acids Research 0WTI
0W44"
0 42
2
0 5S
3
43,
41 (37) 2
0
ES 4
0Sw
I
4M4(46
40
= )1103
Q
6
ORF44
45
5f
4
EcoRt v O9
44144)
3
SI
3
74 E0
o 42
609
40
a 54
E d
57
2
Un
3
n. 55 ..
109
F52 6
S
000 41 52 55
rroO MS rUA
00F43 a
7
9
F'S
4
43(421
47
74
I
E..o I
IGO
0 53
03 40
63
o
100
E44
so
iS?
54
6?
6
Figure 4. Open reading frames in the oric region. All possible coding frames which code for more than 40 amino acids are shown on the EcoRI map of the oric region. Above the map: frames with direction of transcription oriented from left to right. Below the map: frames oriented from right to left. The frames with typical strong SD sequences are indicated by black blocks. Number of amino acids is shown for each frame. Number of amino acids in the black blocks is shown in parenthesis. Possible functional open reading frames (ORF) are named by putting number of amino acids coded by the frames. shown in Fig. 2. Using these restriction sites, fragments cloned in Charon or pBR vectors were sub-cloned into M13 phages for sequencing. Length and directions of nucleotides sequenced by the dideoxy method are shown in Fig. 2. To ascertain the sequence data, every nucleotide was determined in both directions and more than twice using overlapping fragments. One exception is a 120 bp TaqI fragment which was cloned in M13 only in one direction. This region was sequenced for three times using different fragments (Fig. 2). Nucleotide sequence of the oric region. Some 10 kbp were sequenced and numbered as in Fig. 3 starting from a xhoI site within an EcoRI fragment E5 and directing towards a ribosomal RNA operon rrno (5). Possible open reading frames (ORF) for more than 40 amino acids are listed in Fig. 4. Among them only 9 frames are preceded by translational start signals (Shine-Dalgarno sequence, SD) (Table 1). Since two small ORFs, ORF43 and ORF52, are located within the larger ORFs, ORF821 and ORF638, 2258
Nucleic Acids Research Table 1
SD
sequences
for ORF
31-end of 16s RNA
UCUUUCCUCCACUA
ORF446
613 gaAAAGGAGGgacgtgccggaagATG
ORF378
2142 cGttAGGAGGataaaaATG
ORF 71
3409 ttgAAaGAGGTcgatataATG
ORF323
3781 ctcAtGGAGGcGATctatgtcttgtccATG
ORF638
5069 AaAgtGtAGGTGAatgacgtggctATG
ORF 52
5709 tctAAaGAGGTtgTccATG
ORF821
7196 tctAgGGAGGTttTttaATG 8979
ORF 43 ORF 44
ctgctGGAGGTagaaaaggGTG -15 tcgAgGGAGGTGtcataaATG
Bases complementary to 16S RNA sequences are shown in capital letters. Numbers are nucleotide number of starting nucleotide of each ORFs. Numbering is the same as in Fig.3. *
respectively, only the remaining 7 ORFs were considered for further analysis. It should be noted that 6 of these frames are coded in one of the two strands, 5'to 3' strand (left to right of the map in Fig. 1), and therefore oriented in the same direction. Only one small frame, ORF44, was found on the other strand, 3' to 5' strand. Amino acid sequences were deduced from the nucleotide sequences for these 7 ORFs and are shown in Fig. 3 along with the nucleotide sequence. Three genes, recF, gyrB and gyrA have been mapped in the region we have sequenced (6). Recently restriction enzyme fragments which transform these genetic markers were identified (Fig.1) (7,8). These results show clearly that ORF638 and ORF821 correspond to gyrB and gyrA respectively. The fragment which transforms reaF includes both ORF71 and ORF323 indicating that either one of the two corresponds to recF gene. No genes are known corresponding to the location of ORF44, ORF446 and ORF378. We conclude that ORF638 is indeed the structural gene for gyrB, because 109 amino acids from the N-terminal portion of the protein coded by ORF638 shows 68% homology with 2259
Nucleic Acids Research those of DNA gyrase subunit B of E. coli (Fig. 3) (9). Comparison of the amino acid sequences of other ORFs with those of known genes in E. coli revealed that ORF44, ORF446, ORF378 and ORF323 are homologous with rpmH (10), dnaA (11), dnaN (12) and recF (13), respectively (see Fig. 3 for comparison). Since relative chromosomal locations of the ORFs are identical to those of homologous genes in the E. coli chromosome (11), the organization of the ORFs from ORF44 to ORF638 remarkably resembles the organization of genes in the rpmH-gyrB region of the E. coZi chromosome. Extensive comparison of genes and their organization in the replication origin region of the chromosome of the two bacteria will be reported elsewhere. Although no information is available about the structure of DNA gyrase subunit A of the two bacteria, molecular weight of the protein deduced from ORF821 is very close to the reported molecular weight for the partially purified enzyme (14). Properties of proteins coded by these 7 frames are summarized in Table 2. Expression of these 7 ORFs except for two small frames, ORF44 and ORF71, were demonstrated in E. coli using the Maxi-cell method as described in the accompanying paper. On the 5' to 3' strand, only one sequence resembling a promoter for the major RNA polymerase (E.a55) was found, near position 340, upstream of ORF446 , while putative termination signals were found downstream of each ORF378, ORF638 and ORF821 (see Fig. 3). ORF44 is accompanied by two promoters, one for E.a55 and the other for E.x29 (15), and a terminator of its own (Fig.3). In the accompanying paper we demonstrate 6 major initiation sites for transcription in this region in vivo by the Sl-mapping method. A search was made for special features in the nucleotide sequence which may serve as a signal for initiation of DNA replication. Two regions, one upstream of ORF446 (region 1) and the other between ORF446 and ORF378 (region 2) have sequences which might be recognized by regulatory proteins. They are characterized by repeats of 9 nucleotides whose consensus sequence, TTAT(C/A)CACA, is identical to that of the repeats found in the oriC sequence of the E. coZi chromosome (16). If one allows single base changes from the consensus sequence, there are 9 repeats in the region 1 and 4 in the region 2. Pairs of repeats are so oriented that they can be used to form stable stems for local stem and loop structures. In the region 1, a typical promoter structure for ORF446 is located within the repeated sequences. The activity of the promoter was detected in vivo and shown to be dependent on a DNA replication gene (see accompanying paper). The region 2 contains a stretch of some 50 bp extremely rich in AT 2260
I
,-It lqUCo
W cm
_ -0
N 0
0-
CM
-
0
", .1
Nucleic Acids Research
coa
O _QIt
1 :.bt
CO co
N
NQ0 _N
-
N0%
co
0i
01')-
_mL
be
"!
lw cv m
w
_z
0 0to 00t--00
-00
cli
;z U-
bt
N
PI OD N%
C>
19 0%
r-
m
Q
la!
-
Q
0C0D
.z 0; w
~
~
~
M 0
4
a%
,;RI
10
V,
0.- 0CIJ 0.- 0 CL
in
0
S.
C
c
08 l_ N1
u ta
-
N O
w
10
1
Uz I
mCo=Q,a 0oC
a
C
P,0
rf.0
cm
ft
co
tD G 0>O
%9
ff)
11
-
,I
.z
Q
W
0%
cm_
a
%D 1-t 10, Uw
m
0DC00-I- Co 00
_, ur
N _ IW N
co W
cn cm tn UW
-;
m
a
.4
'A 4A
L S._
o
J
R
0 000.Q- 0000000Q.Q- 0 QL
C
L
Q0000 CJ 0.- CU
eK,
-C-
-C-C
C-C
M
00
CU00-
U
0
-
0 CP
LI0
00 Q
L. C') 00-%
a
M.-
cm cm 1.0 m
tA
U. W Q 40 IV lm so
0 C') C0-0 nW
l
C'
00
C-
0.- 0a'
CL C-.-
C IV C
0
LI>,
1-'ML
,.:r -12,
!.! A en
CU-.--W0 O^ "00
-- 0C)0CI 0 'CU C-0 N _ CUJ O,W @MMr NQLQ0Q0 =eL o Q' P,Q% 0 000Q Y MWt nX
D cm
;z
CM
-
en
U; lw
m
U,
9 ti
0)
I;z
0
'Z 0
C-
000 CU 0 CM)
'A
C 0 0 gi
0--
r..
0,4OJ^¢ON w ^N
.2 0
CL
-00 C',C 0c %00WL CCC 0 CION^IV _IOOO +^O__
to0 C- 00
O____N
.,D
b4
I"! zo -W
0-,
eo
l
cm m
W
a %C
0_Nt
C
N0s
W0_~
@N
^N-
.! C.
OW_
Z.
.1 cm
4! .0
0
0 0-.0cm0 c0m3
0.-
-
.
-
Iz
L(3
LI
C=OW
V
(3
0
.&J
2261
Nucleic Acids Research c
G
T TBG T-A T-A C-G
B-C T
A-T A C A-T GTG-C G-C TBBAABTT BA B-C A
B
T
T'''I 4CCATIAICAACA 1
TC
T
C
1l
A CA T-A A-T B-C
C
TT
T
G-CA G
B-CA C-B
i
G-CTA-T B-C A-T
A
TG A-T AA
G
~ABGACGT I A CT CTAC,TAT T TS.AT ATC BT E~ITA A TGA A C A A-T
ORF446
(DNA A")
AT cluster
TR4
SD
ORF3V8
(DNA N")
Figure 5. Secondary structure of the inter-gene-spacer-sequence 2. The leader sequence of 0RF378 can be formed into a stable secondary structure. Thick arrows indicate the repeating sequence with a consensus sequence of TTAT(C/A)CACA. The arrow with TR4 indicates the start site of the transcript 4 (see accompanying paper) . (89%) from which the transcription occurs in vivo as shown in the accompanying paper. The entire region 2 can be envisaged to a stem and loop structure shown in Fig. 5. No other obviously interesting sequences were found in any other part of the 10 kbp region, although there is a 300 bp spacer region between 0RF323 and 0RF638. It should be noted that the 9 nucleotide repeating sequence is found only in 5 other places in the entire 10,000 bp region (Fig. 3). DISCUSSION. We have determined the sequence of some 10,000 nucleotide in the region of the replication origin of the B. subtilia chromosome. Since the sequence is immediately followed by the ribosomal RNA operon, rrnO, whose sequence has been determined (5), this is one of the longest stretches of the bacterial chromosome for which the nucleotide sequence has been completely determined. 2262
Nucleic Acids Research The region composes 7 ORFs each accompanied by the typical translational signal. Except for ORF44, all frames and the following rrnO are in the same orientation, from left to right away from the origin. The only ORF which is transcribed in the reverse direction is located at the other side of ORF446 relative to the origin. These results show that the direction of transcription in the oriC region is same as that of the chromosomal replication. ORF638 and ORF821 were identified as coding frames for gyrB and gyrA respectively. ORF638 was identified from the homology with amino acid sequence of DNA gyrase B subunit of E. coZi. Conservation of the amino acid sequence between the two proteins is remarkable and confirms the previous finding that the gyrB subunit of B. subtiZis complements that of E. coli both in vivo and in vitro (8,14). Genetic evidences have shown that gyrA is linked to gyrB and located between gyrB and rrno (7). Since ORF821 occupies the entire space between ORF638 and rrnO and the molecular weight of the protein deduced from the sequence is very close to that reported for the partially purified enzyme from B. subtilis (14), we conclude that ORF821 is indeed the coding frame for gyrA. In addition, a fragment within E6 and contiguous to that carrying gyrB was found to transform recF (8) suggesting that either ORF71 or ORF323 is the structure gene for recF. Comparison of the amino acid sequences of the unidentified ORFs with those of known proteins in E. coli revealed that ORF44, ORF446, ORF378 and ORF323 are homologous with rpmH, dnaA, dnaN, and recF, respectively. Among them conservation of the amino acid sequences between ORF44 and rpmH, and between ORF446 and dnaA is remarkable to leave no doubt that they share common origins. These results show that organization of ORFs from ORF44 to ORF638, ORF44-ORF446-ORF378-ORF71-ORF323-ORF638, is identical to the organization of genes in the rpmH-gyrB region of the E. coZi chromosome, rpmH-dnaA-dnaN-recF-gyrB (11), except for the addition of one ORF , ORF71, in B. subtilis. The remarkable conservation of genes and their organization between the two bacteria suggests that this region is the replication origin region of the primordial replicon from which the chromosomes of the two bacteria have evolved. Transcription of the oriC region in vivo and expression of these ORFs in E. coZi cells are reported in the accompanying paper. An extensive search for special structural features which may provide signals for initiation of replication revealed two inter-gene-spacer sequences, one between ORF44 and ORF446 and the other between ORF446 and ORF378. These sequences are located close to the site of initiation of synthesis of the first DNA strand (2). They are characterized by the 2263
Nucleic Acids Research presence of repeating sequences which may readily form stem and loop structures. It is surprising that the repeating sequence is composed of 9 nucleotides whose consensus sequence is identical with that of repeating sequences found in oriC sequence of the E. coZi chromosome (16). The 9 nucleotide sequence is found to be a binding site for the dnaA gene product, a DNAinitiation protein in E. coZi (17). Although there is no other apparent similarity between the E. coZi oriC sequence and sequences in the region 1 and 2, the localized presence of numbers of this unique sequence suggests that oriC function of the B. subtiZis chromosome is located in these regions and is recognized by a protein coded by ORF446 which is homologous with dnaA protein of E. coZi.
ACKNOWLEDGMENTS We are deeply indebted to Dr. K. von Meyenburg who has kindly indicated homology in amino acid sequences between ORFs and genes in E. coZi to us. We express our appreciation to Drs K. Bott and M. Gellert for conmnunicating unpublished data. We thank Dr. S. Murakami and other colleagues in our laboratory for discussion and Mrs. K. Terada and Mrs. T. Nishikawa for help. This work was supported by a Grant-in-aid for Special Project Research, for Cooperative Research, and for Scientific Research from Ministry of Education, Science and Culture, Japan. *To whom correspondence should be addressed
REFERENCES 1. Oka, A., Sugiura, K., Takanami, M. and Hirota, Y. (1980) Mol. Gen. Genet., 178, 9-20 2. Ogasawara, N., Mizumoto, S. and Yoshikawa, H. (1984) Gene, 30, 173-182 3. Ogasawara, N., Moriya, S., Mizumoto, S. and Yoshikawa, H. (1984) in Genetics and Biotechnology of Bacilli, Ganesan, A.T. and Hoch, J. A. ed., Academic Press, Inc., London, pp.51-65 4. Seiki, M., Ogasawara, N. and Yoshikawa, H. (1982) Proc. Natl. Acad. Sci. USA, 79, 4285-4289 5. Ogasawara, N., Moriya, S. and Yoshikawa, H. (1983) Nucleic Acid Res., 11, 6301-6318 6. Henner, D. J. and Hoch, J. A. (1982) The genetic map of BacilZus aubtiZis. pl-33. In D. Dubnau (ed) The molecular biology of the Bacilli. Academic Press, New York, NY 7. Lampe, M. F. and Bott, K. F. (1984) Nucleic Acid Res., 12, 6307-6323 8. Lampe, M. F. and Bott, K. F., personal conmunication 9. Adachi, T., Mizuuchi, K., Manzel, R. and Gellert, M. (1984) Nucleic Acid Res., 12, 6389-6395 10. Hansen, F. G., Hansen, E. B. and Atlung, T. (1982) EMBO J., 1, 1043-1048 2264
Nucleic Acids Research 11. Hansen, E. B., Hansen, F. G. and von Meyenburg, K. (1982) Nucleic Acids Res., 10, 7373-7385 12. Ohmori, H., Kimura, M. and Sakakibara, Y. (1984) Gene, 28, 159-170 13. Blanar, M. H., Sandler, S. J., Armengod, M. E., Ream, L. W. and Clark, A. J. (1984) Proc. Natl. Acad. Sci., 81, 4622-4646 14. Orr, E. and Staudenbauer, W. L. (1982) J. Bacteriol., 151, 524-527 15. Charles, W. J., Moran, C. P. Jr. and Losick, R. (1983) Nature, 302, 800-804 16. Zyskind, J. W., Cleary, J. M., Brusilow, W. S. A., Harding, N. E. and Smith, D. W. (1983) Proc. Natl. Acad. Sci. USA, 80, 1164-1168 17. Fuller, R. S., Funnell, B. E. and Kornberg, A. (1984) Cell, 38, 889-900
2265