Structure and function of the region of the replication origin of the ...

3 downloads 61 Views 1MB Size Report
II11. I. I. I. Rso!I. III. ~I. Figure 2. Strategy for detemination of nucleotide sequence. Restriction sites for .... 9701 TAACCCETA GTCATAAiAATTATGG ritATTTCT ... S1 -mapping method (see the accomapnying paper) are indicated by bent arrows. ... MS rUA. 9. I. E..o I. iS? 54. 6? Figure 4. Open reading frames in the oric region.
Volume 13 Number 7 1985 Structure

and

function

of

the

region

of

the

replication

origin

of

the

Nucleic Acids Research

Structure and function of the region of the replication origin of the BaciUus subtilis chromosome. Im. Nucleotide sequence of some 10,000 base pairs in the origin region

Shigeki Moriya, Naotake Ogasawara and Hiroshi Yoshikawa*

Cancer Research Institute, Kanazawa University, 13-1, Takaramachi, Kanazawa 920, Japan

Received 12 January 1985; Revised and Accepted 15 March 1985

ABSTRACT Approximately 10,000 nucleotides were sequenced in the oriC region of the BacilZus subtiZis chromosome. The first replicating DNA strands are hybridized with a SaZI-EcoRI fragment (nucleotide #1206-2954) in one direction (left to right) and an EcoRI-PstI fragment (#2949-4233) in the other. Seven open reading frames (ORF) accompanied with Shine-Dalgarno(SD) sequences were identified. ORF638 and ORF821 were identified as gyrB and gyrA genes respectively based on genetic evidences and amino acid sequence data. Comparison of amino acid sequences revealed that ORF44, ORF446, ORF378 and ORF323 are homologous with rpmHi, dnaA, dnaN and recF of Escherichia coZi, respectively. Thus, the organization of the ORFs from ORF44 to ORF638 resembles the organization of genes in the rpmHi-gyrB region of the E. coZi chromosome. Two non-coding regions characteristic for oriC signals were found near the site of initiation of the first replicating DNA. They are composed of repeating sequences whose consensus sequence TTAT(C/A)CACA is identical to that of 4 repeating sequences in the oriC of E. coZi. INTRODUCTION Two definitions have been associated with the replication origin of bacterial chromosomes, oriC. The first is to define the site of initiation of the DNA synthesis and the second is a minimal sequence essential for autonomous replication. In Escherichia coZi a 245 base pair (bp) sequence was identified as the autonomously replicating sequence of the chromosome (1). However, the exact site of initiation of DNA replication is obscure both in vivo and in vitro. We have determined recently the site of the initiation of DNA synthesis in vivo on each complementary strand of the BaciZZus subtiZis chromosome, using hybridization between the first replicating DNA and the cloned singlestranded DNA from the oriC region of the chromosome (2). On the other hand, attempts to isolate autonomously replicating sequence from the region have not been successful (3). Sequences near the oriC may be inhibitory for replication of plasmid molecules when they are inserted together with the oriC sequence (4). Alternatively an intact organization of the genes and © I RL Press Limited, Oxford, England.

2251

Nucleic Acids Research or,

-_

EP P L

PP

ES

5Kb

recF gyrO gyrA

spoOJ ES

E

E20

S

PS

EG

16S E

rrno 23S 5S

gwA

EP EBEP

El9 1 E2_1 I

E4

E27

chBS03 77777=

1::::::~

chBSOI

|pSM2001 pSM2003

lpNO1001 IpMS102'B7

Figure 1. Genetic and physical map of the oriC region. EcoRI fragments used in this study are shown. In addition, P: P8tI, S: SaZI and B: BamHI sites are shown. Location of spoOJ, guaA (unpublished result) and other genetic markers (7,8 ) was determined by transformation using each fragment indicated by ,i . Location of a ribosomal RNA operon rrnO was determined by sequence determination (5). Position of ori is the site of synthesis of the first DNA strands determined by hybridization (2). Open bars under the map indicate fragments cloned in Charon phage (chBS) or in pBR plasmid (pSM, pMS and pNO) vectors.

inter-gene-sequences near the oriC may be essential for constructing a specific higher order structure of the chromosome to activate oriC. To examine these possibilities and to identify structure and function of oriC, we considered it is essential to determine the nucleotide sequence of the chromosomal segment containing the oriC region. We have now sequenced approximately 10,000 bp in the oriC region. There are 7 open reading frames (ORF) including ones for recF, gyrB and gyrA. Comparison of amino acid sequences of these ORFs with those of known genes in E. coZi revealed that the organization of the ORFs from ORF44 to ORF638 resembles the organization of genes in the rpmH-gyrB region of the E. coZi chromosome. In addition, two inter-gene-sequences are found very close to or at the oriC. They contain remarkable sequences which may function as recognition sequences for regulatory proteins. MATERIALS AND METHODS Bacterial strains and plasmids. E. coZi C600 was used for cloning of B. subtiZis chromosomal fragments. E. coZi JM103 was used as a host for M13 phage vectors. Plasmids used in this study are shown in Fig. 1. Chemicals. a-32P-dATP (PB10384, 800 Ci/mmole) and M13 sequencing kit were purchased from Amersham International Ltd (Amersham, UK). T4 DNA ligase and restriction endonucleases were from Takara Shuzo Co. Ltd. (Kyoto, Japan) or 2252

Nucleic Acids Research 0

2

1

IIII

4

3

5

EcoR I Xho I Pvu ZI

Hind m Pst I CIO I

.

Bgll Alu I.

.

HoeR I

Hho I

I

Sou 3A

I-

laq I

Hinf I Rso I EcoR I

5

6

7

69

kb

XhoI

PvuIE

Hind M Pst I

CaluI

HAl I

Hoe N

HlnfI Hpal Rso!I

I

I7Gen Cy II11

4

IliIIII III I I I III ~I

Figure 2. Strategy for detemination Restriction sites for various enzymes A xhoI site within E5 is taken as the the sequence. Length and direction of shown by arrows.

of nucleotide sequence. within E5, E20a E6 and E19 are shown. start point of numbering nucleotides of fragments sequenced in this study are

from Nippon Gene Co. Ltd. (Toyama, Japan). Determination of the nucleotide sequence. Cloning of fragments used for sequencing is described in the text. These fragments were cut out from the vector and purified by digesting with appropriate restriction enzymes and fractionating through a low-melting agarose gel. Each fragment extracted from the gel was recloned into M13 phages (mp8, mp9 mplO or mpll) and nucleotide sequence of the inserted fragment was determined by the dideoxy chain termination method according to the procedure specified by the supplier of the kit. Exceptionally the fragments within E5 were cloned into M13 directly from Charon BS03 (Fig. 1) because cloning of E5 into pBR vectors caused structural changes in the fragment. RESULTS

Strategy for sequencing oric region of over 10,000 base pairs. The region of replication origin of the B. subtilis chromosome (oriC) was first cloned in bacteriophage )x Charon vectors to construct a map of restriction sites covering over 45 kilo base pairs (kbp) (5). Using these 2253

Nucleic Acids Research CAGCTGGCTCGCCGGTTTTCTGGCAATAATGATGTAATCCTTTTCCTTCAGTCTCTCTTTCTCTTCAAGAAAG

-600 -

S00

-400 -

aCCTGCCGAATCAAACGTTTGATCCGATTTCGCATCACAGCATTGC CAATiTTTTTTGCTGACGGAAAGCC CGACACGCAGTTCATCATTTTCAGGCTGAT CAAGCGTATATAAGACAAAC TGGCGGTTTGCAACTGATGTCCCATGTTTAAACACTTTTTGAAAATCTTCATTTTTCTTTAAACGATTTCaCTTCTTCAA

300 ATGACTCACTCCGATACTGGCTGGCAGAAACTTTTAAAATTTATTATGAACATGAGATGCATCTGAAC CAGGAAGCATGC CCCGGTTCATTACGCATCAC

-200

ATCTATCTTTTCCCATAACGTGAAAAAU

A

ATTAT

C .i;AAGCTGATAATACTTTTCTGCCTTTGCGGCGACGGCGTGCTAAAACT

GATTCGACTATTATGAAAAGACGGAAACGCCGCTGCCGCACGATTTTGA

AtaSerLeuVaZLyeArqCGZJLwaArgAr?ArgArgAZaL.uVaZL

-1 00 AGACGAC CGTTTTTTGAACT CATAC GGCTTCTGAAGC CATGAACTTTGCTGC GTTTAC GGTTATT CGGTTGGAATGTTCTTTTCATTTATGACAC CTC CC TCTGCTGGCTTTTTTCTTGAGTATGCCGAAGACTTCGGTACTTGAAACGACGCAAATGCCAATAAGCCAACCTTACAAGAAAAGTA GTGGAGG SD euArvG1yAsnLYeSerSerMetAr-SerArgPheGZ4fBiaVatLysSerArgLyeArgAsnAenProGZnPheThrArgLysMet 1 1ITCGAGGAATAGCTGTTAAAG GTCTTACTTATTATATTTGCGTTACCTATTCATTGTCAACTTCACTAGTGCTTTTATTTCTTGCACCTAATAGGA z

101

TACCATACCiTTTTCAACTTTCGAAAC CTTATTTTTTAGATTCCTTAATTTTACGGAAAAAAGACAAATTCAAACAATTTGCCCCTAAAATCACGCAIGTa

201 GATATCTTT_TCGGCTTTT_TTAGTATCCA_GAGGTTA_CGAC_ACAT_TTCACATTA_CAACCCG C C GGACAAGGTTTTTTCAACAGGTTGTC CGCT

301 TkGTGGATAAGATTGTGACAACC

CTCTCGTTiATTTTGGTTTGTGTTTTAACTCTTGATTACTAATCCTACCTTTCCTCTTTATC

401 CACACAG CT;GGAAGT 401 _____ _ _GG A G A A G G T T GGT CAC&AAGI.GTGGATAAGTTGTGGATTGATTT

A

GTT GTGAAATTTGTCGAAAAGCTATTTATCTAC TATATT

501

ATATGTTTTCAACATTTAATGTGTACGAATGGTAAGCGCCATTTGCTCTTTTTTTGTGTTCTATAACAGAGAAAGACGCCATTTTCTAAGAAAGGAG

601

ACGTGCCGGAAGATGGAAAA'TATATTAGAC CTGTGGAACCAAGCCCTTGCTCAAATCGAAAAAAAGTTGAGCAAACCGAGTTTTGAGACTTGGATGAAGT

SD

MetGZuAenIZeLeuAepLeuTrpA9nGLnAZaLeuAZaGLnIleGZuLy8Ly6LeuSerLysProSerPheGtuThrTrMvetLy8S

70 1 CAACCAAAGCCCACTCACTGCAAGGCGATACATTAACAATCACGGCTCCCAATGAATTTGCCAGAGACTGGCTGGAGTCCAGATACTTGCATCTGATTGC

erThrLyeAZaHisSerLeuGtnGtyAapThrLeuThrIteThrAtaProAanGtuPheAlaArgABPTrPLeuGluSerArgTkrLeuHieLGuIteAt

801 AGATACTATATATGAATTAACCGGGGAAGAATTGAGCATTAAGTTTGTCATTCCTCAAAATCAAGATGTTGAGGACTTTATGCCGAAACCGCAAGTCAAA

aAapThrIleTyrZGuLeuThrGtyGZuGtuLeuSerIteLysPhe VaZIteProGlnAunGtnAepVatGtuAepPheMetProLyeProGLnVaZLye

901

AAAGCGGTCAAAGAAGATACATCTGATTTTCCT CAAAATATGCTCAATCCAAAATATACiTTTTGATACTTTTGTCATCGGATCTGGAAACCGATTTGCAC LysAtaVatLysGtuAspThrSerAspPheProGZnAsnMetLeuAJsnProLysLTyrThrPheAspThrPheValIZ§etySerGZyAsnArgPheAtaH

1 001 ATGCTGCTTCCCTCGCAGTAGCGGAAGCGC CCGCGAAAGCTTACAAC CCTTTATTTATCTATGGGGGCGTCGGCTTAGGGAAAACACACTTAATGCATGC i8A ZaA ZaSerLeuAta VatA laG zuA ZaProA ZaLy8A taTyrAsnProLeuPheI ZeT.urGZ.uG Z_UVa GZy !L*uG ?LyeThrHisLeuMe tEi#A t

1 101 GATCGGCCATTATGTAATAGAT CATAATC CT T CTGCCAAAGTGGTTTATC TGTCTTC TGAGAAATTTACAAAC GAATTCATCAAC TCtATCCGAGATAAT

aIteGt.uHieTyrVatIZeAspHiiAsnProSerAZaLysVatVatTyrLeuSerSerGZuLysPh.ThrAenGtuPheIteAenSerIZeArgAepAen

1 201 AAAGC CGTCGACTTCCGCAATCGCTATC GAAAT GTTGATGTGCTT TTGATAGATGATAT TCAATTTTTAGCGGGGAAAGAACAAACCCAaGAAGAATTTT

LyjeAZaVatA8pPheArgAsnArgTyrAraAsnValAapVatLeuLeuIteAspAspIteGZnPheLeuAtaGtyLkuGtuGtnThrGtnGluGtuPheP

1 301 TCCATACATTTAACACATTACAC GAAGAAAGCAAACAAAT CGTCATTTCAAGTGACCGGCCGC CAAAGGiAAATTCCGACACTTGAAGACAGATTGCGCTC heHieThrPheAen ThrLeuHieGZuGluSerLyBGtnIleVaZIZeSerSerA.pArgProProLy.G1uIlteProThrLeuGluA.pArVqL.uArgSe

1401 ACGTTTTGAATGGGGACTTATTACAGATATCACACCGC

C;GATCTAGAAACGAGAATTGiAATTTTAAGAAAAAAGGCCAAAGCAGAGGaCCTCGATATi

rAz'PheGtuTrpSGILe*uIreThrAspIteThrProProAspLeuGluThrArgIteAtaIIeLeuArgLy#LsAtaLyuA taGLuGlyLeuAspIre

1 S01 CCGAACGAGGTTATG CTTTACAT CGCGAAT CAAAT CGACAGCAATATT CGGGAACTC GAAGGAGC ATTAATCAGAGTTGTCGCTTATTCATCTTTAATTA

ProAenGZuVaZMetLeuTyrIteAtaAenGtnIleAepSerAsnIteArgGtuLeuGLuGtyAtaLeuIteArgVatVatAZaTypSerSerLeuIleA

1 601 ATAAAGATAT TAATG CTGAT C TGGCC G CTGAGGC G TTGAiAAGATAT TATT CC TT C CTC AAAAC CGAAAGT CATTAC GATAAAAGAAATTCAGAGGGTAGT snLyAsA pI ZeAsnA taAs pLeuA ZaA ZaG tuA ZaL*uLysAt/pIZeI ZeProSerSerLysProLys Va tI eThrI teLy JGZuI ZeGZnArgVaZ Va

1 701

AGGCCAGCAATTTAATATTAAACTCGAGGATTTCAAAGCAAAAAAACGGACAAAGTCAGTAGCTTTTC CGCGTCAAATCaCCATGTACTTATCAAGG6AA ZCGySGnGtnPheALnLLELjLsLeuGtuAs pPheLyeAlaLMaLys

jThrLysSerVa ZA taPheProArgGLnIteA taMetTyrL.uSerArgGZu

1 801 AT GAC TGATTC C TCT CTT C CTAAAA T CGGT GAA GAGTT TGGAGGACGTGATC ATA CGAC CGTT AT TCATGC GCATGAAAiAAATTT CAAAACTGCT6GCA6 MetThrAspSerSerLeuProLyIZetaGCGZuGtuPheGlZGtCArgAepHisThrThrVaZtIZHisA ZaHisGZuLIaIteSerLyeLeuLeuA taA 1901

2001

2101

ATGATGAACAGCTTCAGCAGCATGTAAAAGAAATTAAAGAACAGCTTAAATAGCAGGACCGGGGATCAATCGGGGAAAGIG,AATAA,ACTTTTCGGAAGT *pAepGZuGtnLeuGtnGtnHieVaZLyeGZuIZeLysGZuGtnLeuLys CATACACIGTCTGTCCACAIGTGGATAGGCTGTGTTTCCTGTCTTTTTCACAACTTATCCACJAATCCACAGGCCCTACTATTACTTCTACTATTTTTTA

TAAATATATATATTAATACATTATCCGTTsEEjaTAAAAATGAAATTCACGATTCAAAAAGATCGTCTTGTTGAAAGTGTCCAA6AT6TATTAAAA6C AT 11 uster SD

2254

Ne tLsaPhe Thr I LZeGZnLyeAepArgLeuVaZGZuSerVaLZGnAepValLeurLyAZ

Nucleic Acids Research 2201 AGTTTCATCCAGAAC CACGATTCCCATTCTGACTGGTATTAAAATTGTTGCATCAGATGATGGAGTATCCTTTACAGGGAGTGACTCAGATATTTCTATT

aVaZSerSerArjThrThrIleProIleL.uThrGIZaeLysIZeVaZAZaSerAepAspGCtValSerPheThrGtJSereerAspIZeSerIrz

2301 2401

GAATCCTTCATTCCAAAAGAAGAAGGAGATAAAGAAATCaTCACTATTGAACAGC CCGGAAGCATCGTTTTACAGGCTCaCTTTTTTAGTGAAATTGTAA AMAAATTGCC GAT GGCAACTGTAGAAATTGAAGTC CAAAATCAGTATTTGACGATTATC C6TTCTGGTAAAGC TGAATTTAATC TAAACaGACTGGATGC

GCuSerPheIteProLytGZuGZjuGyAepLyaGluIZeVatThrIZeGZzuGnProGZCSerIteVatLeuGZnAZaAraPh.PheSerGiuIrzVaZL

yeLyeL#uProMetAZaTlhrVatGtuIteGtuVatGtnAenGtnTyrLeuThrlZoIZeArgSerGtwLyeAZaGZuPheAenLeuA#nGtyLouAepAt

2501

TGATGAATA;CCGCACTTGCCGCAGATTGiAGAGCATCA;GCGATTCAGiTCCCAACTGiTTTGTTAAAiAATCTAATCAGACAAACAGTATTTGCAGTG

2 701

SerThrSerGZuThrIAProrIZeLuThrCZk VaZAenTrpLyeVaZGZuGCnSerGZuLeuLeuCyeThrAZaThrAepSernieArgvLeuAZaLsuA GAAAGGCGAiACTTGATAT; CCAGAAGACiGAT CTTATAiCGTCGTGAT;C CGGGAAAAAGTTTAAC TGiACT CAGCAAGATTTTAGATGACAACCAGGi rgLysAZtaLysLeuAspIZeProGtuAepArgSe*rTyrAenVa ZVaZIteProGtyLmeSerLeuThrGtuLauSerLysIZeL!aA*ZAspA*nGZnGI ACTTGTAGATATCGTCATCACAGAAACCCAAGTTCTGTTTAAAGCGAAAAACGTCTTGTTCTTCTCACGGCTTCTGGACGGGAATTATCCAGkCACAACC

aAepGtuTyrProRiaLeuProGtnIZeGluGCZHuRiaeieA taIteGCnIrteProThrA.pLeuLeuLeAenLeuIleArgGZnThrValPh.A ZaVat 2601 TCCAC CTCAaAAACACGCC CTATCTTGACAGGTGTAAACTGGAAAGTGGAGCAAAGTGAATTATTAT GCACTGCAAC GGATAGC CACCGT CTTGC ATTAA

2801

uLeuVatAapIZeVaZIZeThrGZuThrGZnVaZLeuPheLysA taLysAunVaZLeuPhePheSerArgLeuALPeujA.TPGI oApThrThr 2901 AGCCTGATTC CGCAAGACAGCAAAACAGAiAT CATTGTGAACACAAAAGAATTC CTTCAaGC CATTGATCGTGCAT CT C;TTTAGCTAGiGAGGGACGCi

SerLeuIteProGZnAspSerLysThrGZuIZeIleVaZAsnThrLyaGtuPheLeuGZnAZaIZeAspArpAZaSerLeuLeuAZaArgZuCGtyArgA

3001 ACAACGTTGTAAAAC TGTCCGCAAAACC GaCTGAATC CAT TGAAATTT CTTCCAATTCGCCAGAAATC GGTAAAGTTGTaGAAGCAATTGTTGC GGATCA

snAJnVaZVaZLysLeuS*rAZaLysProAZaGZuSerIZeGZuIZeSerSerAsnSerProGZuIZeGZyLysVaZVaZGZuAZaIZeVaZAZaAJpGI

3101 AATTGAAGGTGAGGAATTAAATATCTC TT;TTAGTC CAAAATATATGCTGGATGCAC TAAiGGTGC TTGAiGGAGCAGAAATAC GCGTAAGCTTTACAGGC

nhteGZuGZdGtuGluLetdAenIZeSerPheSe2P2oLyjTrMetLeuAs_AZaL.uLy8VaZLouGZuGZyAZaGZuIZeArgVaZSerPheThrCGy

3201

GCAATGAGACCTTTCTTAATTCGCACGCCGAATGATGAAACGATTGTACAGCTTATCCTTCCTGTCAGAACCTATTAATCCGATACACTiCTGCCGACC AZaMetArgProPheLeuIZeArgThrProAsnAspGtuThrIZaVaZGZnL.uIZeLeuProVaZAr,ThrTyr

3301 T 3401

TTTCTATTCGGTATCTGCTCCGACAAGTT;TCCCTTTCC

3701

GATATT;T

AGA

AGATATAATGGCAAATCCGATTTCAATTGATACAGAGATGATTACACTCGGACAATTCTTAAAATTAGCCGATGTGATTCAGTCTGGCGGTATGGCGAA MetAZaAanProIZeSerIleAapThrGZuMetIZeThrLeuGZyGZnPheLeuLyaLeuAZaAepValIZeGtnSerGZyGZyMetAZaLy

3 501 GTGGTTTTTAAGC GAGCATGAAGTGCT TGTGAACGATGAGC CGGACAACCGC

3601

CTAATTCGTT;TTTTTTAGTCAATTAG

CGGGGCAGAAAGCTGTATGTTGGAGATaTGGTAGAGATTGAAGGATTT

sTrpPheL*uSerGZuRisGtuVaZLeuVaZAsnAepGZuProAspAenArgArgGZyArgLysLeuTyrVaZGZyAspVaZVaZGZuIteGZuGZyPhe GGTTCATTTCAAGTCGTCAiTTAAAGC GGGTGACACTGA;TGTATATC CiGAACTTAGAiCTGAC ATC T;AC CGCAACTAC GAC CATG C;GAACTTCAA;

GZySerPheGZnVaZVaZAun

TTGAAAATAiAGTAAATGTGATCATCGGAGAAAACGCCCiGGGGAAGACAAACCT1aV 1CTATGTCTTGTCCATGGCGAAAjCGCACCGGAi MetA ZaLyeSerHisArpTh

3801 ATCAAATGACAAAGAAC TTATACGGTGGGACAAAGACTATGCTAAAATAGAGGGAAGAGTGATGAAGCAAAACGGGGC GATC CCGATGCAGCTCGTCATC 3901

rSerAsnAspLyJGCuLeuIteAraTrpAspLyJAapTyrAZaLysIteGturaZMtLyeGZnA nG yAZa.ZtProMetGZnLeuVaZIZe TCCAAAAAGaGTAAAAAGGGCAAGGTCAAT CATATTGAACAGCAAAAGCTCAGC CAGTATGTCGGGGC C CTC AACAC CATTATGT TC GCGCCGGAAGATT SerLyeLyagZkLyeLkeGtyLysVaZAtnHiIZeeZuGZnGtnLnMaLeuSerGZnTyrVaZGZyAtaLeuAanThrIZeMetPh.AZaProGZuAspL

4001

TAAATCTTGTAAAGGGAAGC'CCTCAAGTGAGAAGGCGGTTTCTTGACATGGAAATCGGACAGGTTTCTC CC GTCTACC TTCATGATC TTTCTCTTTACCA *uAanLeuVaZLyegZMSerProGZnVaZAV0ArVArgPh.LeuAspMetGZuIleGZyGZnValSerProVaZTyrLeuHieAapLeuSerL.uTyrGt

4101

GAAAATCCTTTCCCAGCGGAATCATTTTTTGAAACAGCTGCAAACAAGAAAACAAACTGACCGGACGATaC TCGATGTTCTGAC CGATCAGCTTGTAGAA nLyLZttLeuS7rSGtnAraAenHisPheLeuLyeGZnLeuGZnThrArgLyaGZnThrAapArgThrMetLeuAepVa ZLeuThrAepGZnL.uVa ZGZu

4201

GTTGCAGCAAAAGTCGTCGTAAAACGCCTGCAGTTTACAGCACAGCTCGAGAAATGGGCGCAGCCCATCCATGCAGGCATCTCAAGAGGGCTTGAAGAAC VatAtaA

ZaLy eVaZ VaZVaZLy,oArgLeuGtnPheThrAZtaGtnLeuGZuLysTrpA taGZnProI ZeHieAlZaGZyI ZeS*rArgGZyLeuGZuGtuL 4 301 TGACCCTGAAATACCATACAGCT CTTGATGTATCAGAT CCCCTAGATTTGTC GAAAATAGGAGATAGCTATCAAGAAGCGTTTTCTAAATTAAGAGAAAA

euThrLeuLysTyrHi.ThrAZaLeuAepValSerAapProLeuAspLeuSerLyeIZeGZyAspSerTBrGZnGtuAtaPheSerLysLeuArgGzuLy

44 01 AGAAAT TGAGC GTGG TGTGACGC TGT CAGaGC C TCATC GCGAT GAT GTTC TTTT CTA T GTGAAC GGAC GCGAT GT GCAGAC GT ATGGTT CTCAAGGAC Aa 4 501

eaGuIteGZuArgGZyValThrLeuSErVGZProHisArgAspAspVaZLeuPheTyrVazA8nMg1,ArgA8pVatGZnThrTyrGZySer.GZnGlGIn CAGCGAACGACGGCGTTGTCCCTTAAGCTaGCGGAGATT'GACCTGATCCATGAAGAAATCGGAGAATATC CCATTTTACTATTGGATGATGTACTGAGTa GZnArgThrThrAZaLeuSerLeuLysLeuAZaGZuIZ.AepLeuItHieiuGZuIGuZeGZyGZuTyrProIZeLLeuLeuLeuAepAspVaZL.uSerG

4601 AACTGGATGATTATCGC CAGTCACACTTGCTTC ATACGATCCAAGGCCGTGTACAAACGTTTGTCACAACGACAAGCGTTGATGGCATTGATCACGAAAC 4 701

ZuL*uAepA8PTlyrd=GZnSerHi6L*uLeuHi8ThrIZGZnGZyArgVaGZInThripheVaZThrThrThrSerVaZAepGZyIleA.pHieGluTh CTTACGTCAAGCAGGAATGTTCCGTGTGCAAAATGGTGCaTTAGTGAAGTGAAGAAATGAGGTGAGCAATTGTATATTCATTTAGGTGATGACTTTGTGa rLeuArgGZnAtaGlyMetPheArgValGZnAengl,AZaLeuVaZLyJ

4801 TTTCAACACGAGATATT GTC GGCATTT TT GAC TTTAAAGCCAACAT GTCaGC CTAT TGTTGAAGAATTT C TGAAAAAAC AaAAAC AC AAGaT GGTGC CTTC

4901 CGTAAACGCACGC CCAAATCTATCGTAGTCACGGTTCAGAATATATATTACTCTCCCTTA'TCTTC CAGCACATTAAAAAAACGTGCGCAATTTATGTTTG

5001

AAATAGATTCTTAGAAATTTTTTATCACGAATATATCGTTTAGAAAAGT&FIIUTGACGTGGCTATGGAACAGCAGCAAAACAGTTATGATGAAAA SDv t VetGZuGGnGZnGZnAenSerSM rAPGZuAs

2255

Nucleic Acids Research

~ ~ SD

51 01 TCAGATACAGGTAC tAGAAGGATTGGAAGCTGTTCG6TAAAAGAC CGGGGAT GTAT ATCG6TTCGACAAAC AGCAAA66CC TT CACCACTT66TATGGGAA

nGZnIZeGZnVaZL.uGLCZGuLeuGZuA taVatArgLm#ArgProGZMyetThrILeGLySerT&hrAsnSerLyeGImLeuMteNiuLeuVaZTrpGZu

5201

ATTGTCGACAATAGTATTGACGAAGCCCTiGCCGGTTATTGTACGGATATCAATATCCAATC

AGTACCGIT'CAcGTATGGCC CGGAGGiAAAATTTGACGGAAGCGGCTA

TGAAAAA6ACAAC

rz.eVaZAepAenSerIZeAsGpZuA ZaLEuAZaGtJyrvCysThrABpIZUAISnZIeGInIZGZuLyeApA9nS*rIZerThrVaZVlYtA SRA&GZIA 53 01 GCGGTATTCC AGT CGGTATT CAT GAAAAAATGGGC CGTCC TGCGGTAGAA6TCATTAT GACGGTGC TTCAT GC rgGCI IZaPro VaZ ZGI IeHiJG ZuLyJJM1etOtjA rgProA taVa ZG tuVa ZI Zz{e t2hr VaZtL uNiJAZtaC Zx axLmPh#AspGItS*rtGTyr

5401 TAAAGTA TCC GGAGGAT TAC AC GGTGTAGaTGCGTCGG TCGtAAACGCAC TATCAACAGAGC TtGATG TGACGG TTCACCG TGACGGTAA'AATTCACCGC

rLysVaZStrGtyGZyLeuHisGZyVatGtyAZaSerVaZVaZAsnAZaLeuSerT'hrCZ,yL*uAepVaZThrVatHisArgA*pGtyLsIZglefiJArg

5 501 CAAAC CTATAAAC GCGGAGiTtC CGGTTACAGACCTTGAAATCATTGGC GAAACGGATCATA CAGGAACGAC GACA CATTT TGTC CCGGAiCC 56 01

5 701

5801 590 1

600i

CTGAAATtTi

GZnThrTlyrLyJArgGtyVa ZProValThrA8pL.uGLuIteZGaCZyGtuThrAepiisrThrGtZyThrThrThrEi8PheVaZProA.pProGiIuteP TC TCAGAAACAAC CGAGTA TGATTACGATC TG CTTGC CAAC CGCG TGC GTGAAT TAGCCiTTTTTAACAAA6GGCG TAAACAT CAC GATTGAAGATAAAC G heS.rGZuThrThrGZuTyrAepTyMrAapL.uLeuA taAsnArg VaZArgG&LueuA taPheLeuThhrLysGiyVaZAsnIXZeThrIehGtuAepLyeAr TGAAGGACAAGAGCGCAAAAATGAATACCATTACGAAGGCGGAAtTAAAAG TTATG TAGAGTATTTAAACC6CTC TAAAGAGGTTGTC CATGAAGAGCC6 9GZuGCyIyCnGZuArgLysAsJnG uTy rEits2'yrG IuGZyly ZIZeLysSerfryrVa ZGC uTlrLouAsnArgSorLysGIu Va ZVa HtGs tuG luPro AT TTACAT T6AAGGCG6AAAiGGAC GGCATTACGGTTGAAGTGG CT TTGCAATACAA TGACAGCTACACAAGCAACATTTAC TCG TTTACAAACAACATTi IZ T,myrIZeG ZuGCZifGCZuLy8Ae8pGtys eThr VaICZOluVaZA taL*uGtnTyrA snAspSer2lyrrhrS*rlAsnriZeyrSerPheThrA#nAl nIZeA, ACACG TACGAAGGC GGTAC CCATGAAGCT GGCTTCAAAAC GGGCC TGACTCG TGT TATCAACGATTAC GCCAGAAAAAAAGGGCTTAT TAAAGAAAA TGA snThlThryrGuGtyGZyThrHisGZGuA laGZyPh.LyeThrGtpL.uThrArgVaIZeAAenAspTyrALaArgLyeLyaGLyLouIieLy.GiuAanAs TC CAAACC TAA6CGGAGATGACGTAAGGGAAGGGC TGACAGCGATtATTT CAATCAAACACC CtGATCC GCAGTTTGAGGGC CAAACAAiAAACAAAGCTG pProAenLnuSerGZyAapAepVaZArgGZuGZuLeurhrAlZatt.leSerI Lveise$iProAupProGtnPh.GZuGtvGtnThrLyvThrLyeLou

6101 GGCAACTCAGAAGCACGGACGATCACCGATACGTTATTTTCTACGGCGATGGAAACATTTATGCTGGAAAATCCAGATGCAGCCAAAAAAAtTGATA GZyA enS#rG luA taArgT'hrl t eZhrAsapfhrLz uPh*S*rThrA ZaMz tGZuThrPh*M*etLeuG ZuA nProA epAZtaAZaLysLy*I l;Val A p~L 6 201 AA6GTTTAAT6GCGGCAAGAG CAAGAATGaC TGCGAAAAAAGCGC GTGAAC TAACAC GCCGTAAGAGTGC TTTGGAAATTTCAAACCT6C CCGGTAAGTT yGICZLeuJetA taA taArgA ZaArgMe tAIaA ZaLys Ly JA aArgGZuLeurhrArgJArgLysSerA ZaLeuGZuI teSerAsnL*uProG18ytyuLe 6301 AGCGGAC TGC TCTTCAAAAaATC CGAGCATC TCCGAGT TATATAT CGTAGAGGGTGAC T CTGC CGGAGG'ATCTGC TAAACAAGGACGCGACAGACATTTC uA taAapCyeSerSerLyeAapProSerIteSerGZuLeuTyrIteVatGZuGtyAepSerA ZaGtyCIySerAZaLysGtnGCyArgAepArgHiePhe

64 01 CAAGC CATTiTt6CCGCT TAGAGGTAAAATC CTAAACGT TGAAAAGGCCAGAC TGGATAAAATCCtTTTC TAACAACGAAGTTC GCTCTATGATCACAGCGC

GZnAZaIIZeLuArgGtyLysIZeLyetteLeuAenVatGZuLyeAtaArgL.uA.pLyei.*L.uSerA#nA.nGtuVatArgS.rMetrtrhrAtaL

6 501 TCGGCACAGGTATCGGAGAAGAC TTCAAC CTTGAGAAAGCCCG TTAC CACAAAGTTGTCATTATGACAGATGC CGATGTTGACGGCGCGCACATCAGAAC euGtyThrGZyIZeG tyGLuAepPheAevnLeuGtuLysA taArg2lyrNiLyaVaZ VaZIZeN.tThrAapAZaA*pVaZAspGtyA Zaili.I.tArgTh 6 601 AC TGC TGTTAACGTTCTTTTACAGATA TAtGC GCCAAATTAT CGAAAATGBCTAC GTGTACATTGCGCAGC CGC CGC TC TACAAGGTTCAACAGGGGAAA rL#uLouLeuT7hrPhsePh*2'rArg.TrMe tArgG tnIZelIZoG uA nGty Tyr Va ITyrIltA aZa nProProLouryrLys Va taGnCZnC nG,yLys 6701 CGCGTTGAATATGCATACAATGACAAGGAGCTTGAAGAGCTGTTAAAAACTCTTCCTCAAACGCCTAAGCCTGGACTGCAGCGTTACAAAGGTCTTGGT6 Arg VaGluClyrA laTaPrAnASpLVsGluLSuGZuGZuL

.uLeuLyeThrLnuProGInrhrProLyeProGyvLeuGlnArgTyrLyG0tyLvuGZyG

6801 AAATGAATGC CACCCAGCTA TGGGAGACAAC CATGGATCC TAGC TC CAGAACAC TTC TTCAGG TAACTC TTGAAGATGCAATGGATGCG6ATGAGAC TTT

ZuMetAennaThGrGtnLeuTrpCtulThrThrMNetAepProS.rS.rArgThrLeuLeuGInVatThrLouGaZuAApA iaNetAepAtaAapGCuTlhrPh AAATCTTGACTCACATCAAAAGCTT s*GuNetLeuNetGZyAepLyaVatGtuProArgArgAenPhIZeGZuAZaAAnALaArgryrVaZLyeAenLnuAepILe

6901 TGAAATGCTTATGGGCGACAAGGTAGAACCGCGCCGAAACTTCATAGAAGCGAATGCGA6ATACGTTAA 700t IT

CAATAAiAAAT^AAGG TTIT;CLUACAAGA TCAGATGCAGCGATGC CTGCAATAC CTATATA TTCTAGCAA TT TAAiGTGTATAATCATAAGTT

7101 tATTGATATAATGGAGAATGGAATCGTATTGAAGGTCATAATGGA

CAATCTACTC

CCACATATTTCATGTGATACTTCGi--TTTTAATGAG HetSe

mat* .~~~~~~~~~~~~~~~~~~~~~~~s

7 201 TGAACAAAACACACCACAAG TTCGTGAAATAAATATC AGT CAGGAAATGCGTACGTC CTTCTTGGATTATGCAATGAGCGTTATC GTGTCCC6TGCTC TT rG tuG InAenThrPro GZnVa ZArgG tutZoAonrleserG tnGtFuM tArgTlhrSerPheLeuA sp.TrAt4Mg tS*rV4 tiIVa ZSerAlrgA laL*u 7 301 C CGGATGTTCGTGACGG TTTAAAACCGGTTCATAGACGGATTTTGTATGCAATGAATGATTTAGGCATGACAAGTGACAAGCC TTATAAiAAAATC CGCGC

ProAWpVatArgAGpZyL.uLYUProVaZhi.ArgArgIXZLeufrA ZaMetAenAepLeuGZiyMtThrS.rA.pLy.ProTyvrLgeLeSerAtaA

7401 GTATCGTTG6AGAAGT TATC GGGAAATAC CACCCGCACG6TGATTCAGC66TATATGAATCCAT6GGTCA6AATGGCTCA66ATTTCAAC TACCGTTATAi

7501 7601

reIZtValtGtyGuV4aZIGZaG,Lys-Trffi@ProHioGtyAopS*rAtaV4ZrTrGtuSerMvetValArgMetAlaGtnsApPheAJ"ryrArg2ryrM GCTCGTTGACGGTCACGGAiACTTCGGTTCTGTTGACGGiGACTCAGCGGCGGCCAT6C6TTATACAiGCACGAATGTCTAAAATCTCAATG6A6ATT tLuVGaZASpGZyfiC9GZyASnPhSGtySerVaZAepGZyA.p$.vAZIAlaAZa et;ArgTfyrrhrGtuA ZaArgMHtS.rLyetIeSeNetGtstZ.I

CTTCGCGACATCACAAAAGACACAATCGATTACCAGGAT'AACTATGACGGGTCAGAAAGAGAACCTGTCGTTATGCCTTCAAGGTTCCCGAAtCTGCTCG

k*uArg A#pIterhrLy#AsaprhrrleA epTyrG ZnAwpA Jn2lyrAspGZyfS*rG ZuArC ZuPro Va ZVatM*etProSerArftPhoProA#xLeuLz V

TGAACGGTGCTGCCGGCAT;6C6GTAGGTiTGGCAACAAiCATTC CTCC 6CACCAGC TG66AGAAATCA;TGACGGTGTiCTTGCT6TTiGTGAGAATCC aZASnGZyAL4aA4GZIreA iaVaZGZyMetA tahrAAnI.PvroProHisGLnL.uGtyGluIterleAepGtyv4ZL.xuAtaVaiSerGzGAaPr 7801 GGACATTACiATTCCAGAGCTTATGGAA; CAtTC CAGG6CCTGATTTCCC6AC CGC6GGTCAAATC TT6GGACGCAGC6GTATCCGGAiA6CATACGAi oA apI lrhrrZProGtuL*uMfjtGtu Va ZI ProGZlwProA @pPh*ProrhrA alyGZtnI ZeDeuGtyJArgSerCGIyrZzArfLyrA Za,IrGZu 7 701

7901 TCAGGCCG6AG6CTCTA TCAC6ATCCGGGCiAAAGCT6AGiTCGAACAAACAtCttC6GGG;AAAGAAAGAiTTATCG6TTACAGAGTTACC;TACCAAGTAi SerCGyArgGCyserI trhrr l ArgAZaLv^AtaGZuI l GZuGt nrhsrrSerGZIyLM* GZuArgX ZsXZeta trhrCZuL*uPro8rCZ"hn ZA

2256

Nucleic Acids Research 8001

ATAAGGCGAAATTAATTGA6GAAAATTGC TGATC TCGTAAGGGACAAAAAGATAGAGGGTATCACAGATCTGCGTGATGAGTCAGATCGTACAGGTATGAG uGZyIZeThrAspLAuArgAspGZuSerAspArgThrGtyMAtAr anLyeAtaLysLeuIleGGuLysIZeAZaAapLeuVaZArgAspLyaLysIZeG

810 1

AATTGTCATT6AAATCAGACGCGATGCCAATGCGAATGTTATCTTAAACAATCTGTACA'AACAAACTGCTCTACAAACATCTTTTGGCATCAACCTGCTT gIleVaZIZeGluIZeArgArgAspAZaAenAZaAsnVaZIZeLeuAsnAsnLeuTyrLyaGZnThrAZaLeuGZnThrSerPheGZylZeAenLeuL#u

8201 GCGCTTGTTGATGGC CAGC'CGAAAG TTTTAACTC TTAAGCAATGC CTGGAGCATTAC CTTGAC CATCAAAAAGTTGTCATTAGAC GC CGTACTGCTTATG

8301

AZaLeuVatAspGZy C.nProLysVaZLeuThrLsuLyeGZnCyaLeuGZuH1isTyrLeuAspHisGZnLyeVaZValIZ*ArgArgArgThrAZaTyrG AATTGCGTAAAGCAGAAGC'GAGAGC TCATATC TTGGAAGGATTGAGAGTTGCACTGGATCATCTC GATGCAGTTATC TCC CTTATC CGTAATTCTCAAAC

ZuLeuArgLyaAZaGZuAZaArgAZaHisIZeLeuGZuGZyLeuArgVaZAZaLeuAapHieLeuAspAZaVaZIZeSerLouIZeArgAsnSerGZnTh

8401 GGCTGAAATTGCGAGAACAGGT TTAAT TGAACAATTCTCAC TGACAGAGAAGCAAGCACAAGC GATC CTTGACATGAGGCTC CAGCGTTTAACGGGACTG

rAZaGZuIZeAZaArgThrGZyLeuIZeGZuGZnPheSerLeuThrGZuLy8GCnAlaGZnAZaIZeLeuAspMetArgLeuGtnArgLeuThrGZyLue

8501 GAACGTGAAAAGATCGAAGAAGAATAC CAGTCT CTTGTTiAAATTAATTGCAGAGC TAAAAGACATC TTGGCAAATGAATATAAAGTGC TTGAGATCATTC

GZuArgGZuLysIZeGZuGZuGZuTyrGZnSerLeuVaZLysLeuIZeAZaGZuLeuLysAapIZeLeuAlaAsnGZuTyrLysVaZL*uGZuIZeIZeA

8601 GTGAAGAACTCACGGAAATCAAAGAGCGTTTTAACGATGAAAGAC GTACTGAGAT CGTCACTTCTGGACT6GAGACAATT6AAGATGAAaATC TCATCGA

rgGZuGtuLeuThrGZuIZeLyCeGuArgPheAsnAapGtuArgArgThrGZuIZeValThrSerGZyLeuGtuThrIZeGZuAapGZuAspLeufleGI

8701 GAGAGAAAATATCGTAGT TACTCTGACGCACAACGGATACGTCAAAC GTCTTCC TGCATCAACTTACCGCAGTCAAAAAC66GGCGGAAAAGGTGTACAi

uArgGZuAenIZeVaZVaZThrLeuThrEisAsnGZyTyrVaZLysArgLeuProAZaS*rThrTyrArgSerGtnLysArgGZyGZyLy#GZyVatGtn

8801 GGTATGGGAACAAACGAAGATGATTTCGTTGAACATTTGATC TCTACGTCAAC TCATGACACGATTCTCTTC TTC TC GAACAAGGGGAAAGTGTATCGT6

GlyMletGlyThrAAnGZuAspAapPheVaZGZuRiaLeuIZeSerThrS.rThrHisAApThrIZeLeuPhePheSerABnLy8GZyLy,VaZTyrArgA

89 01 CAAAAGGGTATGAAAT CC C TGAATACGGCA6AACGG6CAAAAGGAAT CC CaAT TA TTAAC CTG CTGGAGGTA6GAAAA6GGGTGAGT GGA TCAACGC GA TTA T

ZaLyeGZyTyrGtuIZeProGZuTyrGZyArgThrAZaLysCZyIZeProI eIZeAanLeuLeuGZuVaZGZuLyeGZyGZuTrpIZeAsnAZaIZtII

9001 TCCAGTCACGGAATTCAATGCGGAGC TTTACC TCTTC TTCACTACAAAGCATGGGGTTTCAAAACGAACATC GCTATCTCAATTC GC TAATATCCG6CAAC

eProVaZThrGZuPheAsnAZaGZuLeuTyrLeuPhePheThrThrLyeiiaGZyVatSerLysArgThrSerL.uSerGZnPheAAaAenIZeArgAen

9101 AATGGTC TAATTGCTCTGA'GTCTTCGTGAAGATGATGAAC TGATGGGTGTACGTC TGACTGACGGCACAAAACAAATCATCATTGGAACGAAAAACGGTT

AsnGZyL*uIZeAZaL*uS*rLeuArgGZuAspAapGZuLeuM#etGtyVaZArgLeuThrAspGtyThrLyaGZnIZeIZeIZeGZyThrLysAsnGtyL

9201

TACTGATTC6TTTCCCTGAAACAGATGTCCGAGAGATGGaAAGAACTGC6GCGGGCGTAAAAGGCATCAiCCCTGACGGATGACGACGTTaTTGTCGGCAT

euLouIleArgPheProGZuThrAspVaZArgGZuM*tGZyArgThrAZaAZaGZyVaZLyeGZyIZeThrLeuThrAspAspAspVaZVaZVaZGZyM*

9301 GGAGATTTTAGAGGAAGAATCACACGTCCTTATCGTAACTGAAAAAGGGTACGGAAAACGAACTCCTGCTGAAGAGTACAGAACC CAAAaCCGGGGCGGA

tGCuIZLeuGtuGZuGZuSerHiaVaILeuIteVaZThrGluLyCaGZyTyrGZyLysArgThrProA ZaGCZuGuTyrArgThrGLnSerArgGZyGZy

9401 AAAGGAC TCAAAACAGC GAAAATCA CCG6AaAACAA CGGC CAAC TAG TAGC AG TGAAAGC TAC TAAAGGTGAAGAGGATC TAATGAT TA TTACAGC TAGCG LyeGtyL*uLyeThrA ZaLyaIZeT'hrGZuAsnAsnG ZyGCZnLeuVa ZAZa VaZLysA ZaThrLyesCZyCGZuGZuAaepLsuMe tI ZelteThrA taS*rG 9 501 GCGTACTCATCAGAATGGACAT CAATGATATC TCCATCACCGGACGTGTCACTCAAGGTaT6GCGTCTCATCAGAATGGCAGAAGAAGAGCATGTTGCTAC

tyVaZLeuIZeArgMetAspIZeAsnAspIteSerIVeThrGlyArgVatThrGtnGZyVaZArgLeuIZeArgMetAZaGZuGluGZuHieVaZAtaTh

9601 AGTAGCTTTAGTTGAGAAAAACGAAGAAGATGAGAATGAAGAAGAACAAGAAGAAGTGTGAAAAAAAGCGCAGC

GAAATAGCTGCGCTTT-TTGTGTCA

rVaZAZaLeuVatGluLyeAAnGZuGZuAspGZuAsnGZuGttGuGZtnGtuGZuVaZ TCTTTTTAAAGACACAAGCATGACCATTATGACTAGTAAAAACTTTTTCAAAAAA 9701 TAACCCETA GTCATAAiAATTATGG ritATTTCT -35

9801 GTA;

i

-35

-10

AGTTAACTAAAAATGTffitITAAGTA -10

TCGCTTTGAGAGAAGCACACAAGTTCTTTGAAAACTAAACAAGACAAAACGTACCTGTTAA

CTGCACGACGCAGGTCACACAGGTGTCGCCGCAGGATGCGGTGAACTTAi CCTGTGATCCATTTATCGGAGAGTTTGATCCTGGCTCAGGACGACGCTGaCGGCGTGCCTAATACATGCAAGTCGAGCGGACAGATGGGAG 6 1-S RNA

9901 TTCATTTTTATAAATCGCACAGCGATGT6C 6TAGTCAGTCAAACTAGGGC

10001

Figure 3. Nucleotide sequence of the oriC region. The nucleotide number 1 is the XhoI site in E5 fragment as in Fig. 2. Amino acid sequences are also shown for ORFs accompanied by SD sequences. Conserved amino acids in homologous proteins between B. subtiZis and E. coZi are underlined. Regions of initiation sites for transcription determined by the S1-mapping method (see the accomapnying paper) are indicated by bent arrows. Possible promoter and SD sequences are boxed. Putative termination signals are indicated by open arrows. Thick black arrows indicate the characteristic repeats with TTAT(C/A)CACA as a consensus sequence. cloned fragments as well as fragments sub-cloned into pBR and M13 vectors, the site of initiation of DNA replication was determined within a 3 kbp region (Fig. 1) (2). To determine the nucleotide sequence around the origin of replication, a more detailed map of restriction sites was constructed as 2257

Nucleic Acids Research 0WTI

0W44"

0 42

2

0 5S

3

43,

41 (37) 2

0

ES 4

0Sw

I

4M4(46

40

= )1103

Q

6

ORF44

45

5f

4

EcoRt v O9

44144)

3

SI

3

74 E0

o 42

609

40

a 54

E d

57

2

Un

3

n. 55 ..

109

F52 6

S

000 41 52 55

rroO MS rUA

00F43 a

7

9

F'S

4

43(421

47

74

I

E..o I

IGO

0 53

03 40

63

o

100

E44

so

iS?

54

6?

6

Figure 4. Open reading frames in the oric region. All possible coding frames which code for more than 40 amino acids are shown on the EcoRI map of the oric region. Above the map: frames with direction of transcription oriented from left to right. Below the map: frames oriented from right to left. The frames with typical strong SD sequences are indicated by black blocks. Number of amino acids is shown for each frame. Number of amino acids in the black blocks is shown in parenthesis. Possible functional open reading frames (ORF) are named by putting number of amino acids coded by the frames. shown in Fig. 2. Using these restriction sites, fragments cloned in Charon or pBR vectors were sub-cloned into M13 phages for sequencing. Length and directions of nucleotides sequenced by the dideoxy method are shown in Fig. 2. To ascertain the sequence data, every nucleotide was determined in both directions and more than twice using overlapping fragments. One exception is a 120 bp TaqI fragment which was cloned in M13 only in one direction. This region was sequenced for three times using different fragments (Fig. 2). Nucleotide sequence of the oric region. Some 10 kbp were sequenced and numbered as in Fig. 3 starting from a xhoI site within an EcoRI fragment E5 and directing towards a ribosomal RNA operon rrno (5). Possible open reading frames (ORF) for more than 40 amino acids are listed in Fig. 4. Among them only 9 frames are preceded by translational start signals (Shine-Dalgarno sequence, SD) (Table 1). Since two small ORFs, ORF43 and ORF52, are located within the larger ORFs, ORF821 and ORF638, 2258

Nucleic Acids Research Table 1

SD

sequences

for ORF

31-end of 16s RNA

UCUUUCCUCCACUA

ORF446

613 gaAAAGGAGGgacgtgccggaagATG

ORF378

2142 cGttAGGAGGataaaaATG

ORF 71

3409 ttgAAaGAGGTcgatataATG

ORF323

3781 ctcAtGGAGGcGATctatgtcttgtccATG

ORF638

5069 AaAgtGtAGGTGAatgacgtggctATG

ORF 52

5709 tctAAaGAGGTtgTccATG

ORF821

7196 tctAgGGAGGTttTttaATG 8979

ORF 43 ORF 44

ctgctGGAGGTagaaaaggGTG -15 tcgAgGGAGGTGtcataaATG

Bases complementary to 16S RNA sequences are shown in capital letters. Numbers are nucleotide number of starting nucleotide of each ORFs. Numbering is the same as in Fig.3. *

respectively, only the remaining 7 ORFs were considered for further analysis. It should be noted that 6 of these frames are coded in one of the two strands, 5'to 3' strand (left to right of the map in Fig. 1), and therefore oriented in the same direction. Only one small frame, ORF44, was found on the other strand, 3' to 5' strand. Amino acid sequences were deduced from the nucleotide sequences for these 7 ORFs and are shown in Fig. 3 along with the nucleotide sequence. Three genes, recF, gyrB and gyrA have been mapped in the region we have sequenced (6). Recently restriction enzyme fragments which transform these genetic markers were identified (Fig.1) (7,8). These results show clearly that ORF638 and ORF821 correspond to gyrB and gyrA respectively. The fragment which transforms reaF includes both ORF71 and ORF323 indicating that either one of the two corresponds to recF gene. No genes are known corresponding to the location of ORF44, ORF446 and ORF378. We conclude that ORF638 is indeed the structural gene for gyrB, because 109 amino acids from the N-terminal portion of the protein coded by ORF638 shows 68% homology with 2259

Nucleic Acids Research those of DNA gyrase subunit B of E. coli (Fig. 3) (9). Comparison of the amino acid sequences of other ORFs with those of known genes in E. coli revealed that ORF44, ORF446, ORF378 and ORF323 are homologous with rpmH (10), dnaA (11), dnaN (12) and recF (13), respectively (see Fig. 3 for comparison). Since relative chromosomal locations of the ORFs are identical to those of homologous genes in the E. coli chromosome (11), the organization of the ORFs from ORF44 to ORF638 remarkably resembles the organization of genes in the rpmH-gyrB region of the E. coZi chromosome. Extensive comparison of genes and their organization in the replication origin region of the chromosome of the two bacteria will be reported elsewhere. Although no information is available about the structure of DNA gyrase subunit A of the two bacteria, molecular weight of the protein deduced from ORF821 is very close to the reported molecular weight for the partially purified enzyme (14). Properties of proteins coded by these 7 frames are summarized in Table 2. Expression of these 7 ORFs except for two small frames, ORF44 and ORF71, were demonstrated in E. coli using the Maxi-cell method as described in the accompanying paper. On the 5' to 3' strand, only one sequence resembling a promoter for the major RNA polymerase (E.a55) was found, near position 340, upstream of ORF446 , while putative termination signals were found downstream of each ORF378, ORF638 and ORF821 (see Fig. 3). ORF44 is accompanied by two promoters, one for E.a55 and the other for E.x29 (15), and a terminator of its own (Fig.3). In the accompanying paper we demonstrate 6 major initiation sites for transcription in this region in vivo by the Sl-mapping method. A search was made for special features in the nucleotide sequence which may serve as a signal for initiation of DNA replication. Two regions, one upstream of ORF446 (region 1) and the other between ORF446 and ORF378 (region 2) have sequences which might be recognized by regulatory proteins. They are characterized by repeats of 9 nucleotides whose consensus sequence, TTAT(C/A)CACA, is identical to that of the repeats found in the oriC sequence of the E. coZi chromosome (16). If one allows single base changes from the consensus sequence, there are 9 repeats in the region 1 and 4 in the region 2. Pairs of repeats are so oriented that they can be used to form stable stems for local stem and loop structures. In the region 1, a typical promoter structure for ORF446 is located within the repeated sequences. The activity of the promoter was detected in vivo and shown to be dependent on a DNA replication gene (see accompanying paper). The region 2 contains a stretch of some 50 bp extremely rich in AT 2260

I

,-It lqUCo

W cm

_ -0

N 0

0-

CM

-

0

", .1

Nucleic Acids Research

coa

O _QIt

1 :.bt

CO co

N

NQ0 _N

-

N0%

co

0i

01')-

_mL

be

"!

lw cv m

w

_z

0 0to 00t--00

-00

cli

;z U-

bt

N

PI OD N%

C>

19 0%

r-

m

Q

la!

-

Q

0C0D

.z 0; w

~

~

~

M 0

4

a%

,;RI

10

V,

0.- 0CIJ 0.- 0 CL

in

0

S.

C

c

08 l_ N1

u ta

-

N O

w

10

1

Uz I

mCo=Q,a 0oC

a

C

P,0

rf.0

cm

ft

co

tD G 0>O

%9

ff)

11

-

,I

.z

Q

W

0%

cm_

a

%D 1-t 10, Uw

m

0DC00-I- Co 00

_, ur

N _ IW N

co W

cn cm tn UW

-;

m

a

.4

'A 4A

L S._

o

J

R

0 000.Q- 0000000Q.Q- 0 QL

C

L

Q0000 CJ 0.- CU

eK,

-C-

-C-C

C-C

M

00

CU00-

U

0

-

0 CP

LI0

00 Q

L. C') 00-%

a

M.-

cm cm 1.0 m

tA

U. W Q 40 IV lm so

0 C') C0-0 nW

l

C'

00

C-

0.- 0a'

CL C-.-

C IV C

0

LI>,

1-'ML

,.:r -12,

!.! A en

CU-.--W0 O^ "00

-- 0C)0CI 0 'CU C-0 N _ CUJ O,W @MMr NQLQ0Q0 =eL o Q' P,Q% 0 000Q Y MWt nX

D cm

;z

CM

-

en

U; lw

m

U,

9 ti

0)

I;z

0

'Z 0

C-

000 CU 0 CM)

'A

C 0 0 gi

0--

r..

0,4OJ^¢ON w ^N

.2 0

CL

-00 C',C 0c %00WL CCC 0 CION^IV _IOOO +^O__

to0 C- 00

O____N

.,D

b4

I"! zo -W

0-,

eo

l

cm m

W

a %C

0_Nt

C

N0s

W0_~

@N

^N-

.! C.

OW_

Z.

.1 cm

4! .0

0

0 0-.0cm0 c0m3

0.-

-

.

-

Iz

L(3

LI

C=OW

V

(3

0

.&J

2261

Nucleic Acids Research c

G

T TBG T-A T-A C-G

B-C T

A-T A C A-T GTG-C G-C TBBAABTT BA B-C A

B

T

T'''I 4CCATIAICAACA 1

TC

T

C

1l

A CA T-A A-T B-C

C

TT

T

G-CA G

B-CA C-B

i

G-CTA-T B-C A-T

A

TG A-T AA

G

~ABGACGT I A CT CTAC,TAT T TS.AT ATC BT E~ITA A TGA A C A A-T

ORF446

(DNA A")

AT cluster

TR4

SD

ORF3V8

(DNA N")

Figure 5. Secondary structure of the inter-gene-spacer-sequence 2. The leader sequence of 0RF378 can be formed into a stable secondary structure. Thick arrows indicate the repeating sequence with a consensus sequence of TTAT(C/A)CACA. The arrow with TR4 indicates the start site of the transcript 4 (see accompanying paper) . (89%) from which the transcription occurs in vivo as shown in the accompanying paper. The entire region 2 can be envisaged to a stem and loop structure shown in Fig. 5. No other obviously interesting sequences were found in any other part of the 10 kbp region, although there is a 300 bp spacer region between 0RF323 and 0RF638. It should be noted that the 9 nucleotide repeating sequence is found only in 5 other places in the entire 10,000 bp region (Fig. 3). DISCUSSION. We have determined the sequence of some 10,000 nucleotide in the region of the replication origin of the B. subtilia chromosome. Since the sequence is immediately followed by the ribosomal RNA operon, rrnO, whose sequence has been determined (5), this is one of the longest stretches of the bacterial chromosome for which the nucleotide sequence has been completely determined. 2262

Nucleic Acids Research The region composes 7 ORFs each accompanied by the typical translational signal. Except for ORF44, all frames and the following rrnO are in the same orientation, from left to right away from the origin. The only ORF which is transcribed in the reverse direction is located at the other side of ORF446 relative to the origin. These results show that the direction of transcription in the oriC region is same as that of the chromosomal replication. ORF638 and ORF821 were identified as coding frames for gyrB and gyrA respectively. ORF638 was identified from the homology with amino acid sequence of DNA gyrase B subunit of E. coZi. Conservation of the amino acid sequence between the two proteins is remarkable and confirms the previous finding that the gyrB subunit of B. subtiZis complements that of E. coli both in vivo and in vitro (8,14). Genetic evidences have shown that gyrA is linked to gyrB and located between gyrB and rrno (7). Since ORF821 occupies the entire space between ORF638 and rrnO and the molecular weight of the protein deduced from the sequence is very close to that reported for the partially purified enzyme from B. subtilis (14), we conclude that ORF821 is indeed the coding frame for gyrA. In addition, a fragment within E6 and contiguous to that carrying gyrB was found to transform recF (8) suggesting that either ORF71 or ORF323 is the structure gene for recF. Comparison of the amino acid sequences of the unidentified ORFs with those of known proteins in E. coli revealed that ORF44, ORF446, ORF378 and ORF323 are homologous with rpmH, dnaA, dnaN, and recF, respectively. Among them conservation of the amino acid sequences between ORF44 and rpmH, and between ORF446 and dnaA is remarkable to leave no doubt that they share common origins. These results show that organization of ORFs from ORF44 to ORF638, ORF44-ORF446-ORF378-ORF71-ORF323-ORF638, is identical to the organization of genes in the rpmH-gyrB region of the E. coZi chromosome, rpmH-dnaA-dnaN-recF-gyrB (11), except for the addition of one ORF , ORF71, in B. subtilis. The remarkable conservation of genes and their organization between the two bacteria suggests that this region is the replication origin region of the primordial replicon from which the chromosomes of the two bacteria have evolved. Transcription of the oriC region in vivo and expression of these ORFs in E. coZi cells are reported in the accompanying paper. An extensive search for special structural features which may provide signals for initiation of replication revealed two inter-gene-spacer sequences, one between ORF44 and ORF446 and the other between ORF446 and ORF378. These sequences are located close to the site of initiation of synthesis of the first DNA strand (2). They are characterized by the 2263

Nucleic Acids Research presence of repeating sequences which may readily form stem and loop structures. It is surprising that the repeating sequence is composed of 9 nucleotides whose consensus sequence is identical with that of repeating sequences found in oriC sequence of the E. coZi chromosome (16). The 9 nucleotide sequence is found to be a binding site for the dnaA gene product, a DNAinitiation protein in E. coZi (17). Although there is no other apparent similarity between the E. coZi oriC sequence and sequences in the region 1 and 2, the localized presence of numbers of this unique sequence suggests that oriC function of the B. subtiZis chromosome is located in these regions and is recognized by a protein coded by ORF446 which is homologous with dnaA protein of E. coZi.

ACKNOWLEDGMENTS We are deeply indebted to Dr. K. von Meyenburg who has kindly indicated homology in amino acid sequences between ORFs and genes in E. coZi to us. We express our appreciation to Drs K. Bott and M. Gellert for conmnunicating unpublished data. We thank Dr. S. Murakami and other colleagues in our laboratory for discussion and Mrs. K. Terada and Mrs. T. Nishikawa for help. This work was supported by a Grant-in-aid for Special Project Research, for Cooperative Research, and for Scientific Research from Ministry of Education, Science and Culture, Japan. *To whom correspondence should be addressed

REFERENCES 1. Oka, A., Sugiura, K., Takanami, M. and Hirota, Y. (1980) Mol. Gen. Genet., 178, 9-20 2. Ogasawara, N., Mizumoto, S. and Yoshikawa, H. (1984) Gene, 30, 173-182 3. Ogasawara, N., Moriya, S., Mizumoto, S. and Yoshikawa, H. (1984) in Genetics and Biotechnology of Bacilli, Ganesan, A.T. and Hoch, J. A. ed., Academic Press, Inc., London, pp.51-65 4. Seiki, M., Ogasawara, N. and Yoshikawa, H. (1982) Proc. Natl. Acad. Sci. USA, 79, 4285-4289 5. Ogasawara, N., Moriya, S. and Yoshikawa, H. (1983) Nucleic Acid Res., 11, 6301-6318 6. Henner, D. J. and Hoch, J. A. (1982) The genetic map of BacilZus aubtiZis. pl-33. In D. Dubnau (ed) The molecular biology of the Bacilli. Academic Press, New York, NY 7. Lampe, M. F. and Bott, K. F. (1984) Nucleic Acid Res., 12, 6307-6323 8. Lampe, M. F. and Bott, K. F., personal conmunication 9. Adachi, T., Mizuuchi, K., Manzel, R. and Gellert, M. (1984) Nucleic Acid Res., 12, 6389-6395 10. Hansen, F. G., Hansen, E. B. and Atlung, T. (1982) EMBO J., 1, 1043-1048 2264

Nucleic Acids Research 11. Hansen, E. B., Hansen, F. G. and von Meyenburg, K. (1982) Nucleic Acids Res., 10, 7373-7385 12. Ohmori, H., Kimura, M. and Sakakibara, Y. (1984) Gene, 28, 159-170 13. Blanar, M. H., Sandler, S. J., Armengod, M. E., Ream, L. W. and Clark, A. J. (1984) Proc. Natl. Acad. Sci., 81, 4622-4646 14. Orr, E. and Staudenbauer, W. L. (1982) J. Bacteriol., 151, 524-527 15. Charles, W. J., Moran, C. P. Jr. and Losick, R. (1983) Nature, 302, 800-804 16. Zyskind, J. W., Cleary, J. M., Brusilow, W. S. A., Harding, N. E. and Smith, D. W. (1983) Proc. Natl. Acad. Sci. USA, 80, 1164-1168 17. Fuller, R. S., Funnell, B. E. and Kornberg, A. (1984) Cell, 38, 889-900

2265

Suggest Documents