F. Y G K G G. I G. K S T. T S Q K H L A. A L A E. M. G 0. K. I L. I t. XhoI. 3 61 ... C V. E. S G. G P. E. P G. V. G. C A G. R G V. I T S. I N F L E. E N G. A. Y. IlarI. Bg1II.
Vol. 169, No. 1
JOURNAL OF BACTERIOLOGY, Jan. 1987, p. 367-370 0021-9193/87/010367-04$02.00/0 Copyright © 1987, American Society for Microbiology
Nucleotide Sequence of the Gene Encoding the Nitrogenase Iron Protein of Thiobacillus ferrooxidans INGE-MARTINE PRETORIUS, DOUGLAS E. RAWLINGS, ERIC G. O'NEILL, WYN A. JONES, RALPH KIRBY, AND DAVID R. WOOD$* of Cape Town, Rondebosch 7700, South Africa University Department of Microbiology, Received 11 August 1986/Accepted 13 October 1986
The DNA sequence was determined for the cloned Thiobacillusferrooxidans nifH and part of the rifi genes. A putative T. ferrooxidans nipf promoter was identified whose sequences showed perfect consensus with those of the Klebsielk# pneumoniae nif promoter. Two putative consensus upstream activator sequences were also identified. The amino acid sequence was deduced from the DNA sequence. In a comparison of nifH DNA sequences from T. ferrooxidans and eight other nitrogen-fixing microbes, a Rhizobium sp. isolated from Parasponia andersonii showed the greatest homology (74*) and Clostridium pasteurianum (niffH) showed the least homology (54%). In a comparison of the amino acid sequences of the Fe proteins, the Rhizobium sp. and Rhizobium japonicum showed the greatest hpmology (both 86%) and C. pasteurianum (nifHf gene product) demonstrated the least homology (56%) to the T. ferrooxidans Fe protein.
-10 and -23 bp (1). In addition to this, nifA-activated promoters have upstream activator sequences (3). Since no genes from T. ferrooxidans have been sequenced to date, there is no information regarding gene structure, codon usage, and regulatory sequences in this important bacterium. To increase our understanding of the molecular genetics of T. ferrooxidans, we sequenced the nifH and part of the nijD genes.
The acidophilic autotrophic bacterium Thiobacillus ferrooxidans is used industrially to leach metals from mineral ores. This organism obtains its carbon through the fixation of atmospheric carbon dioxide and derives its energy either by the oxidation of ferrous iron to ferric iron or by the oxidation of reduced sulfur compounds to sulfuric acid (8). The ability of T. ferrooxidans to fix atmospheric dinitrogen was first reported by Mackintosh (9). Recently, we demonstrated that five T. ferrooxidans strains contained DNA sequences which hybridized with the highly conserved nifHDK genes of Klebsiella pneumoniae (16). In a two-stage cloning procedure, the T. ferrooxidans ATCC 33020 genes corresponding to the K. pneumoniae nifHDK operon were cloned on a 6.7-kilobase-pair fragment in the Escherichia coli vector pEcoR251. The positions of the T. ferrooxidans nifH, nifD, and nifK genes on the cloned fragment were determined by hybridization with known fragments from the K. pneumoniae nifH, nifD, or nifK genes (16). The nitrogenase proteins from different species have been shown to be closely related by the evolutionary conservation of both the DNA (19) and amino acid sequences (4, 25). Furthermore, the nitrogenase components from several different bacteria are able to complement each other to form enzymatically active hybrid complexes (5). Comparisons of Fe proteins have revealed highly conserved regions of amino acid sequences, which coincide with functionally significant domains (4, 25). Since acidophilic autotrophic T. ferrooxidans strains inhabit ecological niches which are very different from those of other diazotrophs, a comparison of nitrogenase proteins could reveal characteristics unique to T. ferrooxidans. The genetics and regulation of nitrogen fixation have been most thoroughly studied with K. pneumoniae. The nif gene cluster consists of 17 contiguous genes arranged in seven or eight operons. The nitrogenase enzyme, responsible for N2 fixation, is encoded by the nifHDK genes. The nifHDK operon is transcribed from a single promoter located upstream of the N-terminal end of the nifH gene (10, 12, 18). The nif promoters have a characteristic 26-base-pair (bp) structure containing two regions of conserved sequence at
MATERIALS AND METHODS Bacterial strains, vectors, and plasmids. Plasmid pIMP16, which contained the T. ferrooxidans nifIDK genes, was used as the primary source of DNA (16). Phage vectors M13mpl8 and M13mp19 (13) and E. coli JM103 (14) were used. Restriction endonuclease enzymes were obtained commercially and used in accordance with the specifications of the manufacturers. Standard molecular genetics techniques were used (11). The double-stranded DNA replicative form of the phage vectors and plasmid pIMP16 DNA were prepared by CsCl density gradient centrifugation (7). Sequencing reactions. All DNA sequences were determined by the dideoxy chain termination procedure (20). For the DNA sequencing reactions, the reagents and M13specific sequencing primers were obtained from Bethesda Research Laboratories, Inc. In addition to these, custom-made 15-base primers were synthesized and used (gifts from D. Botes, Departnment of Biochemistry, University of Cape Town). The DNA chains were radiolabeled with [a32 ]dCTP (3,000 Ci/mmol) or [a-35S]dCTP (400 Ci/mmol) obtained from Amersham Corp. The sequence was completely determined from both strands. Analysis of sequences. The DNA sequences were analyzed by the IBM XT computer DNA Tools program. The deduced amino acids were analyzed and compared by the IBM AT Microgenie Version 2 protein alignment subroutine. RESULTS AND DISCUSSION Nucleotide sequence of T. ferrooxidans nifH and part of niJD genes. The DNA sequence of the 1,876-bp DNA region of pIMP16 is shown in Fig. 1. From the DNA homology
* Corresponding author. 367
368
PRETORIUS ET AL.
J. BACTERIOL.
TG CGG TCCTTTG CCG TM TAGG TCTG CG CTG TGGGGTTCCTG ACATGGAAACTG CATTTTG TAGG CTTTCAAACAG TCTG CCGAAGG TTCACTCATAAGG CGG CATTCTTCG ATG ATTTT
1 21
CATAG TG TTAAATAGGC: CATG ATIGAACTTGGCACGG CCCTT,AWCAGCG AGG ACGGAACG CG ACTCG TCCCTTTTGGGGGG CTTCCATCTGG CAAG CTAG TCATTTTl'AAATAGGAIGA C
241
AlCG CAATG AG TG ACAAACTAAG ACAAATCG CCTTTTAI,,;G TAAAGG]GGG CATTGG CAAG TCCACG ACCTCG CAG AAACACCTGG CGG CACTG G CG G A AAT'GGG ACAG AAAATTCTCATC( I1
t
A
M
S
D
K
L
R
Q
I
A
F
Y
G
K
G
G
I
G
K
S
T
T
S
Q
K
H
L
A
A
L
A
E
M
G
0
K
I
L
I
3 61
XhoI GTCGG CTGCCGATCCCAAGG COGiACTCCACCCG ACTG ATCCT(; CATTCCAAGG CG CAiAGACACCG TG CTTAG TCTGG CGG CCGAPAG CCGG CAG TIGTGGAGG ATCTCG a.GC-TTG AAG ATC TC V G C D P K A D S T R L I L H S K A Q D T V L S L A A E A G S V E D L E L E D V
4 81
Bcl I ATGAAGGTGGGGTATCG CGACATCCG CTG CGTCG AG TCCGGTGGCCCTG AG CCGGGCGTGGGTTG CG CAGGTCGTIGGMTG GATCACCTCCATCAACTTCCTGGAAC-AAAACGGGGCCCTAT Ii K V G Y R D I P C V E S G G P E P G V G C A G R G V I T S I N F L E E N G A Y
601
IlarI Bg1II GATiGGCGCCAACTAIGTCTCCTACG ACG TGTTGGAGACGTGGTCTG CGGCGGCTTTGCCATG CCCATCCGGAAACAGGCG CAGGAG ATCTACATCGTCATGTCCGGCG AAATG ATGGCC D G A N Y V S Y D V L G D V V C G G F A Ni P I P K Q A Q E I Y I V M S G F Nl A
72 1
ATGCTACGCGGCCAACAA!CATCTCCAAGGGCGTG CTCAAGTATGCCAACTCCGGCGGCGTACGTCTGGGCGGCCTCATCTC TAACGAGCGTCAGACCG ACAAGGAACTC AACTGGCCG AG M
Y
A
A
N
N
I
S
K
G
V
L
K
Y
A
N
S
G
G
V
R
L
G
G
L
I
C
N
E
R
Q
T
D
K
E
L
E
L
A
E
841
SphI GCATITGCCGGCAAACTGGGCACCAAGCTCATTCATTTCGTACCCCf,CGACTTCATCGTG CAG CATG CCGAATTG CGGCGCAIG ACGGM CTiGGAATACGCACCEGGAATCCA G CAGGCG 1! 1 A G K L G T K L I F F V P Ro D F I V Q H A E L R R M T V L E Y A P E S K Q A
961
Bgl II CAAGAATACCGGACTCTGGC;GAAAAAATTCAII CCAATG CCGGCAACCCGGCTATCCCCACCCCGATCACCATGGACG AGTTGGAAG ATCTG CTTAWGGACTTCG GCATCA1'G CAG AAC Q E Y P T L A E K I H A N A G N P A I P T P I T M D E L E D L L tH D F G I M Q K
1081
GAAGACACCACCATCATCGGCAAGACTGCTGCCGAATTGGCGGCTGCGGGAATGTAATGAACGGTGGCGCGGGTTGTTCACCGTCCCCACCAGGAT- GCACCTAATTC; AAGCAAGGAG E
D
T
S
I
I
G
K
T
A
A
E
L
A
A
A
G
M
1201
Clal BglII StuI TACACCAAAATG AGTATATCAG CGGAAG ATCTCAG CACACAG CCACAG CGGAGAAAACTG CCAGAAATCG CGGAACTGATCG ATGAGACG CTCAACGCCTATCCAG AG AAG TTCG CCAAC M S I S A E D L S T Q P Q R R K L P E I A E L I D E T L K A Y P E K F A K
1 3 21
SacIl CGG CG CGCCAAG CACCTCAATG TCTATG AAG AGGGCAAG AGCG AGTG CGACTG CAAGTCCAACATCAAATCCG TTCCCGGCG TGATG ACCATCCG CGGCCTGCCGCCTACG CCGG TTCTTAC R R A K H L N V Y E E G K S E C D C K S N I K S V P G V M T I R G C A I A G S Y
1441 GGCGG1TCTGGAGCCCAGTCAAGGACAT ATCCATATCAGCCATGGTCCGGTCGGCTG-CGGCCACTACGCGCGCGCTGGACGGCGCGCCTACTACATCGGCACCACCGGGG GACACC G V V W S P V K D M I H I S H G P V G C G H Y A R A G R R A Y Y I G T 1 G V D T
1561
TACACGXACCATG CACTTCACCTCGGACTTCCAGGTCAAGGACATCGTTTTCGGCGGCGACAAG AAACTCGCCAAG CTGATGGACCGAG TTGGAAC AACTC TTTCCAATGTCCAACGGCATC Y
T
T
M
H
F
T
S
D
F
Q
V
K
PvuI
681 1681
1801
D
I
V
F
G
G
D
K
K
EcoRV
L
A
K
L
M
D
E
L
E
E
L
F
P
Di
S
K
G
I
KpnI CG GCAATCAGAATGTCCGATCGGGCTGATCGGCGACGATATCGAGGCGGTTTTCAAGAAAAAGGCCGCGGAATTCGGCAAGCCGG TS GTACCCAAICGCTC-TGAGGGT-TCCGTGGT T V Q S E C P I G L I G D D I E A V F K K K A A E F G K P V V P N R C E C F R G SacIIEcoRI
GTGTCGCAGTCGCTTGGCCACCACATCG CCAACGATAGTATCCG CGACTGGGIS CTGGACCCCGCCG CAGACAAG C V
S
Q
S
L
G
H
H
I
A
N
D
S
I
R
D
W
V
L
D
P
A
A
D
K
FIG. 1. Complete nucleotide sequence of the T. ferrooxidans nifH and part of the nifD genes. Only the coding strand (5'-3') is shown, with the direction of transcription from left to right. Restriction endonucleases having 6-bp recognition sites are indicated. The deduced amino acid sequences for the Fe protein (first encoded polypeptide) and part of the Mo-Fe protein subunit (second encoded polypeptide) are shown below the coding sequences. The regions upstream from the Fe protein (encoded by nifH) which show perfect consensus with K. pneumoniae nif promoter regions are underlined. Lines above the nucleotides upstream from the promoter region indicate the two upstream activator sequences. The two potential RBSs are indicated by bold type. The start and end of the amino acid coding regions are indicated by arrows.
studies, the nifW and part of the nifD genes had been assigned to this area of T. ferrooxidans DNA (16). A comparison of the sequences of T. ferrooxidans DNA with the nifH and nifD sequences of K. pneumoniae (25) and Azotobacter vinelandii (2) revealed extensive regions of homology. These known DNA sequences assisted in allocating the different T. ferrooxidans sequences to regions of the nifH gene and in ensuring that there were no sequence deletions when sequencing away from a cloning site in both directions. The restriction endonuclease sites revealed by the DNA sequence (Fig. 1) were in agreement with the restriction endonuclease maps obtained previously (19). The DNA region preceding the nifH gene contains sequences which show perfect consensus with the nifH promoter
sequences of K. pneumoniae (1) (Fig. 1). No E. coli-like -10 and -35 consensus promoter sequences were detectable in the DNA preceding the T. ferrooxidans nifH gene. Upstream activator sequences have recently been identified for nWH genes from K. pneumoniae, Rhizobium strains,
and Azotobacter strains (3). These sequences are characterized by the consensus sequence 5'-TOTN4TN5ACA, where N is any nucleotide. Two consensus upstream activator sequences are present in the T. ferrooxidans nucleotide sequence situated in a region 119 to 73 nucleotides upstream from the nifH promoter (Fig. 1). The sequence 5'-AGGAGA3' is apparent just preceding the proposed nifH ATG start codon. This sequence shows perfect homology to the ShineDalgamo (24) ribosome-binding site (AGGAG) (RBS). Two ATG codons separated by a GCA codon appear directly after the RBS. The Fe protein encoded by nifH thus consists of 296 of 298 amino acids, depending on which of the two ATG codons initiates translation of the gene. Part of the nucleotide sequence of the T. ferrooxidans nifD gene was obtained (positions 1209 through 1876 in Fig. 2). A potential RBS of sequence 5'-AGGAG-3' (positions 1316 to 1320) precedes the ATG translation initiation codon for the T. ferrooxidans nifD gene. The presence of RBSs preceding the T. ferrooxidans nifH and nip:j genes is in agreement with
NUCLEOTIDE SEQUENCE OF T. FERROOXIDANS GENE
VOL. 169, 1987 lTf
369
MAMSDKLRQIAFYGKGGIGKSTTSQKHLAALAEMGQKILI
1 59Tf
MAMYAANNISKGVLKYANSGGVRLGGLICNERQTDKELEL
A?I RQCAIYGKGGIGKSTTTQNLVAALAEGKKVMI MT DENIRQIAFYGKGGIGKSTTSQNTLAAMAEMGQRIMI
MANYAANNI SKGIVKYANSGSVRLGGLICNSRNTDREDEL
1CP
M
1Kp
TM
1Rm
MA A
lPr 1 Rt 1 1
MS MA MA M
SD LRQIAFYGKGGGIGKSTTSQNTLAALVDLGQKILI
1 56Av 159As 153Cp 156Kp 157Rm 157Pr 157Rt 157Rj 157Rp
4 IT f
VGCDPKADS TRL ILHSKAQDTVLSLAAEAGSVEDLELEDV
199Tf
AEALAGKLGTKLIHFVPRDFIVQHAELRRMTVLEYAPE SK
36Av 4OAs 3 5Cp 36Kp 38Rm 38Pr 38Rt 3?3Rj 38Rp
VGCDPKADSTRLILIHSKAQN TIMEMAAEAGTVEDLELEDV VGCDPKADSTRLMLHSKAQT TVLHLAAERGAVEDLELHEV
19 6Av 199As 193Cp 19 6Kp 197Rm 19 7Pr 197Rt 197Rj 19 7Rp
I IALANKLGTQMIHFVPRDNVVQRAEIRRMTVIEYDPKAK
AEALAARLNSKLIHFVPRDNIVQHAELRKfCTVIQYAPN K
8lTf
MKVGYRDIRCVE SGGPEPGVGCAGRGVITSI NFLEENGAY
239Tf
QAQEYRTLAEKIHANAGNPAIPTPITMDELEDLLMDFGIM
7 6Av 8OAs 73Cp 76Kp 7 8Rm 7 8Pr 7 8Rt 7 8Rj 7 8Rp
LKAGYGGVKCV,ESGGPEPGVGCAGRGVITAINFLEEEGAY MLTGFRGVKCVESGGPEPGVGUAGRGI ITAINFLEENGAY LKEGYGGIRCVESGGP,EPGVGzAGRGI ITSINMLEQLGAY LQIGYGDVR?AESGGPEPGVGzAGRGVITAINFLEEEGAY
236Av 239As 233Cp 236Kp 237Rm 237Pr 237Rt 237Rj 237Rp
QADEYRALARKVVDNKL LV IPNPITMDELEELLHEFGIM QGQEYRALAKKI NMDKLT IPTPMEMDELEAU(IEYGL L FVIP QAEEYRELARK V DANE L QANEYRTLAQKIVNNTMKV VPTP'CTMDELESLLMEFGIM
lAV lAs
Rj Rp
S
RQVAIYGKGGIGKSTTTQNLTSGLHAMGKTIMV RQCAIYGKGGIGKSTTTQNLVAALAEMGKKVMI LRQIAFYGKGGIGKSTTSQNTLAALVDIX.QKIL[ LRQIAFYGKGGIGKSTTSQNTLAALAEMGQKIL[
LRQIAFYGKGGLGKSTTSQNTLAALVELGQKILI S LRQIAFYGKGGIGKS TSQNTLAALAEMGQKILI
A
VGZDPKADSTRLLLGGLAQKSVLDTLREEG EDVFLDS t VGCDPKADSTRLILHAKAQNTI MEMAAEVGSVEDLELEDV
VG?DPKADSTRLILHAKAQDTVLHLAAT EGSVEDLELEDV
VGCDPKADSTRLILHAKAQDTILSLAASAGSVEDLELEDV
VGCDPKADSTRL[IL;ISKAQ(,TVLDLAATKGSVEDLELG,DV VGCDPKADSTRLILHAKAQDTILSLAASAGSVEDLELEDV VGJDPKADSTRLILNAKAQDTVLHLAAQEGSVEDLELEDV
LKVGYRGIKCVESGGPEPGVGzAGRGVITSINFLEENGAY MKVGYKDIR-VESGGPEPGVG?CAGRGVITSINFLEENGAY LKTGYGGIKNESGGPEPGVG?ZAGRGVITSINFLEENGAY MKVGYQDIRCVESGGPEPGVGrAGRGVITSINFLEENGAY LKAGYKGIKCVESGGPEPGVGzAGRGVITSINFLEENGAY
MANYAANNIARGILKYAHSGGVRLGGLI?NSRKVPREDEL NfLYAANNISKGIQKYAKSGGVRLGGI IQISRKVANEY FL MANYANISK(GIVKYAKSGKVRLGGLI?NSRQTDREDEL MALYANNIAKGILKYAHAGGVRLGGLIMNER8TDRELDL NANYANNISKGILKYANSGGVRLGGLI ERQTDKELEL MALYAANNIARGILKYASAGSVRLGGLIQTDRELDL MANYAANNISKGILKYANSGGVRLGGLI NERQTDKELEL
MALYAANNIAKGILKYAHSGGVRI.GGLINERQTDRELDL
IMNLAERLNTQMIHFVPRDNIVQHAELRRMTVNEYAPDSN
LDAFAKELGSQLIHFVPRSPMVTKAEI NKQTVIEYPDPTCE
I IALAEKJLGTQMIHFVPRDNIVQRAEIRRMTVIEYD?ACK
AEALAKKLGTQLIYFVPRDNVVQHAELRRMTVLEYAPESQ AEALAAKLNSKLIHFVPRDNIVQHAELRKMTVl QYAPRSK AEALAKKLGTQLIYFVPRDNVVQHAELRRMTVLEYAPDSK SEALAARLNSKLIHFVPRDNIVQHAELRKMTVIQYAPDSK
QAGEYRALAEKIHANSGRGTVPTPITMEELEDMLLDFGIM QADHYRN LATKVHNNGGKG I IPTPI$4MDELEDMLME HGIM QAAEYRWLAEKIHSNSGKGr IPTPITmEELEDMLLDFGIM
QADHYRKRLAKVHNNGGKG
I
IPTPISMDELEDMLMEHGI I
QAGEYRALAEKIHANSGQGT IPTP[ITMEELEDMILLDFG IM
l2lTf
DGANYVSYDVLGDVVCGGFAMPIRKQ AQEIYIVMSGEM
27 9Tf
QKEDTSI IGKTAAELAAACM
1 16Av 12OAs 113Cp 1 16Kp 1 18Rm 1 18Pr 1 18Rt I 18Rj 1 18Rp
EDDLDFVFYDVLGDVVOGGFAMPIRENKAQEIYIVCSGEM
27 5Av 27 7As 2 54Cp 27 5Kp 277Rm 27 7Pr 27 7Rt 27 7Rj 27 7Rp
EVEDESIVGKTA EEV
DDDDT KHS EIIIGKPA AEATNRSCRN LDFVSYDVLGDVVWGGFAMPIREGKAQEIYIVTSGEM KPMTQERLEEILMQYGLMDL TDDLDYVFYDVLGDVVCGGFAMPIREGKAQEIYIVASGEM EEEDTSIIGKTAAE EDDLDFVFYDVLGDVVWGFAMPIRENKAQEIYIVCSGEM KSDEQMLA ELRAKE AA NDVD YVSYDVLGDVV?GGFAMPIRENKAQEIYIVHSGEM KPVDESIVGKTAAELAAS AKVIAPH EN ID YVSYDVLGDVVWGFAMPIRENKAQEIYIVMSGEM DDVD YVSYDVLGDVVCGGFAMPIRENKAQEIYIVMSGEM KSDEQMLE ELLAKEVQAAV AP EN ID YVSYDVLGDVVGGFAMPIRENKAQEIYIVMSGEM KAVDESIIGKTAAELAAS K SDEQMLA DDVD YVSYDVLGDVVWGFAMPIRENKAQEIYIVMSGEM ELIJAKES AVVAAQ FIG. 2. Comparison of amino acid sequence alignment of nijH Fe proteins from T. ferrooxidans (T), A. vinelandii (Av [2]), Anabaena sp. (As [15]), C. pasteurianum (nifHl Cp [26]), K. pneumoniae (Kp [21, 25]), R. meliloti (Rm [27]), Rhizobium sp. isolated from P. andersonii (Pr [22]), R. trifolii (Rt [23]), R. japonicum (Rj [6]), and Rhizobium phaseoli (Rp [17]). The amino acids are identified by the single-letter code, and the sequences are read from the amino to the carboxyl terminal. The residues indicated by bold type are identical to amino acids in the T. ferrooxidans Fe protein. The cysteine residues are underlined. QD
Brigle et al. (2), who reported the presence of an RBS preceding each of the nifH, nifD, and nifK genes of A. vinelandii. Chen et al. (4) have identified putative RBSs preceding the nifH and the niJD genes of Clostridium pasteurianum. No nif promoters or E. coli -10 and -35 consensus promoter sequences were detectable in the DNA preceding the T. ferrooxidans nifD gene. A 68-bp intercistronic region occurs between the nifH and nipD genes (Fig. 1). This region corresponds to that between the A. vinelandii nipH and nifD genes (2) and confirms the contiguous nature of the T. ferrooxidans nifH and nifD genes in the operon. No inverted-repeat DNA sequences were detectable upstream of the nifD gene, suggesting that transcription of the nifH gene is not terminated between the nifH and ni:D genes. A comparison was made of the DNA sequences of the amino-acid-encoding region of the nifH genes of T. ferrooxidans, A. vinelandii (2), C. pasteurianum (the nifHI
E
structural gene) (4), K. pneumoniae (21, 25), Rhizobium meliloti (27), a Rhizobium sp. isolated from Parasponia andersonii (22), Rhizobiumjaponicum (6), Rhizobium trifolii (23), and an Anabaena sp. (15). The nifH DNA sequence of the Rhizobium sp. isolated from P. andersonii showed the greatest homology (74%) to the corresponding T. ferrooxidans sequence. The lowest DNA of homology (54%) existed between the nifH genes of C. pasteurianum and T. ferrooxidans. With the exception of the C. pasteurianum (54%) and R. trifolii (64%) sequences, the nifH DNA sequences of the other organisms were closely related to those of T. ferrooxidans, ranging between 71 and 74% homology. Comparison of the Fe proteins. A comparison was made between the amino acid sequence of the nifH Fe protein (Tf2) from T. ferrooxidans and those of nine other nitrogenfixing microbes (Fig. 2). From the comparison of amino acid alignment and homology, Tf2 consists of either 296 or 298
370
PRETORIUS ET AL.
amino acids, depending on which ATG codon initiates translation. This polypeptide length falls within the range of 273 (for C. pasteurianum protein Cp2) to 301 (for Anabaena sp. protein As2 and Rhizobium sp. protein Pr2) amino acids for the other organisms (Fig. 2). Tf2 has only five Cys residues located at positions 43, 90, 102, 136, and 187 (Fig. 2). All five residues were found at the same positions in all the other Fe proteins under comparison. Furthermore, these Cys residues are within highly conserved amino acid regions and are candidates for possible ligands of the [4Fe:4S] cluster. In addition to the highly conserved regions containing the five Cys residues, there were numerous other regions of extensive amino acid homology (indicated in bold face in Fig. 2). Apart from the first few amino acids, the amino termini of the Fe proteins showed highly conserved regions, whereas the carboxyl termini contain few conserved regions. Tf2 has two single-amino-acid deletions at residues 121 and 147. At position 147, all the other Fe proteins contain a Lys residue. In Fig. 2, the Fe protein sequences are aligned for maximum homology by the IBM AT Microgenie Version 2 protein alignment subroutine. The Fe proteins of the Rhizobium spp. as a group had the highest homology with Tf2. In particular, the Rhizobium sp. isolated from P. andersonii and R. japonicum showed 86% amino acid identity with Tf2 and were the most closely related to T. ferrooxidans. Tf2 has the lowest amino acid homology (56%) with Cp2. As expected from the DNA sequence comparisons, which had revealed extensive DNA homology among the various nitrogen-fixing microbes, the codon usage pattern in their nifH genes was very similar. For most amino acids, there was one codon which was preferentially used by the majority of the organisms. The codon usage in the nifH gene of T. ferrooxidans was very similar to that of the gram-negative bacteria but rather different from that of the gram-positive C. pasteurianum and the Anabaena sp. (data not shown). Since nif genes have been shown to be evolutionarily conserved in divergent procaryotic groups (19), the codon usage in the nif genes of an organism need not necessarily reflect the codon usage in the other genes of the organism. Evolutionary considerations. In most cases the percent homology was lower at the DNA level than at the amino acid level for each organism compared with T. ferrooxidans. The percent identity on the DNA level ranged from 54 to 74%, whereas at the amino acid level ranged from 56 to 86% for the various diazotrophs relative to T. ferrooxidans. The homology differences suggest that a stronger evolutionary selecfion exists on the amino acid sequence level than on the DNA sequence level of nif genes. ACKNOWLEDGMENTS This work was supported by grants from the South African Council for Scientific and Industrial Research and General Mining Union Corporation Limited, Johannesburg, South Africa. LITERATURE CITED 1. Beynon, J., M. Cannon, V. Buchanan-Woliaston, and F. Cannon. 1983. The nif promoters of Klebsiella pneumoniae have characteristic primary structure. Cell 34:665-671. 2. Brigle, K. E., W. E. Newton, and D. R. Dean. 1985. Complete nucleotide sequence of the Azotobacter vinelandii nitrogenase structural gene cluster. Gene 37:37-44. 3. Buck, M., S. Miller, M. Drummond, and R. Dixon. 1986. Upstream activator sequences are present in the promoters of nitrogen fixation genes. Nature (London) 320:370-378.
J. BACTERIOL. 4. Chen, K. C.-K., J.-S. Chen, and J. L. Johnson. 1986. Structural features of multiple nifH-like sequences and very biased codon usage in nitrogenase genes of Clostridium pasteurianum. J. Bacteriol. 166:162-172. 5. Emerich, D. W., and R. H. Burris. 1978. Complementary functioning of the component proteins of nitrogenase from several bacteria. J. Bacteriol. 134:936-943. 6. Fuhrmann, M., and H. Hennecke. 1984. Rhizobium japonicum nitrogenase Fe protein gene (nifH). J. Bacteriol. 158:1005-1011. 7. Ish-Horowicz, D., and J. F. Burke. 1981. Rapid and efficient cosmid cloning. Nucleic Acids Res. 9:2989-2998. 8. Kelly, D. P., P. R. Norris, and C. L. Brierley. 1979. Microbial technology: current state, future prospects, p. 263-308. In A. T. Bull, D. G. Ellwood, and C. Ratledge (ed.), Microbiological methods for the extraction and recovery of metals. Cambridge University Press, Cambridge. 9. Mackintosh, M. E. 1978. Nitrogen fixation by Thiobacillus
ferrooxidans. J. Gen. Microbiol. 105:215-218. 10. MacNeil, T., D. MacNeil, G. P. Roberts, M. A. Supiano, and W. J. Brill. 1978. Fine-structure mapping and complementation analysis of nif (nitrogen fixation) genes in Klebsiella pneumoniae. J. Bacteriol. 136:253-266. 11. Maniatis, T., E. F. Fritsch, and J. Sambrook. 1982. Molecular cloning: a laboratory manual. Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y. 12. Merrick, M., M. Filser, R. Dixon, C. Elmerich, J. Sibold, and J. Houmard. 1980. Use of translocatable genetic elements to construct a fine-structure map of the Klebsiella pneumoniae nitrogen fixation (nif) gene cluster. J. Gen. Microbiol. 117: 509-520. 13. Messing, J. 1983. New M13 vectors for cloning. Methods
Enzymol. 101:20-78, 14. Messing, J., R. Crea, and P. H. Seeburg. 1981. A system for shotgun DNA cloning. Nucleic Acids Res. 9:309-321. 15. Mevarech, M., D. Rice, and R. Haselkorn. 1980. Nucleotide sequence of a cyanobacterial niffH gene coding for nitrogenase reductase. Proc. Natl. Acad. Sci. USA 77:64766480. 16. Pretorius, I. M., D. E. Rawlings, and D. R. Woods. 1986. Identification and cloning of Thiobacillus ferrooxidans structural nif-genes in Escherichia coli. Gene 45:59-65. 17. Quinto, C., H. de la Vega, M. Flores, J. Leemaps, M. A. Cevalos, M. A. Pardo, R. Azpirox, M. de L. Girard, E. Calva, and R. Palacios. 1985. Nitrogenase reductase: a functional multigene family in Rhizobium phaseoli. Proc. Natl. Acad. Sci. USA 82:1170-1174. 18. Riedel, G. E., F. M. Ausubel, and F. C. Cannon. 1979. Physical map of chromosomal nitrogen fixation (nij) genes of Klebsiella pneumoniae. Proc. Natl. Acad. Sci. USA 76:2866-2870. 19. Ruvkun, G. B., and F. M. Ausubel. 1980. Interspecies homology of nitrogenase genes. Proc. Natl. Acad. Sci. USA 77:5463-5467. 20. Sanger, F., S. Nicklen, and A. R. Coulson. 1977. DNA sequencing with chain terminating inhibitors. Proc. Natl. Acad. Sci. USA 74:5463-5467. 21. Scott, K. F., B. G. Rolfe, and J. Shine. 1981. Biological nitrogen fixation: primary structure of the Klebsiella pneumoniae nifll and nifD genes. J. Mol. Appl. Genet. 1:71-81. 22. Scott, K. F., B. G. Rolfe, and J. Shine. 1983. Nitrogenase structural genes are unlinked in the nonlegume symbiont Parasponia rhizobium. DNA 2:141-148. 23. Scott, K. F., B. G. Rolfe, and J. Shine. 1983. Biological nitrogen fixation: primary structure of the Rhizobium trifolii iron protein gene. DNA 2:149-155. 24. Shine, J., and L. Dalgarno. 1970. Determinant of cistron specificity in bacterial ribosomes. Nature (London) 254:34-38. 25. Sundaresan, V., and F. M. Ausubel. 1981. Nucleotide sequence of the gene coding for the nitrogenase iron protein from Klebsiella pneumoniae. J. Biol. Chem. 256:2808-2812. 26. Tanaka, M., M. Haniu, K. T. Yasunobu, and L. E. Mortenson. 1977. The amino acid sequence of Clostridium pasteurianum iron protein, a component of nitrogenase. J. Biol. Chem. 27.
252:7093-7100. Torok, I., and A. Kondorosi. 1981. Nucleotide sequence of the Rhizobium meliloti nitrogenase reductase (nifH) gene. Nucleic Acids Res. 9:5711-5723.