articles
The complete genome of the hyperthermophilic bacterium Aquifex aeolicus
8
Gerard Deckert*†, Patrick V. Warren*†, Terry Gaasterland‡, William G. Young*, Anna L. Lenox*, David E. Graham§, Ross Overbeek‡, Marjory A. Snead*, Martin Keller*, Monette Aujay*, Robert Huberk, Robert A. Feldman*, Jay M. Short*, Gary J. Olsen§ & Ronald V. Swanson* * Diversa Corporation, 10665 Sorrento Valley Road, San Diego, California 92121, USA ‡ Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, Illinois 60439, USA § Department of Microbiology, University of Illinois, Urbana, Illinois 61801, USA k Lehrstuhl fu¨r Mikrobiologie, Universita¨t Regensburg W-8400, Regensburg W-8400, Germany . ............ ............ ............ ........... ............ ............ ............ ........... ............ ............ ............ ........... ............ ............ ............ ........... ............ ............ ............ ............ ...........
Aquifex aeolicus was one of the earliest diverging, and is one of the most thermophilic, bacteria known. It can grow on hydrogen, oxygen, carbon dioxide, and mineral salts. The complex metabolic machinery needed for A. aeolicus to function as a chemolithoautotroph (an organism which uses an inorganic carbon source for biosynthesis and an inorganic chemical energy source) is encoded within a genome that is only one-third the size of the E. coli genome. Metabolic flexibility seems to be reduced as a result of the limited genome size. The use of oxygen (albeit at very low concentrations) as an electron acceptor is allowed by the presence of a complex respiratory apparatus. Although this organism grows at 95 8C, the extreme thermal limit of the Bacteria, only a few specific indications of thermophily are apparent from the genome. Here we describe the complete genome sequence of 1,551,335 base pairs of this evolutionarily and physiologically interesting organism.
Complete genome sequences have been determined for a number of organisms, including Archaea1, Bacteria2–7, and Eukarya8. Here we present and explore the genome sequence of Aquifex aeolicus. With growth-temperature maxima near 95 8C, Aquifex pyrophilus and A. aeolicus are the most thermophilic bacteria known. Although isolated and described only recently9, these species are related to filamentous bacteria first observed at the turn of the century, growing at 89 8C in the outflow of hot springs in Yellowstone National Park10,11. The observation of these macroscopic assemblages would later be instrumental in the drive to culture hyperthermophilic organisms12. The Aquificaceae represent the most deeply branching family within the bacterial domain on the basis of phylogenetic analysis of 16S ribosomal RNA sequences13,14, although analyses of individual protein sequences vary in their placement of Aquifex relative to other groups15–18. The genera in this group, Aquifex and Hydrogenobacter, are thermophilic, hydrogen-oxidizing, microaerophilic, obligate chemolithoautotrophs9,19–21. A. aeolicus (isolated by R.H. and K. O. Stetter) was cultured at 85 8C under an H2/CO2/O2 (79.5:19.5:1.0) atmosphere in a medium containing only inorganic components. A. aeolicus does not grow on a number of organic substrates, including sugars, amino acids, yeast extract or meat extract. Unlike its close relative A. pyrophilus, A. aeolicus has not been shown to grow anaerobically with nitrate as an electron acceptor in the laboratory. From study of the physiology of the organism, several predictions can be made. As an autotroph, A. aeolicus must have genes encoding proteins for one or more modes of carbon fixation and a complete set of biosynthetic genes. As autotrophy is a feature that is distributed throughout the Archaea and Bacteria, most of the associated genes are expected to be of ancient origin and clearly related to those characterized elsewhere. The obligate autotrophy suggests a biosynthetic rather than a degradative character. Oxygen respiration † Present addresses: Codex Bioinformatics Services, PO Box 90273, San Diego, California 92169, USA (G.D.); Department of Bioinformatics, SmithKline Beecham Pharmaceuticals, Collegeville, Philadelphia 19426, USA (P.V.W.)
NATURE | VOL 392 | 26 MARCH 1998
implies the presence of corresponding utilization and tolerance genes. The early divergence of the Aquificaceae inferred from ribosomal RNA sequences leads to several questions. Are the machineries for oxygen usage and tolerance homologous to those found in mitochondria and well studied organisms such as Escherichia coli, or were they invented separately? If there was far less oxygen when the lineage originated, is there evidence for use of alternative oxidants? Genome
General features of the A. aeolicus genome are listed in Box 1. We classified 1,512 open-reading frames (ORFs) into one of three categories, namely, identified (Table 1), hypothetical, or unknown. Identified ORFs were further classified into one of 57 cellular role categories adapted from Riley22 (Table 1). The relatively high G þ C content of the two 16S-23S-5S rRNA operons (65%) is characteristic of thermophilic bacterial rRNAs23. The genome is densely packed: most genes are apparently expressed in polycistronic operons and many convergently transcribed genes overlap slightly. Nonetheless, many genes that are functionally grouped within operons in other organisms, such as the tryptophan or histidine biosynthesis pathways, are found dispersed throughout the A. aeolicus genome or appear in novel operons. Even when they encode subunits of the same enzyme, the genes are often separated on the chromosome (for example, gltB and gltD, the genes encoding the large and small subunits of glutamate synthase). Operon organization of genes for the biosynthesis of amino acids is found in both Archaea and Bacteria but it is not universal in either group. A. aeolicus is extreme in that no two amino acid biosynthetic genes are found in the same operon. In contrast, genes required for electron transport, hydrogenase subunits, transport systems, ribosomal subunits, and flagella are often in functionally related operons in A. aeolicus (Fig. 1). No introns or inteins (protein splicing elements) were detected in the genome. A single extrachromosomal element (ECE) was identified during sequencing. Sequence redundancy for the total project was calculated to be 4.83. The ECE, however, is significantly over-represented
Nature © Macmillan Publishers Ltd 1998
353
articles relative to the chromosome; when calculated independently for the final assemblies, redundancies are 4.73 and 8.76 for the chromosome and for the ECE, respectively. The ECE therefore appears to be present at roughly twice the copy number of the chromosome. Although no ORFs on the ECE can be assigned a function with confidence, except for a transposase, two of the predicted proteins show similarity to hypothetical proteins in the Methanococcus jannaschii genome1. One ORF on the ECE is also present in two identical copies on the A. aeolicus chromosome, providing evidence of genetic exchange between the chromosome and the ECE. Reductive tricarboxylic acid cycle
As an autotroph, A. aeolicus obtains all necessary carbon by fixing CO2 from the environment. An assay for activity of the reductive tricarboxylic acid (TCA) cycle in A. pyrophilus cell extracts showed in vitro activities for each proposed reaction24. The reductive (reverse) TCA cycle fixes two molecules of CO2 to form acetylcoenzyme A (acetyl-CoA) and other biosynthetic intermediates25. The A. aeolicus genome contains genes encoding malate dehydrogenase, fumarate hydratase, fumarate reductase, succinate-CoA ligase, ferredoxin oxidoreductase, isocitrate dehydrogenase, aconitase and citrate synthase, which together could constitute the TCA pathway. There is no biochemical evidence for alternative carbonfixation pathways in A. pyrophilus24,25 nor is there sequence evidence for such pathways in A. aeolicus. The TCA cycle is vital as it provides the substrates of many biosynthetic pathways. (It is beyond the scope of this report to detail these biosynthetic pathways, but they seem to be typically bacterial, and candidate genes for all or most of the enzymes have been identified in A. aeolicus.) The central role of the TCA cycle is emphasized by duplication of many of its constituent genes in A. aeolicus. Two genes encode proteins that are similar to malate dehydrogenase (in addition to a lactate dehydrogenase). The fumarate hydratase is split into amino- and carboxy-terminal subunits, as is the case in M. jannaschii1. Unlinked genes encoding two iron– sulphur proteins of fumarate reductase (alternatively succinate dehydrogenase) accompany a single flavoprotein subunit. Two sets of genes resembling succinate-CoA ligase (both the a- and bsubunits) are present. A. aeolicus has two putative operons encoding four-subunit (a, b, g, d) 2-acid ferredoxin oxidoreductases; members of this family catalyze reversible carboxylation/decarboxylation of pyruvate, 2-isoketovalerate, or 2-oxoglutarate with varying specificity26. These duplicated genes may encode paralogous proteins with unique substrate specificity, as opposed to redundant functions. For example, a paralogue of succinate-CoA ligase may activate citrate with coenzyme A to form citryl-CoA, which citrate synthase can cleave to produce oxaloacetate and acetyl-CoA. Gluconeogenesis through the Embden–Meyerhof–Parnas pathway
Growing autotrophically, A. aeolicus must synthesize pentose and hexose monosaccharides from products of the reductive TCA cycle. Pyruvate produced by pyruvate ferredoxin oxidoreductase or by pyruvate carboxylase (oxaloacetate decarboxylase)24 may enter the Embden–Meyerhof–Parnas pathway of glycolysis and gluconeogenesis. Genes encoding fructose-1,6-bisphosphatase, an essential gluconeogenic enzyme in E. coli, have not been identified in the genomes of the autotrophs A. aeolicus or M. jannaschii1, suggesting that an unidentified pathway may exist. The A. aeolicus genome also encodes enzymes of the pentose-phosphate pathway and enzymes for glycogen synthesis and catabolism. We found neither (phospho) gluconate dehydrase nor 2-keto-3-deoxy-(6-phospho)gluconate aldolase of the Entner–Doudoroff pathway.
The enzymes for oxygen respiration are similar to those of other bacteria: ubiquinol cytochrome c oxidoreductase (bc1 complex), cytochrome c (three different genes) and cytochrome c oxidase (with two different subunit I genes and two different subunit II genes). The alternative system, with cytochrome bd ubiquinol oxidase, is also present. Clearly, the Aquifex lineage did not independently invent oxygen respiration. This leaves at least three possibilities: consistent with the ability of Aquifex to use very low levels of oxygen, the oxygen-respiration system was highly developed when oxygen had only a small fraction of its present concentration before the advent of oxygenic photosynthesis; contrary to what is implied by the 16S phylogeny, the lineage including Aquifex originated after the rise in atmospheric oxygen; or oxygen respiration developed once, and was then laterally transferred among bacterial lineages and acquired by Aquifex. Many other oxidoreductases are present in addition to those obviously involved in oxygen respiration. The physiological role of most of these oxidoreductases is unknown or ambiguous, but two deserve comment. There is a putative nitrate reductase in the genome, although A. aeolicus has not been observed to perform NO−3 respiration, unlike the closely related A. pyrophilus. The nitrate reductase gene is adjacent to a nitrate transporter, and may be involved in nitrogen assimilation rather than respiration. It is also possible that A. aeolicus has a latent ability to respire with nitrate but that the conditions required have not been found. Two gene sequences show strong similarities to Rieske proteins, even though the rest of the ubiquinol cytochrome c oxidoreductase subunits appear only once in the genome. One of these Rieske protein genes is adjacent to a sulphide dehydrogenase subunit, suggesting a role in sulphur respiration. Oxidative stress
A. aeolicus grows optimally under microaerophilic conditions and consequently possesses various protective enzymes to counter reactive oxygen species, particularly superoxide and peroxide. The genome contains three genes encoding superoxide dismutases, two of the copper/zinc family and one of the iron/manganese family. The latter has also been noted in A. pyrophilus27. One of the copper/ zinc superoxide dismutase genes is located in a large gene cluster encoding formate dehydrogenase. No catalase genes were identified. There are several genes in the genome that might encode proteins that catalyze the detoxification of H2O2, including cytochrome c peroxidase, thiol peroxidase, and two alkyl hydroperoxide reductase genes. All of these enzymes require an exogenous reductant and therefore do not evolve O2. However, treatment of A. pyrophilus9 or A. aeolicus biomass with H2O2 results in the rapid evolution of gas bubbles. This catalase activity may result from a novel enzyme that cannot yet be identified by sequence similarity. Motility
Like A. pyrophilus9, A. aeolicus is motile and possesses monopolar polytrichous flagella. More than 25 genes encoding proteins involved in flagellar structure and biosynthesis have been identified in A. aeolicus (Box 1). However, no homologues of the bacterial chemotaxis system were identified. In enteric bacteria, membranebound receptors bind chemoattractants and repellents and mod-
Figure 1 Linear map of the A. aeolicus circular chromosome. Genes are shown as arrows which denote the direction of transcription and are coloured to denote functional categorization according to the key below the figure. The sequences of the two rRNA gene clusters are identical. Here, the first base of the coding sequence of fusA was arbitrarily assigned as base number 1 as no origin of
Respiration
replication has been identified. ORF numbers are discontinuous because some
Aquifex species are able to grow by using oxygen concentrations as low as 7.5 p.p.m. (R.H. and K. O. Stetter, unpublished observations).
ORFs representing 100 amino acids or more are not predicted to be coding and
354
8
are not shown.
Nature © Macmillan Publishers Ltd 1998
NATURE | VOL 392 | 26 MARCH 1998
Q
articles ulate the activity of the histidine kinase CheA28. Phosphoryl groups from CheA are transferred to CheY, which then binds to the flagellar switch, altering the direction of flagellar rotation. Homologous chemotaxis systems are present in the archaea Halobacterium salinarum29 and Pyrococcus sp. OT3 (H. Sizuya, personal communication), although the bacterial and archaeal flagellar apparatuses are not homologous30. The M. jannaschii genome also lacks homologues of known genes required for chemotaxis. Thus, either motility in A. aeolicus and M. jannaschii is undirected or input for controlling taxis is mediated through another, unidentified system. The most studied chemotaxis systems respond to sugars and amino acids, although responses to other inputs (for example, metals, redox potential, and light) may also occur. In contrast to all the organisms known to possess the classical chemotactic signaltransduction pathways, both A. aeolicus and M. jannaschii are obligate chemoautotrophs. Chemoautotrophs may respond to a different set of factors, such as concentrations of dissolved gas (CO2, H2 or O2) or another critical parameter such as temperature. In E. coli, the flagellar switch is essential for flagellar structure and function and coupling of chemotaxis signals. But the A. aeolicus genome encodes homologues of only two of the three E. coli proteins that make up the switch, FliG and FliN. Biochemical31 and genetic32 studies implicate the missing FliM protein as the receptor for phosphorylated CheY, the switch signal. The absence of both FliM and CheY in A. aeolicus supports the identification of FliM as the receptor for phosphorylated CheY in E. coli. This result also argues against a direct role for FliM in torque generation.
Box 1 Aquifex aeolicus genome features General Length
1,551, 335 bp
G + C content
43.4%
Protein-coding regions
93%
Stable RNA 0.8% Non-coding repeats
8
(none significant)
Intergenic sequences
6.2%
RNA Ribosomal RNA
Chromosome coordinates
16S-23S-5S
572785-567770
16S-23S-5S
1192069-1197084
Transfer RNA 44 species (7 clusters, 28 single genes) Other RNAs
Chromosome coordinates
tmRNA
1153844-1153498
Chromosomal coding sequences 849 similar to protein of known function (average length 1,066 bp) 256 similar to protein of unknown function (average length 898 bp) 407 unknown coding regions (average length 762 bp) 1,512 total (average length 956) Extrachromosmal element (ECE) Length
39,456 bp
G + C content
36.4%
Protein-coding regions
53.5%
ECE-coding sequences 1 similar to proteins of known function (length 948 bp)
DNA replication and repair
4 similar to proteins of unknown function (average length 667 bp)
The A. aeolicus primary replicative DNA polymerase, corresponding to the DNA polymerase III holoenzyme in E. coli, probably consists
27 unknown coding regions (average length 648 bp)
Figure 2 Histogram representation of the similarity of selected classes of
150
predicted proteins to predicted proteins from the E. coli (EC) and M. jannaschii
100
(MJ) genomes. Predicted A. aeolicus proteins representing each category were
50
independently compared to sets of all potential polypeptides ($100 amino acids)
0
Functionally identified (848 Aquifex ORFs)
15
from the two genomes using FASTA44. If the top scoring alignment covered $80%
20
25
30
35
40
45
50
55
60
EC (avg id: 38%; count 656)
65
70
75
80
MJ (avg id: 35%; count 379)
of the length of the A. aeolicus protein, the score was plotted. There were more 20
proteins (those identified by database match but of unknown function) are very
15
similarly represented by M. jannaschii and E. coli. There are a small number of
10
very highly conserved hypotheticals that are shared between A. aeolicus and
5
M. jannaschii. Generally, biosynthetic categories show less discrimination than
0
information-processing categories, which are clearly more E. coli-like. The variation in the apparent rates of evolution in different categories suggests that different phylogenies may be inferred depending on the sequence analysed. Within each graph, correspondence to E. coli is shown in white and M. jannaschii is shown in black. Avg id, average identity; count, number of proteins analysed.
Number of proteins displaying similarity
positives found in the E. coli genome in nearly every category. Hypothetical
Amino acid biosynthesis (71 Aquifex ORFs)
15
20
25
30
35
40
45
50
55
60
EC (avg id: 38%; count 66)
65
70
75
80
MJ (avg id: 42%; count 58)
20
Translation (98 Aquifex ORFs)
15 10 5 0 15
20
25
30
35
40
45
50
55
60
EC (avg id: 44%; count 80)
65
70
75
80
MJ (avg id: 32%; count 46)
7 6 5 4 3 2 1 0
Transcription (31 Aquifex ORFs)
15
20
25
30
35
40
45
50
55
EC (avg id: 39%; count 27)
60
65
70
75
80
MJ (avg id: 32%; count 6)
35 30 25 20 15 10 5 0
Hypothetical (255 Aquifex ORFs)
15
20
25
30
35
40
45
50
55
EC (avg id: 32%; count 121)
60
65
70
75
80
MJ (avg id: 33%; count 115)
Per cent identity
NATURE | VOL 392 | 26 MARCH 1998
Nature © Macmillan Publishers Ltd 1998
355
articles of a core structure containing a- and e-subunits, a g-t-subunit and an additional member of the g-t/d9-family. A gene encoding a protein homologous to the b-sliding clamp was also found. This minimalistic complex lacks homologous u-, d-, x- and c-subunits, as does the Mycoplasma genitalium holoenzyme3. Translation of the 54K (relative molecular mass) g-t-ATPase subunit may proceed without a programmed frameshift to produce a protein similar to the N-terminal region of the E. coli g-subunit. DNA polymerase I is present as separate Klenow fragment and 59 → 39 exonuclease subunits, encoded by two non-adjacent ORFs. Although the repair polymerase, DNA polymerase II, has not been found in A. aeolicus, one ORF (Aq1422) encodes a protein similar to the eukaryotic DNA repair polymerase-b. A member of the same family has been identified in Thermus aquaticus33 and Bacillus subtilis. Transcriptional and translational apparatuses
The transcriptional apparatus of A. aeolicus is similar to that of E. coli and lacks any components specific to the Eukarya or Archaea (Fig. 2). In addition to the core RNA polymerase a-, b-, and b9subunits, four s-factors which determine promoter specificity are present (Table 1). Several different families of bacterial transcriptional regulators were also identified, including two-component systems. All of the ribosomal proteins and elongation factors common to other bacteria are present, indicating that all bacteriaspecific ribosomal proteins were present in the common ancestor of Aquifex and other bacteria. Also present are the four sel genes required for the cotranslational incorporation of selenocysteine. These latter genes are clustered in a 15-kilobase-pair segment that also encodes the biosynthetic and structural proteins for formate dehydrogenase, the only selenocysteine-containing protein identified. The gene that encodes selenocysteine transfer RNA, selC, is apparently cotranscribed with the genes encoding the formate dehydrogenase structural proteins. A. aeolicus lacks glutaminyl-tRNA and asparaginyl-tRNA synthetases. The genes required for transamidation of glutamyl-tRNAGln are present34. Charging of asparaginyl-tRNA is likely to proceed through the analogous reaction, as shown in halobacteria35, although the genes(s) for that transamidase are unknown. The canonical methionyl- and leucyl-tRNA synthetases have only been seen previously as single polypeptide enzymes; however, in A.
aeolicus the homologues appear fragmented into two subunits. In both cases, the genes that encode the N- and C-terminal portions are widely separated on the chromosome. No complete threedimensional structural data are available for either methionyl- or leucyl-aminoacyl tRNA synthetases, but the subunit organization in the A. aeolicus aminoacyl-tRNA synthetases may reflect domain organization in the homologous proteins. Thermophily
The A. aeolicus genome is the second completely sequenced genome of a hyperthermophile. By comparing the A. aeolicus and M. jannaschii genomes and contrasting them with the complete genomes of mesophiles, we can discover whether there are aspects of the genome or the encoded information that are diagnostic of hyperthermophiles. The G þ C content of the stable RNAs is clearly indicative of the high growth temperature of the organism. This property can be used to identify stable RNAs against the relatively low G þ C background of the A. aeolicus genome. The gene encoding tmRNA (or 10Sa RNA)36, an RNA involved in tagging polypeptides translated from incomplete messenger RNAs for degradation, was located in this way. Two genes for reverse gyrase are present in the genome. This is the only protein known to be present only in thermophiles. Other proteins, currently described as hypotheticals, may be diagnostic of hyperthermophiles but the data sets are not yet large enough to decide this with confidence. Although features of stabilization may not be apparent in any given protein37, a large enough data set may reveal general trends in amino-acid usage that are informative. Particularly important in this regard is inclusion of multiple genomes of hyperthermophiles so as not to allow the idiosyncracies of a single organism to bias the conclusions. As shown in Table 2, comparison of the amino-acid composition encoded by six genomes shows that use of individual amino acids can vary significantly from genome to genome. The data suggest trends that may be correlated with the thermostability of the encoded proteins. One apparent trend is that the hyperthemophile genomes encode higher levels of charged amino acids on average than mesophile genomes38, primarily at the expense of uncharged polar residues. Glutamine in particular seems to be significantly discriminated against in the hyperthermophiles. Although this observation might be rationalized on the basis of
Table 2 Comparison of relative amino acid compositions (in percentages) of mesophiles and thermophiles Mesophiles Amino acid A C D E F G H I K L M N P Q R S T V W Y
Thermophiles
H. influenzae
H. pylori
E. coli
Synechosystis
A. aeolicus
M. jannaschii
8.21 1.03 4.98 6.48 4.46 6.65 2.05 7.10 6.32 10.50 2.44 4.89 3.72 4.64 4.47 5.84 5.20 6.68 1.12 3.12
6.83 1.09 4.77 6.88 5.41 5.76 2.12 7.20 8.94 11.18 2.28 5.83 3.28 3.70 3.46 6.81 4.37 5.59 0.70 3.68
9.55 1.11 5.20 5.91 3.87 7.42 2.26 5.95 4.48 10.56 2.86 3.88 4.41 4.42 5.58 5.67 5.35 7.11 1.48 2.83
9.07 1.01 5.07 6.20 3.75 7.77 1.93 6.31 4.26 10.93 2.12 3.76 5.09 5.26 5.18 5.46 5.53 7.10 1.30 2.78
5.90 0.79 4.32 9.63 5.13 6.75 1.54 7.32 9.40 10.57 1.92 3.60 4.07 2.04 4.91 4.79 4.21 7.93 0.93 4.13
5.54 1.27 5.52 8.67 4.20 6.41 1.43 10.45 10.36 9.38 2.33 5.24 3.38 1.44 3.85 4.46 4.06 6.85 0.71 4.33
...................................................................................................................................................................................................................................................................................................................................................................
Charged residues (DEKRH) Polar/uncharged residues (GSTNQYC) Hydrophobic residues (LMIVWPAF)
Mesophiles
Thermophiles
24.11 31.15 44.74
29.84 26.79 43.36
...................................................................................................................................................................................................................................................................................................................................................................
356
Nature © Macmillan Publishers Ltd 1998
NATURE | VOL 392 | 26 MARCH 1998
8
articles an increased rate of deamidation of this residue at higher temperatures, aspargine does not appear subject to similar discrimination. Phylogeny
The placement of the Aquifex lineage as one of the earliest divergences in the eubacterial tree13,14 is interesting because of the insights it could provide into the ancestral eubacterial phenotype, including the hypothesized thermophilic nature of the first bacteria. Proteinbased phylogenies often do not support the original rRNA-based placement15,16,18. Thus, the availability of some 1,500 genes from an Aquifex species would seem to offer a definitive resolution of the phylogeny. However, our analyses of ribosomal proteins, aminoacyl-tRNA synthetases, and other proteins do not do so, showing no consistent picture of the organism’s phylogeny. We cannot make a more complete analysis and discussion here, but some observations can be made. These proteins do not yield a statistically significant placement of the Aquifex lineage or of other major eubacterial lineages. This situation partially reflects the inadequacy of some protein sequences as indicators of distant molecular genealogy because of their particular evolutionary dynamic, including the patterns and rates of amino-acid replacements. In some cases (such as the aminoacyl-tRNA synthetases for arginine, cysteine, histidine, proline and tyrosine), the analyses are further complicated by the presence of paralogous genes and/or apparent lateral gene transfers. It seems that a more extensive survey of genes and a better sampling of major eubacterial taxa will be required to confidently confirm or refute an early divergence of the Aquifex lineage.
gels; and second, dye-terminator (ABI Prism FS+) reactions using two pBluescript-specific primers. These reactions were analysed on 36-cm 5% Long-Ranger gels. The sequence fragments were assembled on an Apple Power Macintosh computer using Sequencher (Gene Codes, Ann Arbor, MI), an assembly and editing program. Assembly was typically performed in batches of roughly 200– 400 sequences, and was followed by inspection and editing of the assemblies. All sequences in the set were compared with all others through this process. After assembly, the sequences comprised ,750 contigs at the end of the random phase. Sequences were obtained from both ends of ,200 randomly chosen clones from a fosmid library42,43. These sequences were then assembled with consensus sequences derived from the contigs of random-phase sequences using Sequencher. Gaps between contigs were closed by direct sequencing on fosmids not wholly contained within a contig. The fosmid library thus served a purpose analogous to that of the l-scaffold in other projects1–4. The final eight gaps were closed by direct sequencing of polymerase chain reaction (PCR) products generated with the TaqPlus Long PCR System (Stratagene Cloning Systems, La Jolla, CA). Consequences of reducing the number of sequences in the random phase are the large number of gaps that remain to be closed in the directed phase, and the reduction in overall coverage. To ensure that reduced coverage did not compromise accuracy, ,200 oligonucleotide primers were synthesized to resequence regions of ambiguity identified by visual inspection of the entire assembly. 13,785 sequences, with an average edited read length of 557 base pairs, constitute the final assembly. On the basis of a relatively small number of errors identified during the annotation process, we estimate the error frequency to be ,0.01%, comparable to other published genomic sequence estimates.
8
Gene (ORF + RNA) identification and functional assignment approaches.
Conclusions
Advances in sequencing techniques have allowed us to move beyond studies of single genes to studies of complete genomes only recently2. This rapid advance has created the opportunity to begin to characterize an organism with the full knowledge of the genome in hand. The complete genome summarized in this report represents our first view of A. aeolicus. The challenge now is to ask specific questions in ways which take advantage of the whole-genome data. Beyond studies of any single organism in isolation, complete genomes allow comprehensive comparisons between organisms. For instance, comparisons of the similarity of genes can be made that reveal that genes in different categories vary in their relative conservation (Fig. 2). In addition, genome-wide trends are apparent. For example, why is there not more of a tendency to group functionally related genes (for example, biosynthetic pathways) into operons in A. aeolicus? This was also seen in the genome sequence of the autotroph M. jannaschii1. Is this because the autotrophic lifestyle decreases the need for selective regulation? There also seem to be a few multifunctional, fused proteins in A. aeolicus and M. jannaschii. Although this seems unlikely to be related to autotrophy, it might be associated with extreme thermophily. The large number of diverse genome sequences that will become available in the coming years will allow more detailed correlation of global genomic properties M with particular physiologies. .........................................................................................................................
Methods
Sequencing strategy. The sequencing strategy used to assemble the complete genome was based on the whole genome random (or ‘shotgun’) approach, which has been successfully used for other genomes of similar size1–4. Shotgun sequencing projects are characterized by two phases: an initial completely random phase in which the bulk of the data is collected, followed by a closure phase where directed techniques are used to close gaps and complete the assembly. By pursuing a strategy where only 97% coverage was initially achieved, we were able to limit the number of sequences needed for the random phase to only 10,500 (ref. 39). Sequences were generated from a small insert library constructed in l ZAP II vectors40,41 (average insert length 2.9 kilobase pairs). Two different methods were used for sequencing: first, dye-primer M13-21 and M13 reverse primer ABI Prism CS+ ready reaction kits, analysed on 48-cm 4% polyacrylamide NATURE | VOL 392 | 26 MARCH 1998
Coding regions of the A. aeolicus genome were analysed and assigned using primarily the programs BLASTP44 and FASTA45 to search against a nonredundant protein database. Many analyses were carried out within the context of MAGPIE46,47, an integrated computing environment for genome analysis. The results of these analyses are available for user interpretation, validation, and categorization. Additional ORFs were identified and start sites refined using the program CRITICA (J. H. Badger and G.J.O., unpublished program). Finally, all presumed ‘intergenic regions’ were examined with BLASTX for similarities to known protein sequences48. Transfer RNA genes were identified with the program tRNAscan-SE49. Received 26 August 1997; accepted 3 February 1998. 1. Bult, C. et al. Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii. Science 273, 1058–1073 (1996). 2. Fleischmann, R. D. et al. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269, 496–511 (1995). 3. Fraser, C. M. et al. The minimal gene complement of Mycoplasma genitalium. Science 270, 397–403 (1995). 4. Tomb, J.-F. et al. The complete genome sequence of the gastric pathogen Helicobacter pylori. Nature 388, 539–547 (1997). 5. Himmelreich, R. et al. Complete sequence analysis of the genome of the bacterium Mycoplasma pneumoniae. Nucleic Acids Res. 24, 4420–4449 (1996). 6. Kaneko, T. et al. Sequence analysis of the genome of the unicellular cyanobacterium Synechocystis sp. strain PCC7803. II. Sequence determination of the entire genome and assignment of potential protein-coding regions. DNA Res. 3, 109–136 (1996). 7. Blattner, F. R. et al. The complete genome sequence of Escherichia coli K-12. Science 277, 1453–1462 (1997). 8. Goffeau, A. et al. Life with 6000 genes. Science 274, 546 (1996). 9. Huber, R. et al. Aquifex pyrophilus gen. nov. sp. nov. represents a novel group of marine hyperthermophilic hydrogen oxidizing bacteria. Arch. Micrtobiol. 15, 340–351 (1992). 10. Reysenbach, L., Wickham, G. S. & Pace, N. R. Phylogenetic analysis of the hyperthermophilic pink filament community in Octopus Spring, Yellowstone National Park. Appl. Environ. Microbiol. 60, 2113–2119 (1994). 11. Setchell, W. A. The upper temperature limits of life. Science 17, 934–937 (1903). 12. Brock, T. D. The road to Yellowstone—and beyond. Annu. Rev. Microbiol. 49, 1–28 (1995). 13. Burggraf, S., Olsen, G. J., Stetter, K. O. & Woese, C. R. A phylogenetic analysis of Aquifex pyrophilus. Syst. Appl. Microbiol. 15, 353–356 (1992). 14. Pitulle, C. et al. Phylogenetic position of the genus Hydrogenobacter. Int. J. Syst. Bacteriol. 44, 620–626 (1994). 15. Baldauf, S. L., Palmer, J. D. & Doolittle, W. F. The root of the universal tree and the origin of eukaryotes based on elongation factor phylogeny. Proc. Natl Acad. Sci. USA 93, 7749–7754 (1996). 16. Klenk, H.-P., Palm, P. & Zillig, W. in Molecular Biology of the Archaea (eds Pfeifer, F., Palm, P. & Scleeifer, K. H.) 139–147 (Vch Pub, 1994). 17. Bocchetta, M. et al. Arrangement and nucleotide sequence of the gene (fus) encoding elongation factor G (EF-G) from the hyperthermophilic bacterium Aquifex pyrophilus: phylogenetic depth of hyperthermophilic bacteria inferred from analysis of the EF-G/fus sequences. J. Mol. Evol. 41, 803–812 (1995). 18. Wetmur, J. G. et al. Cloning, sequencing, and expression of RecA proteins from three distantly related thermophilic eubacteria. J. Biol. Chem. 269, 25928–25935 (1994). 19. Kawasumi, T., Igarashi, Y., Kodama, T. & Minoda, Y. Hydrogenobacter thermophilus gen. nov., sp. nov.
Nature © Macmillan Publishers Ltd 1998
357
articles an extremely thermophilic, aerobic, hydrogen-oxidizing bacterium. Int. J. Syst. Bacteriol. 34, 5–10 (1984). 20. Kristjannson, J., Ingason, A. & Alfredsson, G. A. Isolation of thermophilic obligately autotrophic hydrogen-oxidizing bacteria, similar to Hydrogenobacter thermophilus, from Icelandic hotsprings. Arch. Microbiol. 140, 321–325 (1985). 21. Kryukov, V. R., Savel’eva, N. D. & Pusheva, M. A. Calderobacterium hydrogenophilum gen. nov., sp. nov. an extreme thermophilic bacterium and its hydrogenase activity. Microbiology (Engl. Trans. Mikrobiologiya) 52, 611–618 (1983). 22. Riley, M. Functions of the gene products of Escherichia coli. Microbiol. Rev. 57, 862–952 (1993). 23. Weisburg, W. G., Giovannoni, S. J. & Woese, C. R. The Deinococcus-Thermus phylum and the effect of rRNA composition on phylogenetic tree construction. Syst. Appl. Microbiol. 11, 128–134 (1989). 24. Beh, M., Strauss, G., Huber, R., Stetter, K. O. & Fuchs, G. Enzymes of the reductive citric acid cycle in the autotrophic eubacterium Aquifex pyrophilus and in the archaebacterium Thermoproteus neutrophilus. Arch. Microbiol. 160, 306–311 (1993). 25. Fuchs, G. in Autotrophic Bacteria (eds Schegel, H. G & Bowein, B.) 365–382 (Springer, New York, 1987). 26. Mai, X. & Adams, M. W. Characterization of a fourth type of 2-keto acid-oxidizing enzyme from a hyperthermophilic archaeon: 2-ketoglutarate ferredoxin oxidoreductase from Thermococcus litoralis. J. Bacteriol. 178, 5890–5896 (1996). 27. Lim, J. H. et al. Cloning and expression of superoxide dismutase from Aquifex pyrophilus, a hyperthermophilic bacterium. FEBS Lett. 406, 142–146 (1997). 28. Bourret, R. B., Borkovich, K. A. & Simon, M. I. Signal transduction pathways involving protein phosphorylation in prokaryotes. Annu. Rev. Biochem. 60, 401–441 (1991). 29. Rudolph, J., Tolliday, N., Schmitt, C., Schuster, S. C. & Oesterhelt, D. Phosphorylation in halobacterial signal transduction. EMBO J. 14, 4249–4257 (1995). 30. Jarrell, K. F., Bayley, D. P. & Kostyukova, A. S. The archaeal flagellum: a unique motility structure. J. Bacteriol. 178, 5057–5064 (1996). 31. Welch, M., Oosawa, K., Aizawa, S. I. & Eisenbach, M. Effects of phosphorylation, Mg2+, and conformation of the chemotaxis protein CheY on its binding to the flagellar switch protein FliM. Biochemistry 33, 10470–10467 (1994). 32. Sockett, H., Yamaguchi, S., Kihara, M., Irikura, V. M. & Macnab, R. M. Molecular analysis of the flagellar switch protein FliM of Salmonella typhimurium. J. Bacteriol. 174, 793–806 (1992). 33. Motoshima, H. et al. Molecular cloning and nucleotide sequence of the aminopeptidase T gene of Thermus aquaticus YT-1 and its high-level expression in Escherichia coli. Agric. Biol. Chem. 54, 2385– 2392 (1990). 34. Curnow, A. W. et al. Glu-tRNAGln amidotransferase: a novel heterotrimeric enzyme required for correct decoding of glutamine codons during translation. Proc. Natl Acad. Sci. USA 94, 11819–11826 (1997). 35. Curnow, A. W., Ibba, M. & So¨ll, D. tRNA-dependent asparagine formation. Nature 382, 589–590 (1996).
358
36. Tu, G. F., Reid, G. E., Zhang, J. G., Moritz, R. L. & Simpson, R. J. C-terminal extension of truncated proteins in Escherichia coli with a 10Sa decapeptide. J. Biol. Chem. 270, 9322–9326 (1995). 37. Bo¨hm, G. & Jaenicke, R. Relevance of sequence statistics for the properties of extremophilic proteins. Int. J. Pept. Protein Res. 43, 97–106 (1994). 38. Choi, I.-G. et al. Random sequence analysis of genomic DNA of a hyperthermophile: Aquifex pyrophilus. Extremophiles 1, 125–134 (1997). 39. Lander, E. S. & Waterman, M. S. Genomic mapping by fingerprinting random clones: a mathematical analysis. Genomics 2, 231–239 (1988). 40. Short, J. M., Fernandez, J. M., Sorge, J. A. & Huse, W. D. Lambda ZAP: a bacteriophage lambda expression vector with in vivo excision properties. Nucleic Acids Res. 16, 7583–7600 (1988). 41. Alting-Mees, M. A. & Short, J. M. pBluescript II: gene mapping vectors. Nucleic Acids Res. 17, 9494 (1989). 42. Shizuya, H. et al. Cloning and stable maintenance of 300-kilobase-pair fragments of human DNA in Escherichia coli using an F-factor-based vector. Proc. Natl Acad. Sci. USA 89, 8794–8797 (1992). 43. Kim, U.-J., Shizuya, H., de Jong, P. J., Birren, B. & Simon, M. I. Stable propagation of cosmid sized human DNA inserts in an F factor based vector. Nucleic Acids Res. 20, 1083–1085 (1992). 44. Altschul, S. F., Fish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990). 45. Pearson, W. R. & Lipman, D. J. Improved tools for biological sequence comparison. Proc. Natl Acad. Sci. USA 85, 2444–2448 (1988). 46. Gaasterland, T. & Sensen, C. W. MAGPIE: automated genome interpretation. Trends Genet. 12, 76–78 (1996). 47. Gaasterland, T. & Sensen, C. W. Fully automated genome analysis that reflects user needs and preferences. A detailed introduction to the MAGPIE system architecture. Biochimie 78, 302–310 (1996). 48. Gish, W. & States, D. J. Identification of protein coding regions by database similarity search. Nature Genet. 3, 266–272 (1993). 49. Lowe, T. M. & Eddy, S. R. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 25, 955–964 (1997). Acknowledgements. This work was supported in part by Department of Energy Microbial Genome Program grants (to R.V.S., C. R. Woese and G.J.O.). We thank C. Woese for his cooperation in the analysis of the genome and interest in the project; K. Stetter for continuing interest; G. Frey, J. Holaska, S. Peralta, D. Hafenbrandl, S. Delk, T. Robinson, and J. Arnett for technical assistance; and D. Robertson, J. Stein, I. Sanyal, T. Richardson, G. Hauska, and K. Williams for discussions. Correspondence should be addressed to R.V.S. (e-mail:
[email protected]). Requests for Aquifex aeolicus should be addressed to R.H. (e-mail:
[email protected]). The sequences have been deposited with GenBank and assigned accession numbers AE000657 (chromosome) and AE000667 (extrachromosomal element).
Nature © Macmillan Publishers Ltd 1998
NATURE | VOL 392 | 26 MARCH 1998
8
1
439
240
657 topA
866 867a nuoL2 rpsU
600001
1473
1372 argH
1707 ribC
1200001
1708 pfkA
1917
1350001
1919 era2
2129 sbf
1500001
2023 guaB
1425001
2027
2028
2131 fmt
1920 fliP
1277
980000
1379 nuoL3
766
768
1182 fliF
1281 murA
1085
985
1382 nuoM2
1283 hspC
1183 1184 flgC flgB
986 987
1355000
1816 ksgA
1505000
2032 infB
1430000
1818 purU
2035
2141 glyS
2036
881
1185
2038 atpD
1930a rpmG
992 trpS
1186
1489 trmD
1060000
1825
1360000
2042 rplI
2142 ppsA
1510000
2041 atpG1
1435000
1093
994 pncA
2145
2044
1829
rRNA
782 hisD
615000
840000
999 fadD
1096 mtfA
765000
690000
890
1725 lepA
1622 sucD2
1939 rpoB
1293 alaS
1499 sodA
1394 lig
1727
1440000
1365000
1836 purL
1290000
1215000
2147 pal
1629 nfo
1398 leuD
1295
2151 pilU
tRNA
Translation Replication and Repair Transcription Unknown Uncategorized
2147a 2150 gatC recA
2049 2051 flgG1
788
695000
2153 2154 cmk pgsA
920000
995000
2157
1843
2054
2158 pcnB2
Gln
1507 omt
1305 pyrDB
1737 cfa
1638 lplA
1508
1403
700000
905 hksP3
625000
796
550000
1300000
1739 lytB
1225000
1150000
1509 oppC
1075000
1404
1000000
2159
1375000
2160 2163 2164 abcT9 uraP
1525000
2057 helX
1450000
1953 1954 1955 rplS rnhB
2056
1952
373
1210
1110
2165
5 kb
1211
1111
2166 scbA
1958
2060
702 merR
480000
805
705000
780000
855000
1517 pycB
1217 minD1
1117 ntrC1
1155000
1080000
1410 trpB2
1005000
1119
1019 hypE
916 dapB
808
705 truB
1412
1220
1121
1022
1526
1530000
2066 2067
1856 uvrB
2175
2068 argB Pro
2170 2171 2172 21732174 ihfB
2064 gltD
1963 purH
1747 murE
ece001
ece003ece004 ece005
Extrachromosomal Element:
2063
1455000
1380000
1855 dnaX
1305000
5000
ece007
1858
2071
635000
935000
ece009 int
1531 abcT10
1420
Met
1026 gyrB
922
815 dfp
1224
1533
2075 murD
1968 hisIE
ece010
ece011
2186 sqr
1226
490000
940000
1229 mffT2
865000
1390000
1315000
1761
1240000
1539 fliN
1033 selB
2190 coxB
ece017
1540000
2082
1465000
1869
ece018
2191 coxA1
2084 hisC
1543 rfaC1
420000
345000
397
495000
720000
933
645000
570000
718 mpg
1138 rpiB
1331 czcB2 1020000
945000
1234 dmsA
870000
1137
795000
935
200
1765 tktA
ece019
2192 coxA2
1095000
1395000
1983 imp2
ece020
2196
621
1332
1669
1873
2197 2199 mopB
2093 dsbC
200000
275000
1046 fdoH
1141
406
834 flgG2
1142
800000
875000
20000
1400000
ece025
2204 ymxG
1550000
2098
1475000
2203
1772 envA
1558
1551335
1677 aspS
1105000
1180000
1563 abcT13
1996
ece026 ece027 ece028 ece029 ece031
2104 acs
1998 fliC
957
ece032
2106 ssf
2000
1891
510000
660000
847
585000
1894 1410000
ece035
2107
1485000
1257 metG
1156 hoxX
1065 gap
1453
ece037
2109 hemB
2009
35000
965000
1259
890000
815000
740000
1358 cydB
1190000
1459 npr
1359
1695
540
858 ahpC2
857 purN
1462
1360 murC
1263
1070
2115 rfaG
1362 adh1
1908 hflX
1907 umpS
2117 2118 2119 purC thiL
2015 hemG
1796
16S1
1120000
1365
1267
752
1420000
1909
2120 hlyC
1910
2122 abcT11
1495000
1799 rhdA
2018 pstC
23S1
1345000
1798 cphA1
1270000
2016 pstS
1797
437 nuoG
5S1
2124 hemN
2019 2020 pstA
1800 1802
1367 spsI
1595 fliI
1470 accC2
2021 tldD
1912
75000
150000
438
525000
675000
600000
975000
1271 mpp
900000
1173
825000
1080
1125000
1350000
1806 1807
1275000
1706 1707 ribC
1200000
1598
2128
1500000
2023 guaB
1425000
1916 1917 trxA1
1705 galF
1596
1472 dnaB
1050000
1368 1369 tagD2
1270
1172 carB
1079
750000
974 975 cphA2 bioB
866 nuoL2
375000
450000
756 757
657 topA
300000
344 rfaD
225000
238 239 sodC2 folE
118 pgk
551 nuoD1
755
2126 uvrC
1803 soxB
1704
1366 thiE1
1268 hvsT Ser
Leu
973 secD
754
655 frdB2
549 trpG
343 arsA
236 guaA
116
342
1078
653 fliG
548 dieN
341
1169 1171a forG2 fdx2
1076 rhdA
972
863
1168 forB2
1075
1468 folP
970000 Met
1045000
1363 1364 accB efp
1265 cstA
895000
1167 forA2
820000
1073 czcD
750 pgi
652 nlpD1
547 dapE
436 ribD2
Gly
235 fccB’
114
1586 1587 1588 1590 1591 1593 atpF1 atpF2 atpH ndk shyS Ile Ala Arg 1195000
1585
1795
39456
1072
1166
745000
970 argJ
670000
862
1464 14651466 1467
1264
1163
1071 proA
520000
595000
749 truA
968 969 cbbE2 tmk
748 ispA
651
113
234 soxF
434 glpK
545
370000
445000
648
544 hpt
647
433 grpE
295000
340
220000
232 dhsU
430
338
70000
145000
112 amtB
542 bcpC Arg
429
337 cysQ
747 pilC1
967 ostA
855
1794
2014 flhB
1905 serA
1793
2013
ece040
1490000
1415000
1903 gcsP1
1340000
1792 ntrC2
1265000
2114 eif
2011
1069 galE
1159 topG1
1262
966
745 pilT
645
336
427 428
539
111 glnA
230 231 ntrC3 hksP4 Gly Cys
1579 1580 1581 1582 1583 iagB pyrF gidB
1115000
1458 gcvT
1040000
1068 cysS
854
742 purD
644
538 lpxD
425 thrC2
334 dcuP
106 108a 108b 109 fdx4 hfq glnB
Leu
1691 1693 1694 dplF
1791
ece039
665000
853
1357 cydA
1578 mutL
1900 1902
2113
515000
590000
965 mbhS2
1455 1457
1689
1790
1899 dmt
ece038
2110 acuC2
1898 folD
1789 mviB
1687
1258
1356
740
1157 hypD
365000
440000
534 draG
642
852 pdxA
963
1067 miaA
851 corA
331
227 aldH2 215000
290000
421 422 423 aspC3 metG’
327
104
65000
140000
328
103
226a rplO
533
640 thy
739 nifS2
961 hdrD
850
532
418 419 420 gspG
1570 1571 1574 1575
2007 rpsB
1896
1787 trpC
1569
1452 rpoS
101 hly3
326 kdtA
221 phpA
1351 1354 1355 phoH lipA
1155 hisS2
1062 emrB
1256
1450 htrA
2108
2004 2005
1893 ilvE
1335000
1260000
1684 alg
1185000
1110000
1449 mutT
1035000
849
737
638 lysR1
531
417 abcT6
325 murI
098 099 secG hemK
960 mbhL2
416
324
220 dut
097
736 lpdA
848
1350
960000
885000
529 oprC
635
1154 metK
810000
1060
735000
959 nadE
1784 aco
2002 fliS
ece034
30000
2001 fliD
360000
435000
323
218 nifA 210000
285000
415 napA1
1255 feoB
1349
1152 lysC
1447
1565 gltB
1782 mdh1
1682
1890 tsnR
1780 fumB
1446
845 mreB
958 pgsA
1345 1347 1348
1250
633 ligA
60000
135000
732 734 735 hisH rpsG2 rpsL1
1059
1253 1254
1057 1058 acrR1
955 lepB
844 spoT
632
322 dnaA
094 nrdA
527 528 moaC dedF
414
217 narB
321
731 ccdA
413 abcT7
525 ftsZ
1151
1889 nadA
1681 amiB
1777 infC
1888 sucD1
1480000
1997
1405000
1887 mesJ
1330000
1776
1255000
1249 cds
1056
953 pheS
1342 1343 gph
1445 ctrA2
1679 1680 fumX
1444
1030000
1340 tig
955000 Leu
1247
952 rplT Val
843
728 speC
523 ftsA
319 phoB
093 hth
729
215 nasA
091
316 hksP2
411 pcnB1
314
1248
1055 pstB 805000
880000
1148
1054
730000
951 pheA
655000
841 trm1
580000
727 ldhA
505000
629 xcpC
430000
522
355000
280000
313
205000
211 212 213 fhp cynS glnBi
2102 2103 prmA acs’
1994 era1
1886 sbcD
1774 glnE
25000
2101 carB
1339 clpP
1562
1244
1145 dhaT
950
55000
130000
086 087 088 090 modC
521 ddlA
1245 ilvC
837 ilvD
948 hemH
1053 nifS1
1993
1884
1560 mglA1
410 carA
311
209 trpD2
085 kdsA
520 murB1
724 ctrA1
1337 clpX
1442
1144 pabB
1052 gcsH1
1991 htpX
1882 dnaN
1990 pgmA
836
628
519 rfe
207 cobA
083
409 pyrB
946 rnc Sel
627
1242 mutS2
1336
1773
2099 pepA
408
518 spsK
945 glyQ
1143 dapA
1672 clpB
1556 cysM
1335 nse
1879 argC
1325000
1771
1250000
1175000
1989
2095 2096 dksAfloX
2200 mopA
2094 aspC2
Pro
1671 hslV
1100000
1441 oppA
1875 1877 1878 rplM rpsI
1770 leuS’
1670
950000
1025000
1554
1334 pyrG
082
308 mutS1
835 nox
1049 1050 1051 fdoI sodC1 fdhE
725000
723 malM
626 bioF
516 mtfC
407
206 nirB
081 aroC
943 944 gcsH
650000
833 flgA
575000
722 glgB
500000
425000
515 mtfB
350000
404
624 mrcA
832 surE
1548 1549 1550 1552 trpA cycB2 fabI
1769 dld1
204
305 ileS
50000
125000
079 secY
1237 1238 1239 1240 1241 cysG adh2
1986 1987 1988 tolQ
1668
940 leuC
1140 argG
1439
403
078 kad
831 exsB
721 glgA
622
1333
1039 fdoG
1236
1139 ftsW
936 ftsH
16S2
720
1547 ppa
1985
401 402
203 atpG2
ece021ece022 ece023 ece024
2195 bacA
1545000
2090 leuA2
1470000
1982
1871 pmbA
1767 1768 rpsT 1320000
2194
1546 kpsF
1667 thrS His
1170000
1245000
1545
202
303
510 511 512 murB2
400 hdrB
302
620 abcT2
Ala Ile
619 yfeA
509
398 hdrC
301 glmS
199
072 073 074 075a 076 rpsD rpsKrpsM infA map
1429 1432 1433 1434 1435 1436 metF ppdD1fimZ ppdD2ppdD3
1330 gltP
1134 proB
2085 2086 2087 2088
15000
195000
270000
1036 1037 1038 lysR2
1980
1870 secA
197
23S2
932 dnaQ
1666
1428
1329 moeB
1232 dmsB
1665 mdh2 Lys Arg
1542
1763
1664 accC1
1540 gcpE
1426 1427 lpxB
1231 dmsC
1328
1035
507
395 hdrA
Ser 931
1133
5S2
929 napA2
821 murF
717 glgP
613 deaD
506
394
298 serS
070 rpoA
45000
120000
067 069 dmsB rplQ
615 616 618 pfpI
196 trpD1
066
297 abcT8
194
392
296
1132 czcB1
928
1424 hemF
1866 asd
1663 flgL
1165000
1536 aroA
1090000
1423 pdxJ
1015000
820
1230
1031 selA
926
716
1130 sufI
790000
715000
924 rnpH
640000
819 lnt
565000
609
505 otnA
391
192 hslU
1973 1974 1975 1977 1978 1979 masA fucA2 panB
ece013 ece015
2189
2080 sppA
1971 tapB
1863 kch
1662 flgK
1535 pepQ
1422 dpbF
1760
2188 coxC
10000
2077 snf
1969 aspC1
1862
1757 1758 exbB
2076 trpF
1861
1755 tyrA
1661 spoU
1421
1030 selD
294
389 390 340000
415000
503 504 otnA’
608 thrC1
11271129
186 aldH1
293
190000
265000
1323 1324 1326 1327 mobB
1029
715 tsf
818
923 argS
713 pyrH
388
291
40000
115000
062 063 06464a 64b 64c 065 speE rpsFssb rpsR
185 tagD1
061 mog
501 pmu
184
059
607 glmU
1534
1125 ctrA3
816 gsa
712 frr
1322 nuoN1
1657 1658 1659 1660 gcsH2 fucA1 bioW
1530
500 trxB
604 lpxA
387
058
288
183
710 711 nucI
603
499
1321 nuoM1
2179 2181 2183 2185 acrR2 moaE moaA2
1535000
2073
182
286 287 smb
1222 1223 abcT1
1860 fliL 1385000
1460000
1967 polA
1859 flgE
1310000
1754
1235000
1656
1160000
1528
1085000
1010000
1418 1419 furR2
1122 acrD2
785000
860000
1320 nuoL1
920 ftsY
1024
710000
1221 gltX
1023 acuC1
708 hyuA
601
498 gnd
386
284
181 hisF
055 056 057 hemX1 fabZ sfsA
813 814 acpS
560000
485000
600
410000
1752 1753 nlpD2
1965 1966
2177 2178 aroK thiG
2069 obg
1964
1857
1751 tyrS
335000
385
497 gsdA
812
1527
1413 valS
rplO rpmD rpsE rplR rplF rpsH rpsN rplE rplX rplN Ser 1230000
1744 gpmA
1961 1962 fliR fliQ
1853 1854
1
1525
185000
260000
918 919 919a fdx1
811
707
599 rpoN
1315 1317 1318 1319 nuoH1 nuoI1 nuoJ1 nuoK1
1120
1021 hypA
917
809
706 trpB1
598
384
496 mutY3
054
35000
110000
177 178 179 atpB atpE
053 mraY
282 283 281 acrR3 mutY1
383
175
052 ubiA
494 495
382
278
597 purB
493
381
1218 1219 fliA
1520 pycA
1314 nuoD2
930000
915
806 pyrC
703 dnaJ2
492
172 173 mutY2
10171018
914
630000
913
1214 flhF
1408 1409
594 frdA
555000
1015 col
1115 hksP1
911 ebs
490 fnr
330000
405000
277
380
171 180000
255000
050
tmRNA 1642 1644 1645 1648 1649 16511651a 1652 1653 1654 1655
1743
1641 cap
1515 1516
2168
701
30000
105000
046 049 pyrD phhB
804 mbhL3
592
1310 1312 nuoA2 nuoB
1212 flhA
1013
910 dnaC
1959 1960 thiD Val Glu
1742 lgtF
1640
1512 icd
1407 bcsA
1309 thrB
378
276
170 bioA
045 petA
486 488 ahpC1 tpx
377
802 mbhS3
1113
1010 1012 udh
1308 tgt
2058 2059 aas
Asp
1852
1740 1741
1639 glpC
1510
1956
Pro
699
Ile
274
376
591
484 eno
801 gcp
044 petB
168 169
906 908 909 phoU gmhA
797 prc
698 acrE
589 xanB
167
273 aspC4
374
166 proC
041 042 cyc
272
481 oppB
1405
1306 sucC1
925000
1208 lysA
850000
1109 gcsP2
775000
1008 dnaE
1108 gcdH
1206 1207 accA furR1
1106
475000
587 neaC
400000
479 glyA
325000
372
250000
165
040
271
175000
164 ntrC4
695 acrD1
270 lgt
25000
100000
039 hisB
1845 1847 1849 1850 1851 ilvH
1735 dnaJ1
1636 prs
1844
1950 aprV
1732
1634 gspA
1505 nrdF
793 rep
903 904
585
1303a 1303 cspC hisA
1402
1204 thiC
1105 purQ
901 aroE
162 folK
038 hibD
269
478
037
268
791 792 792a cycB1 rpmI
1300 omp
1104
1006
1401 celY
1103
476 477 panD
693
899 900
370
267
371
036
Cofactor Biosynthesis Cell Envelope Proteases Lipid metabolism Energy Metabolism Regulation Purines, Pyrimidines, Nucleotides and Nucleosides
2155 recJ
1520000
1445000
1370000
1842
1295000
1731
1220000
1632
1145000
1504 trk1
1070000
1400
1503
1840 prfB
2053 recG
1945 rpoC
1730 pheT
1630
770000
845000
1101 plsX
898
789
692 kpsU
582 trpE
474
035
156 157 158 159 apfA
265
369
1202 lysU
896 nifU
Arg 620000
1201
1296 clpC
Leu
1631
1099 fabH
1200 forG1
1502
1196 forB1
1097 abcT5
1501
895 ispB
545000
691 gidA2
470000
581
395000
1001 1002 1003 1005 motB2 motB1 motA cutA
893
689
578 579 def
320000
245000
264
170000
155
95000
032
20000
473
263 hemC
154 ctaB Ser
031 trnS
894 queA
577
786 acrD4
576 stpK
367
153 ctaA
469 acrD3
262
030 moeA1
1837 1838 1839 lsp dapF exbD
1625 1627 1628 pol
1140000
1496 1498 xylR
1065000
1392
2046 vacB 1515000 LysAla
1494 1495 ogt
915000
990000
1195 forA1
892 fabD
686 uvrA
574 575 nuoE
152 tlpA
028
260
150 gltA
365 proS
468 czcB2
891 ppx
685 arsC
465000
573 nuoF
540000
684
1292
1391
2146
467
315000
390000
259 nusA
148 deoC
027
1191 1192a 1192 1194 fdx3
1832 1834 rpsG1rpsL1
2045 folC
1724 degT
1620 sucC2
1493 dnaG
1290 purA
240000
257 ygcA
165000
147 cobW
90000
15000
363 364
780
888
1095 abcT3
996 dnaK
778
682 arsA
572
465
362
145 rfaC2
026
Central Intermediary Metabolism Amino Acid Biosynthesis Cellular Processes Transport Hypothetical
2144
2043 hemX2
1828
1723
1618
1827 alr
1490 rpoD
1388
1288 gspD
1189 pbpA1
1094 abcT4
568 deoD
461 gatB
1390 fba
777 nadB
886 topG2
142
025 rodA
254 255
360 timA
141 pkcI
679 atpA
1187 1188
993
1936 1937 rplJ rplL
1720 ffh
1285000
1719
1210000
1614 oadA
1135000
460
359
566 567 birA rimI
678
775
1385 1386 1387 nuoA1 arsR
985000
1287
910000
835000
1091 hylA
760000
676
139 ribF
024 nsd
252 253 kdtB
458 459 bcp
251
565 rfaE
358 dinG
1931 1933 1935 nusG rplK rplA
1824
1718 maf
1613 hisG
1485 rpsA
610000
685000
1286
990
1383 nuoN2
1285 pilC2
460000
535000
673 674 atpC
563
385000
455 sor
310000
235000
138 ribD1
023 argD
250
160000
136 cpx
85000
770 771 773 tly
454
1088
1822 1823 mglA2 Trp
1717 fabF
1612
1928 tufA2
1820 nfeD ThrTyrGlyThr
1716 1717a fabG acpP
1609 1610 modA radC
247 gatA
357a flgM
135 nueM
988
769 purM
672 hypF
Thr
10000
020 021 022 rpsQ aroD
356 leuA1
246
134
453
877 878 880 minD2 minC
1714 flgH
1925 hyuB
1713 flgI
1205000
1280000
1924
765
561 recN
355
245 purK
132 133 ribH nusB
1480 1481 1482 1483 1484 1484a himA
1055000
1130000
1606 1607 pabC dcd
1478 recR
1711 argF
2031
1923
451 ilvB
557 558 bioD thiE2
983
1279 hemA
905000
1180 sahH
830000
1084
755000
680000
876 prfA
605000
530000
1082 rfbD
764 iclR
1814 1815
1477
128
351 leuS
244 leuB
015 016a 017 018 rpsS rplV rpsC rplP
2132 2134 2135 2137 2138 2139 2140 panC abcT12
2030 napA3
1922
1812 thrA
1603
1476
1710 metE
1602 secF
1474 gspE
1807 1809 1810 1811 trxA2
1275001
455000
556 pbpA2
1178 purE
980 gyrA
873 rho
1276
1177 murG
1081 cysD
305000
380000
1373 1374 1375 1377 1378 nuoH2 nuoH3nuoI2 nuoJ2 nuoK2
1275 cafA
1176
979
127
155000
230000
447 448 449 450 mgtC
350 ribA
243
013 rplB
665 666 667 668 669 671 hoxZ hupE hupD hypB
763 genX
445 accD
555
662 mbhL1
871 thdF
761 gidA1
978
1175 purF
1273
1598 1599 1601 pilD
1125001
1472 dnaB
1369 1370 rlpA1 1050001 Asn
975001
977
869 nadC
1080
1271 1272 mpp
900001
1173 1174 rlpA2
825001
750001
975 bioB
675001
Leu
660 mbhS1
757 758 759
525001
554
443 gua
5000
80000
011 012 rplD rplW
123124a 125 126 rpsP
008 009 rpsJ rplC
346 348 pth ctc
122 hisS1
005 tufA1
242 lon
442
553 frdB1
345
121
440
001 fusA
551 552 nuoD1 sms 450001 Phe
375001
438
300001
344 rfaD
225001
239 folE
150001
118 119 pgk talC
75001
8
Nature © Macmillan Publishers Ltd 1998
Table 1 Aquifex aeolicus Open Reading Frame Identifications. Gene numbers (Aq) correspond to those in Fig.1. Percentages refer to the identity found in the best FASTA alignment. The percentage of the sequence covered by the alignment is displayed with bullets as follows 20–40% . , 40–60% . . , 60–80% . . . , 80–100% . . . . Amino Acid Biosynthesis Aromatic amino acids Aq1536 aroA Aq081 aroC Aq021 aroD Aq901 aroE Aq2177 aroK Aq951 pheA Aq1548 trpA Aq706 trpB1 Aq1410 trpB2 Aq1787 trpC Aq196 trpD1 Aq209 trpD2 Aq582 trpE Aq2076 trpF Aq549 trpG Aq1755 tyrA
5-enolpyruvylshikimate-3-phosphate synthetase chorismate synthase 3-dehydroquinate dehydratase shikimate 5-dehydrogenase shikimate kinase chorismate mutase/prephenate dehydratase tryptophan synthase alpha subunit tryptophan synthase beta subunit tryptophan synthase beta subunit indole-3-glycerol phosphate synthase phosphoribosylanthranilate transferase phosphoribosylanthranilate transferase anthranilate synthase component I phosphoribosyl anthranilate isomerase anthranilate synthase component II prephenate dehydrogenase
43.0% 55.2% 33.3% 46.1% 36.5% 44.0% 44.5% 68.0% 50.0% 43.3% 45.1% 24.9% 50.0% 45.6% 59.2% 36.1%
.... .... .... .... .... .... .... .... .... .... .... .... .... .... .... ....
Aspartate family Aq1866 asd Aq1969 aspC1 Aq2094 aspC2 Aq421 aspC3 Aq273 aspC4 Aq1143 dapA Aq916 dapB Aq547 dapE Aq1838 dapF Aq1208 lysA Aq1152 lysC Aq1710 metE Aq1812 thrA Aq1309 thrB Aq608 thrC1 Aq425 thrC2
aspartate-semialdehyde dehydrogenase aspartate aminotransferase aminotransferase (AspC family) aminotransferase (AspC family) aminotransferase (AspC family) dihydrodipicolinate synthase dihydrodipicolinate reductase succinyl-diaminopimelate desuccinylase diaminopimelate epimerase diaminopimelate decarboxylase aspartokinase tetrahydropteroyltriglutamate methyltransferase homoserine dehydrogenase homoserine kinase threonine synthase threonine synthase
54.6% 53.5% 55.4% 43.3% 48.5% 53.1% 44.2% 25.8% 35.5% 47.4% 52.2% 45.9% 40.4% 38.3% 64.3% 61.9%
.... .... .... .... .... .... .... .... .... .... .... .... .... .... .... ....
Branched-chain family Aq451 ilvB Aq1245 ilvC Aq837 ilvD Aq1893 ilvE Aq1851 ilvH Aq356 leuA1 Aq2090 leuA2 Aq244 leuB Aq940 leuC Aq1398 leuD
acetolactate synthase large subunit acetohydroxy acid isomeroreductase dihydroxyacid dehydratase branched-chain amino acid aminotransferase acetolactate synthase 2-isopropylmalate synthase 2-isopropylmalate synthase 3-isopropylmalate dehydrogenase large subunit of isopropylmalate isomerase 3-isopropylmalate dehydratase
53.1% 64.3% 58.0% 40.3% 53.2% 52.1% 49.9% 58.7% 52.3% 56.6%
.... .... .... .... .... .... .... .... .... ...
Glutamate family Aq2068 argB Aq1879 argC Aq023 argD Aq1711 argF Aq1140 argG Aq1372 argH Aq970 argJ Aq111 glnA Aq109 glnB Aq1774 glnE Aq1565 gltB Aq2064 gltD Aq1071 proA Aq1134 proB Aq166 proC
acetylglutamate kinase N-Acetyl-gamma-glutamylphosphate reductase N-acetylornithine aminotransferase ornithine carbamoyltransferase argininosuccinate synthase argininosuccinate lyase glutamate N-acetyltransferase glutamine synthetase nitrogen regulatory PII protein glutamate ammonia ligase adenylyl-transferase glutamate synthase large subunit glutamate synthase small subunit gltD gamma-glutamyl phosphate reductase glutamate 5-kinase pyrroline carboxylate reductase
54.2% 40.6% 49.5% 46.2% 54.9% 46.4% 39.8% 57.6% 73.2% 28.4% 44.3% 37.7% 47.9% 43.2% 35.1%
.... .... .... .... .... .... .... .... .... .... .... .... .... .... ....
phosphoribosylformimino-5-aminoimidazole carboxamide ribotide isomerase imidazoleglycerolphosphate dehydratase histidinol-phosphate aminotransferase histidinol dehydrogenase HisF (cyclase) ATP phosphoribosyltransferase amidotransferase HisH phosphoribosyl-ATP pyrophosphohydrolase
40.9% 46.4% 33.7% 49.9% 59.9% 40.3% 47.7% 43.8%
.... .... .... .... .... .... .... ....
L-seryl-tRNA(ser) selenium transferase selenophosphate synthase
42.7% 37.7%
.... ....
cysteine synthase, O-acetylserine (thiol) lyase B serine hydroxymethyl transferase D-3-phosphoglycerate dehydrogenase
45.8% 62.7% 44.1%
.... .... ....
Cell Envelope Pili and fimbrae Aq1433 fimZ Aq1432 ppdD1 Aq1434 ppdD2 Aq1435 ppdD3
minor pilin pilin pilin pilin
34.9% 40.6% 26.4% 28.2%
.. .... .... ...
Lipoproteins and porins Aq270 lgt Aq819 lnt Aq652 nlpD1 Aq1753 nlpD2 Aq529 oprC Aq2147 pal Aq1370 rlpA1 Aq1174 rlpA2 Aq2166 scbA Aq619 yfeA
prolipoprotein diacylglyceryl transferase apolipoprotein N-acyltransferase lipoprotein lipoprotein NlpD fragment outer membrane protein c peptidoglycan associated lipoprotein rare lipoprotein A rare lipoprotein A adhesion protein adhesion B precursor
30.1% 25.5% 25.4% 43.2% 27.2% 35.1% 61.1% 40.6% 25.7% 28.5%
.... .... .... .... .... .. .. ... .... ....
alanine racemase N-acetylmuramoyl-L-alanine amidase undecaprenol kinase beta lactamase precursor beta lactamase precursor D-alanine:D-alanine ligase glucosamine-fructose-6-phosphate aminotransferase UDP-N-acetylglucosamine pyrophosphorylase phospho-N-acetylmuramoyl-pentapeptidetransferase penicillin binding protein 1A UDP-N-acetylglucosamine
.... .... .... ... ... .... 43.2% .... 37.6% ... 47.5% .... 33.2% ....
Histidine Aq1303
hisA
Aq039 Aq2084 Aq782 Aq181 Aq1613 Aq732 Aq1968
hisB hisC hisD hisF hisG hisH hisIE
Selenocysteine Aq1031 selA Aq1030 selD Serine family Aq1556
cysM
Aq479 Aq1905
glyA serA
Peptidoglycan Aq1827 Aq1681 Aq2195 Aq1798 Aq974 Aq521 Aq301
alr amiB bacA cphA1 cphA2 ddlA glmS
Aq607 Aq053
glmU mraY
Aq624 Aq1281
mrcA murA
33.2% 31.0% 43.1% 25.0% 29.4% 38.2%
Aq520 Aq511 Aq1360 Aq2075
murB1 murB2 murC murD
Aq1747 Aq821 Aq1177
murE murF murG
Aq325 Aq1189 Aq556 Aq185 Aq1368
murI pbpA1 pbpA2 tagD1 tagD2
1-carboxyvinyltransferase UDP-N-acetylenolpyruvoylglucosamine reductase UDP-N-acetylenolpyruvoylglucosamine reductase UDP-N-acetylmuramate-alanine ligase UDP-N-acetylmuramoylalanine-D-glutamate ligase UDP-MurNac-tripeptide synthetase UDP-MURNAC-pentapeptide sythetase phospho-N-acetylmuramoyl-pentapeptidetransferase glutamate racemase penicillin binding protein 2 penicillin binding protein 2 glycerol-3-phosphate cytidyltransferase glycerol-3-phosphate cytidyltransferase
.... .... .... .... 29.3% .... 42.9% .... 32.3% .... 30.5% .... 43.4% .... 32.2% .... 30.3% .... 52.0% .... 67.2% ...
45.7% 35.6% 38.9% 46.1%
Surface polysaccharides and lipopolysaccharides Aq1684 alg alginate synthesis-related protein Aq1641 cap capsular polysaccharide biosynthesis protein Aq1899 dmt dolichol-phosphate mannosyltransferase Aq1772 envA UDP-3-0-acyl N-acetylglcosamine deacetylase Aq1757 exbB biopolymer transport exbB Aq1839 exbD biopolymer transport ExbD Aq1069 galE UDP-glucose-4-epimerase Aq1705 galF UDP-glucose pyrophosphorylase Aq908 gmhA phosphoheptose isomerase Aq085 kdsA 3-deoxy-d-manno-octulosonic acid 8-phosphate synthase Aq326 kdtA 3-deoxy-D-manno-2-octulosonic acid transferase Aq253 kdtB lipopolysaccharide core biosynthesis protein Aq1546 kpsF polysialic acid capsule expression protein Aq692 kpsU 3-deoxy-manno-octulosonate cytidylyltransferase Aq1742 lgtF beta 1,4 glucosyltransferase Aq604 lpxA acyl-[acyl-carrier-protein]-UDP-Nacetylglucosamine acyltransferase Aq1427 lpxB lipid A disaccharide synthetase Aq538 lpxD UDP-3-O-[3-hydroxymyristoyl] glucosamine N acyltransferase Aq718 mpg mannose-1-phosphate guanyltransferase Aq1096 mtfA mannosyltransferase A Aq515 mtfB mannosyltransferase B Aq516 mtfC mannosyltransferase C Aq1335 nse nucleotide sugar epimerase Aq505 otnA polysaccharide biosynthesis protein Aq504 otnA’ polysaccharide biosynthesis protein (fragment) Aq1543 rfaC1 ADP-heptose:LPS heptosyltransferase Aq145 rfaC2 ADP-heptose:LPS heptosyltransferase Aq344 rfaD ADP-L-glycero-D-manno-heptose-6-epimerase Aq565 rfaE ADP-heptose synthase Aq2115 rfaG glucosyl transferase I Aq1082 rfbD GDP-D-mannose dehydratase Aq519 rfe undecaprenyl-phosphate-alphaN-acetylglucosaminyltransferase Aq1367 spsI glucose-1-phosphate thymidylyltransferase Aq518 spsK spore coat polysaccharide biosynthesis protein SpsK Aq589 xanB mannose-6-phosphate isomerase/mannose-1phosphate guanyl transferase
.. .... .... .... .... .... .... .... ... 52.0% .... 28.9% .... 46.5% .... 45.9% .... 41.3% .... 35.2% .... 47.7% .... 31.6% .... 43.3% .... 34.1% .... 34.3% ... 29.0% .... 35.9% .... 45.8% .... 26.9% .... 37.8% .. 30.7% .... 28.1% .... 39.6% .... 44.0% .... 27.1% .... 53.2% .... 24.8% .... 30.4% .. 49.5% ... 40.9% ....
Cellular Processes Cell division Aq698 acrE Aq1275 cafA Aq523 ftsA Aq936 ftsH Aq1139 ftsW Aq920 ftsY Aq525 ftsZ Aq761 gidA1 Aq691 gidA2 Aq1582 gidB Aq1718 maf Aq1887 mesJ Aq878 minC Aq1217 minD1 Aq877 minD2 Aq845 mreB Aq025 rodA Aq1130 sufI
acriflavin resistance protein AcrE cytoplasmic axial filament protein cell division protein FtsA cell division protein FtsH cell division protein FtsW cell division protein FtsY cell division protein FtsZ glucose inhibited division protein A glucose inhibited division protein A glucose inhibited division protein B MAF protein cell cycle protein MesJ septum site-determining protein MinC septum site-determining protein MinD septum site-determining protein MinD rod shape determining protein MreB rod shape determining protein RodA periplasmic cell division protein (SufI)
24.8% 28.5% 31.9% 51.1% 30.8% 35.2% 48.6% 50.2% 57.5% 39.4% 44.9% 27.7% 39.4% 33.1% 54.5% 57.4% 37.6% 28.1%
.... .... .... .... ... ... .... .... .... .... .... .... .. .... .... .... .... ....
Chaperones Aq154 Aq1735 Aq703 Aq996 Aq433 Aq192 Aq1283 Aq1991 Aq2200 Aq2199
ctaB dnaJ1 dnaJ2 dnaK grpE hslU hspC htpX mopA mopB
cytochrome c oxidase assembly factor chaperone DnaJ chaperone DnaJ Hsp70 chaperone DnaK heat shock protein GrpE chaperone HslU small heat shock protein (class I) heat shock protein X GroEL GroES
38.8% 41.3% 45.1% 59.1% 38.8% 57.5% 31.0% 51.1% 64.4% 56.2%
.... .... .... .... .... .... .... .... .... ...
Detoxification Aq486 Aq858 Aq685 Aq136 Aq1005 Aq1499 Aq1050 Aq238 Aq488
ahpC1 ahpC2 arsC cpx cutA sodA sodC1 sodC2 tpx
alkyl hydroperoxide reductase alkyl hydroperoxide reductase arsenate reductase cytochrome c peroxidase periplasmic divalent cation tolerance protein superoxide dismutase (Fe/Mn family) superoxide dismutase (Cu/Zn) superoxide dismutase (Cu/Zn) thiol peroxidase
49.2% 53.4% 50.0% 48.9% 47.0% 34.2% 39.5% 39.2% 39.5%
.... .... .... .... .... .... .... .... ....
Motility Aq833 Aq1184 Aq1183 Aq1859 Aq2051 Aq834 Aq1714 Aq1713 Aq1662 Aq1663 Aq1212 Aq2014 Aq1214 Aq1998
flgA flgB flgC flgE flgG1 flgG2 flgH flgI flgK flgL flhA flhB flhF fliC
flagellar protein FlgA flagellar basal body rod protein FlgB flagellar biosynthesis FlgC flagellar hook protein FlgE flagellar hook basal-body protein FlgG flagellar hook basal-body protein FlgG flagellar L-ring protein FlgH flagellar P-ring protein FlgI flagellar hook associated protein FlgK flagellar hook associated protein FlgL flagellar export protein flagellar biosynthetic protein FlhB flagellar biosynthesis FlhF flagellin
39.4% 30.8% 32.8% 50.4% 31.9% 46.9% 21.9% 27.1% 44.0% 39.8% 28.7% 59.4%
.... .... .... .... .... .... .... .. .... .... .... ....
Nature © Macmillan Publishers Ltd 1998
37.2% 30.8% 40.2% 36.5% 48.2% 34.7% 54.7% 47.2% 63.4%
8
Aq2001 Aq1182 Aq653 Aq1595 Aq1860 Aq1539 Aq1920 Aq1962 Aq1961 Aq2002 Aq1003 Aq1002 Aq1001
fliD fliF fliG fliI fliL fliN fliP fliQ fliR fliS motA motB1 motB2
flagellar hook associated protein FliD Flagellar M-ring protein flagellar switch protein FliG flagellar export protein flagellar biosynthesis FliL flagellar switch protein FliN flagellar biosynthetic protein FliP flagellar biosynthesis protein FliQ flagellar biosynthetic protein FliR flagellar protein FliS flagellar motor protein MotA flagellar motor protein MotB flagellar motor protein MotB-like
24.3% 32.0% 35.9% 44.6% 30.6% 42.9% 47.7% 45.5% 29.7% 30.8% 35.0% 36.8% 27.5%
.. .... .... .... .... ... .... .... .... .... .... .... ....
Secretion Aq1720 Aq1288 Aq1474 Aq418 Aq955 Aq1837 Aq1271 Aq747 Aq1285 Aq1601 Aq745 Aq2151 Aq1870 Aq973 Aq1602 Aq079 Aq2080 Aq1971 Aq1340
ffh gspD gspE gspG lepB lsp mpp pilC1 pilC2 pilD pilT pilU secA secD secF secY sppA tapB tig
signal recognition particle receptor protein general secretion pathway protein D general secretion pathway protein E general secretion pathway protein G type-I signal peptidase lipoprotein signal peptidase processing protease fimbrial assembly protein PilC fimbrial assembly protein PilC type 4 prepilin peptidase twitching motility protein PilT twitching mobility protein preprotein translocase SecA subunit protein export membrane protein SecD protein-export membrane protein preprotein translocase SecY proteinase IV type IV pilus assembly protein TapB trigger factor
49.1% 27.5% 48.8% 50.7% 33.9% 37.4% 28.7% 37.4% 28.9% 34.8% 51.4% 41.6% 44.9% 36.0% 41.4% 44.2% 43.4% 42.2% 27.4%
.... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... ....
Central Intermediary Metabolism One-carbon metabolism Aq1429 metF 5,10-methylenetetrahydrofolate reductase Aq1154 metK S-adenosylmethionine synthetase Aq1180 sahH S-adenosylhomocysteine hydrolase
43.3% 49.2% 60.9%
.... .... ....
Cytoplasmic polysaccharides Aq1407 bcsA Aq1401 celY Aq721 glgA Aq722 glgB Aq717 glgP Aq723 malM
cellulose synthase catalytic subunit endoglucanase fragment glycogen synthase 1,4-alpha-glucan branching enzyme glycogen phosphorylase 4-alpha-glucanotransferase (amylomaltase)
39.5% 33.0% 38.1% 56.5% 37.0% 43.4%
.... .. .... .... .... ....
Tri-carboxylic acid cycle Aq1784 aco Aq1195 forA1 Aq1167 forA2 Aq1196 forB1 Aq1168 forB2 Aq1200 forG1 Aq1169 forG2 Aq594 frdA Aq553 frdB1 Aq655 frdB2 Aq1780 fumB Aq1679 fumX Aq150 gltA Aq1512 icd Aq1782 mdh1 Aq1665 mdh2 Aq1614 oadA Aq1306 sucC1 Aq1620 sucC2 Aq1888 sucD1 Aq1622 sucD2
aconitase ferredoxin oxidoreductase alpha subunit ferredoxin oxidoreductase alpha subunit ferredoxin oxidoreductase beta subunit ferredoxin oxidoreductase beta subunit ferredoxin oxidoreductase gamma subunit ferredoxin oxidoreductase gamma subunit fumarate reductase flavoprotein subunit reductase iron-sulfur subunit fumarate reductase iron-sulfur subunit fumarate hydratase (fumarase) C-terminal fumarate hydratase, class I citrate synthase isocitrate dehydrogenase malate dehydrogenase malate dehydrogenase oxaloacetate decarboxylase alpha chain succinyl-CoA ligase beta subunit succinyl-CoA ligase beta subunit succinyl-CoA ligase alpha subunit succinyl-CoA ligase alpha subunit
36.1% 31.5% 32.3% 29.6% 31.5% 34.5% 34.5% 51.4% 35.2% 35.1% 46.4% 40.4% 33.0% 46.0% 49.8% 46.9% 50.1% 35.1% 52.9% 41.7% 65.7%
... ... .... .... ... .... ... .... .... .... .... .... .... .... .... ... .... .... .... ... ....
Phosphate Aq1351 Aq1547 Aq891
phoH ppa ppx
phosphate starvation-inducible protein inorganic pyrophosphatase exopolyphosphatase
47.1% 56.5% 33.6%
.... .... ....
Polyamines Aq728 Aq062
speC speE
ornithine decarboxylase spermidine synthase
30.9% 48.4%
.... ....
Sulfur Aq1081 Aq1076 Aq1799 Aq455 Aq1803
cysD rhdA rhdA sor soxB
sulfate adenylyltransferase thiosulfate sulfurtransferase thiosulfate sulfurtransferase sulfur oxygenase reductase sulfur oxidation protein SoxB
46.7% 32.3% 31.7% 36.7% 41.3%
.... .... .... .... .... ...
Cofactor Biosynthesis Lipoic acid biosynthesis Aq1355 lipA Biotin Aq170 Aq975 Aq557 Aq626 Aq1659
bioA bioB bioD bioF bioW
Aq566
birA
Folic acid Aq2045 Aq1898 Aq239 Aq162
folC folD folE folK
Aq1468 Aq1144 Aq1606
folP pabB pabC
Heme Aq207 Aq1237 Aq334 Aq816 Aq1279
cobA cysG dcuP gsa hemA
Aq2109 Aq263 Aq1424
hemB hemC hemF
Aq2015 Aq948 Aq099 Aq2124
hemG hemH hemK hemN
Molybdopterin Aq2183 moaA2
Lipoic acid synthetase
48.9%
DAPA aminotransferase biotin synthetase dethiobiotin synthetase 8-amino-7-oxononanoate synthase 6-carboxyhexanoate-CoA ligase (pimeloyl CoA synthase) biotin [acetyl-CoA-carboxylase] ligase
51.7% 42.0% 41.5% 45.1%
folylpolyglutamate synthetase methylenetetrahydrofolate dehydrogenase GTP cyclohydrolase I folate biosynthesis 7,8-dihydro-6hydroxymethylpterin-pyrophosphokinase dihydropteroate synthase p-aminobenzoate synthetase aminodeoxychorismate lyase
31.8% 53.2% 57.1%
uroporphyrin-III c-methyltransferase siroheme synthase uroporphyrinogen decarboxylase glutamate-1-semialdehyde aminotransferase glutamyl tRNA reductase (delta-aminolevulinate synthase) porphobilinogen synthase porphobilinogen deaminase oxygen-independent coproporphyrinogen III oxidase protoporphyrinogen oxidase ferrochelatase protoporphyrinogen oxidase oxygen-independent coproporphyrinogen II
.... .... .... .... 38.7% .... 64.5% .... 53.1% .... 33.1% .... 30.3% ... 46.4% .... 32.2% .... 50.2% ....
molybdenum cofactor biosynthesis protein A
47.0%
.... .... .... .... 47.3% .... 37.5% ....
.... .... .... 43.7% .... 45.8% .... 41.5% ... 29.0% .... 52.1% 36.9% 41.4% 56.5%
....
molybdenum cofactor biosynthesis moaC molybdopterin converting factor subunit 2 molybdopterin-guainine dinucleotide biosynthesis protein B molybdenum cofactor biosynthesis protein A molybdopterin biosynthesis protein MoeB molybdenum cofactor biosynthesis MOG pterin-4a-carbinolamine dehydratase
.. ... 44.4% . 36.8% .... 54.1% .... 55.5% .... 37.9% ....
pantothenate metabolism flavoprotein 3-methyl-2-oxobutanoate hydroxymethyltransferase pantothenate synthetase aspartate 1-decarboxylase
41.2%
panC panD
45.5% 47.4% 46.0%
.... .... .... ....
Pyridine nucleotides Aq1889 nadA Aq777 nadB Aq869 nadC Aq959 nadE
quinolinate synthetase A L-aspartate oxidase quinolinate phosphoribosyl transferase NH(3)-dependent NAD+ synthetase
44.3% 36.7% 47.0% 39.6%
.... ... .... ....
Pyridoxal phosphate Aq852 pdxA Aq1423 pdxJ
pyridoxal phosphate biosynthetic protein PdxA pyridoxal phosphate synthetase
36.8% 88.2%
.... ....
Quinones Aq895 Aq052
ispB ubiA
octoprenyl-diphosphate synthase 4-hydroxybenzoate octaprenyltransferase
35.7% 41.4%
.... ....
Riboflavin Aq350 Aq1707 Aq138 Aq436 Aq139 Aq132
ribA ribC ribD1 ribD2 ribF ribH
GTP cyclohydrolase II riboflavin synthase alpha chain riboflavin specific deaminase riboflavin specific deaminase riboflavin kinase riboflavin synthase beta subunit
61.7% 45.3% 46.0% 42.9% 38.4% 51.0%
.... .... .... .... .... ....
Thiamine Aq1204 Aq1960 Aq1366 Aq558 Aq2178 Aq2119
thiC thiD thiE1 thiE2 thiG thiL
thiamine biosynthesis protein HMP-P kinase thiamine phosphate synthase thiamine phosphate synthase thiamine biosynthesis, thiazole moiety thiamine monophosphate kinase
67.1% 40.5% 36.3% 39.5% 52.5% 34.5%
.... .... .... .... .... ....
glutaredoxin-like protein thioredoxin thioredoxin thioredoxin reductase
33.8% 58.9% 32.2% 39.8%
.... ... .. ....
Aq527 Aq2181 Aq1326
moaC moaE mobB
Aq030 Aq1329 Aq061 Aq049
moeA1 moeB mog phhB
Panthenate Aq815 Aq1973
dfp panB
Aq2132 Aq476
Thio- and glutaredoxin Aq443 gua Aq1916 trxA1 Aq1811 trxA2 Aq500 trxB
45.0% 39.3%
Energy Metabolism Aq1342 gph
phosphoglycolate phosphatase
33.9%
....
ATP-Proton Motive Force Aq679 atpA Aq179 atpB Aq673 atpC Aq2038 atpD Aq177 atpE Aq1586 atpF1 Aq1587 atpF2 Aq2041 atpG Aq1588 atpH
ATP synthase F1 alpha subunit ATP synthase F0 subunit a ATP synthase F1 epsilon subunit ATP synthase F1 beta subunit ATP synthase F0 subunit c ATP synthase F0 subunit b ATP synthase F0 subunit b ATP synthase F1 gamma subunit ATP synthase F1 delta chain
64.3% 36.4% 37.4% 67.4% 53.8% 26.3% 25.5% 39.9% 28.1%
.... .... .... .... ... .... .... .... ....
Dehydrogenases Aq1362 adh1 Aq1240 adh2 Aq186 aldH1 Aq227 aldH2 Aq1145 dhaT Aq232 dhsU Aq1769 dld1 Aq1234 dmsA Aq1232 dmsB Aq1231 dmsC Aq1051 fdhE Aq1039 fdoG Aq1046 fdoH Aq1049 fdoI Aq1903 gcsP1 Aq1109 gcsP2 Aq1639 glpC Aq395 hdrA Aq400 hdrB Aq398 hdrC Aq961 hdrD Aq038 hibD Aq727 ldhA Aq736 lpdA Aq217 narB Aq206 nirB Aq835 nox Aq024 nsd Aq135 nueM Aq1010 udh
alcohol dehydrogenase alcohol dehydrogenase aldehyde dehydrogenase aldehyde dehydrogenase 1,3 propanediol dehydrogenase flavocytochrome C sulfide dehydrogenase D-lactate dehydrogenase DMSO reductase chain A DMSO reductase chain B DMSO reductase chain C formate dehydrogenase formation protein FdhE formate dehydrogenase alpha subunit formate dehydrogenase beta subunit formate dehydrogenase gamma subunit glycine dehydrogenase (decarboxylating) glycine dehydrogenase (decarboxylating) oxido/reductase iron sulfur protein heterodisulfide reductase subunit A heterodisulfide reductase subunit B heterodisulfide reductase subunit C heterodisulfide reductase 3-hydroxyisobutyrate dehydrogenase D-lactate dehydrogenase dihydrolipoamide dehydrogenase nitrate reductase narB nitrite reductase (NAD(P)H) large subunit NADH oxidase nucleotide sugar dehydrogenase NADH dehydrogenase (ubiquinone) dehydrogenase
35.4% 28.8% 41.9% 28.0% 36.6% 33.6% 45.3% 25.0% 38.4% 29.5% 25.9% 50.0% 45.7% 38.4% 49.6% 46.8% 27.1% 39.7% 32.5% 35.7% 29.5% 34.6% 33.5% 37.0% 39.1% 35.3% 33.1% 47.0% 28.2% 29.7%
.... .... .... .... .... .... .... .... ... . .... .... .... .... .... .... .... .... .... . .... .... .... .... .... ... .... .... .... ....
Electron transport Aq2191 coxA1 Aq2192 coxA2 Aq2190 coxB Aq2188 coxC Aq153 ctaA Aq042 cyc Aq792 cycB1 Aq1550 cycB2 Aq1357 cydA Aq1358 cydB Aq067 dmsB Aq235 fccB’ Aq919a fdx1 Aq1171a fdx2 Aq1192a fdx3 Aq108a fdx4 Aq211 fhp Aq2096 floX Aq045 petA Aq044 petB Aq234 soxF Aq2186 sqr
cytochrome c oxidase subunit I cytochrome c oxidase subunit I cytochrome c oxidase subunit II cytochrome c oxidase subunit III heme O oxygenase cytochrome c cytochrome c552 cytochrome C552 cytochrome oxidase d subunit I cytochrome oxidase d subunit II dimethylsulfoxide reductase chain B sulfide dehydrogenase, flavoprotein subunit ferredoxin ferredoxin ferredoxin ferredoxin flavohemoprotein flavodoxin Rieske-I iron sulfur protein cytochrome b Rieske-I iron sulfur protein sulfide-quinone reductase
42.4% 38.1% 27.4% 28.6% 28.1% 25.8% 29.9% 38.7% 38.8% 31.2% 40.2% 38.0% 37.1% 43.9% 35.0% 56.6% 43.4% 32.5% 34.3% 38.3% 29.0% 41.0%
.... .... .... .... .... ... .. .... .... .... .... ... ... .. ... ... .... .... .... ... .... ....
Glycolysis and gluconeogenesis Aq484 eno Aq1390 fba Aq1065 gap Aq434 glpK Aq1744 gpmA Aq1634 gspA
enolase fructose-1,6-bisphosphate aldolase class II glyceraldehyde-3-phosphate dehydrogenase glycerol kinase phosphoglycerate mutase glycerol-3-phosphate dehydrogenase (NAD+)
65.0% 39.9% 59.5% 51.0% 27.9% 40.5%
.... .... .... .... .... ....
Nature © Macmillan Publishers Ltd 1998
8
Aq1708 Aq750 Aq118 Aq1990 Aq501 Aq2142 Aq1520 Aq1517 Aq360
pfkA pgi pgk pgmA pmu ppsA pycA pycB timA
phosphofructokinase glucose-6-phosphate isomerase phosphoglycerate kinase phosphoglycerate mutase phosphoglucomutase/phosphomannomutase phosphoenolpyruvate synthase pyruvate carboxylase c-terminal domain pyruvate carboxylase n-terminal domain triose phophate isomerase
49.4% 37.8% 54.5% 33.2% 37.8% 56.3% 46.6% 57.1% 52.2%
.... .... .... .... .... .... .... .... ....
Aq046 Aq1305
Hydrogenase Aq665 Aq667 Aq666 Aq1021 Aq671 Aq1157 Aq662 Aq960 Aq804 Aq660 Aq965 Aq802 Aq1591
Aq1580 Aq1334 Aq713 Aq640 Aq969 Aq1907 Aq2163
pyrF pyrG pyrH thy tmk umpS uraP
hoxZ hupD hupE hypA hypB hypD mbhL1 mbhL2 mbhL3 mbhS1 mbhS2 mbhS3 shyS
Ni/Fe hydrogenase B-type cytochrome subunit HupD hydrogenase related function HupE hydrogenase related function hydrogenase accessory protein HypA hydrogenase expression/formation protein B hydrogenase expression/formation protein HypD hydrogenase large subunit hydrogenase large subunit hydrogenase large subunit hydrogenase small subunit hydrogenase small subunit hydrogenase small subunit soluble hydrogenase small subunit
40.4% 40.9% 38.3% 39.8% 50.6% 56.1% 50.6% 44.3% 27.9% 66.6% 51.3% 36.7% 41.6%
.... .... .... .... .... .... .... .... .... .... .... .... ....
Regulation Aq1058 Aq2179 Aq281 Aq1387 Aq1724
acrR1 acrR2 acrR3 arsR degT
Sugar metabolism Aq968 cbbE2 Aq1658 fucA1 Aq1979 fucA2 Aq498 gnd Aq497 gsdA Aq1138 rpiB Aq119 talC Aq1765 tktA
ribulose-5-phosphate 3-epimerase fuculose-1-phosphate aldolase fuculose-1-phosphate aldolase 6-phosphogluconate dehydrogenase glucose-6-phosphate 1-dehydrogenase ribose 5-phosphate isomerase B transaldolase transketolase
47.2% 31.8% 29.7% 45.2% 32.3% 54.5% 71.1% 52.4%
.... .... .... .... .... .... .... ....
NADH dehydrogenase Aq1385 nuoA1 Aq1310 nuoA2 Aq1312 nuoB Aq551 nuoD1 Aq1314 nuoD2 Aq574 nuoE Aq573 nuoF Aq437 nuoG Aq1315 nuoH1 Aq1373 nuoH2 Aq1374 nuoH3 Aq1317 nuoI1 Aq1375 nuoI2 Aq1318 nuoJ1 Aq1377 nuoJ2 Aq1319 nuoK1 Aq1378 nuoK2 Aq1320 nuoL1 Aq866 nuoL2 Aq1379 nuoL3 Aq1321 nuoM1 Aq1382 nuoM2 Aq1322 nuoN1 Aq1383 nuoN2
NADH dehydrogenase I chain A NADH dehydrogenase I chain A NADH dehydrogenase I chain B NADH dehydrogenase I chain D NADH dehydrogenase I chain D NADH dehydrogenase I chain E NADH dehydrogenase I chain F NADH dehydrogenase I chain G NADH dehydrogenase I chain H NADH dehydrogenase I chain H NADH dehydrogenase I chain H NADH dehydrogenase I chain I NADH dehydrogenase I chain I NADH dehydrogenase I chain J NADH dehydrogenase I chain J NADH dehydrogenase I chain K NADH dehydrogenase I chain K NADH dehydrogenase I chain L NADH dehydrogenase I chain L NADH dehydrogenase I chain L NADH dehydrogenase I chain M NADH dehydrogenase I chain M NADH dehydrogenase I chain N NADH dehydrogenase I chain N
42.0% 44.9% 60.1% 37.7% 42.2% 36.8% 20.5% 35.4% 41.0% 42.1% 38.9% 30.5% 29.2% 35.4% 30.6% 51.1% 48.4% 39.0% 30.2% 43.1% 43.6% 36.9% 34.1% 32.8%
.... .... ... .... .... .... .. .. .... .... .... ... ... .... .... .... .... .... ... .... .... .... .... ....
Aq534 Aq831 Aq490 Aq1207 Aq1418 Aq213 Aq1908 Aq1115 Aq316 Aq905 Aq231 Aq1156 Aq093 Aq1019 Aq672 Aq764 Aq638 Aq1038 Aq702 Aq218 Aq1117 Aq1792 Aq230 Aq164 Aq2069 Aq319 Aq906 Aq844 Aq1496
draG exsB fnr furR1 furR2 glnBi hflX hksP1 hksP2 hksP3 hksP4 hoxX hth hypE hypF iclR lysR1 lysR2 merR nifA ntrC1 ntrC2 ntrC3 ntrC4 obg phoB phoU spoT xylR
2-acylglycerophosphoethanolamine acyltransferase acetyl-CoA carboxylase alpha subunit biotin carboxyl carrier protein biotin carboxylase biotin carboxylase acetyl-CoA carboxyltransferase beta subunit acyl carrier protein holo-[acyl-carrier protein] synthase acetyl-coenzyme A synthetase acetyl-coenzyme A synthetase c-terminal fragment phosphatidate cytidylyltransferase cyclopropane-fatty-acyl-phospholipid synthase malonyl-CoA:Acyl carrier protein transacylase 3-oxoacyl-[acyl-carrier-protein] synthase II 3-oxoacyl-[acyl-carrier-protein] reductase 3-oxoacyl-[acyl-carrier-protein] synthase III enoyl-[acyl-carrier-protein] reductase (NADH) (3R)-hydroxymyristoyl-(acyl carrier protein) dehydratase long-chain-fatty-acid CoA ligase lipoate-protein ligase A phosphotidylglycerophosphate synthase phosphotidylglycerophosphate synthase PlsX protein
.... ... .... .... .... .... .... .... .... 61.2% .... 29.2% .... 37.5% .... 42.1% .... 58.4% .... 52.9% .... 47.0% .... 49.6% .... 58.7% .... 30.0% ... 28.1% .. 37.3% .. 38.9% ... 43.7% ....
Lipid metabolism Aq2058 aas Aq1206 Aq1363 Aq1664 Aq1470 Aq445 Aq1717a Aq813 Aq2104 Aq2103
accA accB accC1 accC2 accD acpP acpS acs acs’
Aq1249 Aq1737 Aq892 Aq1717 Aq1716 Aq1099 Aq1552 Aq056
cds cfa fabD fabF fabG fabH fabI fabZ
Aq999 Aq1638 Aq958 Aq2154 Aq1101
fadD lplA pgsA pgsA plsX
Purines, Pyrimidines, Nucleotides and Nucleosides Aq094 nrdA ribonucleotide reductase alpha chain Aq1505 nrdF ribonucleotide reductase beta chain Purines Aq568 Aq236 Aq2023 Aq544 Aq078 Aq1590 Aq1636 Aq1290 Aq597 Aq2117
deoD guaA guaB hpt kad ndk prs purA purB purC
Aq742 Aq1178 Aq1175 Aq1963
purD purE purF purH
Aq245 Aq1836 Aq769 Aq857 Aq1105 Aq1818 Pyrimidines Aq410 Aq1172 Aq2101 Aq2153 Aq1607 Aq220 Aq409 Aq806
37.1% 57.1% 44.6% 54.4% 56.5% 56.9% 71.2% 30.8% 54.0%
35.0% 36.2%
.... .
.... .... .... .... .... .... .... .... .... 52.5% .... 54.2% .... 64.6% .... 42.7% .... 48.2% .... 35.6% .... 49.3% ... 50.0% .... 48.3% .... 51.1% .... 56.3% .... 33.1% 58.4% 65.4% 48.2% 50.0% 48.2% 55.2% 49.2% 52.4%
purK purL purM purN purQ purU
purine nucleoside phosphorylase GMP synthase inosine monophosphate dehydrogenase hypoxanthine-guanine phosphoribosyltransferase adenylate kinase nucleoside diphosphate kinase phosphoribosylpyrophosphate synthetase adenylosuccinate synthetase adenylosuccinate lyase phosphoribosylaminoimidazolesuccinocarboxamide synthase phosphoribosylamine-glycine ligase phosphoribosylaminoimidazole carboxylase amidophosphoribosyltransferase phosphoribosylaminoimidazolecarboxamide formyltransferase phosphoribosyl aminoimidazole carboxylase phosphoribosylformylglycinamidine synthase II phosphoribosylformylglycinamidine cyclo-ligase phosphoribosylglycinamide formyltransferase phosphoribosyl formylglycinamidine synthase I formyltetrahydrofolate deformylase
carA carB carB cmk dcd dut pyrB pyrC
carbamoyl phosphate synthetase small subunit carbamoyl-phosphate synthase large subunit carbamoyl-phosphate synthase, large subunit cytidylate kinase deoxycytidine triphosphate deaminase deoxyuridine 5’triphosphate nucleotidohydrolase aspartate carbamoyltransferase catalytic chain dihydroorotase
52.2% 60.7% 63.1% 38.5% 39.5%
.... .... .... .... .... 42.0% .... 37.3% ...
pyrD pyrDB
DNA Replication and Repair Aq358 dinG Aq322 dnaA Aq1472 dnaB Aq910 dnaC Aq1008 dnaE Aq1493 dnaG Aq1882 dnaN Aq932 dnaQ Aq1855 dnaX Aq1422 dpbF Aq1693 dplF Aq980 gyrA Aq1026 gyrB Aq2057 helX Aq1484a himA Aq2174 ihfB Aq1394 lig Aq633 ligA Aq1578 mutL Aq308 mutS1 Aq1242 mutS2 Aq1449 mutT Aq282 mutY1 Aq172 mutY2 Aq496 mutY3 Aq1629 nfo Aq710 nucI Aq1495 ogt Aq1628 pol Aq1967 polA Aq1610 radC Aq2150 recA Aq2053 recG Aq2155 recJ Aq561 recN Aq1478 recR Aq793 rep Aq1886 sbcD Aq064 ssb Aq657 topA Aq1159 topG1 Aq886 topG2 Aq686 uvrA Aq1856 uvrB Aq2126 uvrC
50.5%
transcriptional regulator (TetR/AcrR family) transcriptional regulator (TetR/AcrR family) transcriptional regulator (TetR/AcrR family) transcriptional regulator (ArsR family) transcriptional regulator (DegT/DnrJ/Eryc1 family) ADP-ribosylglycohydrolase trans-regulatory protein ExsB transcriptional regulator (Crp/Fnr family) transcriptional regulator (FurR family) transcriptional regulator (FurR family) PII-like protein GlnBi GTP-binding protein HflX histidine kinase sensor protein histidine kinase sensor protein histidine kinase sensor protein histidine kinase sensor protein hydrogenase regulation HoxX transcriptional regulator (H-T-H) hydrogenase expression/formation protein transcriptional regulatory protein HypF transcriptional regulator (IclR family) transcriptional regulator (LysR family) transcriptional regulator (LysR family) transcriptional regulator (MerR family) transcriptional regulator (NifA family) transcriptional regulator (NtrC family) transcriptional regulator (NtrC family) transcriptional regulator (NtrC family) transcriptional regulator (NtrC family) GTP-binding protein transcriptional regulator (PhoB-like) transcriptional regulator (PhoU-like) (p)ppGpp 3-pyrophosphohydrolase transcriptional regulator (NagC/XylR family)
... ... .... .... 34.1% .... 32.1% .... 38.5% .... 29.5% .... 37.9% .... 34.6% .... 48.0% .. 40.3% .... 27.7% .. 28.1% .... 23.6% ... 28.2% .... 46.7% .... 50.2% .... 44.3% .... 44.8% .... 30.4% .... 32.8% .... 28.9% .... 32.8% .... 42.8% .... 41.0% .... 40.2% .... 40.0% .... 38.3% .... 54.9% .. 41.6% .... 41.9% .... 47.2% ... 29.3% ....
ATP-dependent helicase (DinG family) chromosome replication initiator protein DnaA replicative DNA helicase DNA replication protein DnaC DNA polymerase III alpha subunit DNA primase DNA polymerase III beta chain DNA polymerase III epsilon subunit DNA polymerase III gamma subunit DNA polymerase beta family N-terminus of phage SPO1 DNA polymerase DNA gyrase A subunit gyrase B DNA helicase DNA binding protein HU integration host factor beta subunit DNA ligase (ATP dependent) DNA ligase (NAD dependent) DNA mismatch repair protein MutL DNA mismatch repair protein MutS DNA mismatch repair protein MutS 8-OXO-dGTPase domain (mutT domain) endonuclease III endonuclease III endonuclease III deoxyribonuclease IV thermococcal nuclease homolog O-6-methylguanine-DNA-alkyltransferase DNA polymerase I 3’-5’ exo domain DNA polymerase I (PolI) DNA repair protein RadC recombination protein RecA ATP-dependent DNA helicase RecG single-strand-DNA-specific exonuclease RecJ recombination protein RecN recombination protein RecR ATP-dependent DNA helicase REP ATP-dependent dsDNA exonuclease single stranded DNA-binding protein topoisomerase I reverse gyrase reverse gyrase repair excision nuclease subunit A repair excision nuclease subunit B repair excision nuclease subunit C
27.9% 36.5% 40.3% 26.4% 41.9% 39.8% 32.1% 40.0% 36.6% 39.1% 37.3% 43.6% 55.2% 49.7% 40.2% 35.8% 50.8% 45.7% 72.3% 77.5% 37.0% 46.3% 53.6% 51.8% 43.4% 39.0% 36.4% 36.9% 43.2% 30.5% 39.0% 88.5% 38.9% 31.8% 27.7% 38.3% 33.4% 29.9% 39.4% 39.6% 41.6% 35.1% 61.0% 53.9% 32.5%
.... .... .... .... ... ... .... .... .... .... .... .... .. .... .... .... .... .... .... .... .... .... .... .... .... .... .... ... .... .... .... .... .... .... .... .... .... .... ... .... .... .... .... .... ....
42.3% 20.6% 37.2% 45.4% 32.3% 46.3% 59.6% 40.4% 44.0% 46.9% 41.6% 30.6% 40.5%
.... .... .... ... .... .. .... .... .... .... .... .... ....
36.1%
.... .... .... .. .... .... .... .... .... .... ....
Transcription RNA polymerase and transcription factors Aq613 deaD ATP-dependent RNA helicase DeaD Aq357a flgM anti sigma factor FlgM Aq1218 fliA RNA polymerase sigma factorFliA Aq259 nusA transcription termination NusA Aq133 nusB transcription termination NusB Aq1931 nusG transcription antitermination protein NusG Aq873 rho transcriptional terminator Rho Aq070 rpoA RNA polymerase alpha subunit Aq1939 rpoB RNA polymerase beta subunit Aq1945 rpoC RNA polymerase beta prime subunit Aq1490 rpoD RNA polymerase sigma factor RpoD Aq599 rpoN RNA polymerase sigma factor RpoN Aq1452 rpoS RNA polymerase sigma factor RpoS RNA modification Aq1816 ksgA Aq1067 miaA Aq411 Aq2158 Aq221 Aq894 Aq946 Aq1955 Aq924 Aq1661 Aq1308 Aq841
pcnB1 pcnB2 phpA queA rnc rnhB rnpH spoU tgt trm1
.... .... .... .... .... ... .... .... ....
dihydroorotase dehydrogenase dihydroorotate dehydrogenase electron transfer subunit orotidine-5’-phosphate decarboxylase CTP synthetase UMP kinase thymidylate synthase complementing protein thymidylate kinase uridine 5-monophosphate synthase uracil phosphoribosyltransferase
dimethyladenosine transferase tRNA delta-2-isopentenylpyrophosphate (IPP) transferase poly A polymerase poly A polymerase polyribonucleotide nucleotidyltransferase queuosine biosynthesis protein RNase III RNase HII RNase PH rRNA methylase SpoU queuine tRNA-ribosyltransferase N2,N2-dimethylguanosine tRNA
Nature © Macmillan Publishers Ltd 1998
34.7% 37.2% 57.5% 62.1% 30.5% 35.1% 42.1% 42.0% 34.1% 31.0% 29.7% 35.3%
38.2% 28.5% 33.9% 45.0% 46.9% 35.8% 48.4% 64.0% 44.0% 52.6%
8
Aq1489 Aq749 Aq705 Aq1890 Aq2046 Aq257
trmD truA truB tsnR vacB ygcA
methyltransferase tRNA guanine-N1 methyltransferase pseudouridine synthase I tRNA pseudouridine 55 synthase rRNA methylase VacB protein (ribonuclease II family) RNA methyltransferase (TrmA-family)
34.6% 42.9% 33.1% 38.2% 36.4% 37.9% 28.8%
.... .... .... .... .... .... ....
Translation Aq2131 Aq247 Aq461 Aq2147a Aq346
fmt gatA gatB gatC pth
methionyl-tRNA formyltransferase glutamyl-tRNA(Gln) amidotransferase subunit A glutamyl-tRNA(Gln) amidotransferase subunit B glutamyl-tRNA(Gln) amidotransferase subunit C peptidyl-tRNA hydrolase
45.7% 53.6% 48.8% 41.1% 48.8%
.... .... .... .... ....
Aminoacyl tRNA synthetases Aq1293 alaS Aq923 argS Aq1677 aspS Aq1068 cysS Aq763 genX Aq1221 gltX Aq945 glyQ Aq2141 glyS Aq122 hisS1 Aq1155 hisS2 Aq305 ileS Aq351 leuS Aq1770 leuS’ Aq1202 lysU Aq1257 metG Aq422 metG’ Aq953 pheS Aq1730 pheT Aq365 proS Aq298 serS Aq1667 thrS Aq992 trpS Aq1751 tyrS Aq1413 valS
alanyl-tRNA synthetase arginyl-tRNA synthetase aspartyl-tRNA synthetase cysteinyl-tRNA synthetase lysyl-tRNA synthetase (genX) homolog glutamyl-tRNA synthetase glycyl-tRNA synthetase alpha subunit glycyl-tRNA synthetase beta subunit histidyl-tRNA synthetase histidyl-tRNA synthetase isoleucyl-tRNA synthetase leucyl-tRNA synthetase alpha subunit leucyl-tRNA synthetase beta subunit lysyl-tRNA synthetase methionyl-tRNA synthetase alpha subunit methionyl-tRNA synthetase beta subunit phenylalanyl-tRNA synthetase alpha subunit phenylalanyl-tRNA synthetase beta subunit proline-tRNA synthetase seryl-tRNA synthetase threonyl-tRNA synthetase tryptophanyl-tRNA synthetase tyrosyl tRNA synthetase valyl-tRNA synthetase
46.6% 39.4% 51.3% 45.0% 38.6% 48.5% 61.9% 37.1% 43.3% 34.9% 82.1% 50.7% 47.2% 53.2% 45.0% 64.2% 51.9% 35.4% 44.1% 59.4% 48.5% 38.4% 56.2% 33.2%
.... .... .... .... .... .... .... .... .... .... .. .... .... ... .... .... .... .... .... .... .... .... .... ....
Ribosomal Proteins Aq1935 rplA Aq013 rplB Aq009 rplC Aq011 rplD Aq1652 rplE Aq1649 rplF Aq2042 rplI Aq1936 rplJ Aq1933 rplK Aq1937 rplL Aq1877 rplM Aq1654 rplN Aq1642 rplO Aq018 rplP Aq069 rplQ Aq1648 rplR Aq1954 rplS Aq952 rplT Aq016a rplV Aq012 rplW Aq1653 rplX Aq1644 rpmD Aq1930a rpmG Aq792a rpmI Aq1485 rpsA Aq2007 rpsB Aq017 rpsC Aq072 rpsD Aq1645 rpsE Aq063 rpsF Aq1832 rpsG1 Aq734 rpsG2 Aq1651 rpsH Aq1878 rpsI Aq008 rpsJ Aq073 rpsK Aq735 rpsL1 Aq1834 rpsL2 Aq074 rpsM Aq1651a rpsN Aq226a rpsO Aq123 rpsP Aq020 rpsQ Aq064a rpsR Aq015 rpsS Aq1767 rpsT Aq867a rpsU
ribosomal protein L01 ribosomal protein L02 ribosomal protein L03 ribosomal protein L04 ribosomal protein L05 ribosomal protein L06 ribosomal protein L09 ribosomal protein L10 ribosomal protein L11 ribosomal protein L7/L12 ribosomal protein L13 ribosomal protein L14 ribosomal protein L15 ribosomal protein L16 ribosomal protein L17 ribosomal protein L18 ribosomal protein L19 ribosomal protein L20 ribosomal protein L22 ribosomal protein L23 ribosomal protein L24 ribosomal protein L30 ribosomal protein L33 ribosomal protein L35 ribosomal protein S01 ribosomal protein S02 ribosomal protein S03 ribosomal protein S04 ribosomal protein S05 ribosomal protein S06 ribosomal protein S07 ribosomal protein S07 ribosomal protein S08 ribosomal protein S09 ribosomal protein S10 ribosomal protein S11 ribosomal protein S12 ribosomal protein S12 ribosomal protein S13 ribosomal protein S14 ribosomal protein S15 ribosomal protein S16 ribosomal protein S17 ribosomal protein S18 ribosomal protein S19 ribosomal protein S20 ribosomal protein S21
57.9% 46.9% 53.8% 51.3% 67.0% 46.2% 35.6% 36.5% 71.4% 75.4% 60.6% 59.5% 57.4% 59.3% 48.7% 62.7% 59.8% 63.5% 47.3% 52.2% 50.8% 46.4% 67.9% 48.3% 32.6% 60.3% 54.0% 51.9% 60.6% 32.7% 52.5% 51.9% 39.9% 50.5% 55.9% 60.7% 78.9% 78.9% 61.9% 51.6% 61.6% 36.6% 59.6% 48.5% 63.1% 40.0% 38.2%
.... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... ... .... .... .... .... .. .... .... .... .... .... .... ... .... .... .... .... ... .... .... .... .... .... .... .... ... .... .... .. ... ....
Translation factors Aq1364 efp Aq2114 eif Aq712 frr Aq001 fusA Aq075a infA Aq2032 infB Aq1777 infC Aq876 prfA Aq1840 prfB Aq1033 selB Aq715 tsf Aq005 tufA1 Aq1928 tufA2
elongation factor P initiation factor eIF-2B alpha subunit ribosome recycling factor elongation factor EF-G initiation factor IF-1 initiation factor IF-2 initiation factor IF-3 peptide chain release factor RF-1 peptide chain release factor RF-2 elongation factor SelB elongation factor EF-Ts elongation factor EF-Tu elongation Factor EF-Tu
48.6% 58.4% 43.0% 91.9% 69.1% 48.5% 53.6% 54.8% 49.9% 30.4% 35.8% 74.4% 73.9%
.... ... .... .... .... .... .... .... .... .... .... .... ....
Protein modification Aq731 ccdA Aq579 def Aq2093 dsbC Aq055 hemX1 Aq2043 hemX2 Aq1053 nifS1 Aq739 nifS2 Aq1871 pmbA Aq2102 prmA Aq567 rimI Aq576 stpK Aq152 tlpA
cytochrome c-type biogenesis protein polypeptide deformylase thiol:disulfide interchange protein cytochrome c biogenesis protein cytochrome c biogenesis protein FeS cluster formation protein NifS FeS cluster formation protein NifS peptide maturation ribosomal protein L11 methyltransferase ribosomal-protein-alanine acetyltransferase ser/thr protein kinase thiol disulfide interchange protein
32.0% 41.4% 27.6% 26.2% 36.2% 38.5% 45.5% 25.6% 35.1% 37.9% 30.8% 37.6%
.... .... .... .... .... .... .... .... .... .... .... ...
Proteases Aq1950 Aq1672 Aq1296 Aq1339 Aq1337 Aq1015 Aq801
serine protease ATPase subunit of ATP-dependent protease ATP-dependent Clp protease ATP-dependent Clp protease proteolytic subunit ATP-dependent protease ATPase subunit clpX collagenase sialoglycoprotease
26.5% 46.8% 54.9% 65.4% 66.1% 41.3% 45.5%
... .... .... .... .... .... ....
aprV clpB clpC clpP clpX col gcp
.... .... .... .... .. .... .... .... .... .... ....
Aq1671 Aq1450 Aq242 Aq076 Aq1459 Aq2099 Aq1535 Aq618 Aq797 Aq552 Aq2204
hslV htrA lon map npr pepA pepQ pfpI prc sms ymxG
heat shock protein HsLV periplasmic serine protease Lon protease methionyl aminopeptidase neutral protease leucine aminopeptidase xaa-pro dipeptidase protease I carboxyl-terminal protease ATP-dependent protease sms processing protease
57.6% 38.3% 50.6% 44.1% 27.7% 39.5% 31.9% 41.8% 41.8% 46.2% 28.3%
Transport Aq1222 Aq620 Aq1095 Aq1094 Aq1097 Aq417 Aq413 Aq297 Aq2160 Aq1531 Aq2122 Aq2137 Aq1563 Aq695 Aq1122 Aq469 Aq786 Aq112 Aq682 Aq343 Aq851 Aq724 Aq1445 Aq1125 Aq1132 Aq1331 Aq468 Aq1073 Aq911 Aq1062 Aq1255 Aq1330 Aq1268 Aq1863 Aq1725 Aq1229 Aq447 Aq1609 Aq086 Aq415 Aq929 Aq2030 Aq215 Aq1441
abcT1 abcT2 abcT3 abcT4 abcT5 abcT6 abcT7 abcT8 abcT9 abcT10 abcT11 abcT12 abcT13 acrD1 acrD2 acrD3 acrD4 amtB arsA1 arsA2 corA ctrA1 ctrA2 ctrA3 czcB1 czcB2 czcB2 czcD ebs emrB feoB gltP hvsT kch lepA mffT mgtC modA modC napA1 napA2 napA3 nasA oppA
Aq481 Aq1509 Aq2019 Aq1055 Aq2018 Aq2016 Aq2129 Aq098 Aq2077 Aq2106 Aq1988 Aq1504 Aq031
oppB oppC pstA pstB pstC pstS sbf secG snf ssf tolQ trk1 trnS
ABC transporter ABC transporter ABC transporter (ABC-2 subfamily) ABC transporter ABC transporter (hlyB subfamily) ABC transporter ABC transporter ABC transporter ABC transporter ABC transporter ABC transporter ABC transporter ABC transporter (MsbA subfamily) cation efflux system (AcrB/AcrD/AcrF family) cation efflux system (AcrB/AcrD/AcrF family) cation efflux system (AcrB/AcrD/AcrF family) cation efflux (AcrB/AcrD/AcrF family) ammonium transporter anion transporting ATPase anion transporting ATPase Mg(2+) and Co(2+) transport protein cation transporting ATPase (E1-E2 family) cation transporting ATPase (E1-E2 family) cation transporting ATPase (E1-E2 family) cation efflux system (czcB-like) cation efflux system (czcB-like) cation efflux system (czcB-like) cation efflux system (CzcD-like) erythrocyte band 7 homolog major facilitator family transporter ferrous iron transport protein B proton/sodium-glutamate symport protein high affinity sulfate transporter potassium channel protein G-protein LepA transporter (major facilitator family) Mg(2+) transport ATPase molybdate periplasmic binding protein Molybdenum transport system permease Na(+)/H(+) antiporter Na(+)/H(+) antiporter Na(+)/H(+) antiporter nitrate transporter transporter (extracellular solute binding protein family 5) transporter (OppBC family) oligopeptide transport system permease phosphate transport system permease PstA phosphate transport ATP binding protein phosphate transport system permease protein C phosphate-binding periplasmic protein Na(+) dependent transporter (Sbf family) protein export membrane protein SecG Na(+):neurotransmitter symporter (Snf family) Na(+):solute symporter (Ssf family) TolQ homolog K+ transport protein homolog transporter (Pho87 family)
.... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... ... .... .... .... .... .... .... .... .... ... 37.0% .... 46.2% .... 46.2% .... 43.5% .... 68.1% .... 45.2% .... 52.4% .... 34.9% .... 35.7% .... 25.7% .... 47.4% . 32.5% . 40.6% .... 46.8% ....
Uncategorized Aq1023 Aq2110 Aq158 Aq458 Aq542 Aq147 Aq1303a Aq1265 Aq348 Aq212 Aq337 Aq528 Aq148 Aq2095 Aq1994 Aq1919 Aq1540 Aq1052 Aq1657 Aq944 Aq1108 Aq1458
acuC1 acuC2 apfA bcp bcpC cobW cspC cstA ctc cynS cysQ dedF deoC dksA era1 era2 gcpE gcsH1 gcsH2 gcsH3 gcsH4 gcvT
36.9% 38.6% 36.6% 40.6% 37.4% 29.5% 67.2% 33.0% 34.7% 39.5% 47.4% 52.4% 46.6% 35.1% 49.7% 43.0% 50.1% 28.6% 39.8% 36.7% 44.8%
Aq108b Aq101 Aq2120 Aq1091 Aq708 Aq1925 Aq1579 Aq1983 Aq748 Aq1739 Aq1977 Aq1560 Aq1823 Aq1789 Aq587 Aq1820 Aq896 Aq1300 Aq1507 Aq967 Aq141 Aq994 Aq057 Aq287 Aq832 Aq871 Aq2021 Aq773 Aq629
hfq hly hlyC hylA hyuA hyuB iagB imp2 ispA lytB masA mglA1 mglA2 mviB neaC nfeD nifU omp omt ostA pkcI pncA sfsA smb surE thdF tldD tly xcpC
acetoin utilization protein acetoin utilization protein AP4A hydrolase bacterioferritin comigratory protein phosphonopyruvate decarboxylase cobalamin synthesis related protein CobW cold shock protein carbon starvation protein A general stress protein Ctc cyanate hydrolase CysQ protein phenylacrylic acid decarboxylase deoxyribose-phosphate aldolase dnaK suppressor protein GTP-binding protein Era GTP binding protein Era GcpE protein glycine cleavage system protein H glycine cleavage system protein H glycine cleavage system protein H glycine cleavage system protein H aminomethyltransferase (glycine cleavage system T protein) host factor I hemolysin hemolysin homolog protein hemolysin N-methylhydantoinase A N-methylhydantoinase B invasion protein IagB myo-inositol-1(or 4)-monophosphatase geranylgeranyl pyrophosphate synthase LytB protein enolase-phosphatase E-1 gliding motility protein gliding motility protein MglA ‘virulence factor’ homolog MviB N-ethylammeline chlorohydrolase nodulation competitiveness protein NfeD NifU protein outer membrane protein O-methyltransferase organic solvent tolerance protein protein kinase C inhibitor (HIT family) pyrazinamidase/nicotinamidase sugar fermentation stimulation protein small protein B stationary phase survival protein SurE thiophene and furan oxidation protein TldD protein hemolysin chromosome assembly protein homolog
Nature © Macmillan Publishers Ltd 1998
34.7% 36.8% 34.4% 37.7% 45.5% 51.8% 51.5% 49.3% 45.3% 36.4% 42.5% 38.2% 30.5% 22.7% 32.0% 34.2% 27.7% 49.0% 41.5% 33.9% 31.1% 30.7% 28.1% 43.8% 23.7% 26.9% 28.5% 43.4% 50.2% 28.3% 32.6% 35.6% 29.4% 30.1% 59.8% 37.2% 36.2% 38.2% 44.8% 27.6% 32.7% 26.8% 35.8%
.... .... .... .... .... ... .... .... .... .... .... .... .... .... .... .... .... .... ... ... ... 42.2% .... 53.5% .... 33.7% ... 29.3% .... 33.5% ... 39.8% .... 43.1% .... 38.3% ... 36.0% .... 40.7% .... 43.9% .... 42.3% .... 42.4% .... 34.1% .... 29.7% .... 42.8% .... 37.9% .... 48.3% .... 25.5% .... 39.5% .... 22.0% .... 59.0% .... 39.1% .... 27.3% .... 52.0% .... 44.1% .... 45.4% .... 40.9% .... 43.8% .... 33.3% ....
8