The complete genome of the hyperthermophilic bacterium Aquifex ...

121 downloads 0 Views 472KB Size Report
Mar 26, 1998 - Gerard Deckert*†, Patrick V. Warren*†, Terry Gaasterland‡, William G. Young*, Anna L. Lenox*, David E. Graham§, ...... 1747 murE. 175.
articles

The complete genome of the hyperthermophilic bacterium Aquifex aeolicus

8

Gerard Deckert*†, Patrick V. Warren*†, Terry Gaasterland‡, William G. Young*, Anna L. Lenox*, David E. Graham§, Ross Overbeek‡, Marjory A. Snead*, Martin Keller*, Monette Aujay*, Robert Huberk, Robert A. Feldman*, Jay M. Short*, Gary J. Olsen§ & Ronald V. Swanson* * Diversa Corporation, 10665 Sorrento Valley Road, San Diego, California 92121, USA ‡ Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, Illinois 60439, USA § Department of Microbiology, University of Illinois, Urbana, Illinois 61801, USA k Lehrstuhl fu¨r Mikrobiologie, Universita¨t Regensburg W-8400, Regensburg W-8400, Germany . ............ ............ ............ ........... ............ ............ ............ ........... ............ ............ ............ ........... ............ ............ ............ ........... ............ ............ ............ ............ ...........

Aquifex aeolicus was one of the earliest diverging, and is one of the most thermophilic, bacteria known. It can grow on hydrogen, oxygen, carbon dioxide, and mineral salts. The complex metabolic machinery needed for A. aeolicus to function as a chemolithoautotroph (an organism which uses an inorganic carbon source for biosynthesis and an inorganic chemical energy source) is encoded within a genome that is only one-third the size of the E. coli genome. Metabolic flexibility seems to be reduced as a result of the limited genome size. The use of oxygen (albeit at very low concentrations) as an electron acceptor is allowed by the presence of a complex respiratory apparatus. Although this organism grows at 95 8C, the extreme thermal limit of the Bacteria, only a few specific indications of thermophily are apparent from the genome. Here we describe the complete genome sequence of 1,551,335 base pairs of this evolutionarily and physiologically interesting organism.

Complete genome sequences have been determined for a number of organisms, including Archaea1, Bacteria2–7, and Eukarya8. Here we present and explore the genome sequence of Aquifex aeolicus. With growth-temperature maxima near 95 8C, Aquifex pyrophilus and A. aeolicus are the most thermophilic bacteria known. Although isolated and described only recently9, these species are related to filamentous bacteria first observed at the turn of the century, growing at 89 8C in the outflow of hot springs in Yellowstone National Park10,11. The observation of these macroscopic assemblages would later be instrumental in the drive to culture hyperthermophilic organisms12. The Aquificaceae represent the most deeply branching family within the bacterial domain on the basis of phylogenetic analysis of 16S ribosomal RNA sequences13,14, although analyses of individual protein sequences vary in their placement of Aquifex relative to other groups15–18. The genera in this group, Aquifex and Hydrogenobacter, are thermophilic, hydrogen-oxidizing, microaerophilic, obligate chemolithoautotrophs9,19–21. A. aeolicus (isolated by R.H. and K. O. Stetter) was cultured at 85 8C under an H2/CO2/O2 (79.5:19.5:1.0) atmosphere in a medium containing only inorganic components. A. aeolicus does not grow on a number of organic substrates, including sugars, amino acids, yeast extract or meat extract. Unlike its close relative A. pyrophilus, A. aeolicus has not been shown to grow anaerobically with nitrate as an electron acceptor in the laboratory. From study of the physiology of the organism, several predictions can be made. As an autotroph, A. aeolicus must have genes encoding proteins for one or more modes of carbon fixation and a complete set of biosynthetic genes. As autotrophy is a feature that is distributed throughout the Archaea and Bacteria, most of the associated genes are expected to be of ancient origin and clearly related to those characterized elsewhere. The obligate autotrophy suggests a biosynthetic rather than a degradative character. Oxygen respiration † Present addresses: Codex Bioinformatics Services, PO Box 90273, San Diego, California 92169, USA (G.D.); Department of Bioinformatics, SmithKline Beecham Pharmaceuticals, Collegeville, Philadelphia 19426, USA (P.V.W.)

NATURE | VOL 392 | 26 MARCH 1998

implies the presence of corresponding utilization and tolerance genes. The early divergence of the Aquificaceae inferred from ribosomal RNA sequences leads to several questions. Are the machineries for oxygen usage and tolerance homologous to those found in mitochondria and well studied organisms such as Escherichia coli, or were they invented separately? If there was far less oxygen when the lineage originated, is there evidence for use of alternative oxidants? Genome

General features of the A. aeolicus genome are listed in Box 1. We classified 1,512 open-reading frames (ORFs) into one of three categories, namely, identified (Table 1), hypothetical, or unknown. Identified ORFs were further classified into one of 57 cellular role categories adapted from Riley22 (Table 1). The relatively high G þ C content of the two 16S-23S-5S rRNA operons (65%) is characteristic of thermophilic bacterial rRNAs23. The genome is densely packed: most genes are apparently expressed in polycistronic operons and many convergently transcribed genes overlap slightly. Nonetheless, many genes that are functionally grouped within operons in other organisms, such as the tryptophan or histidine biosynthesis pathways, are found dispersed throughout the A. aeolicus genome or appear in novel operons. Even when they encode subunits of the same enzyme, the genes are often separated on the chromosome (for example, gltB and gltD, the genes encoding the large and small subunits of glutamate synthase). Operon organization of genes for the biosynthesis of amino acids is found in both Archaea and Bacteria but it is not universal in either group. A. aeolicus is extreme in that no two amino acid biosynthetic genes are found in the same operon. In contrast, genes required for electron transport, hydrogenase subunits, transport systems, ribosomal subunits, and flagella are often in functionally related operons in A. aeolicus (Fig. 1). No introns or inteins (protein splicing elements) were detected in the genome. A single extrachromosomal element (ECE) was identified during sequencing. Sequence redundancy for the total project was calculated to be 4.83. The ECE, however, is significantly over-represented

Nature © Macmillan Publishers Ltd 1998

353

articles relative to the chromosome; when calculated independently for the final assemblies, redundancies are 4.73 and 8.76 for the chromosome and for the ECE, respectively. The ECE therefore appears to be present at roughly twice the copy number of the chromosome. Although no ORFs on the ECE can be assigned a function with confidence, except for a transposase, two of the predicted proteins show similarity to hypothetical proteins in the Methanococcus jannaschii genome1. One ORF on the ECE is also present in two identical copies on the A. aeolicus chromosome, providing evidence of genetic exchange between the chromosome and the ECE. Reductive tricarboxylic acid cycle

As an autotroph, A. aeolicus obtains all necessary carbon by fixing CO2 from the environment. An assay for activity of the reductive tricarboxylic acid (TCA) cycle in A. pyrophilus cell extracts showed in vitro activities for each proposed reaction24. The reductive (reverse) TCA cycle fixes two molecules of CO2 to form acetylcoenzyme A (acetyl-CoA) and other biosynthetic intermediates25. The A. aeolicus genome contains genes encoding malate dehydrogenase, fumarate hydratase, fumarate reductase, succinate-CoA ligase, ferredoxin oxidoreductase, isocitrate dehydrogenase, aconitase and citrate synthase, which together could constitute the TCA pathway. There is no biochemical evidence for alternative carbonfixation pathways in A. pyrophilus24,25 nor is there sequence evidence for such pathways in A. aeolicus. The TCA cycle is vital as it provides the substrates of many biosynthetic pathways. (It is beyond the scope of this report to detail these biosynthetic pathways, but they seem to be typically bacterial, and candidate genes for all or most of the enzymes have been identified in A. aeolicus.) The central role of the TCA cycle is emphasized by duplication of many of its constituent genes in A. aeolicus. Two genes encode proteins that are similar to malate dehydrogenase (in addition to a lactate dehydrogenase). The fumarate hydratase is split into amino- and carboxy-terminal subunits, as is the case in M. jannaschii1. Unlinked genes encoding two iron– sulphur proteins of fumarate reductase (alternatively succinate dehydrogenase) accompany a single flavoprotein subunit. Two sets of genes resembling succinate-CoA ligase (both the a- and bsubunits) are present. A. aeolicus has two putative operons encoding four-subunit (a, b, g, d) 2-acid ferredoxin oxidoreductases; members of this family catalyze reversible carboxylation/decarboxylation of pyruvate, 2-isoketovalerate, or 2-oxoglutarate with varying specificity26. These duplicated genes may encode paralogous proteins with unique substrate specificity, as opposed to redundant functions. For example, a paralogue of succinate-CoA ligase may activate citrate with coenzyme A to form citryl-CoA, which citrate synthase can cleave to produce oxaloacetate and acetyl-CoA. Gluconeogenesis through the Embden–Meyerhof–Parnas pathway

Growing autotrophically, A. aeolicus must synthesize pentose and hexose monosaccharides from products of the reductive TCA cycle. Pyruvate produced by pyruvate ferredoxin oxidoreductase or by pyruvate carboxylase (oxaloacetate decarboxylase)24 may enter the Embden–Meyerhof–Parnas pathway of glycolysis and gluconeogenesis. Genes encoding fructose-1,6-bisphosphatase, an essential gluconeogenic enzyme in E. coli, have not been identified in the genomes of the autotrophs A. aeolicus or M. jannaschii1, suggesting that an unidentified pathway may exist. The A. aeolicus genome also encodes enzymes of the pentose-phosphate pathway and enzymes for glycogen synthesis and catabolism. We found neither (phospho) gluconate dehydrase nor 2-keto-3-deoxy-(6-phospho)gluconate aldolase of the Entner–Doudoroff pathway.

The enzymes for oxygen respiration are similar to those of other bacteria: ubiquinol cytochrome c oxidoreductase (bc1 complex), cytochrome c (three different genes) and cytochrome c oxidase (with two different subunit I genes and two different subunit II genes). The alternative system, with cytochrome bd ubiquinol oxidase, is also present. Clearly, the Aquifex lineage did not independently invent oxygen respiration. This leaves at least three possibilities: consistent with the ability of Aquifex to use very low levels of oxygen, the oxygen-respiration system was highly developed when oxygen had only a small fraction of its present concentration before the advent of oxygenic photosynthesis; contrary to what is implied by the 16S phylogeny, the lineage including Aquifex originated after the rise in atmospheric oxygen; or oxygen respiration developed once, and was then laterally transferred among bacterial lineages and acquired by Aquifex. Many other oxidoreductases are present in addition to those obviously involved in oxygen respiration. The physiological role of most of these oxidoreductases is unknown or ambiguous, but two deserve comment. There is a putative nitrate reductase in the genome, although A. aeolicus has not been observed to perform NO−3 respiration, unlike the closely related A. pyrophilus. The nitrate reductase gene is adjacent to a nitrate transporter, and may be involved in nitrogen assimilation rather than respiration. It is also possible that A. aeolicus has a latent ability to respire with nitrate but that the conditions required have not been found. Two gene sequences show strong similarities to Rieske proteins, even though the rest of the ubiquinol cytochrome c oxidoreductase subunits appear only once in the genome. One of these Rieske protein genes is adjacent to a sulphide dehydrogenase subunit, suggesting a role in sulphur respiration. Oxidative stress

A. aeolicus grows optimally under microaerophilic conditions and consequently possesses various protective enzymes to counter reactive oxygen species, particularly superoxide and peroxide. The genome contains three genes encoding superoxide dismutases, two of the copper/zinc family and one of the iron/manganese family. The latter has also been noted in A. pyrophilus27. One of the copper/ zinc superoxide dismutase genes is located in a large gene cluster encoding formate dehydrogenase. No catalase genes were identified. There are several genes in the genome that might encode proteins that catalyze the detoxification of H2O2, including cytochrome c peroxidase, thiol peroxidase, and two alkyl hydroperoxide reductase genes. All of these enzymes require an exogenous reductant and therefore do not evolve O2. However, treatment of A. pyrophilus9 or A. aeolicus biomass with H2O2 results in the rapid evolution of gas bubbles. This catalase activity may result from a novel enzyme that cannot yet be identified by sequence similarity. Motility

Like A. pyrophilus9, A. aeolicus is motile and possesses monopolar polytrichous flagella. More than 25 genes encoding proteins involved in flagellar structure and biosynthesis have been identified in A. aeolicus (Box 1). However, no homologues of the bacterial chemotaxis system were identified. In enteric bacteria, membranebound receptors bind chemoattractants and repellents and mod-

Figure 1 Linear map of the A. aeolicus circular chromosome. Genes are shown as arrows which denote the direction of transcription and are coloured to denote functional categorization according to the key below the figure. The sequences of the two rRNA gene clusters are identical. Here, the first base of the coding sequence of fusA was arbitrarily assigned as base number 1 as no origin of

Respiration

replication has been identified. ORF numbers are discontinuous because some

Aquifex species are able to grow by using oxygen concentrations as low as 7.5 p.p.m. (R.H. and K. O. Stetter, unpublished observations).

ORFs representing 100 amino acids or more are not predicted to be coding and

354

8

are not shown.

Nature © Macmillan Publishers Ltd 1998

NATURE | VOL 392 | 26 MARCH 1998

Q

articles ulate the activity of the histidine kinase CheA28. Phosphoryl groups from CheA are transferred to CheY, which then binds to the flagellar switch, altering the direction of flagellar rotation. Homologous chemotaxis systems are present in the archaea Halobacterium salinarum29 and Pyrococcus sp. OT3 (H. Sizuya, personal communication), although the bacterial and archaeal flagellar apparatuses are not homologous30. The M. jannaschii genome also lacks homologues of known genes required for chemotaxis. Thus, either motility in A. aeolicus and M. jannaschii is undirected or input for controlling taxis is mediated through another, unidentified system. The most studied chemotaxis systems respond to sugars and amino acids, although responses to other inputs (for example, metals, redox potential, and light) may also occur. In contrast to all the organisms known to possess the classical chemotactic signaltransduction pathways, both A. aeolicus and M. jannaschii are obligate chemoautotrophs. Chemoautotrophs may respond to a different set of factors, such as concentrations of dissolved gas (CO2, H2 or O2) or another critical parameter such as temperature. In E. coli, the flagellar switch is essential for flagellar structure and function and coupling of chemotaxis signals. But the A. aeolicus genome encodes homologues of only two of the three E. coli proteins that make up the switch, FliG and FliN. Biochemical31 and genetic32 studies implicate the missing FliM protein as the receptor for phosphorylated CheY, the switch signal. The absence of both FliM and CheY in A. aeolicus supports the identification of FliM as the receptor for phosphorylated CheY in E. coli. This result also argues against a direct role for FliM in torque generation.

Box 1 Aquifex aeolicus genome features General Length

1,551, 335 bp

G + C content

43.4%

Protein-coding regions

93%

Stable RNA 0.8% Non-coding repeats

8

(none significant)

Intergenic sequences

6.2%

RNA Ribosomal RNA

Chromosome coordinates

16S-23S-5S

572785-567770

16S-23S-5S

1192069-1197084

Transfer RNA 44 species (7 clusters, 28 single genes) Other RNAs

Chromosome coordinates

tmRNA

1153844-1153498

Chromosomal coding sequences 849 similar to protein of known function (average length 1,066 bp) 256 similar to protein of unknown function (average length 898 bp) 407 unknown coding regions (average length 762 bp) 1,512 total (average length 956) Extrachromosmal element (ECE) Length

39,456 bp

G + C content

36.4%

Protein-coding regions

53.5%

ECE-coding sequences 1 similar to proteins of known function (length 948 bp)

DNA replication and repair

4 similar to proteins of unknown function (average length 667 bp)

The A. aeolicus primary replicative DNA polymerase, corresponding to the DNA polymerase III holoenzyme in E. coli, probably consists

27 unknown coding regions (average length 648 bp)

Figure 2 Histogram representation of the similarity of selected classes of

150

predicted proteins to predicted proteins from the E. coli (EC) and M. jannaschii

100

(MJ) genomes. Predicted A. aeolicus proteins representing each category were

50

independently compared to sets of all potential polypeptides ($100 amino acids)

0

Functionally identified (848 Aquifex ORFs)

15

from the two genomes using FASTA44. If the top scoring alignment covered $80%

20

25

30

35

40

45

50

55

60

EC (avg id: 38%; count 656)

65

70

75

80

MJ (avg id: 35%; count 379)

of the length of the A. aeolicus protein, the score was plotted. There were more 20

proteins (those identified by database match but of unknown function) are very

15

similarly represented by M. jannaschii and E. coli. There are a small number of

10

very highly conserved hypotheticals that are shared between A. aeolicus and

5

M. jannaschii. Generally, biosynthetic categories show less discrimination than

0

information-processing categories, which are clearly more E. coli-like. The variation in the apparent rates of evolution in different categories suggests that different phylogenies may be inferred depending on the sequence analysed. Within each graph, correspondence to E. coli is shown in white and M. jannaschii is shown in black. Avg id, average identity; count, number of proteins analysed.

Number of proteins displaying similarity

positives found in the E. coli genome in nearly every category. Hypothetical

Amino acid biosynthesis (71 Aquifex ORFs)

15

20

25

30

35

40

45

50

55

60

EC (avg id: 38%; count 66)

65

70

75

80

MJ (avg id: 42%; count 58)

20

Translation (98 Aquifex ORFs)

15 10 5 0 15

20

25

30

35

40

45

50

55

60

EC (avg id: 44%; count 80)

65

70

75

80

MJ (avg id: 32%; count 46)

7 6 5 4 3 2 1 0

Transcription (31 Aquifex ORFs)

15

20

25

30

35

40

45

50

55

EC (avg id: 39%; count 27)

60

65

70

75

80

MJ (avg id: 32%; count 6)

35 30 25 20 15 10 5 0

Hypothetical (255 Aquifex ORFs)

15

20

25

30

35

40

45

50

55

EC (avg id: 32%; count 121)

60

65

70

75

80

MJ (avg id: 33%; count 115)

Per cent identity

NATURE | VOL 392 | 26 MARCH 1998

Nature © Macmillan Publishers Ltd 1998

355

articles of a core structure containing a- and e-subunits, a g-t-subunit and an additional member of the g-t/d9-family. A gene encoding a protein homologous to the b-sliding clamp was also found. This minimalistic complex lacks homologous u-, d-, x- and c-subunits, as does the Mycoplasma genitalium holoenzyme3. Translation of the 54K (relative molecular mass) g-t-ATPase subunit may proceed without a programmed frameshift to produce a protein similar to the N-terminal region of the E. coli g-subunit. DNA polymerase I is present as separate Klenow fragment and 59 → 39 exonuclease subunits, encoded by two non-adjacent ORFs. Although the repair polymerase, DNA polymerase II, has not been found in A. aeolicus, one ORF (Aq1422) encodes a protein similar to the eukaryotic DNA repair polymerase-b. A member of the same family has been identified in Thermus aquaticus33 and Bacillus subtilis. Transcriptional and translational apparatuses

The transcriptional apparatus of A. aeolicus is similar to that of E. coli and lacks any components specific to the Eukarya or Archaea (Fig. 2). In addition to the core RNA polymerase a-, b-, and b9subunits, four s-factors which determine promoter specificity are present (Table 1). Several different families of bacterial transcriptional regulators were also identified, including two-component systems. All of the ribosomal proteins and elongation factors common to other bacteria are present, indicating that all bacteriaspecific ribosomal proteins were present in the common ancestor of Aquifex and other bacteria. Also present are the four sel genes required for the cotranslational incorporation of selenocysteine. These latter genes are clustered in a 15-kilobase-pair segment that also encodes the biosynthetic and structural proteins for formate dehydrogenase, the only selenocysteine-containing protein identified. The gene that encodes selenocysteine transfer RNA, selC, is apparently cotranscribed with the genes encoding the formate dehydrogenase structural proteins. A. aeolicus lacks glutaminyl-tRNA and asparaginyl-tRNA synthetases. The genes required for transamidation of glutamyl-tRNAGln are present34. Charging of asparaginyl-tRNA is likely to proceed through the analogous reaction, as shown in halobacteria35, although the genes(s) for that transamidase are unknown. The canonical methionyl- and leucyl-tRNA synthetases have only been seen previously as single polypeptide enzymes; however, in A.

aeolicus the homologues appear fragmented into two subunits. In both cases, the genes that encode the N- and C-terminal portions are widely separated on the chromosome. No complete threedimensional structural data are available for either methionyl- or leucyl-aminoacyl tRNA synthetases, but the subunit organization in the A. aeolicus aminoacyl-tRNA synthetases may reflect domain organization in the homologous proteins. Thermophily

The A. aeolicus genome is the second completely sequenced genome of a hyperthermophile. By comparing the A. aeolicus and M. jannaschii genomes and contrasting them with the complete genomes of mesophiles, we can discover whether there are aspects of the genome or the encoded information that are diagnostic of hyperthermophiles. The G þ C content of the stable RNAs is clearly indicative of the high growth temperature of the organism. This property can be used to identify stable RNAs against the relatively low G þ C background of the A. aeolicus genome. The gene encoding tmRNA (or 10Sa RNA)36, an RNA involved in tagging polypeptides translated from incomplete messenger RNAs for degradation, was located in this way. Two genes for reverse gyrase are present in the genome. This is the only protein known to be present only in thermophiles. Other proteins, currently described as hypotheticals, may be diagnostic of hyperthermophiles but the data sets are not yet large enough to decide this with confidence. Although features of stabilization may not be apparent in any given protein37, a large enough data set may reveal general trends in amino-acid usage that are informative. Particularly important in this regard is inclusion of multiple genomes of hyperthermophiles so as not to allow the idiosyncracies of a single organism to bias the conclusions. As shown in Table 2, comparison of the amino-acid composition encoded by six genomes shows that use of individual amino acids can vary significantly from genome to genome. The data suggest trends that may be correlated with the thermostability of the encoded proteins. One apparent trend is that the hyperthemophile genomes encode higher levels of charged amino acids on average than mesophile genomes38, primarily at the expense of uncharged polar residues. Glutamine in particular seems to be significantly discriminated against in the hyperthermophiles. Although this observation might be rationalized on the basis of

Table 2 Comparison of relative amino acid compositions (in percentages) of mesophiles and thermophiles Mesophiles Amino acid A C D E F G H I K L M N P Q R S T V W Y

Thermophiles

H. influenzae

H. pylori

E. coli

Synechosystis

A. aeolicus

M. jannaschii

8.21 1.03 4.98 6.48 4.46 6.65 2.05 7.10 6.32 10.50 2.44 4.89 3.72 4.64 4.47 5.84 5.20 6.68 1.12 3.12

6.83 1.09 4.77 6.88 5.41 5.76 2.12 7.20 8.94 11.18 2.28 5.83 3.28 3.70 3.46 6.81 4.37 5.59 0.70 3.68

9.55 1.11 5.20 5.91 3.87 7.42 2.26 5.95 4.48 10.56 2.86 3.88 4.41 4.42 5.58 5.67 5.35 7.11 1.48 2.83

9.07 1.01 5.07 6.20 3.75 7.77 1.93 6.31 4.26 10.93 2.12 3.76 5.09 5.26 5.18 5.46 5.53 7.10 1.30 2.78

5.90 0.79 4.32 9.63 5.13 6.75 1.54 7.32 9.40 10.57 1.92 3.60 4.07 2.04 4.91 4.79 4.21 7.93 0.93 4.13

5.54 1.27 5.52 8.67 4.20 6.41 1.43 10.45 10.36 9.38 2.33 5.24 3.38 1.44 3.85 4.46 4.06 6.85 0.71 4.33

...................................................................................................................................................................................................................................................................................................................................................................

Charged residues (DEKRH) Polar/uncharged residues (GSTNQYC) Hydrophobic residues (LMIVWPAF)

Mesophiles

Thermophiles

24.11 31.15 44.74

29.84 26.79 43.36

...................................................................................................................................................................................................................................................................................................................................................................

356

Nature © Macmillan Publishers Ltd 1998

NATURE | VOL 392 | 26 MARCH 1998

8

articles an increased rate of deamidation of this residue at higher temperatures, aspargine does not appear subject to similar discrimination. Phylogeny

The placement of the Aquifex lineage as one of the earliest divergences in the eubacterial tree13,14 is interesting because of the insights it could provide into the ancestral eubacterial phenotype, including the hypothesized thermophilic nature of the first bacteria. Proteinbased phylogenies often do not support the original rRNA-based placement15,16,18. Thus, the availability of some 1,500 genes from an Aquifex species would seem to offer a definitive resolution of the phylogeny. However, our analyses of ribosomal proteins, aminoacyl-tRNA synthetases, and other proteins do not do so, showing no consistent picture of the organism’s phylogeny. We cannot make a more complete analysis and discussion here, but some observations can be made. These proteins do not yield a statistically significant placement of the Aquifex lineage or of other major eubacterial lineages. This situation partially reflects the inadequacy of some protein sequences as indicators of distant molecular genealogy because of their particular evolutionary dynamic, including the patterns and rates of amino-acid replacements. In some cases (such as the aminoacyl-tRNA synthetases for arginine, cysteine, histidine, proline and tyrosine), the analyses are further complicated by the presence of paralogous genes and/or apparent lateral gene transfers. It seems that a more extensive survey of genes and a better sampling of major eubacterial taxa will be required to confidently confirm or refute an early divergence of the Aquifex lineage.

gels; and second, dye-terminator (ABI Prism FS+) reactions using two pBluescript-specific primers. These reactions were analysed on 36-cm 5% Long-Ranger gels. The sequence fragments were assembled on an Apple Power Macintosh computer using Sequencher (Gene Codes, Ann Arbor, MI), an assembly and editing program. Assembly was typically performed in batches of roughly 200– 400 sequences, and was followed by inspection and editing of the assemblies. All sequences in the set were compared with all others through this process. After assembly, the sequences comprised ,750 contigs at the end of the random phase. Sequences were obtained from both ends of ,200 randomly chosen clones from a fosmid library42,43. These sequences were then assembled with consensus sequences derived from the contigs of random-phase sequences using Sequencher. Gaps between contigs were closed by direct sequencing on fosmids not wholly contained within a contig. The fosmid library thus served a purpose analogous to that of the l-scaffold in other projects1–4. The final eight gaps were closed by direct sequencing of polymerase chain reaction (PCR) products generated with the TaqPlus Long PCR System (Stratagene Cloning Systems, La Jolla, CA). Consequences of reducing the number of sequences in the random phase are the large number of gaps that remain to be closed in the directed phase, and the reduction in overall coverage. To ensure that reduced coverage did not compromise accuracy, ,200 oligonucleotide primers were synthesized to resequence regions of ambiguity identified by visual inspection of the entire assembly. 13,785 sequences, with an average edited read length of 557 base pairs, constitute the final assembly. On the basis of a relatively small number of errors identified during the annotation process, we estimate the error frequency to be ,0.01%, comparable to other published genomic sequence estimates.

8

Gene (ORF + RNA) identification and functional assignment approaches.

Conclusions

Advances in sequencing techniques have allowed us to move beyond studies of single genes to studies of complete genomes only recently2. This rapid advance has created the opportunity to begin to characterize an organism with the full knowledge of the genome in hand. The complete genome summarized in this report represents our first view of A. aeolicus. The challenge now is to ask specific questions in ways which take advantage of the whole-genome data. Beyond studies of any single organism in isolation, complete genomes allow comprehensive comparisons between organisms. For instance, comparisons of the similarity of genes can be made that reveal that genes in different categories vary in their relative conservation (Fig. 2). In addition, genome-wide trends are apparent. For example, why is there not more of a tendency to group functionally related genes (for example, biosynthetic pathways) into operons in A. aeolicus? This was also seen in the genome sequence of the autotroph M. jannaschii1. Is this because the autotrophic lifestyle decreases the need for selective regulation? There also seem to be a few multifunctional, fused proteins in A. aeolicus and M. jannaschii. Although this seems unlikely to be related to autotrophy, it might be associated with extreme thermophily. The large number of diverse genome sequences that will become available in the coming years will allow more detailed correlation of global genomic properties M with particular physiologies. .........................................................................................................................

Methods

Sequencing strategy. The sequencing strategy used to assemble the complete genome was based on the whole genome random (or ‘shotgun’) approach, which has been successfully used for other genomes of similar size1–4. Shotgun sequencing projects are characterized by two phases: an initial completely random phase in which the bulk of the data is collected, followed by a closure phase where directed techniques are used to close gaps and complete the assembly. By pursuing a strategy where only 97% coverage was initially achieved, we were able to limit the number of sequences needed for the random phase to only 10,500 (ref. 39). Sequences were generated from a small insert library constructed in l ZAP II vectors40,41 (average insert length 2.9 kilobase pairs). Two different methods were used for sequencing: first, dye-primer M13-21 and M13 reverse primer ABI Prism CS+ ready reaction kits, analysed on 48-cm 4% polyacrylamide NATURE | VOL 392 | 26 MARCH 1998

Coding regions of the A. aeolicus genome were analysed and assigned using primarily the programs BLASTP44 and FASTA45 to search against a nonredundant protein database. Many analyses were carried out within the context of MAGPIE46,47, an integrated computing environment for genome analysis. The results of these analyses are available for user interpretation, validation, and categorization. Additional ORFs were identified and start sites refined using the program CRITICA (J. H. Badger and G.J.O., unpublished program). Finally, all presumed ‘intergenic regions’ were examined with BLASTX for similarities to known protein sequences48. Transfer RNA genes were identified with the program tRNAscan-SE49. Received 26 August 1997; accepted 3 February 1998. 1. Bult, C. et al. Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii. Science 273, 1058–1073 (1996). 2. Fleischmann, R. D. et al. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269, 496–511 (1995). 3. Fraser, C. M. et al. The minimal gene complement of Mycoplasma genitalium. Science 270, 397–403 (1995). 4. Tomb, J.-F. et al. The complete genome sequence of the gastric pathogen Helicobacter pylori. Nature 388, 539–547 (1997). 5. Himmelreich, R. et al. Complete sequence analysis of the genome of the bacterium Mycoplasma pneumoniae. Nucleic Acids Res. 24, 4420–4449 (1996). 6. Kaneko, T. et al. Sequence analysis of the genome of the unicellular cyanobacterium Synechocystis sp. strain PCC7803. II. Sequence determination of the entire genome and assignment of potential protein-coding regions. DNA Res. 3, 109–136 (1996). 7. Blattner, F. R. et al. The complete genome sequence of Escherichia coli K-12. Science 277, 1453–1462 (1997). 8. Goffeau, A. et al. Life with 6000 genes. Science 274, 546 (1996). 9. Huber, R. et al. Aquifex pyrophilus gen. nov. sp. nov. represents a novel group of marine hyperthermophilic hydrogen oxidizing bacteria. Arch. Micrtobiol. 15, 340–351 (1992). 10. Reysenbach, L., Wickham, G. S. & Pace, N. R. Phylogenetic analysis of the hyperthermophilic pink filament community in Octopus Spring, Yellowstone National Park. Appl. Environ. Microbiol. 60, 2113–2119 (1994). 11. Setchell, W. A. The upper temperature limits of life. Science 17, 934–937 (1903). 12. Brock, T. D. The road to Yellowstone—and beyond. Annu. Rev. Microbiol. 49, 1–28 (1995). 13. Burggraf, S., Olsen, G. J., Stetter, K. O. & Woese, C. R. A phylogenetic analysis of Aquifex pyrophilus. Syst. Appl. Microbiol. 15, 353–356 (1992). 14. Pitulle, C. et al. Phylogenetic position of the genus Hydrogenobacter. Int. J. Syst. Bacteriol. 44, 620–626 (1994). 15. Baldauf, S. L., Palmer, J. D. & Doolittle, W. F. The root of the universal tree and the origin of eukaryotes based on elongation factor phylogeny. Proc. Natl Acad. Sci. USA 93, 7749–7754 (1996). 16. Klenk, H.-P., Palm, P. & Zillig, W. in Molecular Biology of the Archaea (eds Pfeifer, F., Palm, P. & Scleeifer, K. H.) 139–147 (Vch Pub, 1994). 17. Bocchetta, M. et al. Arrangement and nucleotide sequence of the gene (fus) encoding elongation factor G (EF-G) from the hyperthermophilic bacterium Aquifex pyrophilus: phylogenetic depth of hyperthermophilic bacteria inferred from analysis of the EF-G/fus sequences. J. Mol. Evol. 41, 803–812 (1995). 18. Wetmur, J. G. et al. Cloning, sequencing, and expression of RecA proteins from three distantly related thermophilic eubacteria. J. Biol. Chem. 269, 25928–25935 (1994). 19. Kawasumi, T., Igarashi, Y., Kodama, T. & Minoda, Y. Hydrogenobacter thermophilus gen. nov., sp. nov.

Nature © Macmillan Publishers Ltd 1998

357

articles an extremely thermophilic, aerobic, hydrogen-oxidizing bacterium. Int. J. Syst. Bacteriol. 34, 5–10 (1984). 20. Kristjannson, J., Ingason, A. & Alfredsson, G. A. Isolation of thermophilic obligately autotrophic hydrogen-oxidizing bacteria, similar to Hydrogenobacter thermophilus, from Icelandic hotsprings. Arch. Microbiol. 140, 321–325 (1985). 21. Kryukov, V. R., Savel’eva, N. D. & Pusheva, M. A. Calderobacterium hydrogenophilum gen. nov., sp. nov. an extreme thermophilic bacterium and its hydrogenase activity. Microbiology (Engl. Trans. Mikrobiologiya) 52, 611–618 (1983). 22. Riley, M. Functions of the gene products of Escherichia coli. Microbiol. Rev. 57, 862–952 (1993). 23. Weisburg, W. G., Giovannoni, S. J. & Woese, C. R. The Deinococcus-Thermus phylum and the effect of rRNA composition on phylogenetic tree construction. Syst. Appl. Microbiol. 11, 128–134 (1989). 24. Beh, M., Strauss, G., Huber, R., Stetter, K. O. & Fuchs, G. Enzymes of the reductive citric acid cycle in the autotrophic eubacterium Aquifex pyrophilus and in the archaebacterium Thermoproteus neutrophilus. Arch. Microbiol. 160, 306–311 (1993). 25. Fuchs, G. in Autotrophic Bacteria (eds Schegel, H. G & Bowein, B.) 365–382 (Springer, New York, 1987). 26. Mai, X. & Adams, M. W. Characterization of a fourth type of 2-keto acid-oxidizing enzyme from a hyperthermophilic archaeon: 2-ketoglutarate ferredoxin oxidoreductase from Thermococcus litoralis. J. Bacteriol. 178, 5890–5896 (1996). 27. Lim, J. H. et al. Cloning and expression of superoxide dismutase from Aquifex pyrophilus, a hyperthermophilic bacterium. FEBS Lett. 406, 142–146 (1997). 28. Bourret, R. B., Borkovich, K. A. & Simon, M. I. Signal transduction pathways involving protein phosphorylation in prokaryotes. Annu. Rev. Biochem. 60, 401–441 (1991). 29. Rudolph, J., Tolliday, N., Schmitt, C., Schuster, S. C. & Oesterhelt, D. Phosphorylation in halobacterial signal transduction. EMBO J. 14, 4249–4257 (1995). 30. Jarrell, K. F., Bayley, D. P. & Kostyukova, A. S. The archaeal flagellum: a unique motility structure. J. Bacteriol. 178, 5057–5064 (1996). 31. Welch, M., Oosawa, K., Aizawa, S. I. & Eisenbach, M. Effects of phosphorylation, Mg2+, and conformation of the chemotaxis protein CheY on its binding to the flagellar switch protein FliM. Biochemistry 33, 10470–10467 (1994). 32. Sockett, H., Yamaguchi, S., Kihara, M., Irikura, V. M. & Macnab, R. M. Molecular analysis of the flagellar switch protein FliM of Salmonella typhimurium. J. Bacteriol. 174, 793–806 (1992). 33. Motoshima, H. et al. Molecular cloning and nucleotide sequence of the aminopeptidase T gene of Thermus aquaticus YT-1 and its high-level expression in Escherichia coli. Agric. Biol. Chem. 54, 2385– 2392 (1990). 34. Curnow, A. W. et al. Glu-tRNAGln amidotransferase: a novel heterotrimeric enzyme required for correct decoding of glutamine codons during translation. Proc. Natl Acad. Sci. USA 94, 11819–11826 (1997). 35. Curnow, A. W., Ibba, M. & So¨ll, D. tRNA-dependent asparagine formation. Nature 382, 589–590 (1996).

358

36. Tu, G. F., Reid, G. E., Zhang, J. G., Moritz, R. L. & Simpson, R. J. C-terminal extension of truncated proteins in Escherichia coli with a 10Sa decapeptide. J. Biol. Chem. 270, 9322–9326 (1995). 37. Bo¨hm, G. & Jaenicke, R. Relevance of sequence statistics for the properties of extremophilic proteins. Int. J. Pept. Protein Res. 43, 97–106 (1994). 38. Choi, I.-G. et al. Random sequence analysis of genomic DNA of a hyperthermophile: Aquifex pyrophilus. Extremophiles 1, 125–134 (1997). 39. Lander, E. S. & Waterman, M. S. Genomic mapping by fingerprinting random clones: a mathematical analysis. Genomics 2, 231–239 (1988). 40. Short, J. M., Fernandez, J. M., Sorge, J. A. & Huse, W. D. Lambda ZAP: a bacteriophage lambda expression vector with in vivo excision properties. Nucleic Acids Res. 16, 7583–7600 (1988). 41. Alting-Mees, M. A. & Short, J. M. pBluescript II: gene mapping vectors. Nucleic Acids Res. 17, 9494 (1989). 42. Shizuya, H. et al. Cloning and stable maintenance of 300-kilobase-pair fragments of human DNA in Escherichia coli using an F-factor-based vector. Proc. Natl Acad. Sci. USA 89, 8794–8797 (1992). 43. Kim, U.-J., Shizuya, H., de Jong, P. J., Birren, B. & Simon, M. I. Stable propagation of cosmid sized human DNA inserts in an F factor based vector. Nucleic Acids Res. 20, 1083–1085 (1992). 44. Altschul, S. F., Fish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990). 45. Pearson, W. R. & Lipman, D. J. Improved tools for biological sequence comparison. Proc. Natl Acad. Sci. USA 85, 2444–2448 (1988). 46. Gaasterland, T. & Sensen, C. W. MAGPIE: automated genome interpretation. Trends Genet. 12, 76–78 (1996). 47. Gaasterland, T. & Sensen, C. W. Fully automated genome analysis that reflects user needs and preferences. A detailed introduction to the MAGPIE system architecture. Biochimie 78, 302–310 (1996). 48. Gish, W. & States, D. J. Identification of protein coding regions by database similarity search. Nature Genet. 3, 266–272 (1993). 49. Lowe, T. M. & Eddy, S. R. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 25, 955–964 (1997). Acknowledgements. This work was supported in part by Department of Energy Microbial Genome Program grants (to R.V.S., C. R. Woese and G.J.O.). We thank C. Woese for his cooperation in the analysis of the genome and interest in the project; K. Stetter for continuing interest; G. Frey, J. Holaska, S. Peralta, D. Hafenbrandl, S. Delk, T. Robinson, and J. Arnett for technical assistance; and D. Robertson, J. Stein, I. Sanyal, T. Richardson, G. Hauska, and K. Williams for discussions. Correspondence should be addressed to R.V.S. (e-mail: [email protected]). Requests for Aquifex aeolicus should be addressed to R.H. (e-mail: [email protected]). The sequences have been deposited with GenBank and assigned accession numbers AE000657 (chromosome) and AE000667 (extrachromosomal element).

Nature © Macmillan Publishers Ltd 1998

NATURE | VOL 392 | 26 MARCH 1998

8

1

439

240

657 topA

866 867a nuoL2 rpsU

600001

1473

1372 argH

1707 ribC

1200001

1708 pfkA

1917

1350001

1919 era2

2129 sbf

1500001

2023 guaB

1425001

2027

2028

2131 fmt

1920 fliP

1277

980000

1379 nuoL3

766

768

1182 fliF

1281 murA

1085

985

1382 nuoM2

1283 hspC

1183 1184 flgC flgB

986 987

1355000

1816 ksgA

1505000

2032 infB

1430000

1818 purU

2035

2141 glyS

2036

881

1185

2038 atpD

1930a rpmG

992 trpS

1186

1489 trmD

1060000

1825

1360000

2042 rplI

2142 ppsA

1510000

2041 atpG1

1435000

1093

994 pncA

2145

2044

1829

rRNA

782 hisD

615000

840000

999 fadD

1096 mtfA

765000

690000

890

1725 lepA

1622 sucD2

1939 rpoB

1293 alaS

1499 sodA

1394 lig

1727

1440000

1365000

1836 purL

1290000

1215000

2147 pal

1629 nfo

1398 leuD

1295

2151 pilU

tRNA

Translation Replication and Repair Transcription Unknown Uncategorized

2147a 2150 gatC recA

2049 2051 flgG1

788

695000

2153 2154 cmk pgsA

920000

995000

2157

1843

2054

2158 pcnB2

Gln

1507 omt

1305 pyrDB

1737 cfa

1638 lplA

1508

1403

700000

905 hksP3

625000

796

550000

1300000

1739 lytB

1225000

1150000

1509 oppC

1075000

1404

1000000

2159

1375000

2160 2163 2164 abcT9 uraP

1525000

2057 helX

1450000

1953 1954 1955 rplS rnhB

2056

1952

373

1210

1110

2165

5 kb

1211

1111

2166 scbA

1958

2060

702 merR

480000

805

705000

780000

855000

1517 pycB

1217 minD1

1117 ntrC1

1155000

1080000

1410 trpB2

1005000

1119

1019 hypE

916 dapB

808

705 truB

1412

1220

1121

1022

1526

1530000

2066 2067

1856 uvrB

2175

2068 argB Pro

2170 2171 2172 21732174 ihfB

2064 gltD

1963 purH

1747 murE

ece001

ece003ece004 ece005

Extrachromosomal Element:

2063

1455000

1380000

1855 dnaX

1305000

5000

ece007

1858

2071

635000

935000

ece009 int

1531 abcT10

1420

Met

1026 gyrB

922

815 dfp

1224

1533

2075 murD

1968 hisIE

ece010

ece011

2186 sqr

1226

490000

940000

1229 mffT2

865000

1390000

1315000

1761

1240000

1539 fliN

1033 selB

2190 coxB

ece017

1540000

2082

1465000

1869

ece018

2191 coxA1

2084 hisC

1543 rfaC1

420000

345000

397

495000

720000

933

645000

570000

718 mpg

1138 rpiB

1331 czcB2 1020000

945000

1234 dmsA

870000

1137

795000

935

200

1765 tktA

ece019

2192 coxA2

1095000

1395000

1983 imp2

ece020

2196

621

1332

1669

1873

2197 2199 mopB

2093 dsbC

200000

275000

1046 fdoH

1141

406

834 flgG2

1142

800000

875000

20000

1400000

ece025

2204 ymxG

1550000

2098

1475000

2203

1772 envA

1558

1551335

1677 aspS

1105000

1180000

1563 abcT13

1996

ece026 ece027 ece028 ece029 ece031

2104 acs

1998 fliC

957

ece032

2106 ssf

2000

1891

510000

660000

847

585000

1894 1410000

ece035

2107

1485000

1257 metG

1156 hoxX

1065 gap

1453

ece037

2109 hemB

2009

35000

965000

1259

890000

815000

740000

1358 cydB

1190000

1459 npr

1359

1695

540

858 ahpC2

857 purN

1462

1360 murC

1263

1070

2115 rfaG

1362 adh1

1908 hflX

1907 umpS

2117 2118 2119 purC thiL

2015 hemG

1796

16S1

1120000

1365

1267

752

1420000

1909

2120 hlyC

1910

2122 abcT11

1495000

1799 rhdA

2018 pstC

23S1

1345000

1798 cphA1

1270000

2016 pstS

1797

437 nuoG

5S1

2124 hemN

2019 2020 pstA

1800 1802

1367 spsI

1595 fliI

1470 accC2

2021 tldD

1912

75000

150000

438

525000

675000

600000

975000

1271 mpp

900000

1173

825000

1080

1125000

1350000

1806 1807

1275000

1706 1707 ribC

1200000

1598

2128

1500000

2023 guaB

1425000

1916 1917 trxA1

1705 galF

1596

1472 dnaB

1050000

1368 1369 tagD2

1270

1172 carB

1079

750000

974 975 cphA2 bioB

866 nuoL2

375000

450000

756 757

657 topA

300000

344 rfaD

225000

238 239 sodC2 folE

118 pgk

551 nuoD1

755

2126 uvrC

1803 soxB

1704

1366 thiE1

1268 hvsT Ser

Leu

973 secD

754

655 frdB2

549 trpG

343 arsA

236 guaA

116

342

1078

653 fliG

548 dieN

341

1169 1171a forG2 fdx2

1076 rhdA

972

863

1168 forB2

1075

1468 folP

970000 Met

1045000

1363 1364 accB efp

1265 cstA

895000

1167 forA2

820000

1073 czcD

750 pgi

652 nlpD1

547 dapE

436 ribD2

Gly

235 fccB’

114

1586 1587 1588 1590 1591 1593 atpF1 atpF2 atpH ndk shyS Ile Ala Arg 1195000

1585

1795

39456

1072

1166

745000

970 argJ

670000

862

1464 14651466 1467

1264

1163

1071 proA

520000

595000

749 truA

968 969 cbbE2 tmk

748 ispA

651

113

234 soxF

434 glpK

545

370000

445000

648

544 hpt

647

433 grpE

295000

340

220000

232 dhsU

430

338

70000

145000

112 amtB

542 bcpC Arg

429

337 cysQ

747 pilC1

967 ostA

855

1794

2014 flhB

1905 serA

1793

2013

ece040

1490000

1415000

1903 gcsP1

1340000

1792 ntrC2

1265000

2114 eif

2011

1069 galE

1159 topG1

1262

966

745 pilT

645

336

427 428

539

111 glnA

230 231 ntrC3 hksP4 Gly Cys

1579 1580 1581 1582 1583 iagB pyrF gidB

1115000

1458 gcvT

1040000

1068 cysS

854

742 purD

644

538 lpxD

425 thrC2

334 dcuP

106 108a 108b 109 fdx4 hfq glnB

Leu

1691 1693 1694 dplF

1791

ece039

665000

853

1357 cydA

1578 mutL

1900 1902

2113

515000

590000

965 mbhS2

1455 1457

1689

1790

1899 dmt

ece038

2110 acuC2

1898 folD

1789 mviB

1687

1258

1356

740

1157 hypD

365000

440000

534 draG

642

852 pdxA

963

1067 miaA

851 corA

331

227 aldH2 215000

290000

421 422 423 aspC3 metG’

327

104

65000

140000

328

103

226a rplO

533

640 thy

739 nifS2

961 hdrD

850

532

418 419 420 gspG

1570 1571 1574 1575

2007 rpsB

1896

1787 trpC

1569

1452 rpoS

101 hly3

326 kdtA

221 phpA

1351 1354 1355 phoH lipA

1155 hisS2

1062 emrB

1256

1450 htrA

2108

2004 2005

1893 ilvE

1335000

1260000

1684 alg

1185000

1110000

1449 mutT

1035000

849

737

638 lysR1

531

417 abcT6

325 murI

098 099 secG hemK

960 mbhL2

416

324

220 dut

097

736 lpdA

848

1350

960000

885000

529 oprC

635

1154 metK

810000

1060

735000

959 nadE

1784 aco

2002 fliS

ece034

30000

2001 fliD

360000

435000

323

218 nifA 210000

285000

415 napA1

1255 feoB

1349

1152 lysC

1447

1565 gltB

1782 mdh1

1682

1890 tsnR

1780 fumB

1446

845 mreB

958 pgsA

1345 1347 1348

1250

633 ligA

60000

135000

732 734 735 hisH rpsG2 rpsL1

1059

1253 1254

1057 1058 acrR1

955 lepB

844 spoT

632

322 dnaA

094 nrdA

527 528 moaC dedF

414

217 narB

321

731 ccdA

413 abcT7

525 ftsZ

1151

1889 nadA

1681 amiB

1777 infC

1888 sucD1

1480000

1997

1405000

1887 mesJ

1330000

1776

1255000

1249 cds

1056

953 pheS

1342 1343 gph

1445 ctrA2

1679 1680 fumX

1444

1030000

1340 tig

955000 Leu

1247

952 rplT Val

843

728 speC

523 ftsA

319 phoB

093 hth

729

215 nasA

091

316 hksP2

411 pcnB1

314

1248

1055 pstB 805000

880000

1148

1054

730000

951 pheA

655000

841 trm1

580000

727 ldhA

505000

629 xcpC

430000

522

355000

280000

313

205000

211 212 213 fhp cynS glnBi

2102 2103 prmA acs’

1994 era1

1886 sbcD

1774 glnE

25000

2101 carB

1339 clpP

1562

1244

1145 dhaT

950

55000

130000

086 087 088 090 modC

521 ddlA

1245 ilvC

837 ilvD

948 hemH

1053 nifS1

1993

1884

1560 mglA1

410 carA

311

209 trpD2

085 kdsA

520 murB1

724 ctrA1

1337 clpX

1442

1144 pabB

1052 gcsH1

1991 htpX

1882 dnaN

1990 pgmA

836

628

519 rfe

207 cobA

083

409 pyrB

946 rnc Sel

627

1242 mutS2

1336

1773

2099 pepA

408

518 spsK

945 glyQ

1143 dapA

1672 clpB

1556 cysM

1335 nse

1879 argC

1325000

1771

1250000

1175000

1989

2095 2096 dksAfloX

2200 mopA

2094 aspC2

Pro

1671 hslV

1100000

1441 oppA

1875 1877 1878 rplM rpsI

1770 leuS’

1670

950000

1025000

1554

1334 pyrG

082

308 mutS1

835 nox

1049 1050 1051 fdoI sodC1 fdhE

725000

723 malM

626 bioF

516 mtfC

407

206 nirB

081 aroC

943 944 gcsH

650000

833 flgA

575000

722 glgB

500000

425000

515 mtfB

350000

404

624 mrcA

832 surE

1548 1549 1550 1552 trpA cycB2 fabI

1769 dld1

204

305 ileS

50000

125000

079 secY

1237 1238 1239 1240 1241 cysG adh2

1986 1987 1988 tolQ

1668

940 leuC

1140 argG

1439

403

078 kad

831 exsB

721 glgA

622

1333

1039 fdoG

1236

1139 ftsW

936 ftsH

16S2

720

1547 ppa

1985

401 402

203 atpG2

ece021ece022 ece023 ece024

2195 bacA

1545000

2090 leuA2

1470000

1982

1871 pmbA

1767 1768 rpsT 1320000

2194

1546 kpsF

1667 thrS His

1170000

1245000

1545

202

303

510 511 512 murB2

400 hdrB

302

620 abcT2

Ala Ile

619 yfeA

509

398 hdrC

301 glmS

199

072 073 074 075a 076 rpsD rpsKrpsM infA map

1429 1432 1433 1434 1435 1436 metF ppdD1fimZ ppdD2ppdD3

1330 gltP

1134 proB

2085 2086 2087 2088

15000

195000

270000

1036 1037 1038 lysR2

1980

1870 secA

197

23S2

932 dnaQ

1666

1428

1329 moeB

1232 dmsB

1665 mdh2 Lys Arg

1542

1763

1664 accC1

1540 gcpE

1426 1427 lpxB

1231 dmsC

1328

1035

507

395 hdrA

Ser 931

1133

5S2

929 napA2

821 murF

717 glgP

613 deaD

506

394

298 serS

070 rpoA

45000

120000

067 069 dmsB rplQ

615 616 618 pfpI

196 trpD1

066

297 abcT8

194

392

296

1132 czcB1

928

1424 hemF

1866 asd

1663 flgL

1165000

1536 aroA

1090000

1423 pdxJ

1015000

820

1230

1031 selA

926

716

1130 sufI

790000

715000

924 rnpH

640000

819 lnt

565000

609

505 otnA

391

192 hslU

1973 1974 1975 1977 1978 1979 masA fucA2 panB

ece013 ece015

2189

2080 sppA

1971 tapB

1863 kch

1662 flgK

1535 pepQ

1422 dpbF

1760

2188 coxC

10000

2077 snf

1969 aspC1

1862

1757 1758 exbB

2076 trpF

1861

1755 tyrA

1661 spoU

1421

1030 selD

294

389 390 340000

415000

503 504 otnA’

608 thrC1

11271129

186 aldH1

293

190000

265000

1323 1324 1326 1327 mobB

1029

715 tsf

818

923 argS

713 pyrH

388

291

40000

115000

062 063 06464a 64b 64c 065 speE rpsFssb rpsR

185 tagD1

061 mog

501 pmu

184

059

607 glmU

1534

1125 ctrA3

816 gsa

712 frr

1322 nuoN1

1657 1658 1659 1660 gcsH2 fucA1 bioW

1530

500 trxB

604 lpxA

387

058

288

183

710 711 nucI

603

499

1321 nuoM1

2179 2181 2183 2185 acrR2 moaE moaA2

1535000

2073

182

286 287 smb

1222 1223 abcT1

1860 fliL 1385000

1460000

1967 polA

1859 flgE

1310000

1754

1235000

1656

1160000

1528

1085000

1010000

1418 1419 furR2

1122 acrD2

785000

860000

1320 nuoL1

920 ftsY

1024

710000

1221 gltX

1023 acuC1

708 hyuA

601

498 gnd

386

284

181 hisF

055 056 057 hemX1 fabZ sfsA

813 814 acpS

560000

485000

600

410000

1752 1753 nlpD2

1965 1966

2177 2178 aroK thiG

2069 obg

1964

1857

1751 tyrS

335000

385

497 gsdA

812

1527

1413 valS

rplO rpmD rpsE rplR rplF rpsH rpsN rplE rplX rplN Ser 1230000

1744 gpmA

1961 1962 fliR fliQ

1853 1854

1

1525

185000

260000

918 919 919a fdx1

811

707

599 rpoN

1315 1317 1318 1319 nuoH1 nuoI1 nuoJ1 nuoK1

1120

1021 hypA

917

809

706 trpB1

598

384

496 mutY3

054

35000

110000

177 178 179 atpB atpE

053 mraY

282 283 281 acrR3 mutY1

383

175

052 ubiA

494 495

382

278

597 purB

493

381

1218 1219 fliA

1520 pycA

1314 nuoD2

930000

915

806 pyrC

703 dnaJ2

492

172 173 mutY2

10171018

914

630000

913

1214 flhF

1408 1409

594 frdA

555000

1015 col

1115 hksP1

911 ebs

490 fnr

330000

405000

277

380

171 180000

255000

050

tmRNA 1642 1644 1645 1648 1649 16511651a 1652 1653 1654 1655

1743

1641 cap

1515 1516

2168

701

30000

105000

046 049 pyrD phhB

804 mbhL3

592

1310 1312 nuoA2 nuoB

1212 flhA

1013

910 dnaC

1959 1960 thiD Val Glu

1742 lgtF

1640

1512 icd

1407 bcsA

1309 thrB

378

276

170 bioA

045 petA

486 488 ahpC1 tpx

377

802 mbhS3

1113

1010 1012 udh

1308 tgt

2058 2059 aas

Asp

1852

1740 1741

1639 glpC

1510

1956

Pro

699

Ile

274

376

591

484 eno

801 gcp

044 petB

168 169

906 908 909 phoU gmhA

797 prc

698 acrE

589 xanB

167

273 aspC4

374

166 proC

041 042 cyc

272

481 oppB

1405

1306 sucC1

925000

1208 lysA

850000

1109 gcsP2

775000

1008 dnaE

1108 gcdH

1206 1207 accA furR1

1106

475000

587 neaC

400000

479 glyA

325000

372

250000

165

040

271

175000

164 ntrC4

695 acrD1

270 lgt

25000

100000

039 hisB

1845 1847 1849 1850 1851 ilvH

1735 dnaJ1

1636 prs

1844

1950 aprV

1732

1634 gspA

1505 nrdF

793 rep

903 904

585

1303a 1303 cspC hisA

1402

1204 thiC

1105 purQ

901 aroE

162 folK

038 hibD

269

478

037

268

791 792 792a cycB1 rpmI

1300 omp

1104

1006

1401 celY

1103

476 477 panD

693

899 900

370

267

371

036

Cofactor Biosynthesis Cell Envelope Proteases Lipid metabolism Energy Metabolism Regulation Purines, Pyrimidines, Nucleotides and Nucleosides

2155 recJ

1520000

1445000

1370000

1842

1295000

1731

1220000

1632

1145000

1504 trk1

1070000

1400

1503

1840 prfB

2053 recG

1945 rpoC

1730 pheT

1630

770000

845000

1101 plsX

898

789

692 kpsU

582 trpE

474

035

156 157 158 159 apfA

265

369

1202 lysU

896 nifU

Arg 620000

1201

1296 clpC

Leu

1631

1099 fabH

1200 forG1

1502

1196 forB1

1097 abcT5

1501

895 ispB

545000

691 gidA2

470000

581

395000

1001 1002 1003 1005 motB2 motB1 motA cutA

893

689

578 579 def

320000

245000

264

170000

155

95000

032

20000

473

263 hemC

154 ctaB Ser

031 trnS

894 queA

577

786 acrD4

576 stpK

367

153 ctaA

469 acrD3

262

030 moeA1

1837 1838 1839 lsp dapF exbD

1625 1627 1628 pol

1140000

1496 1498 xylR

1065000

1392

2046 vacB 1515000 LysAla

1494 1495 ogt

915000

990000

1195 forA1

892 fabD

686 uvrA

574 575 nuoE

152 tlpA

028

260

150 gltA

365 proS

468 czcB2

891 ppx

685 arsC

465000

573 nuoF

540000

684

1292

1391

2146

467

315000

390000

259 nusA

148 deoC

027

1191 1192a 1192 1194 fdx3

1832 1834 rpsG1rpsL1

2045 folC

1724 degT

1620 sucC2

1493 dnaG

1290 purA

240000

257 ygcA

165000

147 cobW

90000

15000

363 364

780

888

1095 abcT3

996 dnaK

778

682 arsA

572

465

362

145 rfaC2

026

Central Intermediary Metabolism Amino Acid Biosynthesis Cellular Processes Transport Hypothetical

2144

2043 hemX2

1828

1723

1618

1827 alr

1490 rpoD

1388

1288 gspD

1189 pbpA1

1094 abcT4

568 deoD

461 gatB

1390 fba

777 nadB

886 topG2

142

025 rodA

254 255

360 timA

141 pkcI

679 atpA

1187 1188

993

1936 1937 rplJ rplL

1720 ffh

1285000

1719

1210000

1614 oadA

1135000

460

359

566 567 birA rimI

678

775

1385 1386 1387 nuoA1 arsR

985000

1287

910000

835000

1091 hylA

760000

676

139 ribF

024 nsd

252 253 kdtB

458 459 bcp

251

565 rfaE

358 dinG

1931 1933 1935 nusG rplK rplA

1824

1718 maf

1613 hisG

1485 rpsA

610000

685000

1286

990

1383 nuoN2

1285 pilC2

460000

535000

673 674 atpC

563

385000

455 sor

310000

235000

138 ribD1

023 argD

250

160000

136 cpx

85000

770 771 773 tly

454

1088

1822 1823 mglA2 Trp

1717 fabF

1612

1928 tufA2

1820 nfeD ThrTyrGlyThr

1716 1717a fabG acpP

1609 1610 modA radC

247 gatA

357a flgM

135 nueM

988

769 purM

672 hypF

Thr

10000

020 021 022 rpsQ aroD

356 leuA1

246

134

453

877 878 880 minD2 minC

1714 flgH

1925 hyuB

1713 flgI

1205000

1280000

1924

765

561 recN

355

245 purK

132 133 ribH nusB

1480 1481 1482 1483 1484 1484a himA

1055000

1130000

1606 1607 pabC dcd

1478 recR

1711 argF

2031

1923

451 ilvB

557 558 bioD thiE2

983

1279 hemA

905000

1180 sahH

830000

1084

755000

680000

876 prfA

605000

530000

1082 rfbD

764 iclR

1814 1815

1477

128

351 leuS

244 leuB

015 016a 017 018 rpsS rplV rpsC rplP

2132 2134 2135 2137 2138 2139 2140 panC abcT12

2030 napA3

1922

1812 thrA

1603

1476

1710 metE

1602 secF

1474 gspE

1807 1809 1810 1811 trxA2

1275001

455000

556 pbpA2

1178 purE

980 gyrA

873 rho

1276

1177 murG

1081 cysD

305000

380000

1373 1374 1375 1377 1378 nuoH2 nuoH3nuoI2 nuoJ2 nuoK2

1275 cafA

1176

979

127

155000

230000

447 448 449 450 mgtC

350 ribA

243

013 rplB

665 666 667 668 669 671 hoxZ hupE hupD hypB

763 genX

445 accD

555

662 mbhL1

871 thdF

761 gidA1

978

1175 purF

1273

1598 1599 1601 pilD

1125001

1472 dnaB

1369 1370 rlpA1 1050001 Asn

975001

977

869 nadC

1080

1271 1272 mpp

900001

1173 1174 rlpA2

825001

750001

975 bioB

675001

Leu

660 mbhS1

757 758 759

525001

554

443 gua

5000

80000

011 012 rplD rplW

123124a 125 126 rpsP

008 009 rpsJ rplC

346 348 pth ctc

122 hisS1

005 tufA1

242 lon

442

553 frdB1

345

121

440

001 fusA

551 552 nuoD1 sms 450001 Phe

375001

438

300001

344 rfaD

225001

239 folE

150001

118 119 pgk talC

75001

8

Nature © Macmillan Publishers Ltd 1998

Table 1 Aquifex aeolicus Open Reading Frame Identifications. Gene numbers (Aq) correspond to those in Fig.1. Percentages refer to the identity found in the best FASTA alignment. The percentage of the sequence covered by the alignment is displayed with bullets as follows 20–40% . , 40–60% . . , 60–80% . . . , 80–100% . . . . Amino Acid Biosynthesis Aromatic amino acids Aq1536 aroA Aq081 aroC Aq021 aroD Aq901 aroE Aq2177 aroK Aq951 pheA Aq1548 trpA Aq706 trpB1 Aq1410 trpB2 Aq1787 trpC Aq196 trpD1 Aq209 trpD2 Aq582 trpE Aq2076 trpF Aq549 trpG Aq1755 tyrA

5-enolpyruvylshikimate-3-phosphate synthetase chorismate synthase 3-dehydroquinate dehydratase shikimate 5-dehydrogenase shikimate kinase chorismate mutase/prephenate dehydratase tryptophan synthase alpha subunit tryptophan synthase beta subunit tryptophan synthase beta subunit indole-3-glycerol phosphate synthase phosphoribosylanthranilate transferase phosphoribosylanthranilate transferase anthranilate synthase component I phosphoribosyl anthranilate isomerase anthranilate synthase component II prephenate dehydrogenase

43.0% 55.2% 33.3% 46.1% 36.5% 44.0% 44.5% 68.0% 50.0% 43.3% 45.1% 24.9% 50.0% 45.6% 59.2% 36.1%

.... .... .... .... .... .... .... .... .... .... .... .... .... .... .... ....

Aspartate family Aq1866 asd Aq1969 aspC1 Aq2094 aspC2 Aq421 aspC3 Aq273 aspC4 Aq1143 dapA Aq916 dapB Aq547 dapE Aq1838 dapF Aq1208 lysA Aq1152 lysC Aq1710 metE Aq1812 thrA Aq1309 thrB Aq608 thrC1 Aq425 thrC2

aspartate-semialdehyde dehydrogenase aspartate aminotransferase aminotransferase (AspC family) aminotransferase (AspC family) aminotransferase (AspC family) dihydrodipicolinate synthase dihydrodipicolinate reductase succinyl-diaminopimelate desuccinylase diaminopimelate epimerase diaminopimelate decarboxylase aspartokinase tetrahydropteroyltriglutamate methyltransferase homoserine dehydrogenase homoserine kinase threonine synthase threonine synthase

54.6% 53.5% 55.4% 43.3% 48.5% 53.1% 44.2% 25.8% 35.5% 47.4% 52.2% 45.9% 40.4% 38.3% 64.3% 61.9%

.... .... .... .... .... .... .... .... .... .... .... .... .... .... .... ....

Branched-chain family Aq451 ilvB Aq1245 ilvC Aq837 ilvD Aq1893 ilvE Aq1851 ilvH Aq356 leuA1 Aq2090 leuA2 Aq244 leuB Aq940 leuC Aq1398 leuD

acetolactate synthase large subunit acetohydroxy acid isomeroreductase dihydroxyacid dehydratase branched-chain amino acid aminotransferase acetolactate synthase 2-isopropylmalate synthase 2-isopropylmalate synthase 3-isopropylmalate dehydrogenase large subunit of isopropylmalate isomerase 3-isopropylmalate dehydratase

53.1% 64.3% 58.0% 40.3% 53.2% 52.1% 49.9% 58.7% 52.3% 56.6%

.... .... .... .... .... .... .... .... .... ...

Glutamate family Aq2068 argB Aq1879 argC Aq023 argD Aq1711 argF Aq1140 argG Aq1372 argH Aq970 argJ Aq111 glnA Aq109 glnB Aq1774 glnE Aq1565 gltB Aq2064 gltD Aq1071 proA Aq1134 proB Aq166 proC

acetylglutamate kinase N-Acetyl-gamma-glutamylphosphate reductase N-acetylornithine aminotransferase ornithine carbamoyltransferase argininosuccinate synthase argininosuccinate lyase glutamate N-acetyltransferase glutamine synthetase nitrogen regulatory PII protein glutamate ammonia ligase adenylyl-transferase glutamate synthase large subunit glutamate synthase small subunit gltD gamma-glutamyl phosphate reductase glutamate 5-kinase pyrroline carboxylate reductase

54.2% 40.6% 49.5% 46.2% 54.9% 46.4% 39.8% 57.6% 73.2% 28.4% 44.3% 37.7% 47.9% 43.2% 35.1%

.... .... .... .... .... .... .... .... .... .... .... .... .... .... ....

phosphoribosylformimino-5-aminoimidazole carboxamide ribotide isomerase imidazoleglycerolphosphate dehydratase histidinol-phosphate aminotransferase histidinol dehydrogenase HisF (cyclase) ATP phosphoribosyltransferase amidotransferase HisH phosphoribosyl-ATP pyrophosphohydrolase

40.9% 46.4% 33.7% 49.9% 59.9% 40.3% 47.7% 43.8%

.... .... .... .... .... .... .... ....

L-seryl-tRNA(ser) selenium transferase selenophosphate synthase

42.7% 37.7%

.... ....

cysteine synthase, O-acetylserine (thiol) lyase B serine hydroxymethyl transferase D-3-phosphoglycerate dehydrogenase

45.8% 62.7% 44.1%

.... .... ....

Cell Envelope Pili and fimbrae Aq1433 fimZ Aq1432 ppdD1 Aq1434 ppdD2 Aq1435 ppdD3

minor pilin pilin pilin pilin

34.9% 40.6% 26.4% 28.2%

.. .... .... ...

Lipoproteins and porins Aq270 lgt Aq819 lnt Aq652 nlpD1 Aq1753 nlpD2 Aq529 oprC Aq2147 pal Aq1370 rlpA1 Aq1174 rlpA2 Aq2166 scbA Aq619 yfeA

prolipoprotein diacylglyceryl transferase apolipoprotein N-acyltransferase lipoprotein lipoprotein NlpD fragment outer membrane protein c peptidoglycan associated lipoprotein rare lipoprotein A rare lipoprotein A adhesion protein adhesion B precursor

30.1% 25.5% 25.4% 43.2% 27.2% 35.1% 61.1% 40.6% 25.7% 28.5%

.... .... .... .... .... .. .. ... .... ....

alanine racemase N-acetylmuramoyl-L-alanine amidase undecaprenol kinase beta lactamase precursor beta lactamase precursor D-alanine:D-alanine ligase glucosamine-fructose-6-phosphate aminotransferase UDP-N-acetylglucosamine pyrophosphorylase phospho-N-acetylmuramoyl-pentapeptidetransferase penicillin binding protein 1A UDP-N-acetylglucosamine

.... .... .... ... ... .... 43.2% .... 37.6% ... 47.5% .... 33.2% ....

Histidine Aq1303

hisA

Aq039 Aq2084 Aq782 Aq181 Aq1613 Aq732 Aq1968

hisB hisC hisD hisF hisG hisH hisIE

Selenocysteine Aq1031 selA Aq1030 selD Serine family Aq1556

cysM

Aq479 Aq1905

glyA serA

Peptidoglycan Aq1827 Aq1681 Aq2195 Aq1798 Aq974 Aq521 Aq301

alr amiB bacA cphA1 cphA2 ddlA glmS

Aq607 Aq053

glmU mraY

Aq624 Aq1281

mrcA murA

33.2% 31.0% 43.1% 25.0% 29.4% 38.2%

Aq520 Aq511 Aq1360 Aq2075

murB1 murB2 murC murD

Aq1747 Aq821 Aq1177

murE murF murG

Aq325 Aq1189 Aq556 Aq185 Aq1368

murI pbpA1 pbpA2 tagD1 tagD2

1-carboxyvinyltransferase UDP-N-acetylenolpyruvoylglucosamine reductase UDP-N-acetylenolpyruvoylglucosamine reductase UDP-N-acetylmuramate-alanine ligase UDP-N-acetylmuramoylalanine-D-glutamate ligase UDP-MurNac-tripeptide synthetase UDP-MURNAC-pentapeptide sythetase phospho-N-acetylmuramoyl-pentapeptidetransferase glutamate racemase penicillin binding protein 2 penicillin binding protein 2 glycerol-3-phosphate cytidyltransferase glycerol-3-phosphate cytidyltransferase

.... .... .... .... 29.3% .... 42.9% .... 32.3% .... 30.5% .... 43.4% .... 32.2% .... 30.3% .... 52.0% .... 67.2% ...

45.7% 35.6% 38.9% 46.1%

Surface polysaccharides and lipopolysaccharides Aq1684 alg alginate synthesis-related protein Aq1641 cap capsular polysaccharide biosynthesis protein Aq1899 dmt dolichol-phosphate mannosyltransferase Aq1772 envA UDP-3-0-acyl N-acetylglcosamine deacetylase Aq1757 exbB biopolymer transport exbB Aq1839 exbD biopolymer transport ExbD Aq1069 galE UDP-glucose-4-epimerase Aq1705 galF UDP-glucose pyrophosphorylase Aq908 gmhA phosphoheptose isomerase Aq085 kdsA 3-deoxy-d-manno-octulosonic acid 8-phosphate synthase Aq326 kdtA 3-deoxy-D-manno-2-octulosonic acid transferase Aq253 kdtB lipopolysaccharide core biosynthesis protein Aq1546 kpsF polysialic acid capsule expression protein Aq692 kpsU 3-deoxy-manno-octulosonate cytidylyltransferase Aq1742 lgtF beta 1,4 glucosyltransferase Aq604 lpxA acyl-[acyl-carrier-protein]-UDP-Nacetylglucosamine acyltransferase Aq1427 lpxB lipid A disaccharide synthetase Aq538 lpxD UDP-3-O-[3-hydroxymyristoyl] glucosamine N acyltransferase Aq718 mpg mannose-1-phosphate guanyltransferase Aq1096 mtfA mannosyltransferase A Aq515 mtfB mannosyltransferase B Aq516 mtfC mannosyltransferase C Aq1335 nse nucleotide sugar epimerase Aq505 otnA polysaccharide biosynthesis protein Aq504 otnA’ polysaccharide biosynthesis protein (fragment) Aq1543 rfaC1 ADP-heptose:LPS heptosyltransferase Aq145 rfaC2 ADP-heptose:LPS heptosyltransferase Aq344 rfaD ADP-L-glycero-D-manno-heptose-6-epimerase Aq565 rfaE ADP-heptose synthase Aq2115 rfaG glucosyl transferase I Aq1082 rfbD GDP-D-mannose dehydratase Aq519 rfe undecaprenyl-phosphate-alphaN-acetylglucosaminyltransferase Aq1367 spsI glucose-1-phosphate thymidylyltransferase Aq518 spsK spore coat polysaccharide biosynthesis protein SpsK Aq589 xanB mannose-6-phosphate isomerase/mannose-1phosphate guanyl transferase

.. .... .... .... .... .... .... .... ... 52.0% .... 28.9% .... 46.5% .... 45.9% .... 41.3% .... 35.2% .... 47.7% .... 31.6% .... 43.3% .... 34.1% .... 34.3% ... 29.0% .... 35.9% .... 45.8% .... 26.9% .... 37.8% .. 30.7% .... 28.1% .... 39.6% .... 44.0% .... 27.1% .... 53.2% .... 24.8% .... 30.4% .. 49.5% ... 40.9% ....

Cellular Processes Cell division Aq698 acrE Aq1275 cafA Aq523 ftsA Aq936 ftsH Aq1139 ftsW Aq920 ftsY Aq525 ftsZ Aq761 gidA1 Aq691 gidA2 Aq1582 gidB Aq1718 maf Aq1887 mesJ Aq878 minC Aq1217 minD1 Aq877 minD2 Aq845 mreB Aq025 rodA Aq1130 sufI

acriflavin resistance protein AcrE cytoplasmic axial filament protein cell division protein FtsA cell division protein FtsH cell division protein FtsW cell division protein FtsY cell division protein FtsZ glucose inhibited division protein A glucose inhibited division protein A glucose inhibited division protein B MAF protein cell cycle protein MesJ septum site-determining protein MinC septum site-determining protein MinD septum site-determining protein MinD rod shape determining protein MreB rod shape determining protein RodA periplasmic cell division protein (SufI)

24.8% 28.5% 31.9% 51.1% 30.8% 35.2% 48.6% 50.2% 57.5% 39.4% 44.9% 27.7% 39.4% 33.1% 54.5% 57.4% 37.6% 28.1%

.... .... .... .... ... ... .... .... .... .... .... .... .. .... .... .... .... ....

Chaperones Aq154 Aq1735 Aq703 Aq996 Aq433 Aq192 Aq1283 Aq1991 Aq2200 Aq2199

ctaB dnaJ1 dnaJ2 dnaK grpE hslU hspC htpX mopA mopB

cytochrome c oxidase assembly factor chaperone DnaJ chaperone DnaJ Hsp70 chaperone DnaK heat shock protein GrpE chaperone HslU small heat shock protein (class I) heat shock protein X GroEL GroES

38.8% 41.3% 45.1% 59.1% 38.8% 57.5% 31.0% 51.1% 64.4% 56.2%

.... .... .... .... .... .... .... .... .... ...

Detoxification Aq486 Aq858 Aq685 Aq136 Aq1005 Aq1499 Aq1050 Aq238 Aq488

ahpC1 ahpC2 arsC cpx cutA sodA sodC1 sodC2 tpx

alkyl hydroperoxide reductase alkyl hydroperoxide reductase arsenate reductase cytochrome c peroxidase periplasmic divalent cation tolerance protein superoxide dismutase (Fe/Mn family) superoxide dismutase (Cu/Zn) superoxide dismutase (Cu/Zn) thiol peroxidase

49.2% 53.4% 50.0% 48.9% 47.0% 34.2% 39.5% 39.2% 39.5%

.... .... .... .... .... .... .... .... ....

Motility Aq833 Aq1184 Aq1183 Aq1859 Aq2051 Aq834 Aq1714 Aq1713 Aq1662 Aq1663 Aq1212 Aq2014 Aq1214 Aq1998

flgA flgB flgC flgE flgG1 flgG2 flgH flgI flgK flgL flhA flhB flhF fliC

flagellar protein FlgA flagellar basal body rod protein FlgB flagellar biosynthesis FlgC flagellar hook protein FlgE flagellar hook basal-body protein FlgG flagellar hook basal-body protein FlgG flagellar L-ring protein FlgH flagellar P-ring protein FlgI flagellar hook associated protein FlgK flagellar hook associated protein FlgL flagellar export protein flagellar biosynthetic protein FlhB flagellar biosynthesis FlhF flagellin

39.4% 30.8% 32.8% 50.4% 31.9% 46.9% 21.9% 27.1% 44.0% 39.8% 28.7% 59.4%

.... .... .... .... .... .... .... .. .... .... .... ....

Nature © Macmillan Publishers Ltd 1998

37.2% 30.8% 40.2% 36.5% 48.2% 34.7% 54.7% 47.2% 63.4%

8

Aq2001 Aq1182 Aq653 Aq1595 Aq1860 Aq1539 Aq1920 Aq1962 Aq1961 Aq2002 Aq1003 Aq1002 Aq1001

fliD fliF fliG fliI fliL fliN fliP fliQ fliR fliS motA motB1 motB2

flagellar hook associated protein FliD Flagellar M-ring protein flagellar switch protein FliG flagellar export protein flagellar biosynthesis FliL flagellar switch protein FliN flagellar biosynthetic protein FliP flagellar biosynthesis protein FliQ flagellar biosynthetic protein FliR flagellar protein FliS flagellar motor protein MotA flagellar motor protein MotB flagellar motor protein MotB-like

24.3% 32.0% 35.9% 44.6% 30.6% 42.9% 47.7% 45.5% 29.7% 30.8% 35.0% 36.8% 27.5%

.. .... .... .... .... ... .... .... .... .... .... .... ....

Secretion Aq1720 Aq1288 Aq1474 Aq418 Aq955 Aq1837 Aq1271 Aq747 Aq1285 Aq1601 Aq745 Aq2151 Aq1870 Aq973 Aq1602 Aq079 Aq2080 Aq1971 Aq1340

ffh gspD gspE gspG lepB lsp mpp pilC1 pilC2 pilD pilT pilU secA secD secF secY sppA tapB tig

signal recognition particle receptor protein general secretion pathway protein D general secretion pathway protein E general secretion pathway protein G type-I signal peptidase lipoprotein signal peptidase processing protease fimbrial assembly protein PilC fimbrial assembly protein PilC type 4 prepilin peptidase twitching motility protein PilT twitching mobility protein preprotein translocase SecA subunit protein export membrane protein SecD protein-export membrane protein preprotein translocase SecY proteinase IV type IV pilus assembly protein TapB trigger factor

49.1% 27.5% 48.8% 50.7% 33.9% 37.4% 28.7% 37.4% 28.9% 34.8% 51.4% 41.6% 44.9% 36.0% 41.4% 44.2% 43.4% 42.2% 27.4%

.... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... ....

Central Intermediary Metabolism One-carbon metabolism Aq1429 metF 5,10-methylenetetrahydrofolate reductase Aq1154 metK S-adenosylmethionine synthetase Aq1180 sahH S-adenosylhomocysteine hydrolase

43.3% 49.2% 60.9%

.... .... ....

Cytoplasmic polysaccharides Aq1407 bcsA Aq1401 celY Aq721 glgA Aq722 glgB Aq717 glgP Aq723 malM

cellulose synthase catalytic subunit endoglucanase fragment glycogen synthase 1,4-alpha-glucan branching enzyme glycogen phosphorylase 4-alpha-glucanotransferase (amylomaltase)

39.5% 33.0% 38.1% 56.5% 37.0% 43.4%

.... .. .... .... .... ....

Tri-carboxylic acid cycle Aq1784 aco Aq1195 forA1 Aq1167 forA2 Aq1196 forB1 Aq1168 forB2 Aq1200 forG1 Aq1169 forG2 Aq594 frdA Aq553 frdB1 Aq655 frdB2 Aq1780 fumB Aq1679 fumX Aq150 gltA Aq1512 icd Aq1782 mdh1 Aq1665 mdh2 Aq1614 oadA Aq1306 sucC1 Aq1620 sucC2 Aq1888 sucD1 Aq1622 sucD2

aconitase ferredoxin oxidoreductase alpha subunit ferredoxin oxidoreductase alpha subunit ferredoxin oxidoreductase beta subunit ferredoxin oxidoreductase beta subunit ferredoxin oxidoreductase gamma subunit ferredoxin oxidoreductase gamma subunit fumarate reductase flavoprotein subunit reductase iron-sulfur subunit fumarate reductase iron-sulfur subunit fumarate hydratase (fumarase) C-terminal fumarate hydratase, class I citrate synthase isocitrate dehydrogenase malate dehydrogenase malate dehydrogenase oxaloacetate decarboxylase alpha chain succinyl-CoA ligase beta subunit succinyl-CoA ligase beta subunit succinyl-CoA ligase alpha subunit succinyl-CoA ligase alpha subunit

36.1% 31.5% 32.3% 29.6% 31.5% 34.5% 34.5% 51.4% 35.2% 35.1% 46.4% 40.4% 33.0% 46.0% 49.8% 46.9% 50.1% 35.1% 52.9% 41.7% 65.7%

... ... .... .... ... .... ... .... .... .... .... .... .... .... .... ... .... .... .... ... ....

Phosphate Aq1351 Aq1547 Aq891

phoH ppa ppx

phosphate starvation-inducible protein inorganic pyrophosphatase exopolyphosphatase

47.1% 56.5% 33.6%

.... .... ....

Polyamines Aq728 Aq062

speC speE

ornithine decarboxylase spermidine synthase

30.9% 48.4%

.... ....

Sulfur Aq1081 Aq1076 Aq1799 Aq455 Aq1803

cysD rhdA rhdA sor soxB

sulfate adenylyltransferase thiosulfate sulfurtransferase thiosulfate sulfurtransferase sulfur oxygenase reductase sulfur oxidation protein SoxB

46.7% 32.3% 31.7% 36.7% 41.3%

.... .... .... .... .... ...

Cofactor Biosynthesis Lipoic acid biosynthesis Aq1355 lipA Biotin Aq170 Aq975 Aq557 Aq626 Aq1659

bioA bioB bioD bioF bioW

Aq566

birA

Folic acid Aq2045 Aq1898 Aq239 Aq162

folC folD folE folK

Aq1468 Aq1144 Aq1606

folP pabB pabC

Heme Aq207 Aq1237 Aq334 Aq816 Aq1279

cobA cysG dcuP gsa hemA

Aq2109 Aq263 Aq1424

hemB hemC hemF

Aq2015 Aq948 Aq099 Aq2124

hemG hemH hemK hemN

Molybdopterin Aq2183 moaA2

Lipoic acid synthetase

48.9%

DAPA aminotransferase biotin synthetase dethiobiotin synthetase 8-amino-7-oxononanoate synthase 6-carboxyhexanoate-CoA ligase (pimeloyl CoA synthase) biotin [acetyl-CoA-carboxylase] ligase

51.7% 42.0% 41.5% 45.1%

folylpolyglutamate synthetase methylenetetrahydrofolate dehydrogenase GTP cyclohydrolase I folate biosynthesis 7,8-dihydro-6hydroxymethylpterin-pyrophosphokinase dihydropteroate synthase p-aminobenzoate synthetase aminodeoxychorismate lyase

31.8% 53.2% 57.1%

uroporphyrin-III c-methyltransferase siroheme synthase uroporphyrinogen decarboxylase glutamate-1-semialdehyde aminotransferase glutamyl tRNA reductase (delta-aminolevulinate synthase) porphobilinogen synthase porphobilinogen deaminase oxygen-independent coproporphyrinogen III oxidase protoporphyrinogen oxidase ferrochelatase protoporphyrinogen oxidase oxygen-independent coproporphyrinogen II

.... .... .... .... 38.7% .... 64.5% .... 53.1% .... 33.1% .... 30.3% ... 46.4% .... 32.2% .... 50.2% ....

molybdenum cofactor biosynthesis protein A

47.0%

.... .... .... .... 47.3% .... 37.5% ....

.... .... .... 43.7% .... 45.8% .... 41.5% ... 29.0% .... 52.1% 36.9% 41.4% 56.5%

....

molybdenum cofactor biosynthesis moaC molybdopterin converting factor subunit 2 molybdopterin-guainine dinucleotide biosynthesis protein B molybdenum cofactor biosynthesis protein A molybdopterin biosynthesis protein MoeB molybdenum cofactor biosynthesis MOG pterin-4a-carbinolamine dehydratase

.. ... 44.4% . 36.8% .... 54.1% .... 55.5% .... 37.9% ....

pantothenate metabolism flavoprotein 3-methyl-2-oxobutanoate hydroxymethyltransferase pantothenate synthetase aspartate 1-decarboxylase

41.2%

panC panD

45.5% 47.4% 46.0%

.... .... .... ....

Pyridine nucleotides Aq1889 nadA Aq777 nadB Aq869 nadC Aq959 nadE

quinolinate synthetase A L-aspartate oxidase quinolinate phosphoribosyl transferase NH(3)-dependent NAD+ synthetase

44.3% 36.7% 47.0% 39.6%

.... ... .... ....

Pyridoxal phosphate Aq852 pdxA Aq1423 pdxJ

pyridoxal phosphate biosynthetic protein PdxA pyridoxal phosphate synthetase

36.8% 88.2%

.... ....

Quinones Aq895 Aq052

ispB ubiA

octoprenyl-diphosphate synthase 4-hydroxybenzoate octaprenyltransferase

35.7% 41.4%

.... ....

Riboflavin Aq350 Aq1707 Aq138 Aq436 Aq139 Aq132

ribA ribC ribD1 ribD2 ribF ribH

GTP cyclohydrolase II riboflavin synthase alpha chain riboflavin specific deaminase riboflavin specific deaminase riboflavin kinase riboflavin synthase beta subunit

61.7% 45.3% 46.0% 42.9% 38.4% 51.0%

.... .... .... .... .... ....

Thiamine Aq1204 Aq1960 Aq1366 Aq558 Aq2178 Aq2119

thiC thiD thiE1 thiE2 thiG thiL

thiamine biosynthesis protein HMP-P kinase thiamine phosphate synthase thiamine phosphate synthase thiamine biosynthesis, thiazole moiety thiamine monophosphate kinase

67.1% 40.5% 36.3% 39.5% 52.5% 34.5%

.... .... .... .... .... ....

glutaredoxin-like protein thioredoxin thioredoxin thioredoxin reductase

33.8% 58.9% 32.2% 39.8%

.... ... .. ....

Aq527 Aq2181 Aq1326

moaC moaE mobB

Aq030 Aq1329 Aq061 Aq049

moeA1 moeB mog phhB

Panthenate Aq815 Aq1973

dfp panB

Aq2132 Aq476

Thio- and glutaredoxin Aq443 gua Aq1916 trxA1 Aq1811 trxA2 Aq500 trxB

45.0% 39.3%

Energy Metabolism Aq1342 gph

phosphoglycolate phosphatase

33.9%

....

ATP-Proton Motive Force Aq679 atpA Aq179 atpB Aq673 atpC Aq2038 atpD Aq177 atpE Aq1586 atpF1 Aq1587 atpF2 Aq2041 atpG Aq1588 atpH

ATP synthase F1 alpha subunit ATP synthase F0 subunit a ATP synthase F1 epsilon subunit ATP synthase F1 beta subunit ATP synthase F0 subunit c ATP synthase F0 subunit b ATP synthase F0 subunit b ATP synthase F1 gamma subunit ATP synthase F1 delta chain

64.3% 36.4% 37.4% 67.4% 53.8% 26.3% 25.5% 39.9% 28.1%

.... .... .... .... ... .... .... .... ....

Dehydrogenases Aq1362 adh1 Aq1240 adh2 Aq186 aldH1 Aq227 aldH2 Aq1145 dhaT Aq232 dhsU Aq1769 dld1 Aq1234 dmsA Aq1232 dmsB Aq1231 dmsC Aq1051 fdhE Aq1039 fdoG Aq1046 fdoH Aq1049 fdoI Aq1903 gcsP1 Aq1109 gcsP2 Aq1639 glpC Aq395 hdrA Aq400 hdrB Aq398 hdrC Aq961 hdrD Aq038 hibD Aq727 ldhA Aq736 lpdA Aq217 narB Aq206 nirB Aq835 nox Aq024 nsd Aq135 nueM Aq1010 udh

alcohol dehydrogenase alcohol dehydrogenase aldehyde dehydrogenase aldehyde dehydrogenase 1,3 propanediol dehydrogenase flavocytochrome C sulfide dehydrogenase D-lactate dehydrogenase DMSO reductase chain A DMSO reductase chain B DMSO reductase chain C formate dehydrogenase formation protein FdhE formate dehydrogenase alpha subunit formate dehydrogenase beta subunit formate dehydrogenase gamma subunit glycine dehydrogenase (decarboxylating) glycine dehydrogenase (decarboxylating) oxido/reductase iron sulfur protein heterodisulfide reductase subunit A heterodisulfide reductase subunit B heterodisulfide reductase subunit C heterodisulfide reductase 3-hydroxyisobutyrate dehydrogenase D-lactate dehydrogenase dihydrolipoamide dehydrogenase nitrate reductase narB nitrite reductase (NAD(P)H) large subunit NADH oxidase nucleotide sugar dehydrogenase NADH dehydrogenase (ubiquinone) dehydrogenase

35.4% 28.8% 41.9% 28.0% 36.6% 33.6% 45.3% 25.0% 38.4% 29.5% 25.9% 50.0% 45.7% 38.4% 49.6% 46.8% 27.1% 39.7% 32.5% 35.7% 29.5% 34.6% 33.5% 37.0% 39.1% 35.3% 33.1% 47.0% 28.2% 29.7%

.... .... .... .... .... .... .... .... ... . .... .... .... .... .... .... .... .... .... . .... .... .... .... .... ... .... .... .... ....

Electron transport Aq2191 coxA1 Aq2192 coxA2 Aq2190 coxB Aq2188 coxC Aq153 ctaA Aq042 cyc Aq792 cycB1 Aq1550 cycB2 Aq1357 cydA Aq1358 cydB Aq067 dmsB Aq235 fccB’ Aq919a fdx1 Aq1171a fdx2 Aq1192a fdx3 Aq108a fdx4 Aq211 fhp Aq2096 floX Aq045 petA Aq044 petB Aq234 soxF Aq2186 sqr

cytochrome c oxidase subunit I cytochrome c oxidase subunit I cytochrome c oxidase subunit II cytochrome c oxidase subunit III heme O oxygenase cytochrome c cytochrome c552 cytochrome C552 cytochrome oxidase d subunit I cytochrome oxidase d subunit II dimethylsulfoxide reductase chain B sulfide dehydrogenase, flavoprotein subunit ferredoxin ferredoxin ferredoxin ferredoxin flavohemoprotein flavodoxin Rieske-I iron sulfur protein cytochrome b Rieske-I iron sulfur protein sulfide-quinone reductase

42.4% 38.1% 27.4% 28.6% 28.1% 25.8% 29.9% 38.7% 38.8% 31.2% 40.2% 38.0% 37.1% 43.9% 35.0% 56.6% 43.4% 32.5% 34.3% 38.3% 29.0% 41.0%

.... .... .... .... .... ... .. .... .... .... .... ... ... .. ... ... .... .... .... ... .... ....

Glycolysis and gluconeogenesis Aq484 eno Aq1390 fba Aq1065 gap Aq434 glpK Aq1744 gpmA Aq1634 gspA

enolase fructose-1,6-bisphosphate aldolase class II glyceraldehyde-3-phosphate dehydrogenase glycerol kinase phosphoglycerate mutase glycerol-3-phosphate dehydrogenase (NAD+)

65.0% 39.9% 59.5% 51.0% 27.9% 40.5%

.... .... .... .... .... ....

Nature © Macmillan Publishers Ltd 1998

8

Aq1708 Aq750 Aq118 Aq1990 Aq501 Aq2142 Aq1520 Aq1517 Aq360

pfkA pgi pgk pgmA pmu ppsA pycA pycB timA

phosphofructokinase glucose-6-phosphate isomerase phosphoglycerate kinase phosphoglycerate mutase phosphoglucomutase/phosphomannomutase phosphoenolpyruvate synthase pyruvate carboxylase c-terminal domain pyruvate carboxylase n-terminal domain triose phophate isomerase

49.4% 37.8% 54.5% 33.2% 37.8% 56.3% 46.6% 57.1% 52.2%

.... .... .... .... .... .... .... .... ....

Aq046 Aq1305

Hydrogenase Aq665 Aq667 Aq666 Aq1021 Aq671 Aq1157 Aq662 Aq960 Aq804 Aq660 Aq965 Aq802 Aq1591

Aq1580 Aq1334 Aq713 Aq640 Aq969 Aq1907 Aq2163

pyrF pyrG pyrH thy tmk umpS uraP

hoxZ hupD hupE hypA hypB hypD mbhL1 mbhL2 mbhL3 mbhS1 mbhS2 mbhS3 shyS

Ni/Fe hydrogenase B-type cytochrome subunit HupD hydrogenase related function HupE hydrogenase related function hydrogenase accessory protein HypA hydrogenase expression/formation protein B hydrogenase expression/formation protein HypD hydrogenase large subunit hydrogenase large subunit hydrogenase large subunit hydrogenase small subunit hydrogenase small subunit hydrogenase small subunit soluble hydrogenase small subunit

40.4% 40.9% 38.3% 39.8% 50.6% 56.1% 50.6% 44.3% 27.9% 66.6% 51.3% 36.7% 41.6%

.... .... .... .... .... .... .... .... .... .... .... .... ....

Regulation Aq1058 Aq2179 Aq281 Aq1387 Aq1724

acrR1 acrR2 acrR3 arsR degT

Sugar metabolism Aq968 cbbE2 Aq1658 fucA1 Aq1979 fucA2 Aq498 gnd Aq497 gsdA Aq1138 rpiB Aq119 talC Aq1765 tktA

ribulose-5-phosphate 3-epimerase fuculose-1-phosphate aldolase fuculose-1-phosphate aldolase 6-phosphogluconate dehydrogenase glucose-6-phosphate 1-dehydrogenase ribose 5-phosphate isomerase B transaldolase transketolase

47.2% 31.8% 29.7% 45.2% 32.3% 54.5% 71.1% 52.4%

.... .... .... .... .... .... .... ....

NADH dehydrogenase Aq1385 nuoA1 Aq1310 nuoA2 Aq1312 nuoB Aq551 nuoD1 Aq1314 nuoD2 Aq574 nuoE Aq573 nuoF Aq437 nuoG Aq1315 nuoH1 Aq1373 nuoH2 Aq1374 nuoH3 Aq1317 nuoI1 Aq1375 nuoI2 Aq1318 nuoJ1 Aq1377 nuoJ2 Aq1319 nuoK1 Aq1378 nuoK2 Aq1320 nuoL1 Aq866 nuoL2 Aq1379 nuoL3 Aq1321 nuoM1 Aq1382 nuoM2 Aq1322 nuoN1 Aq1383 nuoN2

NADH dehydrogenase I chain A NADH dehydrogenase I chain A NADH dehydrogenase I chain B NADH dehydrogenase I chain D NADH dehydrogenase I chain D NADH dehydrogenase I chain E NADH dehydrogenase I chain F NADH dehydrogenase I chain G NADH dehydrogenase I chain H NADH dehydrogenase I chain H NADH dehydrogenase I chain H NADH dehydrogenase I chain I NADH dehydrogenase I chain I NADH dehydrogenase I chain J NADH dehydrogenase I chain J NADH dehydrogenase I chain K NADH dehydrogenase I chain K NADH dehydrogenase I chain L NADH dehydrogenase I chain L NADH dehydrogenase I chain L NADH dehydrogenase I chain M NADH dehydrogenase I chain M NADH dehydrogenase I chain N NADH dehydrogenase I chain N

42.0% 44.9% 60.1% 37.7% 42.2% 36.8% 20.5% 35.4% 41.0% 42.1% 38.9% 30.5% 29.2% 35.4% 30.6% 51.1% 48.4% 39.0% 30.2% 43.1% 43.6% 36.9% 34.1% 32.8%

.... .... ... .... .... .... .. .. .... .... .... ... ... .... .... .... .... .... ... .... .... .... .... ....

Aq534 Aq831 Aq490 Aq1207 Aq1418 Aq213 Aq1908 Aq1115 Aq316 Aq905 Aq231 Aq1156 Aq093 Aq1019 Aq672 Aq764 Aq638 Aq1038 Aq702 Aq218 Aq1117 Aq1792 Aq230 Aq164 Aq2069 Aq319 Aq906 Aq844 Aq1496

draG exsB fnr furR1 furR2 glnBi hflX hksP1 hksP2 hksP3 hksP4 hoxX hth hypE hypF iclR lysR1 lysR2 merR nifA ntrC1 ntrC2 ntrC3 ntrC4 obg phoB phoU spoT xylR

2-acylglycerophosphoethanolamine acyltransferase acetyl-CoA carboxylase alpha subunit biotin carboxyl carrier protein biotin carboxylase biotin carboxylase acetyl-CoA carboxyltransferase beta subunit acyl carrier protein holo-[acyl-carrier protein] synthase acetyl-coenzyme A synthetase acetyl-coenzyme A synthetase c-terminal fragment phosphatidate cytidylyltransferase cyclopropane-fatty-acyl-phospholipid synthase malonyl-CoA:Acyl carrier protein transacylase 3-oxoacyl-[acyl-carrier-protein] synthase II 3-oxoacyl-[acyl-carrier-protein] reductase 3-oxoacyl-[acyl-carrier-protein] synthase III enoyl-[acyl-carrier-protein] reductase (NADH) (3R)-hydroxymyristoyl-(acyl carrier protein) dehydratase long-chain-fatty-acid CoA ligase lipoate-protein ligase A phosphotidylglycerophosphate synthase phosphotidylglycerophosphate synthase PlsX protein

.... ... .... .... .... .... .... .... .... 61.2% .... 29.2% .... 37.5% .... 42.1% .... 58.4% .... 52.9% .... 47.0% .... 49.6% .... 58.7% .... 30.0% ... 28.1% .. 37.3% .. 38.9% ... 43.7% ....

Lipid metabolism Aq2058 aas Aq1206 Aq1363 Aq1664 Aq1470 Aq445 Aq1717a Aq813 Aq2104 Aq2103

accA accB accC1 accC2 accD acpP acpS acs acs’

Aq1249 Aq1737 Aq892 Aq1717 Aq1716 Aq1099 Aq1552 Aq056

cds cfa fabD fabF fabG fabH fabI fabZ

Aq999 Aq1638 Aq958 Aq2154 Aq1101

fadD lplA pgsA pgsA plsX

Purines, Pyrimidines, Nucleotides and Nucleosides Aq094 nrdA ribonucleotide reductase alpha chain Aq1505 nrdF ribonucleotide reductase beta chain Purines Aq568 Aq236 Aq2023 Aq544 Aq078 Aq1590 Aq1636 Aq1290 Aq597 Aq2117

deoD guaA guaB hpt kad ndk prs purA purB purC

Aq742 Aq1178 Aq1175 Aq1963

purD purE purF purH

Aq245 Aq1836 Aq769 Aq857 Aq1105 Aq1818 Pyrimidines Aq410 Aq1172 Aq2101 Aq2153 Aq1607 Aq220 Aq409 Aq806

37.1% 57.1% 44.6% 54.4% 56.5% 56.9% 71.2% 30.8% 54.0%

35.0% 36.2%

.... .

.... .... .... .... .... .... .... .... .... 52.5% .... 54.2% .... 64.6% .... 42.7% .... 48.2% .... 35.6% .... 49.3% ... 50.0% .... 48.3% .... 51.1% .... 56.3% .... 33.1% 58.4% 65.4% 48.2% 50.0% 48.2% 55.2% 49.2% 52.4%

purK purL purM purN purQ purU

purine nucleoside phosphorylase GMP synthase inosine monophosphate dehydrogenase hypoxanthine-guanine phosphoribosyltransferase adenylate kinase nucleoside diphosphate kinase phosphoribosylpyrophosphate synthetase adenylosuccinate synthetase adenylosuccinate lyase phosphoribosylaminoimidazolesuccinocarboxamide synthase phosphoribosylamine-glycine ligase phosphoribosylaminoimidazole carboxylase amidophosphoribosyltransferase phosphoribosylaminoimidazolecarboxamide formyltransferase phosphoribosyl aminoimidazole carboxylase phosphoribosylformylglycinamidine synthase II phosphoribosylformylglycinamidine cyclo-ligase phosphoribosylglycinamide formyltransferase phosphoribosyl formylglycinamidine synthase I formyltetrahydrofolate deformylase

carA carB carB cmk dcd dut pyrB pyrC

carbamoyl phosphate synthetase small subunit carbamoyl-phosphate synthase large subunit carbamoyl-phosphate synthase, large subunit cytidylate kinase deoxycytidine triphosphate deaminase deoxyuridine 5’triphosphate nucleotidohydrolase aspartate carbamoyltransferase catalytic chain dihydroorotase

52.2% 60.7% 63.1% 38.5% 39.5%

.... .... .... .... .... 42.0% .... 37.3% ...

pyrD pyrDB

DNA Replication and Repair Aq358 dinG Aq322 dnaA Aq1472 dnaB Aq910 dnaC Aq1008 dnaE Aq1493 dnaG Aq1882 dnaN Aq932 dnaQ Aq1855 dnaX Aq1422 dpbF Aq1693 dplF Aq980 gyrA Aq1026 gyrB Aq2057 helX Aq1484a himA Aq2174 ihfB Aq1394 lig Aq633 ligA Aq1578 mutL Aq308 mutS1 Aq1242 mutS2 Aq1449 mutT Aq282 mutY1 Aq172 mutY2 Aq496 mutY3 Aq1629 nfo Aq710 nucI Aq1495 ogt Aq1628 pol Aq1967 polA Aq1610 radC Aq2150 recA Aq2053 recG Aq2155 recJ Aq561 recN Aq1478 recR Aq793 rep Aq1886 sbcD Aq064 ssb Aq657 topA Aq1159 topG1 Aq886 topG2 Aq686 uvrA Aq1856 uvrB Aq2126 uvrC

50.5%

transcriptional regulator (TetR/AcrR family) transcriptional regulator (TetR/AcrR family) transcriptional regulator (TetR/AcrR family) transcriptional regulator (ArsR family) transcriptional regulator (DegT/DnrJ/Eryc1 family) ADP-ribosylglycohydrolase trans-regulatory protein ExsB transcriptional regulator (Crp/Fnr family) transcriptional regulator (FurR family) transcriptional regulator (FurR family) PII-like protein GlnBi GTP-binding protein HflX histidine kinase sensor protein histidine kinase sensor protein histidine kinase sensor protein histidine kinase sensor protein hydrogenase regulation HoxX transcriptional regulator (H-T-H) hydrogenase expression/formation protein transcriptional regulatory protein HypF transcriptional regulator (IclR family) transcriptional regulator (LysR family) transcriptional regulator (LysR family) transcriptional regulator (MerR family) transcriptional regulator (NifA family) transcriptional regulator (NtrC family) transcriptional regulator (NtrC family) transcriptional regulator (NtrC family) transcriptional regulator (NtrC family) GTP-binding protein transcriptional regulator (PhoB-like) transcriptional regulator (PhoU-like) (p)ppGpp 3-pyrophosphohydrolase transcriptional regulator (NagC/XylR family)

... ... .... .... 34.1% .... 32.1% .... 38.5% .... 29.5% .... 37.9% .... 34.6% .... 48.0% .. 40.3% .... 27.7% .. 28.1% .... 23.6% ... 28.2% .... 46.7% .... 50.2% .... 44.3% .... 44.8% .... 30.4% .... 32.8% .... 28.9% .... 32.8% .... 42.8% .... 41.0% .... 40.2% .... 40.0% .... 38.3% .... 54.9% .. 41.6% .... 41.9% .... 47.2% ... 29.3% ....

ATP-dependent helicase (DinG family) chromosome replication initiator protein DnaA replicative DNA helicase DNA replication protein DnaC DNA polymerase III alpha subunit DNA primase DNA polymerase III beta chain DNA polymerase III epsilon subunit DNA polymerase III gamma subunit DNA polymerase beta family N-terminus of phage SPO1 DNA polymerase DNA gyrase A subunit gyrase B DNA helicase DNA binding protein HU integration host factor beta subunit DNA ligase (ATP dependent) DNA ligase (NAD dependent) DNA mismatch repair protein MutL DNA mismatch repair protein MutS DNA mismatch repair protein MutS 8-OXO-dGTPase domain (mutT domain) endonuclease III endonuclease III endonuclease III deoxyribonuclease IV thermococcal nuclease homolog O-6-methylguanine-DNA-alkyltransferase DNA polymerase I 3’-5’ exo domain DNA polymerase I (PolI) DNA repair protein RadC recombination protein RecA ATP-dependent DNA helicase RecG single-strand-DNA-specific exonuclease RecJ recombination protein RecN recombination protein RecR ATP-dependent DNA helicase REP ATP-dependent dsDNA exonuclease single stranded DNA-binding protein topoisomerase I reverse gyrase reverse gyrase repair excision nuclease subunit A repair excision nuclease subunit B repair excision nuclease subunit C

27.9% 36.5% 40.3% 26.4% 41.9% 39.8% 32.1% 40.0% 36.6% 39.1% 37.3% 43.6% 55.2% 49.7% 40.2% 35.8% 50.8% 45.7% 72.3% 77.5% 37.0% 46.3% 53.6% 51.8% 43.4% 39.0% 36.4% 36.9% 43.2% 30.5% 39.0% 88.5% 38.9% 31.8% 27.7% 38.3% 33.4% 29.9% 39.4% 39.6% 41.6% 35.1% 61.0% 53.9% 32.5%

.... .... .... .... ... ... .... .... .... .... .... .... .. .... .... .... .... .... .... .... .... .... .... .... .... .... .... ... .... .... .... .... .... .... .... .... .... .... ... .... .... .... .... .... ....

42.3% 20.6% 37.2% 45.4% 32.3% 46.3% 59.6% 40.4% 44.0% 46.9% 41.6% 30.6% 40.5%

.... .... .... ... .... .. .... .... .... .... .... .... ....

36.1%

.... .... .... .. .... .... .... .... .... .... ....

Transcription RNA polymerase and transcription factors Aq613 deaD ATP-dependent RNA helicase DeaD Aq357a flgM anti sigma factor FlgM Aq1218 fliA RNA polymerase sigma factorFliA Aq259 nusA transcription termination NusA Aq133 nusB transcription termination NusB Aq1931 nusG transcription antitermination protein NusG Aq873 rho transcriptional terminator Rho Aq070 rpoA RNA polymerase alpha subunit Aq1939 rpoB RNA polymerase beta subunit Aq1945 rpoC RNA polymerase beta prime subunit Aq1490 rpoD RNA polymerase sigma factor RpoD Aq599 rpoN RNA polymerase sigma factor RpoN Aq1452 rpoS RNA polymerase sigma factor RpoS RNA modification Aq1816 ksgA Aq1067 miaA Aq411 Aq2158 Aq221 Aq894 Aq946 Aq1955 Aq924 Aq1661 Aq1308 Aq841

pcnB1 pcnB2 phpA queA rnc rnhB rnpH spoU tgt trm1

.... .... .... .... .... ... .... .... ....

dihydroorotase dehydrogenase dihydroorotate dehydrogenase electron transfer subunit orotidine-5’-phosphate decarboxylase CTP synthetase UMP kinase thymidylate synthase complementing protein thymidylate kinase uridine 5-monophosphate synthase uracil phosphoribosyltransferase

dimethyladenosine transferase tRNA delta-2-isopentenylpyrophosphate (IPP) transferase poly A polymerase poly A polymerase polyribonucleotide nucleotidyltransferase queuosine biosynthesis protein RNase III RNase HII RNase PH rRNA methylase SpoU queuine tRNA-ribosyltransferase N2,N2-dimethylguanosine tRNA

Nature © Macmillan Publishers Ltd 1998

34.7% 37.2% 57.5% 62.1% 30.5% 35.1% 42.1% 42.0% 34.1% 31.0% 29.7% 35.3%

38.2% 28.5% 33.9% 45.0% 46.9% 35.8% 48.4% 64.0% 44.0% 52.6%

8

Aq1489 Aq749 Aq705 Aq1890 Aq2046 Aq257

trmD truA truB tsnR vacB ygcA

methyltransferase tRNA guanine-N1 methyltransferase pseudouridine synthase I tRNA pseudouridine 55 synthase rRNA methylase VacB protein (ribonuclease II family) RNA methyltransferase (TrmA-family)

34.6% 42.9% 33.1% 38.2% 36.4% 37.9% 28.8%

.... .... .... .... .... .... ....

Translation Aq2131 Aq247 Aq461 Aq2147a Aq346

fmt gatA gatB gatC pth

methionyl-tRNA formyltransferase glutamyl-tRNA(Gln) amidotransferase subunit A glutamyl-tRNA(Gln) amidotransferase subunit B glutamyl-tRNA(Gln) amidotransferase subunit C peptidyl-tRNA hydrolase

45.7% 53.6% 48.8% 41.1% 48.8%

.... .... .... .... ....

Aminoacyl tRNA synthetases Aq1293 alaS Aq923 argS Aq1677 aspS Aq1068 cysS Aq763 genX Aq1221 gltX Aq945 glyQ Aq2141 glyS Aq122 hisS1 Aq1155 hisS2 Aq305 ileS Aq351 leuS Aq1770 leuS’ Aq1202 lysU Aq1257 metG Aq422 metG’ Aq953 pheS Aq1730 pheT Aq365 proS Aq298 serS Aq1667 thrS Aq992 trpS Aq1751 tyrS Aq1413 valS

alanyl-tRNA synthetase arginyl-tRNA synthetase aspartyl-tRNA synthetase cysteinyl-tRNA synthetase lysyl-tRNA synthetase (genX) homolog glutamyl-tRNA synthetase glycyl-tRNA synthetase alpha subunit glycyl-tRNA synthetase beta subunit histidyl-tRNA synthetase histidyl-tRNA synthetase isoleucyl-tRNA synthetase leucyl-tRNA synthetase alpha subunit leucyl-tRNA synthetase beta subunit lysyl-tRNA synthetase methionyl-tRNA synthetase alpha subunit methionyl-tRNA synthetase beta subunit phenylalanyl-tRNA synthetase alpha subunit phenylalanyl-tRNA synthetase beta subunit proline-tRNA synthetase seryl-tRNA synthetase threonyl-tRNA synthetase tryptophanyl-tRNA synthetase tyrosyl tRNA synthetase valyl-tRNA synthetase

46.6% 39.4% 51.3% 45.0% 38.6% 48.5% 61.9% 37.1% 43.3% 34.9% 82.1% 50.7% 47.2% 53.2% 45.0% 64.2% 51.9% 35.4% 44.1% 59.4% 48.5% 38.4% 56.2% 33.2%

.... .... .... .... .... .... .... .... .... .... .. .... .... ... .... .... .... .... .... .... .... .... .... ....

Ribosomal Proteins Aq1935 rplA Aq013 rplB Aq009 rplC Aq011 rplD Aq1652 rplE Aq1649 rplF Aq2042 rplI Aq1936 rplJ Aq1933 rplK Aq1937 rplL Aq1877 rplM Aq1654 rplN Aq1642 rplO Aq018 rplP Aq069 rplQ Aq1648 rplR Aq1954 rplS Aq952 rplT Aq016a rplV Aq012 rplW Aq1653 rplX Aq1644 rpmD Aq1930a rpmG Aq792a rpmI Aq1485 rpsA Aq2007 rpsB Aq017 rpsC Aq072 rpsD Aq1645 rpsE Aq063 rpsF Aq1832 rpsG1 Aq734 rpsG2 Aq1651 rpsH Aq1878 rpsI Aq008 rpsJ Aq073 rpsK Aq735 rpsL1 Aq1834 rpsL2 Aq074 rpsM Aq1651a rpsN Aq226a rpsO Aq123 rpsP Aq020 rpsQ Aq064a rpsR Aq015 rpsS Aq1767 rpsT Aq867a rpsU

ribosomal protein L01 ribosomal protein L02 ribosomal protein L03 ribosomal protein L04 ribosomal protein L05 ribosomal protein L06 ribosomal protein L09 ribosomal protein L10 ribosomal protein L11 ribosomal protein L7/L12 ribosomal protein L13 ribosomal protein L14 ribosomal protein L15 ribosomal protein L16 ribosomal protein L17 ribosomal protein L18 ribosomal protein L19 ribosomal protein L20 ribosomal protein L22 ribosomal protein L23 ribosomal protein L24 ribosomal protein L30 ribosomal protein L33 ribosomal protein L35 ribosomal protein S01 ribosomal protein S02 ribosomal protein S03 ribosomal protein S04 ribosomal protein S05 ribosomal protein S06 ribosomal protein S07 ribosomal protein S07 ribosomal protein S08 ribosomal protein S09 ribosomal protein S10 ribosomal protein S11 ribosomal protein S12 ribosomal protein S12 ribosomal protein S13 ribosomal protein S14 ribosomal protein S15 ribosomal protein S16 ribosomal protein S17 ribosomal protein S18 ribosomal protein S19 ribosomal protein S20 ribosomal protein S21

57.9% 46.9% 53.8% 51.3% 67.0% 46.2% 35.6% 36.5% 71.4% 75.4% 60.6% 59.5% 57.4% 59.3% 48.7% 62.7% 59.8% 63.5% 47.3% 52.2% 50.8% 46.4% 67.9% 48.3% 32.6% 60.3% 54.0% 51.9% 60.6% 32.7% 52.5% 51.9% 39.9% 50.5% 55.9% 60.7% 78.9% 78.9% 61.9% 51.6% 61.6% 36.6% 59.6% 48.5% 63.1% 40.0% 38.2%

.... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... ... .... .... .... .... .. .... .... .... .... .... .... ... .... .... .... .... ... .... .... .... .... .... .... .... ... .... .... .. ... ....

Translation factors Aq1364 efp Aq2114 eif Aq712 frr Aq001 fusA Aq075a infA Aq2032 infB Aq1777 infC Aq876 prfA Aq1840 prfB Aq1033 selB Aq715 tsf Aq005 tufA1 Aq1928 tufA2

elongation factor P initiation factor eIF-2B alpha subunit ribosome recycling factor elongation factor EF-G initiation factor IF-1 initiation factor IF-2 initiation factor IF-3 peptide chain release factor RF-1 peptide chain release factor RF-2 elongation factor SelB elongation factor EF-Ts elongation factor EF-Tu elongation Factor EF-Tu

48.6% 58.4% 43.0% 91.9% 69.1% 48.5% 53.6% 54.8% 49.9% 30.4% 35.8% 74.4% 73.9%

.... ... .... .... .... .... .... .... .... .... .... .... ....

Protein modification Aq731 ccdA Aq579 def Aq2093 dsbC Aq055 hemX1 Aq2043 hemX2 Aq1053 nifS1 Aq739 nifS2 Aq1871 pmbA Aq2102 prmA Aq567 rimI Aq576 stpK Aq152 tlpA

cytochrome c-type biogenesis protein polypeptide deformylase thiol:disulfide interchange protein cytochrome c biogenesis protein cytochrome c biogenesis protein FeS cluster formation protein NifS FeS cluster formation protein NifS peptide maturation ribosomal protein L11 methyltransferase ribosomal-protein-alanine acetyltransferase ser/thr protein kinase thiol disulfide interchange protein

32.0% 41.4% 27.6% 26.2% 36.2% 38.5% 45.5% 25.6% 35.1% 37.9% 30.8% 37.6%

.... .... .... .... .... .... .... .... .... .... .... ...

Proteases Aq1950 Aq1672 Aq1296 Aq1339 Aq1337 Aq1015 Aq801

serine protease ATPase subunit of ATP-dependent protease ATP-dependent Clp protease ATP-dependent Clp protease proteolytic subunit ATP-dependent protease ATPase subunit clpX collagenase sialoglycoprotease

26.5% 46.8% 54.9% 65.4% 66.1% 41.3% 45.5%

... .... .... .... .... .... ....

aprV clpB clpC clpP clpX col gcp

.... .... .... .... .. .... .... .... .... .... ....

Aq1671 Aq1450 Aq242 Aq076 Aq1459 Aq2099 Aq1535 Aq618 Aq797 Aq552 Aq2204

hslV htrA lon map npr pepA pepQ pfpI prc sms ymxG

heat shock protein HsLV periplasmic serine protease Lon protease methionyl aminopeptidase neutral protease leucine aminopeptidase xaa-pro dipeptidase protease I carboxyl-terminal protease ATP-dependent protease sms processing protease

57.6% 38.3% 50.6% 44.1% 27.7% 39.5% 31.9% 41.8% 41.8% 46.2% 28.3%

Transport Aq1222 Aq620 Aq1095 Aq1094 Aq1097 Aq417 Aq413 Aq297 Aq2160 Aq1531 Aq2122 Aq2137 Aq1563 Aq695 Aq1122 Aq469 Aq786 Aq112 Aq682 Aq343 Aq851 Aq724 Aq1445 Aq1125 Aq1132 Aq1331 Aq468 Aq1073 Aq911 Aq1062 Aq1255 Aq1330 Aq1268 Aq1863 Aq1725 Aq1229 Aq447 Aq1609 Aq086 Aq415 Aq929 Aq2030 Aq215 Aq1441

abcT1 abcT2 abcT3 abcT4 abcT5 abcT6 abcT7 abcT8 abcT9 abcT10 abcT11 abcT12 abcT13 acrD1 acrD2 acrD3 acrD4 amtB arsA1 arsA2 corA ctrA1 ctrA2 ctrA3 czcB1 czcB2 czcB2 czcD ebs emrB feoB gltP hvsT kch lepA mffT mgtC modA modC napA1 napA2 napA3 nasA oppA

Aq481 Aq1509 Aq2019 Aq1055 Aq2018 Aq2016 Aq2129 Aq098 Aq2077 Aq2106 Aq1988 Aq1504 Aq031

oppB oppC pstA pstB pstC pstS sbf secG snf ssf tolQ trk1 trnS

ABC transporter ABC transporter ABC transporter (ABC-2 subfamily) ABC transporter ABC transporter (hlyB subfamily) ABC transporter ABC transporter ABC transporter ABC transporter ABC transporter ABC transporter ABC transporter ABC transporter (MsbA subfamily) cation efflux system (AcrB/AcrD/AcrF family) cation efflux system (AcrB/AcrD/AcrF family) cation efflux system (AcrB/AcrD/AcrF family) cation efflux (AcrB/AcrD/AcrF family) ammonium transporter anion transporting ATPase anion transporting ATPase Mg(2+) and Co(2+) transport protein cation transporting ATPase (E1-E2 family) cation transporting ATPase (E1-E2 family) cation transporting ATPase (E1-E2 family) cation efflux system (czcB-like) cation efflux system (czcB-like) cation efflux system (czcB-like) cation efflux system (CzcD-like) erythrocyte band 7 homolog major facilitator family transporter ferrous iron transport protein B proton/sodium-glutamate symport protein high affinity sulfate transporter potassium channel protein G-protein LepA transporter (major facilitator family) Mg(2+) transport ATPase molybdate periplasmic binding protein Molybdenum transport system permease Na(+)/H(+) antiporter Na(+)/H(+) antiporter Na(+)/H(+) antiporter nitrate transporter transporter (extracellular solute binding protein family 5) transporter (OppBC family) oligopeptide transport system permease phosphate transport system permease PstA phosphate transport ATP binding protein phosphate transport system permease protein C phosphate-binding periplasmic protein Na(+) dependent transporter (Sbf family) protein export membrane protein SecG Na(+):neurotransmitter symporter (Snf family) Na(+):solute symporter (Ssf family) TolQ homolog K+ transport protein homolog transporter (Pho87 family)

.... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... .... ... .... .... .... .... .... .... .... .... ... 37.0% .... 46.2% .... 46.2% .... 43.5% .... 68.1% .... 45.2% .... 52.4% .... 34.9% .... 35.7% .... 25.7% .... 47.4% . 32.5% . 40.6% .... 46.8% ....

Uncategorized Aq1023 Aq2110 Aq158 Aq458 Aq542 Aq147 Aq1303a Aq1265 Aq348 Aq212 Aq337 Aq528 Aq148 Aq2095 Aq1994 Aq1919 Aq1540 Aq1052 Aq1657 Aq944 Aq1108 Aq1458

acuC1 acuC2 apfA bcp bcpC cobW cspC cstA ctc cynS cysQ dedF deoC dksA era1 era2 gcpE gcsH1 gcsH2 gcsH3 gcsH4 gcvT

36.9% 38.6% 36.6% 40.6% 37.4% 29.5% 67.2% 33.0% 34.7% 39.5% 47.4% 52.4% 46.6% 35.1% 49.7% 43.0% 50.1% 28.6% 39.8% 36.7% 44.8%

Aq108b Aq101 Aq2120 Aq1091 Aq708 Aq1925 Aq1579 Aq1983 Aq748 Aq1739 Aq1977 Aq1560 Aq1823 Aq1789 Aq587 Aq1820 Aq896 Aq1300 Aq1507 Aq967 Aq141 Aq994 Aq057 Aq287 Aq832 Aq871 Aq2021 Aq773 Aq629

hfq hly hlyC hylA hyuA hyuB iagB imp2 ispA lytB masA mglA1 mglA2 mviB neaC nfeD nifU omp omt ostA pkcI pncA sfsA smb surE thdF tldD tly xcpC

acetoin utilization protein acetoin utilization protein AP4A hydrolase bacterioferritin comigratory protein phosphonopyruvate decarboxylase cobalamin synthesis related protein CobW cold shock protein carbon starvation protein A general stress protein Ctc cyanate hydrolase CysQ protein phenylacrylic acid decarboxylase deoxyribose-phosphate aldolase dnaK suppressor protein GTP-binding protein Era GTP binding protein Era GcpE protein glycine cleavage system protein H glycine cleavage system protein H glycine cleavage system protein H glycine cleavage system protein H aminomethyltransferase (glycine cleavage system T protein) host factor I hemolysin hemolysin homolog protein hemolysin N-methylhydantoinase A N-methylhydantoinase B invasion protein IagB myo-inositol-1(or 4)-monophosphatase geranylgeranyl pyrophosphate synthase LytB protein enolase-phosphatase E-1 gliding motility protein gliding motility protein MglA ‘virulence factor’ homolog MviB N-ethylammeline chlorohydrolase nodulation competitiveness protein NfeD NifU protein outer membrane protein O-methyltransferase organic solvent tolerance protein protein kinase C inhibitor (HIT family) pyrazinamidase/nicotinamidase sugar fermentation stimulation protein small protein B stationary phase survival protein SurE thiophene and furan oxidation protein TldD protein hemolysin chromosome assembly protein homolog

Nature © Macmillan Publishers Ltd 1998

34.7% 36.8% 34.4% 37.7% 45.5% 51.8% 51.5% 49.3% 45.3% 36.4% 42.5% 38.2% 30.5% 22.7% 32.0% 34.2% 27.7% 49.0% 41.5% 33.9% 31.1% 30.7% 28.1% 43.8% 23.7% 26.9% 28.5% 43.4% 50.2% 28.3% 32.6% 35.6% 29.4% 30.1% 59.8% 37.2% 36.2% 38.2% 44.8% 27.6% 32.7% 26.8% 35.8%

.... .... .... .... .... ... .... .... .... .... .... .... .... .... .... .... .... .... ... ... ... 42.2% .... 53.5% .... 33.7% ... 29.3% .... 33.5% ... 39.8% .... 43.1% .... 38.3% ... 36.0% .... 40.7% .... 43.9% .... 42.3% .... 42.4% .... 34.1% .... 29.7% .... 42.8% .... 37.9% .... 48.3% .... 25.5% .... 39.5% .... 22.0% .... 59.0% .... 39.1% .... 27.3% .... 52.0% .... 44.1% .... 45.4% .... 40.9% .... 43.8% .... 33.3% ....

8