May 20, 1994 - molecules and the UDPG bound model includes positions for the uridine diphosphate .... (Remington et al., 1982; Wiegand et al., 1984), hexokinase. (Anderson et al., 1978; ..... J. Virol., 5, 700-708. Warwicker,J. and Watson ...
The EMBO Journal vol.13 no.15 pp.3413-3422, 1994
Crystal structure of the DNA modifying enzyme :-glucosyltransferase in the presence and absence of the substrate uridine diphosphoglucose Alice Vrielinkl, Wolfgang Ruger2, Huub P.C.Driessen3 and Paul S.Freemont4 Protein Structure Laboratory, Imperial Cancer Research Fund, Lincoln's Inn Fields, London WC2A 3PX, UK, 2Arbeitsgruppe Molekulare Genetik, Fakultat fUr Biologie, Ruhr Universitat, Bochum, Germany and 3ICRF Unit of Structural Molecular Biology, Birkbeck College, Malet Street, London WCIE 7HX, UK 'Present address: Department of Biochemistry, McGill University, 3655 Drummond Street, Montreal, Canada 4Corresponding author
Communicated by M.Crumpton
Bacteriophage T4 P-glucosyltransferase (EC 2.4.1.27) catalyses the transfer of glucose from uridine diphosphoglucose to hydroxymethyl groups of modified cytosine bases in T4 duplex DNA forming f-glycosidic linkages. The enzyme forms part of a phage DNA protection system. We have solved and refined the crystal structure of recombinant P-glucosyltransferase to 2.2 A resolution in the presence and absence of the substrate, uridine diphosphoglucose. The structure comprises two domains of similar topology, each reminiscent of a nucleotide binding fold. The two domains are separated by a central cleft which generates a concave surface along one side of the molecule. The substrate-bound complex reveals only clear electron density for the uridine diphosphate portion of the substrate. The UDPG is bound in a pocket at the bottom of the cleft between the two domains and makes extensive hydrogen bonding contacts with residues of the C-terminal domain only. The domains undergo a rigid body conformational change causing the structure to adopt a more closed conformation upon ligand binding. The movement of the domains is facilitated by a hinge region between residues 166 and 172. Electrostatic surface potential calculations reveal a large positive potential along the concave surface of the structure, suggesting a possible site for duplex DNA interaction. Key words: DNA modification/enzyme/glucosylation/ T-phage/X-ray crystal structure
Introduction T-even bacteriophage, and in particular T4, have been the subject of extensive biochemical and genetic analyses leading to a detailed molecular understanding of T-phage infection, replication and assembly (Mosig and Eiserling, 1988). The extreme virulence of these phages is reflected in a complete cessation of host macromolecular synthesis immediately after phage infection (for a review see
© Oxford University Press
Rabussay, 1982). Moreover, the host DNA is degraded by phage encoded enzymes (Warner et al., 1970; Mathews et al., 1983). To protect it's own genome against phage encoded nucleases and host restriction endonuclease systems, the phage has evolved a specific DNA modification system. In T-even phage this specific DNA modification process involves two steps. First, cytosine is replaced by 5-hydroxymethylcytosine which is incorporated into DNA synthesis forming hydroxymethylated DNA (HMC-DNA) (Wyatt and Cohen, 1952; Lamm et al., 1988). As a second step, in a post-replicative mechanism, the hydroxymethylated cytosines are glucosylated forming glucose-HMC-DNA (Revel, 1983): UDP-glucose + HMC-DNA glucosyl-HMC-DNA + UDP. The enzymes catalysing DNA glucosylation in T4 phage are a-glucosyltransferase (AGT) and 0-glucosyltransferase (BGT) (Kornberg et al., 1961; Josse and Kornberg, 1962; Zimmerman et al., 1962). In T2 and T6 phage ,-glucosyltransferase is replaced by ,B-glucosyl-HMC-a-glucosyltransferase (Lehman and Pratt, 1960). While AGT and BGT form a- and P-glycosidic linkages directly to the hydroxymethylcytosine bases respectively, 3-glucosylHMC-a-glucosyltransferase links a second glucose molecule in a 1,6, linkage to bases which have already been a-glucosylated (Lehman and Pratt, 1960). The glucosylation pattern occurs in a species-specific fashion for each of the three phage strains (Lehman and Pratt, 1960). The glucosylation reaction catalysed by these enzymes involves the transfer of glucose from host-synthesized uridine diphosphoglucose (UDPG) to the hydroxymethyl group of cytosine bases in double-stranded DNA. The genes for these three enzymes have been sequenced and the proteins overexpressed and purified (Gram and Ruger 1985; Tomaschewski et al., 1985; Winkler and Ruger 1993). Sequence comparisons among the three glucosyltransferase enzymes show only limited homology (Tomaschewski et al., 1985; Winkler and Roger 1993), suggesting that these enzymes may have different threedimensional structures, although convergent structural evolution cannot be excluded. Apart from the protective function, glucosylation of phage DNA has also been implicated as having a control function on phage-specific gene expression. Studies have shown that non-glucosylated T4 DNA is significantly more active in stimulating transcription and protein synthesis (Cox and Conway, 1973; Roger 1978) and that this control occurs during late gene expression (Dharmalingam and Goldberg, 1979). This control function may involve the specificity of the phage-induced modification of Escherichia coli RNA polymerase (Wu and Geiduschek, 1975) or the structure of the glucosylated DNA template which may result in an altered susceptibility 3413
A.Vrielink et al.
of glucosylated phage DNA to nucleases and other enzymes. Experimental studies have shown that glucosylated DNA is unable to undergo the transition from B to A conformation, whereas non-glucosylated DNA can (Mokulskaya et al., 1966). The glucose group of the modified B-DNA would lie in the major groove and would sterically prevent major groove narrowing, an event characteristic of B to A transition. Non-glucosylated DNA therefore would have greater structural flexibility than glucosylated DNA, allowing it potentially to adopt a larger number of conformations. For many years DNA modification by cytosine hydroxymethylation and glucosylation have been found only in Teven phage. More recently, however, a similar modification has been observed in the African trypanosome, Trypanosoma brucei, where the modified base has been identified as 3-D-glucosyl-hydroxymethyluracil (Gommers-Ampt et al., 1993). The role of such a modification system in Tbrucei is unclear, although it has been suggested to be directly involved in the regulation of variant surface glycoprotein (VSG) gene expression (Gommers-Ampt et al., 1993).
This is of particular importance since VSG expression is the primary mechanism for parasitic surface coat replacement, a mechanism which protects the parasite from immune recognition and neutralization (Bemards et al., 1984). The identification of a glucosylated form of DNA in organisms other than phage suggests that this form of DNA modification may be more widespread than has previously been thought and leads to further speculation on mechanisms of DNA protection and control of gene
expression.
In order to understand, at the molecular level, this unusual DNA modification process we have solved the crystal structure of T4-phage [-glucosyltransferase. The BGT structure represents the first example of an enzyme which glucosylates double-stranded DNA and provides the basis for studies into the mechanisms of non-specific DNA recognition combined with specific base modification. Furthermore, the BGT structure may provide clues as to the mechanism of the trypanosome-specific DNA modification system which may have important therapeutic consequences.
A
B
Fig. 1. Stereo diagram showing the regions of the final 2Fobs-FcaIc electron density maps for I-glucosyltransferase calculated using all reflections between 10 and 2.2 A and phases from the final model. The contour level used is 1.3 times the standard deviation of each map. (A) A view of the electron density for Ile94 and Tyr95 in the substrate-free structure. (B) A view of the density for uridine diphosphate in the substrate-bound structure.
3414
I--glucosyltransferase
Results and discussion Electron density map and quality of the model The final electron density maps for both the substrate free and substrate bound models were calculated using the Fourier coefficients (2Fobs-Fcalc), (Xcaic. The maps show clear electron density for residues 1-67, 75-107 and 123-351. The substrate-free structure includes 184 water molecules and the UDPG bound model includes positions for the uridine diphosphate portion of the UDPG substrate and 221 water molecules. Two loops within the structure, 68-74 and 108-122 have no visible electron density and thus could not be modelled. Figure 1 shows examples of the electron density map where a clear interpretation of the structure was possible. Ramachandran plots of the two models (not illustrated) show that all non-glycine residues lie inside the energetically allowed regions of (p/AJ space. For both structures Asp2O5, Asp263, Asp350 and His290 fall in left-handed helical regions. In addition, for the substrate-bound structure, Serl89 and Argl9l also fall in the left-handed helical region. Table I gives the refinement statistics for each of the final models.
ferase (Cheng et al., 1993a), guanine DNA methyltransferase (Moore et al., 1994) and adenine-specific DNA methyltransferase (Labahn et al., 1994) shows only limited similarity. Both the HhaI DNA methyltransferase and adenine-specific DNA methyltransferase are monomers comprising two domains, one of which contains a nucleotide binding fold that binds the substrate, S-adenosyl-Lmethionine. UDPG binding In order to obtain a substrate-bound complex, data were collected using crystals in which UDPG had not been removed. Difference electron density maps using data collected from these crystals, and the substrate-free structure after rigid body and positional refinement, showed clear density only for the uridine diphosphate portion of the substrate. Further refinement and manual inspection
) .5
Description of the overall structure A ribbon representation showing the overall structure of BGT is shown in Figure 2 with the strands and helices labelled as referred to in the text. The molecule has dimensions 45X45X55 A and consists of two domains which adopt similar topologies, as shown in Figure 3. The two domains are separated by a central cleft. The Nterminal domain comprises residues 1-165 and 338-351 and consists of a seven stranded parallel twisted Psheet surrounded by seven a-helices. The second domain comprises residues 182-316 and consists of a six stranded parallel twisted ,-sheet and five a-helices. The two domains are linked by an extended chain region from residue 166-181 and a long a-helix (axl2) from residue 317-334. The fold for these topologically similar domains resembles a Rossmann nucleotide binding motif (Rossmann et al., 1975). The C-terminal domain represents a more classical nucleotide binding motif, whereas slight deviations are observed in the N-terminal domain. Although the sequential arrangement of the secondary structure elements of the two domains is similar, a superposition of the domains gives no significant three-dimensional overlap. Indeed, an analysis of the sequences of the two domains does not reveal any significant homology. A topological comparison of the structure of BGT with the DNA modification enzymes, HhaI DNA methyltrans-
Fig. 2. A ribbon representation of the structure of f-glucosyltransferase drawn using the program MOLSCRIPT (Kraulis, 1991). The secondary structure elements of the protein are labelled as referred to in the text and in Figure 3. The atoms for the UDP portion of the substrate are also shown in ball and stick representation.
Table I. Final crystallographic refinement statistics
Substrate-free structure
R factor (%)a
19.4 10-2.2 Resolution (A) No. of reflections 18758 2683 No. of protein atoms 184 No. of solvent molecules Average B factor for the model (A2) 29.7 R.m.s. deviation in bond lengths (A) 0.011 aR factor = 100 X
XIFo-F I/1IF L
Substratebound complex 19.1 10-2.2 20885 2697 221 21.0 0.011
Fig. 3. A schematic representation of the topology of Pglucosyltransferase. The a-helices are depicted by cylinders and the 5-strands by arrows. The secondary structure assignments have been determined as defined by Kabsch and Sander (1983). Dotted lines indicate regions of the structure which could not be modelled. The secondary structure elements from each domain are aligned to show their topological similarity, e.g. al and aS are equivalent to ax7 and atI 1 respectively.
3415
A.Vrielink et aL
of the electron density maps calculated using the Fourier coefficients 3FOb,-S2FCalc and Fobs-FCalc showed some disconnected electron density near to the terminal phosphate oxygen atoms of UDP. In order to ascertain whether the crystal used for data collection had UDP or UDPG bound to the enzyme, a new crystal was soaked overnight in fresh mother liquor containing 3 mM UDPG and data were collected to 2.2 A resolution. Inspection of the difference electron density map again showed density only for the UDP portion of the substrate. The fragmented density is situated in a pocket which is of the appropriate size to accommodate a glucose ring. It was not possible from the electron density, however, to determine whether the glucose is present in a disordered conformation or if the sugar ring had been cleaved off by the enzyme. In order to determine whether UDPG or UDP is present, a sample of crystals and mother liquor were analysed by HPLC using a DEAE-cellulose ion exchange column. The results showed only UDPG present in both the mother liquor and in the dissolved crystals. This confirms that the ligand bound to the enzyme in the crystals is UDPG rather than UDP and the glucose portion of the substrate must be present in a disordered conformation. The UDPG substrate binds to the protein in the cleft between the two domains. The UDP portion can be divided into three regions, the phosphate groups, the ribose ring and the uracyl ring, all of which make hydrogen bonding contacts to the C-terminal domain of the protein and to water molecules (Figure 4). Three phosphate oxygen atoms make hydrogen bond contacts to the guanidinium groups of three arginine residues (191, 195 and 269), three water molecules (487, 488 and 489) and the main chain nitrogen atom of Serl89. The region surrounding the phosphate
oxygens of the ligand is therefore occupied by a predominance of positively charged side chains, presumably to balance the high negative charge of the phosphate groups. This is in contrast to what has been seen in more classic nucleotide binding proteins, where the negative charge of the phosphates is partially accommodated by a helix dipole (Wierenga et al., 1985, 1986). In nucleotide binding proteins, the close approach of the phosphate groups to the N-terminus of the helix is enabled by the conserved pattern of glycine residues, which is not seen in the sequence of BGT. The uracyl ring is held in position through hydrogen bonding contacts between 04 and N3 of the ring and the main chain nitrogen and oxygen atoms of Ile238 respectively. Only 02 of the uridine ring does not make any interaction with the protein. The ribose ring of the ligand adopts a C2'-endo ring pucker and is held in position through hydrogen bond contacts between 02' and 03' and the carboxyl side chain of Glu270. Similar interactions between the ribose hydroxyl groups and the side chains of glutamate or aspartate residues have been observed in a number of structures of nucleotide binding proteins (Eklund et al., 1984; Skarzynski et al., 1987). The UDP ligand is bent slightly away from a fully extended conformation and is held in this conformation by H20488, which makes hydrogen bonds to 04P of the terminal phosphate group and 03' of the ribose ring.
Comparison of the substrate-free structure with the UDPG complex Comparison between the structures of the substrate-free and substrate-bound structures reveals significant changes as a result of binding uridine diphosphoglucose. The
Fig. 4. A stereo view of the hydrogen bond interactions made between the UDP portion of the substrate and the surrounding protein and water molecules. The UDP portion of the substrate is shown in open bonds and the protein is shown in closed bonds. Three water molecules are shown by double circles. Hydrogen bond interactions are shown by dotted lines.
3416
-9glucosyltransferase
substrate-free structure adopts an 'open' conformation in a similar fashion to the structures of citrate synthase (Remington et al., 1982; Wiegand et al., 1984), hexokinase (Anderson et al., 1978; Bennett and Steitz, 1978, 1980) and the periplasmic binding proteins, a number of which have been solved in the closed/liganded and in the open/ free forms (Spurlino et al., 1991; Kang et al., 1992; Sharff et al., 1992; Oh et al., 1993). Although the structures of periplasmic binding proteins show some overall similarity to BGT in that they contain two al, domains with a central cavity, no three-dimensional superposition can be obtained. Upon binding of the UDPG substrate in BGT, a conformational change occurs resulting in an approximate 5° rigid body rotation of the C-terminal domain to a closed conformation. Superposition of the residues making up the N-terminal domains of the two structures results in an r.m.s. deviation of 1.4 A for all a carbon atoms and an r.m.s. deviation of 2.0 A for the a carbon atoms of the C-terminal domain. A superposition of the two structures is shown in Figure 5. The central region of helix a12 and residues 166-172 between the two domains, form the region of the structure responsible for the domain movement and is designated as the hinge region. The electron density of residues 166172 is poorly defined in the substrate-free structure with temperature factors which are significantly higher than those observed in the UDPG-bound structure. Two salt bridges in the hinge region, between Asp171 and Lys28 and Lys 166 and Glu272, are observed in the UDPG-bound structure but are not present in the substrate-free structure. These salt bridges involve residues from each domain and contribute to the observed domain movement. A comparison of the main chain dihedral angles in this region do not show any significant differences in individual (p/A angles upon UDPG binding. Therefore the observed
movement is generated by a combination of a number of small main chain dihedral angle changes along the entire hinge region of the structure rather than large changes to specific residues within the hinge region. Adjacent to the hinge region, residues 173-181 adopt a PPII (polyproline) type helical conformation (Adzhubei and Steinberg, 1993). Interestingly, a hinge region has also been observed in the structures of periplasmic binding proteins (Sack et al ., 1989a,b; Sharff et al., 1992; Zou et al., 1993) and is believed to be responsible for mediating the conformational change of the molecule upon ligand binding. It is interesting to note that derivatization with K2Pt(NO2)4 was only possible with crystals which had not had the substrate removed. However, upon soaking with the heavy metal, the cell dimensions were identical to those of the substrate-free crystals. The platinum metal binds to a methionine residue (Met 169) in the hinge region and appears to induce the enzyme to dissociate from UDPG and adopt the substrate-free conformation. It may be that the more mobile nature of this hinge region in the substrate-free structure does not allow the metal to bind to Met169. However, in the presence of UDPG, the conformation of this side chain is more ordered, thus enabling K2Pt(NO2)4 to bind in an isomorphous fashion. Apart from this hinge region, a loop (residues 188195), situated in the C-terminal domain between strand ,B8 and helix a7, moves significantly as a result of substrate binding (see Figure 5). A number of residues (Argl91, Ser192 and Gly193) within this loop fall in the left-handed helical region of the Ramachandran plot. In the substratefree structure the electron density for this loop is poorly defined and the temperature factors are high, indicating conformational flexibility. In contrast, the electron density for this loop region in the UDPG-bound structure is well defined and the temperature factors are comparably low
Fig. 5. A stereo diagram showing the superposition of the a carbon atoms for the substrate-free and UDP-bound complex of 3-glucosyltransferase. The matrix for the superposition was calculated using only residues in the N-terminal domain (1-66, 77-106, 125-169 and 338-351). After superposition, the r.m.s separation for all a carbon atoms is 1.4 i. The r.m.s separation for the a carbon atoms in the UDP binding domain (I182-316) is 2.0 A. Red indicates the stibstrate-bound structure and green the substrate-free structure. A van der Waals representation for the UDP portion of the substrate is shown in yellow.
3417
A.Vrielink et al.
(the average temperature factors for this loop in the structure of the complex are 22.2 A2 for main chain atoms and 20.4 A2 for side chain atoms, whereas in the substratefree structure the temperature factors are 53.3 A2 and 53.4 A2 respectively). Three salt bridge interactions, Argl91-AsplOO, Argl91-Asp258 and Argl95-Asp258 are present in the UDPG-bound structure but are not observed in the substrate-free structure. The side chains of Argl9l and Arg195 are also involved in hydrogen bonding interactions with the phosphate oxygen atoms of the substrate. Thus this loop is held in a specific conformation by both salt bridge interactions with other protein side chains and by hydrogen bond interactions with the substrate. An additional region of the structure, corresponding to residues 235-238 shows a significant change upon UDPG binding. The main chain of Ile238 in this region is also involved in hydrogen bonding interactions with the uracyl ring of the substrate, as described above. The main chain electron density in this region is poorly defined in the substrate-free structure and the model is characterized by high temperature factors (45.1 A2 for main chain atoms and 46.5 A2 for side chain atoms), indicating considerable flexibility. In contrast, the temperature factors for this region in the UDPG-bound structure are significantly lower (22.2 A2 for main chain atoms and 22.8 A2 for side chain atoms) and the chain is conformationally fixed by hydrogen bond interactions with the uracyl ring. Therefore, the binding of the substrate causes a conformational change in the molecule, resulting in a domain movement. A number of salt bridge interactions are observed between residues in both domains, which presumably contribute to the domain movement reducing the overall flexibility of the molecule. In addition, a number of regions in the structure become more ordered upon substrate binding as a result of interactions with atoms from the UDP portion of the substrate. It should be noted that the observed conformational change between the two structures may be constrained by crystal packing effects which could prevent further changes.
Proposed glucose binding site and implications for catalysis Although no significant electron density could be observed for the glucose ring of the substrate, its approximate location in the structure can be inferred from the position of the UDP portion of the ligand. The glucose moiety must lie in a pocket near to 05P of the terminal phosphate group of the UDP ligand. The other two oxygen atoms attached to the terminal phosphate group, 04P and 06P, interact with protein residues and water molecules as shown in Figure 5. Some unconnected difference electron density is visible in this pocket. However, it was not possible to unambiguously model the glucose ring into this density. The pocket, bounded by the UDP portion of the substrate and by residues from the N-terminal domain, is exposed to the external solvent environment via a channel extending from the concave surface between the two domains of the molecule. The top of this channel is lined by the side chains of Vall 8, Prol9, Ser67, ArglO2, Leul.03, Asn215 and H20442. The reaction catalysed by BGT involves the transfer of glucose from the substrate, UDPG, to the 5-hydroxymethylcytosine base of phage DNA, with the release of UDP. The terminal phosphate group of UDPG, which acts as the leaving group in the transfer of glucose to the base, is covered by the side chain of Argl9l, which makes salt bridge interactions with the side chains of Asp258 and AsplOO. It is known that BGT requires the presence of Mg2+ for catalysis (Josse and Kornberg, 1962). Divalent cations have been observed in the structures of a number of phosphodiesterase enzymes and are thought to either activate a nucleophile and/or stabilize the phosphate oxyanion leaving group, as suggested for the mechanism of the 3'-5' exonuclease activity of the Klenow fragment (Freemont et al., 1988). In a similar fashion, a divalent metal ion in BGT could act to stabilize the negative charge on the uridine diphosphate leaving group. The crystallization conditions for BGT, however, did not contain any magnesium, nor were any metal ions added during the purification (Tomaschewski et al., 1985). Thus,
Fig. 6. Stereo view of the active site region of the substrate-bound region of BGT. The UDP portion of the substrate is shown in open bonds and the protein is shown in closed bonds. The salt bridges between AsplOO, Argl9l and Asp258 are shown as dotted lines.
3418
3-glucosyltransferase although the protein binds divalent metal ions, none are present in the crystal structure. The BGT structure was inspected for the presence of potential metal binding residues in the vicinity of the terminal phosphate groups of the UDP ligand. Two aspartic acid residues, AsplOO and Asp258, are located near to this phosphate group, however, in the substrate-bound structure both are involved in salt bridge interactions with Argl91, as described above and shown in Figure 6. In addition, the hydroxyl group of Tyr261 lies near to the two carboxyl groups of Asp 1OO and Asp258 and to the guanidinium side chain of Argl9l. These residues could all provide the necessary ligands for coordinating the metal ion. Although the mechanism of glucose transfer to the 5hydroxymethylcytosine base is at present unknown, the structure of the substrate-bound form of BGT allows us to speculate on possible mechanisms. The transfer reaction could involve a nucleophilic attack on Cl of the glucose
ring by the 5-hydroxymethyl group from the modified cytosine base. Two acidic residues, AsplOO and Glu22, are located in the region of the active site and are positioned such that their carboxylate side chains are exposed on either side of the channel extending from the concave surface of the structure. Interestingly, the channel is of the appropriate dimensions to accommodate a cytosine nucleotide and is lined by a number of hydrophobic residues which could provide van der Waals contacts with the aromatic ring of the base (see Figure 7). The carboxylate group of Glu22 lies in a position which would be accessible to the proposed position of the nucleotide base and thus may activate the 5-hydroxymethyl group for nucleophilic attack on the glucose ring. If such an arrangement of nucleotide base, UDPG and metal ion were correct, it would suggest that the 5-hydroxymethylcytosine base loops out of the double helix of DNA in a similar manner to that which has been observed in the structure
Fig. 7. Stereo view towards the concave surface showing the van der Waals surface for the protein in green and the van der Waals surface for the UDP portion of the substrate in pink. A channel can be seen extending from the base of the concave surface into the substrate binding pocket.
Fig. 8. A view of the electrostatic potential surface for 3-glucosyltransferase. Red contours correspond to -2kTle and blue contours correspond to +2kT/e. The calculation was carried out with the program FDCALC (Warwicker and Watson, 1982) using an ionic strength of 0.1 M, pH 6.5 and dielectric constants of 3.0 for the protein and 80.0 for the solvent. The structure of the substrate-bound complex was used for the calculation. The unobserved loop regions and the UDP ligand were omitted from the calculation. A ribbon representation of the molecule is shown in yellow and the UDP ligand is represented by a dot surface.
3419
A.Vrielink et al. Table II. Data collection statistics for BGT Soak Soak time Method of Resolution Total Independent % complete Rmerge (%)a Ranom (%) Rderiv (%) concentration (days) data (A) reflections reflections (mM) collection
Data
Substrate-free UDPG complex K2Pt(NO2)4 K2PtAuCl4e K2HgI4 K2Pt(NO2)4 K2Pt(NO2)4 + K2AuC14e
aRmerge =
FASIt FASTc
1.0 1.0 0.1 1.0 1 + 1
1 1 3 1 1 +3 h
2.2 2.2 Xentronicsd 2.8 FASTC 3.0 FASTc 3.0 FASIF 2.9 FAST 3.0
52 070 74 613 21 762 20 972 16 751 28 025 20274
19 074 21 380 8435 (3a;) 9088 8940 9691 8790
84.3 94.5 76.7 98.9 97.3 95.5 95.2
8.4 8.6 4.0 8.1 4.7 7.9 11.1
6.7 5.4 4.9 6.0 6.4
18.1 19.1 16.7 20.8 19.4
Y_IIh,ij-hI/XXIhji (summed over all intensities).
bRderiv = XlFdenvh-FnathlYFnath (in the resolution range 10-3.0 A).
cData collected at Glaxo Research Laboratory, Oxford. dData collected in the Biophysics Department, University of Leeds. eSoak carried out in 100 mM acetate buffer (pH 5.6).
of the HhaI DNA methyltransferase complex with DNA (Cheng et al., 1993b).
Implications for DNA binding An inspection of the positions of charged residues along the surface of the protein shows a predominance of lysines and arginines along the concave surface. Eleven positively charged residues are positioned along the surface, Lys16, Lys43, ArglO2, Lysl49 and Lysi50 from the N-terminal domain and Arg217, Lys219, Lys222, Lys225, Lys237 and Lys259 from the C-terminal domain. The side chains of residues 217, 219, 222 and 225 lie along one edge of the concave surface and are not involved in crystal contacts. In contrast, only three negatively charged residues are located along this surface, AsplOO, Asp258 and Glu196. The side chains of AsplOO and Asp258 are both involved in salt bridge interactions with Argl91 and therefore do not contribute fully to the overall positive charge along this surface. The positions of this large number of positively charged residues provide strong evidence that the DNA double helix will lie along this concave surface. To further illustrate this, an electrostatic potential surface has been calculated for the molecule using the program FDCALC (Warwicker and Watson, 1982) at pH 6.5 (Figure 8). As expected, the surface shows a significant positively charged region of the molecule along the concave surface, suggesting this to be the position of the DNA. The dipole moments of helices al, a7, a9 and a3 may also contribute to the positive electrostatic potential surface. 3-Glucosyltransferase does not recognize any specific nucleotide sequence, and it may only be necessary for the enzyme to recognize the modified base, 5-hydroxymethylcytosine. It is therefore likely that the protein contacts the DNA through interactions with the phosphate backbone and thus a large positively charged surface along the protein would provide a suitable contact surface. An attempt to model a double helical DNA structure along this surface produced a large number of bad contacts and the modified nucleotide base in a helical conformation was not able to access the active site of the enzyme. One cannot, however, rule out the possibility that conformational changes in the structures of the protein and/or the DNA may occur upon complex formation, as has been observed in a number of structures
3420
of complexes of DNA binding protein with bound DNA (Schultz et al., 1991; Winkler et al., 1993). The inferred position of the glucose ring, deeply buried in the structure, and the presence of a channel extending from the surface to the glucose binding pocket suggest that significant conformational changes must occur in order for the glucose phosphate bond to be accessible to the hydroxymethyl group of the modified cytosine base. As has been suggested above, these changes might occur to the DNA in the form of the reactive base flipping out of the helical structure into the channel or the protein may undergo further changes. Further analysis of the specific interactions and the mechanism await a structure of the complex of BGT with DNA and UDPG in the presence of a divalent cation. We have determined the structure of the DNA modification enzyme 3-glucosyltransferase both in the presence and absence of the substrate, uridine diphosphoglucose. The enzyme represents a novel structure for DNA binding proteins. From a central cleft, a channel extends into the molecule to form the active site where the substrate binds. Upon binding of the substrate a movement of the two domains relative to each other is observed, resulting in a more closed conformation of the structure and increasing the interactions between the two domains. The cleft between the two domains is lined by positively charged residues providing a surface for DNA interaction.
Materials and methods Crystallization, heavy metal derivatives and data collection The enzyme P-glucosyltransferase was crystallized as described by Freemont and Ruger (1988). The crystals were grown in the presence of the substrate, uridine diphosphoglueose. The density of the crystals was measured using a Ficoll 400 (Pharmacia) gradient based on the method described by Westbrook (1985). The density of the gradient was assessed using droplets of toluene and carbon tetrachloride, together with a crystal of known density. The density of BGT was found to be 1.14 g/ml, corresponding to one molecule in the asymmetric unit and a solvent content of 55%. For the heavy atom derivative soaks using K2AuCl4 and K2HgI4 the substrate was removed by a stepwise procedure where the concentration of the substrate was gradually reduced. The cell dimensions of the substrate-bound crystals were a = 151.92 A, b = 52.26 A, c = 52.74 A, while the desoaked crystals had cell dimensions of a = 152.88 A, b = 52.25 A, c = 53.66 A. The observed change in cell dimensions
P-glucosyltransferase coordinates of the substrate free structure have been deposited in the Protein Data Bank, Brookhaven.
Table III. Heavy atom refinement statistics for BGT
Derivative K2Pt(NO)4 K2AuCI4 K2HgI4 K,Pt(NO2)4 K2Pt(NO2)4 + K2AuCl4
No. of heavy RcU11is atom sites 5 3 2 4 7
0.68 0.65 0.95 0.71 0.7
Phasing power 2.1 1.6 0.5 1.5 1.7
Heavy atom refinement was carried out using the program MLPHARE. The overall figure of merit after heavy atom refinement and phasing was 0.59 on 8605 reflections from 20 to 2.8 A. The phases were modified by applying three cycles of solvent flattening using a solvent content of 50%. upon removal of the substrate suggests a possible conformational change in the molecule. Derivatization of the desoaked crystals with K2Pt(NO2)4 was unsuccessful. However, soaking substrate-bound crystals in K2Pt(NO2)4 gave a suitable derivative with cell dimensions similar to the native desoaked crystals. Similarly, a double derivative was obtained by firstly soaking crystals in K2Pt(NO2)4 followed by transferring the crystals to a solution containing K2AuCI4. The substrate-free crystals were stored in the mother liquor solution containing 65% saturated ammonium sulfate, 100 mM MES, pH 5.6, and 0.02% sodium azide. Native data were collected using both a Xentronics and FAST area detector and a rotating anode generator using graphite monochromatized CuKa radiation. The derivative data sets were collected on an Enraf Nonius FAST television area detector with no crystal cooling. Frames of 0.10 were collected with the crystal-detector distance set at 90 mm. Images from the area detector were evaluated using the program MADNES (Messerschmidt and Pflugrath, 1987). Further processing and scaling were carried out using the CCP4 program suite (Daresbury, UK). Details of the data collection are given in Table II.
Phasing and model building Two heavy atom sites from the Pt(NO2)4 derivative were obtained from a three-dimensional difference Patterson map. The positions and occupancies of these sites as well as the scale and temperature factors relating the derivative data to the native data were refined using the phase refinement program MLPHARE (CCP4 program suite, Daresbury, UK). The anomalous scattering data were used to determine the absolute configuration of the structure. The remaining derivatives were located using difference Fourier maps calculated with the single isomorphous replacement (s.i.r.) phases. Heavy atom refinement and phase calculations were carried out using all reflections. The final multiple isomorphous replacement (m.i.r.) phases used to calculate a 'best' native Fourier map were obtained using the refined parameters from 21 sites for five heavy atom derivatives. Table III gives the final parameters for each of the heavy atom derivatives. A mean figure of merit of 0.59 was calculated for the final m.i.r. phases. Density modification using the solventflattening procedure of Wang (1985) was used to improve the quality of the electron density map. A 2.8 A resolution electron density map was calculated using the combined phases, weighted by the figure of merit. The polypeptide chain was modelled into the electron density map using the graphics program O (Jones et al., 1991). The initial model consisted of 282 amino acid residues (80%), of which 219 were modelled with side chain atoms (62% of the total residues in the structure).
Crystallographic refinement of the substrate-free structure The initial model was refined with the molecular dynamics program XPLOR (Brunger et al., 1989). The starting crystallographic R factor for the structure was 45.3% on all reflections from 10.0 to 2.8 A. After applying simulated annealing, using the slow cooling protocol, and refining the overall temperature factor, the R factor was reduced to 27.7%. Electron density maps were calculated using the Fourier coefficients 3Fobs-2FcaIc and Fobs-Fcalc and the model rebuilt by examining the difference electron density. Additional side chains and some segments of the main chain were added to the model. Subsequent rounds of refinement were carried out, replacing the overall B factor by restrained individual B factors and extending the resolution to 2.2 A. The current model has an R factor of 19.4% using all reflections in the resolution range 10.0-2.2 A, with an r.m.s. bond length deviation of 0.01 A. The
Crystallographic refinement of the substrate-bound complex The coordinates for the substrate-free complex were initially used for the refinement of the UDPG complex. Rigid body refinement was applied to the substrate-free structure using ligand bound data to 2.8 A resolution. The structure was divided into three rigid bodies consisting of residues 1-170, 171-317 and 318-351. The starting R factor was 38.1% on all reflections between 10 and 2.8 A. Rigid body refinement followed by conjugate-gradient minimization and individual B factor refinement was carried out and the model rebuilt, incorporating the UDP portion of the substrate, by examining the difference electron density maps. Coordinates for ligands were obtained by modelling UDP using QUANTA. Subsequent cycles of conjugate-gradient minimization on all reflections in the resolution range 10-2.2 A (20 885 reflections) followed by manual rebuilding gave a final R factor for the UDP-bound complex of 19.1%. The r.m.s. bond length deviation is 0.01 A. The coordinates for the UDP ligand complex have been deposited in the Protein Data Bank, Brookhaven.
Acknowledgements We would like to thank Simon Phillips and Nobutoshi Ito of the University of Leeds and Alan Wonacott of Glaxo Research for providing us with data collection facilities. We would also like to thank Ursula Aschke for isolating and purifying the protein, Michael Gorman for measuring the crystal density, Jim Warwicker for calculating the electrostatic potential surface and Suhail Islam for assistance with his graphics program, PREPI. W.R. wishes to thank the D.F.G. for funding and A.V. wishes to thank the European Community for support in the form of a postdoctoral research fellowship.
References Adzhubei,A.A. and Stemberg,M.J.E. (1993) J. MoI. Biol., 229, 472 - 493. Anderson,C.M., Stenkamp,R.E. and Steitz,T.A. (1978) J. Mol. Biol., 123, 15-33. Bennett,W.S. and Steitz,T.A. (1978) Proc. Natl Acad. Sci. USA, 75, 4848-4852. Bennett,W.S. and Steitz,T.A. (1980) J. Moi. Biol., 140, 211-230. Bernards,A., VanHarten-Loosbroek,N. and Borst,P. (1984) Nucleic Acids Res., 12, 4153-4170. Brunger,A.T., Kuriyan,J. and Karplus,M. (1987) Science, 235, 458-460. Cheng,X., Kumar,S., Posfai,J., Pflugrath,J.W. and Roberts,R.J. (1993a) Cell, 74, 299-307. Cheng,X., Kumar,S., Sha,M. and Roberts,R.J. (1993b) Acta Crystallogr, A49 (Suppl.), 61. Cox,G.S. and Conway,T.W. (1973) J. Virol., 12, 1279-1287. Dharmalingam,K. and Goldberg,E.B. (1979) Virology, 96, 393-403. Eklund,H., Samama,J.P. and Jones,T.A. (1984) Biochemistry, 23, 5982-5996. Freemont,P.S. and Ruger,W. (1988) J. Mo. Biol., 203, 525-526. Freemont,P.S., Friedman,J.M., Beese,L.S., Sanderson,M.R. and Steitz,T.A. (1988) Proc. Natl Acad. Sci. USA, 85, 8924-8928. Gommers-Ampt,J.H., VanLeeuwen,F., DeBeer,A.L.J., Vliegenthart, J.F.G., Dizdaroglu,M., Kowalak,J.A., Crain,P.F. and Borst,P. (1993) Cell, 75, 1129-1136. Gram,H. and Ruger,W. (1985) EMBO J., 4, 257-264. Jones,T.A., Zou,J.-Y., Cowan,S. and Kjeldgaard,M. (1991) Acta Crvstallogr., A47, 110-119. Josse,J. and Kornberg,A. (1962) J. Biol. Chem., 237, 1968-1976. Kabsch,W. and Sander,S. (1983) Biopolymers, 22, 2577-2637 Kang,C.H., Shin,W.-C., Yamagata,Y., Gokcen,S., Ames,G.F.-L. and Kim,S.-H. (1992) J. Biol. Chem., 266, 23893-23899. Kornberg,S.R., Zimmerman,S.B. and Kornberg,A. (1961) J. Biol. Chem., 236, 1487-1493. Kraulis,P.J. (1991) J. Appl. Crystallogr., 24, 946-950. Labahn,J., Granzin,J., Schluckebier,G., Robinson,D.P., Jack,W.E., Schildkraut,I. and Saenger,W. (1994) Proc. Natl Acad. Sci. USA, in press. Lamm,N., Wang,Y., Mathews,C.K. and Ruger,W. (1988) Eur J. Biochem., 172, 553-563. Lehman,I.R. and Pratt,E.A. (1960) J. Biol. Chem., 235, 32543259. Mathews,C.K., Kutter,E.M., Mosig,G. and Berget,P.B. (eds) (1983) Bacteriophage T4. American Society for Microbiology, Washington.
3421
A.Vrielink et aL Messerschmidt,A. and Pflugrath,J.W. (1987) J. Appl. Crystallogr, 20, 306-315. Mokulskaya,T.D., Gorlenko,A.M., Zumchuk,L.A., Bogdanova,E.S., Mokulskii,M.A., Goldfarb,D.M. and Khesin,R.B. (1966) Biokhimiya, 31, 749-759. Moore,M.H., Gulbis,J.M., Dodson,E.J., Demple,B. and Moody,P.C.E. (1994) EMBO J., 13, 1495-1501. Mosig,G. and Eiserling,F. (1988) In Calendar,R. (ed.), The Bacteriophages. Plenum Publishing Corp., New York, pp. 521-606. Oh,B.-H., Pandit,J., Kang,C.-H., Nikaido,K., Goken,S., Ames,G.F.-L. and Kim,S.-H. (1993) J. Biol. Chem., 268, 11348-11355. Rabussay,D. (1982) In Cohen,P. and van Heyningen,S. (eds), Molecular Action of Toxins and Viruses. Elsevier Biomedical Press, pp. 219-331. Remington,S., Wiegand,G. and Huber,R. (1982) J. Mol. Biol., 158, 111-152. Revel,H.R. (1983) In Mathews,C.K., Kutter,E.M., Mosig,G. and Berget,P.B. (eds), Bacteriophage T4. American Society for Microbiology, Washington, DC, pp. 156-165. Rossmann,M.G., Liljas,A., Branden,C.-I. and Bansazak,L.J. (1975) In Boyer,P.D. (ed.), The Enzymes. Academic Press, New York, Vol. 11, pp. 61-102. Ruger,W. (1978) Eur J. Biochem., 88, 109-117. Sack,J.S., Saper,M.A. and Quiocho,F.A. (1989a) J. Mol. Biol., 206, 17-191. Sack,J.S. Trakhanov,S.D., Tsigannik,I.H. and Quiocho,F.A. (1989b) J. Mol. Biol., 206, 193-207. Sharff,A.J., Rodseth,L.E., Spurlino,J.C. and Quiocho,F.A. (1992) Biochemistry, 31, 10657-10663. Schultz,S.C., Shields,G.C. and Steitz,T.A. (1991) Science, 253, 1001-1007. Skarzynski,T., Moody,P.C.E. and Wonacott,A.J. (1987) J. Mol. Biol., 193, 171-187. Spurlino,J.C., Lu,G.-Y and Quicho,F.A. (1991) J. Biol. Chem., 266, 5202-5219. Tomaschewski,J., Gram,H., Crabb,J.W. and Ruger,W. (1985) Nucleic Acids Res., 13, 7551-7568. Wang,B.C. (1985) Methods Enzymol., 115, 90-112. Warner,H.R., Snustad,D.P., Jorgensen,S.E. and Koerner,J.F. (1970) J. Virol., 5, 700-708. Warwicker,J. and Watson,H.C. (1982) J. Mol. Biol., 157, 671-679. Westbrook,E.M. (1985) Methods Enzymol., 114, 187-196. Wiegand,G. Remington,S., Deisenhofer,J. and Huber,R. (1984) J. Mol. Biol., 174, 205-219. Wierenga,R.K., De Maeyer,M.C.H. and Hol,W.G.J. (1985) Biochemistry, 24, 1346-1357. Wierenga,R.K. Terpstra,P. and Hol,W.G.J. (1986) J. Mol. Biol., 187, 101-107. Winkler,M. and Ruger,W. (1993) Nucleic Acids Res., 21, 1500. Winkler,F.K., Banner,D.W., Oefner,C., Tsernoglou,D., Brown,R.S., Heathman,S.P., Bryan,R.K., Martin,P.D., Petratos,K. and Wilson,K.S. (1993) EMBO J., 12, 1781-1795. Wu,R. and Geiduschek,E.P. (1975) J. Mol. Biol., 96, 539-562. Wyatt,G.R. and Cohen,S.S. (1952) Nature, 170, 1072-1073. Zimmerman,S.B., Kornberg,S.R. and Kornberg,A. (1962) J. Biol. Chem., 237, 512-518. Zou,J., Flocco,M.M. and Mowbray,S.L. (1993) J. Mol. Biol., 233, 739-752. Received on March 2, 1994; revised on May 20, 1994
3422