Journal of Biomolecular Structure & Dynamics, ISSN 0739-1102 Volume 22, Issue Number 6, (2005) ©Adenine Press (2005)
Homology Modeling Based Solution Structure of Hoxc8-DNA Complex: Role of Context Bases Outside TAAT Stretch
Sujata Roy Srikanta Sen* Molecular Modeling Section
http://www.jbsdonline.com Abstract
Chembiotek Research International Bengal Intelligent Park Building Tower B, Block EP & GP.
The 3D structure of neither Hoxc8 nor Hoxc8-DNA complex is known. The repressor protein Hoxc8 binds to the TAAT stretch of the promoter of the osteopontin gene and modulates its expression. Over expression of the osteopontin gene is related to diseases like osteoporosis, multiple sclerosis, cancer et cetera. In this paper we have proposed a 3D structure of Hoxc8-DNA complex obtained by Homology modeling and molecular dynamics (MD) simulation in explicit water. The crystal structure (9ant.pdb) of Antennapedia homeodomain in complex with its DNA sequence was chosen as the template based on (i) high sequence identity (85% for the protein and 60% for the DNA) and (ii) the presence of the TAAT stretch in interaction with the protein. The resulting model was refined by MD simulation for 2.0ns in explicit water. This refined model was then characterized in terms of the structural and the interactional features to improve our understanding of the mechanism of Hoxc8-DNA recognition. The interaction pattern shows that the residues Ile195, Gln198, and Asn199, and the bases S2-4TAATG8 are most important for recognition suggesting the stretch TAATG as the ‘true recognition element’ in the present case. A strong and long-lived water bridge connecting Gln198 and the base of S1-C7 complementary to S2-G8 was observed. Our predicted model of Hoxc8-DNA complex provides us with features that are consistent with the available experimental data on Hoxc8 and the general features of other homeodomain-DNA complexes. The predictions based on the model are also amenable to experimental verification.
Salt lake Electronics Complex Calcutta 70009 India
Key words: Osteopontin-promoter, Repressor protein, Molecular recognition, H-bonds.
Introduction In biological systems, protein-protein and protein-nucleic acid recognition constitutes the molecular basis for many cellular processes like signal transduction, regulation of gene transcription et cetera. Approaches like functional assays and mutational studies provide useful information on the recognition process of such complexes. However a detailed understanding of the physical mechanism of such a biomolecular complex requires the knowledge of the 3D structures of the relevant complex and in most cases of Protein-DNA complexes, the 3D structure is not available. Homology modeling has proved to be useful in generating feasible 3D structures based on sequence similarity with a suitable template protein with known 3D structure (1-5). Currently homology models have also started playing important role in the virtual screening based drug discovery (6-9). Relaxation of 3D models of biomolecular complexes by Molecular Dynamics Simulation in aqueous solvent is known to be useful in improving our understanding of its solution structure and the mechanism of the molecular recognition (10-23). Recently homology modeling has been applied to model protein-DNA complexes like the transcription factor Sp1-DNA complex (24). However sequence similarity of the DNA duplex was not considered in that study. A more realistic approach to use
*Fax-0091
33 2357 0342 Email:
[email protected]
1
2 Roy and Sen
homology modeling of protein-DNA complexes would be to use the sequence similarity of not only the protein part but also the DNA part. In the present work we have followed this approach. Osteopontin is a phosphorylated glycoprotein that is expressed by the activated macrophages and is present in the extra-cellular matrix (ECM), at the sites of inflammation, ECM of mineralized tissues, and mediates cell-matrix and cell-cell interactions (25, 26). Recently, it has been established that the over expression of the osteopontin gene is related to a number of diseases like osteoporosis, multiple sclerosis, cancer et cetera (25-35). The expression of the osteopontin gene is activated by the bone morphogenic proteins (BMPs) and the signal transduction involves the smad proteins that act as the mediators for the BMP signal (36-41). Hoxc8 is a DNA binding protein that binds to the promoter region of the osteopontin gene and represses the gene transcription. Upon phosphorylation by the BMP receptors, smad1, one of the mediator proteins interacts with another mediator protein smad4 and translocates into the nucleus of the cell. Entering the nucleus smad1 interacts with the repressor protein Hoxc8 and dislodges the bound Hoxc8 from its DNA binding element resulting in the induction of gene transcription (41, 42). The repressor protein Hoxc8 is a homeodomain protein and homeodomain proteins are one of the key families of the eukaryotic DNA-binding motifs that play a central role in gene regulation (27, 43-46). The promoter region of the osteopontin gene is known to consist of several putative homeodomain binding response elements like several copies of TAAT and TATT et cetera (41). However, it has been shown experimentally that among these the Hoxc8 binds only to the response element with the sequence context AGT TAAT GACATC indicating the importance of the sequence context for the recognition process (41). In our work we have used the same DNA sequence (5′-AGTTAATGACATC-3′) as was used in experiment after removing the lone bases at the ends of the template DNA (41). However, as the 3D structure of either Hoxc8 or the Hoxc8-DNA complex is not known, no detailed information on the Hoxc8DNA interaction regarding the molecular recognition and stabilization of the complex are available. The purpose of the present work is to use homology modeling and molecular dynamics simulation techniques in order to address two major issues, (i) generating a realistic 3D structure of the Hoxc8-DNA complex and (ii) investigating the role of the bases in the context of the recognition element TAAT. In order to develop a homology-based 3D model, we have selected the crystal structure (9ant.pdb) of the Antennapedia homeodomain in complex with its DNA sequence as the most appropriate structural template based on the high degree of sequence identity between the proteins as well as the DNA parts involved (47, 48). The generated structure was subsequently relaxed by MD simulation in explicit water. We find this approach extremely useful in generating a reasonable 3D structure of DNA-protein complex in the present case. Our predicted structure shows good agreement with the available experimental data on Hoxc8 and is consistent with the general features of other known homeodomain-DNA complexes (41-50). Methods Construction of the Initial Hoxc8-DNA Complex The amino acid sequence of the wild type Hoxc8 protein (human) was obtained from the NCBI database and the HOMOLOGY module of Insight-II (Accelrys Inc.) was used to identify a suitable homology-based template crystal structure. We selected 9ant.pdb (48) as the best template for the Hoxc8-DNA complex based on the following issues. (I) It contains the template structure of a DNA-protein complex where the template protein sequence is about 85% identical with Hoxc8 sequence (see diagram 1a). (II) The template DNA base sequence is about 60% identical (see diagram 1b). (III) Most importantly the template DNA contains the
TAAT stretch that is interacting directly with the protein in the template and the amino acid residues interacting with the TAAT bases, are common to the template and Hoxc8 sequences. According to the current belief, such a tight sequence identity enables homology modeling to generate a highly realistic model (6).
3 Running Title Needed
Figure 1: a). Comparison of the aligned sequences of the template protein, Antennapedia homeodomain (9ant.pdb) and the human Hoxc8 protein. The conserved residues of the Hoxc8 protein are shown in capital letter. Variable amino acid residues are shown as shaded. b). The aligned sequences of the template DNA and the Hoxc8 binding DNA are compared. The nucleotides that are different from the template are shown with shade.
In the process of model building we first prepared the homology-based model of the Hoxc8 protein using the HOMOLOGY module of the insight-II software. This generated the atomic coordinates of the Hoxc8 protein part keeping the atomic coordinates of the residues common to the template and the target intact. The atomic coordinates of the template DNA was then added to generate a Hoxc8-DNA complex where the protein part corresponds to the target sequence but the DNA part remains identical to the template DNA. The lone bases at the ends of the DNA in the template were removed. In the next step, we have removed all the bases that are different from our target base sequence keeping the backbone part intact and the atomic coordinates of the corresponding target bases were built by the BUILD facility in CHARMM (51) using the local DNA backbone as the template. The BUILD facility utilizes the topology and relevant parameters to generate the missing coordinates of atoms in the topology of a molecule (51). Thus a rational-based starting model of the target Hoxc8-DNA complex was obtained. It may be mentioned here that the HOMOLOGY module generates only the 56 residues long homeodomain part (Arg153 to Asn208) of Hoxc8. This initial structure was then energy minimized by 5000 steepest descent steps by CHARMM to eliminate any covalent distortion or steric conflicts present in the modeled structure. Solvation and Energy Minimization of Hoxc8-DNA Complex Eighteen Na+ counter-ions were added to the energy-minimized modeled complex for the electro-neutrality of the system. The counter-ions were placed following two different protocols. For the phosphate groups of the DNA, which are accessible to solvent, sodium ions were placed at a position of 6.0Å from the phosphorous atom along the line starting from the phosphorous atom and bisecting the line joining the phosphate oxygen atoms. By this way we could place only 8 Na+ ions. The system was then solvated by placing it at the center of a pre-equilibrated TIP3P water sphere of radius 25Å (52). As the largest radius of the whole complex excepting the end base pairs of the DNA is 19.8Å and it is 17.0Å for only the pro-
4 Roy and Sen
tein, the water sphere of 25Å effectively leaves a solvent layer of thickness 5Å or more. Any water molecule having its oxygen atom within a distance of 2.8Å from any non-H atom of the complex was deleted. Each of the other 10 sodium ions was placed by replacing the water molecule whose oxygen atom has the highest electrostatic energy. No sodium ion was placed closer than 5Å to each other and all the counterions were kept unconstrained during energy minimization and MD simulation. The effect of the bulk water that could be present outside the solvent sphere was taken into account by the mean-field effects (53). It not only minimized the finite boundary effects but also prevented the evaporation of water molecules from the surface of the solvent sphere. The end base pairs were constrained harmonically with a force const of 10 kcal/Å2/mol to their H-bonded base-paired configurations in order to prevent unrealistic end effects like base-pair opening, buckling et cetera. On the same ground we have constrained the positions of the alpha carbon atoms of the end residues relative to the closest phosphorous atom of the DNA in the same way. This partially mimics the effects of the covalent continuity of both ends as in the real case. The resulting system was then energy minimized again by 5000 steepest descent steps keeping the Hoxc8-DNA complex fixed to allow only the water molecules to reorient themselves for eliminating any steric conflict with the complex. As Ewald method (54, 55) cannot be employed in spherical systems we have used standard spherical cut-off methods with a cut-off values 12Å along with the force shift method in handling the electrostatics where the interaction energy was smoothly shifted to zero at a cutoff distance of 11.0Å (56, 57). This cut-off combination is known to produce stable and reliable trajectories (5859) and thus appears to be suitable for relaxing the structure of a model complex. A dielectric constant 1.0 was used for the electrostatic interactions during energy minimization and MD simulation. A further energy minimization for 5000 SD steps was performed without fixing any atom of the complex and the resulting structure was then relaxed by MD simulation at 300K. Protocol for MD Simulation of Hoxc8-DNA Complex in Aqueous Solution MD simulation was performed using leapfrog algorithm (60) with an integration time step of 2.0fs by employing the software CHARMM version 28 along with the combined nucleic acid and protein parameter set version 27 (61). All bond lengths involving hydrogen were kept fixed using SHAKE algorithm (62, 63). The pair list of non-bonded interactions was updated every 20 steps. During the MD simulation we first thermalized the system by assigning velocities drawn from a Gaussian distribution at 100K to the atoms of the system and then gradually heated the system over a time period of 10.0ps by assigning velocities in steps at higher temperatures until it reached the target temperature 300K. The system was subsequently equilibrated at 300K over the next 10.0ps by assigning velocities from a Gaussian distribution of velocities at 300K. The simulation was then continued and atomic velocities were rescaled only if the average temperature was outside the window of ±5K to maintain the temperature around 300K. The simulation was continued for a total of 2.0ns and the coordinate frames were saved at intervals of 0.4ps for analysis. Results and Discussions Structural Relaxation of the Hoxc8-DNA Complex Structural relaxation of the DNA-Hoxc8 complex was monitored by analyzing the time evolution of the RMSD of the frames with respect to initial structure following the relation. [1] ri(t) and ri(t0) represents the position coordinates of the ith atom of the molecule at times t and t0 (=0) the beginning of the trajectory respectively. Mintrans,rot indicates
that the coordinate set of each frame from the trajectory is translated and rotated to minimize the RMSD with respect to initial co-ordinates set. In the present work all the non-hydrogen atoms in the entire complex was considered for superposition and were also used for computing the RMSD for the complex. In the individual cases of the protein and the DNA, the whole complex was superposed in the same way but the non-H atoms of only Hoxc8 and the DNA were considered separately for computing the RMSD values in the respective cases.
5 Running Title Needed
Figure 2a demonstrates that during MD simulation the whole complex gradually relaxed from its initial model structure with increasing RMSD and finally the RMSD remained stable around an average value of 2.0Å over a considerable period of the later part of the trajectory. This indicates that the complex has reached a stable average structure. Moreover, an average RMSD of only 2.0Å implies that the relaxed structure remained reasonably close to the template crystal structure. In addition, Figure 2b shows that not only the complex as a whole but also the molecular components (Hoxc8 and DNA) have individually reached stable equilibrium structures. Average Structure and Its Characterization In order to ensure proper relaxation of the complex we have used only the last 500ps of the trajectory in computing the average structure of the complex. This average structure was then energy minimized by 1000 SD steps with distance dependent dielectric constant in order to remove the distortion induced into the structure due to averaging. It may be mentioned that extensive energy minimization was not done to avoid any artifact that might arise in the process. This energy-minimized average structure of the complex is considered as its representative 3D structure in aqueous environment and is shown in Figure 3. The structural features of the model complex are described below.
Figure 2: a) Time evolution of RMSD of the coordinates of the complex with respect to its initial structure (excluding the end base pairs of DNA and the end residues of Hoxc8). b). Time evolution of the individual molecular components of the complex, DNA (dotted line) and protein (solid line). In computing the RMSD values superposition of the entire complex was used and data at an interval of 2.0ps was used to increase the clarity of the plot.
Figure 3: The stereo view of the energy-minimized structure of Hoxc8-DNA complex (without H-atoms) averaged over the last 500ps. The rectangular yellow ribbon represents the S1 strand of the DNA and the other strand S2 containing the TAAT stretch is shown by saffron ribbon where the TAAT stretch is shown in violet. Hoxc8 protein is shown as solid oval sky blue ribbon where ‘helix-3’ is shown in red.
Hoxc8 Protein Part of the Complex: The modeled Hoxc8 consists of three distinct helices as in the other known homeodomain proteins including the present template. The first helix (helix-1) is extended from Tyr156-Asn171, the second helix (helix-2) consists of the residues Arg176-Leu186 and the residues Arg191Asn208 constitute the third helix (helix-3). The third helix was in contact with the DNA major groove (Fig. 3).\ It is quite clear from the sequence comparison that both the modeled protein and the DNA have large homology with the respective parts of the template complex. Thus, comparison of the structural differences with respect to the template is of interest, as it should give us some idea about the changes induced by the presence of the altered residues and the physical basis of the differences observed. The residue-wise RMSD of the protein was computed by considering the initial energy-
6 Roy and Sen
Figure 4: a) The residue-wise RMSD of Hoxc8 in the energy-minimized trajectory averaged structure with respect to the template excluding the non-conserved residues. The symbols ‘’ and ‘’ are used for the backbone and the side chain respectively. b) The average backbone torsion angles [ε () and ζ ()] of strand S2 of the DNA in the energy-minimized trajectory averaged Hoxc8-DNA complex have been compared to those [ε (∆) and ζ ()] of the energy-minimized template crystal structure.
minimized model coordinates (before relaxation by MD simulation) as the representative structure of the template complex. However, this initial coordinates contain the residues of Hoxc8 that were not present in the template and thus those residues were excluded to make the comparison meaningful. We have superimposed the initial coordinate set to the average structure by selecting only the conserved residues and nucleotides and then have computed separately the RMSD of only the conserved residues of Hoxc8. The RMSD of the individual residues with reference to the template gives us the information about the locations where significant conformational changes have been occurred. It is quite apparent in Figure 4a that Thr155, Met202, Gln207 are associated with significant positional changes compared to the template crystal structure. Detailed analysis shows that the large RMSD values of the side chain (7.2Å) and the backbone part (4.3Å) of the residue Thr155 are due to the fact that this residue is exposed to the solvent and it is conformationally rather free not being a part of any secondary structure. In the case of Met202 the high RMSD (5.3Å) has been identified to be due to a large torsional change (from 100.5° to -87.5°) of the side chain, occurred about the bond (CG-SD). Moreover, the methyl group of its side chain has significantly less solvent exposure in the relaxed structure compared to the reference model suggesting that this change was probably caused by the solvophobic effect due to the explicit presence of water. On the other hand for Glu207 the large RMSD (5.8Å) took place due to a torsional change (177.3° to 68.5°) about the bond CB-CG. This also resulted in enhanced solvent exposure. Glu207 being a charged residue this conformational change seems to be induced by hydrophilic interaction. DNA Part of the Complex: DNA conformations are characterized by the standard backbone torsional angles α, β, γ, δ, ε, ζ, χ (64). We have calculated the torsion angles for each nucleotide for both the DNA strands of the representative structure (Fig. 3) to characterize its conformation. Calculation shows that for the DNA strand S1, the average values for each torsion angles are reasonably similar for all the nucleotides over the entire strand (data not shown) suggesting a regular helical form adopted by S1. In the case of the other strand (S2) the torsion angles α, γ, δ and χ are found reasonably stable for the nucleotides over the strand. However, significantly different values occurred for the angle ζ (= -170°) and ε (=138°) for the nucleotide S2-T7 while, the other nucleotides have values in the range (-88° to -114°) and (90° to 112°) respectively (Fig. 4b). It may be pointed out that S2-T7 is a part of the recognition site TAAT. Thus, these considerable differences in the angle ζ and ε for the nucleotide S2:T7 compared to those of the other nucleotides in the same strand seem to correspond to the bend introduced in the S2 strand due to the insertion of the third helix of Hoxc8 in the major groove of the DNA. Comparison shows that in the template crystal structure the odd value for ζ was associated to S2-T4, which is different than in the case of Hoxc8-DNA complex (Fig. 4b). Moreover, in Hoxc8-DNA model, the ε angle associated to S2-T7 is also odd. Thus, the positions of the kink are different in the template and Hoxc8-DNA model indicating that the effect may be base-sequence dependent. Further comparison shows that the positional changes of the conserved bases S24TAATG8 that are important for interaction with Hoxc8 are found to be rather small
as judged by base-wise RMSDs, which are limited in the range (1.0Å to 1.7Å). On the other hand, relatively larger positional changes within the range 1.8Å- 2.4Å were found in the respective backbone regions.
Figure 5: Protein DNA contact map. The symbols ‘’ and ‘’ represent contacts separately between Hoxc8 and the strands S1 and S2 of the DNA respectively.
Contact Map at the Hoxc8 Protein-DNA Interface: Contact map of the Hoxc8DNA interface was generated on the basis of an inter-molecular distance cutoff of 3.0Å at the atomic level in the energy-minimized trajectory averaged structure of Hoxc8-DNA complex modeled here (Fig. 5). Due to the surface complementarities between the helix-3 and the DNA major groove, there are significant number of contacts between the atoms of both the strands of the DNA and the protein at the Hoxc8-DNA interface. The other residues in contact with the DNA belong to
the helix-1 and helix-2 and it is found that in contrast to helix-3, helix-1 is in contact with S2 strand only while helix-2 is in contact with strand S1 only. For the DNA the contact with Hoxc8 is limited with the nucleotides in the ranges S1-T4 to S1-C7 and S2-T4 to S2-G8.
7 Running Title Needed
Characterization of the DNA-protein Interaction Pattern of the Modeled DNAHoxc8 Complex Interaction energies between DNA and Hoxc8 protein were computed through the non-bonded interactions between them by CHARMM estimating the electrostatic and the van der Waal interactions separately in order to understand their physical nature. The electrostatic interactions were computed using distance dependent dielectric constant as the trajectory was generated with explicit water. In this report we have considered the interaction between Hoxc8 and the DNA bases as the specific interaction and that between Hoxc8 and DNA backbone as the non-specific interaction. Interaction of Hoxc8 with DNA Bases: The interaction pattern in Figure 6a clearly indicates that there are only a few significant favorable interactions between the S1 strand of the DNA and Hoxc8. S1-G5 (-1.81 kcal/mol) and S1-A8 (-0.94 kcal/mol) are interacting favorably while S1-T4 (2.44 kcal/mol) and S1-G7 (0.87 kcal/mol) are involved in unfavorable interactions with Hoxc8. Further analysis shows that S1-G5 is mainly interacting favorably with Lys194 and S1-T4 is interacting with the same residue Lys194 through unfavorable electrostatic interaction.
Figure 6: The patterns of the interaction (specific) between Hoxc8 and the bases of the DNA strands (a) S1 and (b) S2 of the DNA. The patterns of the interaction (non-specific) between Hoxc8 and the backbone of DNA strands (c) S1 and (d) S2 are also presented.
The bases in the stretch 4TAATG8 consisting of the recognition element TAAT on the S2 strand of the DNA are found to be interacting favorably with the protein (Fig. 6b). S2-T4 (-1.02 kcal/mol) was observed to be interacting weakly through several rather weaker interactions with nearby residues. Similarly, S2-A5 is interacting weakly with Ile195, Trp196, Asn199, and Met202 resulting in a total of (-1.36 kcal/mol). S2:A6 is interacting with the protein most strongly (-12.19 kcal/mol) forming two hydrogen bonds with Asn199 (see the H-bond list Table I). S2-T7 is interacting favorably (-3.98 kcal/mol) with Ile195 and Gln198 mostly through van der Waal interactions (Fig. 7a,b). It may be pointed out that the base S2-G8 interacts significantly (-2.11 kcal/mol) with Hoxc8 even though it is outside the recognition region towards 3′ end of the strand S2. The detail of the interaction between S2-G8 and the protein at the atomic level is demonstrated in Figure 7c. It is observed that there is a weak H-bond between S2G8:O6–Gln198:NE2—HE22 with a distance DAH=3.3Å and an angle θDHA=126°.
8 Roy and Sen
Figure 7: Stereo views highlighting the atomic details of (a) the van der Waal interaction between S2:T7 and Gln198, (b) the van der Waal interaction between S2:T7 and Ile195 and (c) the weak H-bonding between Gln198: NE2 and S2-G8:O6 with DAH=3.3Å and θDHA=126°. The DNA backbones are shown as pink rectangular ribbons and Hoxc8 is represented as oval cyan ribbon. The side chains of Ile195, Gln198 and the base S2:T7 are represented by CPK model in (a) and (b) while in (c) ball and stick model has been used. Other side chains and bases are removed for clarity. Carbon, nitrogen, oxygen and hydrogen atoms are shown in black, blue, red and saffron color respectively.
Thus the sequence context at the 3′ side of TAAT appears to play important role in the recognition process. The net specific interaction of strand S2 with the protein (-20.27 kcal/mol) is dominated over that between S1 and Hoxc8 (-0.03 kcal/mol). It should be pointed out that the interactions were computed ignoring the residue Arg153. Arg153 being the first residue of the modeled part of the protein is positionally less restricted due to the end effects. Due to this enhanced freedom, Arg153 side chain is found to interact with several bases that appear as an artifact. However, in a recent work (65), it has been shown that the lack of the N-terminal part does not affect the DNA binding specificity of the homeodomain. Interaction of Hoxc8 with DNA Backbone: In order to identify which region of the DNA is important for the non-specific binding to Hoxc8, the interaction energies of the DNA backbone of both the strands with the protein have been computed. The loop region (Tyr173-Thr175) of the protein and part of the helix2 (Arg176-Arg179) is interacting non-specifically only with the S1 strand. The helix-3 is interacting with both the strands S1 and S2 of the DNA. S1 is interacting non-specifically with the stretch 3ATGTC7, which is outside the complimentary region of the recognition element S2-4TAAT7 (Fig. 6c). The residues Tyr173, Arg176, Arg179, Lys194, Arg201, and Lys205 are observed to be interacting significantly with the S1 strand. All the interactions are electrostatic in origin. The interactions of Hoxc8 and the backbones of S1-A3 (-12.13 kcal/mol), S1-T4 (70.23 kcal/mol), S1-G5 (-71.25 kcal/mol), S1-T6 (-39.44 kcal/mol), and S1-C7 (-31.49 kcal/mol) are substantial. The backbone of S1-T4 interacts mostly with Arg176 and Arg179 while that of S1-G5 interacts with Leu174 and Arg201. The backbones of S1-
T6 and S1-C7 interact only Arg201 and Lys205 respectively. Several H-bonds have made each of the above-mentioned interactions strong (see Table I). In the case of S2 the backbone of the stretch 3TTAATG8 is found to be interacting with the protein (Fig. 6d) where S2-T4 (-36.30 kcal/mol), S2-A5 (-23.04 kcal/mol), S2-A6 (-51.36 kcal/mol), S2-T7 (-41.30 kcal/mol), are strongly interacting with Hoxc8 through a number of H-bonds involving the residues Tyr156, Gln154, and Arg191 (see Table I). The total non-specific interaction of strand S1 with the protein (-266.21 kcal/mol) is dominated over that of S2 with the protein (-168.66 kcal/mol).
9 Running Title Needed
Residue-wise Interaction Pattern of Hoxc8 with the DNA: The interaction pattern against the Hoxc8 residues as represented in Figure 8 shows that a significant number of residues are interacting favorably with DNA with considerable energies. Most strongly interacting residues are charged residues like Arg and Lys and obviously the interaction types are electrostatic in nature. These strongly interacting residues come from the three helices helix-1 (Tyr156), helix-2 (Arg176, Arg179), and helix-3 (Arg191, Asn199, Arg201, Lys203, Lys205 and Lys206). Hydrogen Bonds Between DNA and Hoxc8: Direct H-bonding interactions between Hoxc8 and DNA were analyzed by CHARMM using the following criteria. A H-bond acceptor and donor pair is considered to form a hydrogen bond if the acceptor-hydrogen distance (dAH) ≤ 2.4Å and the donor-hydrogen-acceptor angle (θDHA) ≥ 135°. The geometries of the H-bonds are summarized in Table I. Table I indicates that there are several strong H-bonds involving the DNA bases and the protein, which are important for the specificity of the recognition mechanism. However, most of the strongest hydrogen bonds were found to involve the DNA backbone and the protein, and thus contribute to the overall stability of the complex.
Figure 8: The pattern of the interactions between DNA and Hoxc8 against the residue number of Hoxc8. The strong interactions are with the phosphate backbone of the DNA duplex.
Comparison of the DNA-protein Interaction Patterns for the Template Complex and the Modeled DNA-Hoxc8 Complex It is interesting to compare the interaction pattern of our model complex with that of the template from which the protein model has been derived. For this purpose, even though the results from a parallel trajectory for 9ant.pdb would be most desirable, we have considered the energy-minimized crystal structure of the template and the modeled Hoxc8-DNA complex (Fig. 3) to make the comparison meaningful. Figure 9 indicates that there are striking similarities in the DNA-protein interaction patterns for these two complexes, such as the residues Ile195 and Asn199 are most important for DNA binding in both the cases. The bases S2-A5A6-G8 involved in interaction with respective protein are also common in these two cases. The crystal structure of Antennapedia homeodomain-DNA complex indicates a pair of hydrogen bonds between Asn199 and S2-A6 and the same are also present in our model of Hoxc8-DNA complex. Clearly, these observed similarities are the consequences of the high sequence homology both for the proteins and the DNAs in these two cases. The two hydrogen bonds between Asn199 and S2-A6, are known to be conserved throughout the homeodoamin family (43-50). Interestingly, Figure 9 also shows significant dissimilarities. In contrast to the template crystal structure where Gln198 was reported not to be involved in direct interaction with the DNA
Figure 9: Comparison of the specific interaction patterns of the S2 strand of DNA and the protein between the energy-minimized template crystal structure (dotted line) and the energy-minimized trajectory averaged Hoxc8-DNA complex (solid line). The upper sequence represents the template DNA and the lower sequence corresponds to the DNA in the present Hoxc8-DNA model complex. Data on S1 strand is not presented, as it does not participate significantly in interaction with Hoxc8.
Figure 10: Stereo view of the water bridge linking Gln198 and the base S1-C7. The oxygen atom of the bridging water is H-bonded to S1-C7: H42 with DAH=1.8Å and θDHA=168° and on the other side it is Hbonded to Gln198: HE21 with DAH=1.8 and θDHA=171°. The DNA backbones are shown as pink ribbon and Hoxc8 protein part is represented as cyan ribbon. The partners of the water bridge Gln198 and S1-C7 are shown by ball and stick model. Carbon, nitrogen, oxygen and hydrogen atoms are shown in black, blue, red and saffron color respectively.
10 Roy and Sen
(48), in Hoxc8-complex Gln198 is significantly interacting with the bases S2-T7G8 of the DNA through van der Waal interactions and a weak H-bond. It may further be pointed out that the involvement of Gln198 in interaction with DNA is quite common in homeodomains (43-46). Moreover, comparison of the protein DNA interaction patterns against the bases indicates that the bases S2-T4 and S2-T7 were not involved in interaction with the protein in the template while they are significantly interacting with Hoxc8 (Met202 and Gln198 respectively). As these interactions only involve the conserved residues, none of these observed effects are the direct consequences of the differences in the non-conserved residues in the proteins. Thus, these effects seem to be the indirect consequences of the structural rearrangements induced by the differences in the non-conserved residues in the proteins. In this context, it may be mentioned that in a recent publication (66) on the MD simulation of the wild type antennapedia-DNA complex (wt/BS2), a strong H-bond between Gln50 and S2-G9 has been reported while we did not find any such H-bond between the equivalent partners Gln198 and S2-A9 in the case of Hoxc8. Moreover, unlike their observation in antennapedia homeodomain, we have not found any strong H-bond involving Lys194 (Lys46 in antennapedia case). In Hoxc8-DNA complex we found a van der Waal contact between Gln198 and S2-T7 while in their antennapedia case similar contact was observed between Gln50 and S2-G9. Extension of the Hoxc8 Binding Site Beyond the TAAT Nucleotide Region As mentioned earlier, the 3′ flanking base S2-G8 of the S2-4TAAT7 stretch is directly interacting with Gln198. Additionally the analysis of the MD trajectory shows that there is a strong and long-lived water bridge linking S1-C7 and Gln198 (data not shown as here we have included only the static features). The details of the water bridge are pictorially shown in Figure 10. It is observed that the oxygen atom of the water molecule involved in the bridge is H-bonded to S1-C7:N4—H42 with DAH=1.8Å and θDHA=168° and on the other side it is H-bonded to Gln198: NE2— HE21 with DAH=1.8Å and θDHA=171°. The H-bonding geometries clearly indicate the strength of both the H-bonds. It is interesting to note that the base S1-C7 is the complementary partner of S2-G8. Thus, both the base S2-G8 and its complementary counterpart S1-C7 participate individually and specifically in the interaction with the residue Gln198 of Hoxc8. These clearly indicate the importance of S2-G8 and its complementary base S1-C7 in the recognition process. Thus it appears that the actual Hoxc8 binding is not limited to the base sequence stretch S2: 4TAAT7 but is extended to S2-4TAATG8. Hence we propose that in the case of Hoxc8-DNA complex S2-4TAATG8 should be considered as the true recognition element. It is also known that the promoter region of the osteopontin gene contains several copies of TAAT stretch but it has been shown experimentally that Hoxc8 only binds to the response element TAAT at a specific position on the DNA indicating that for proper recognition the bases around the response element TAAT play important roles (41). Thus, our finding is consistent with the experimental observation. Concluding Remarks The present article reports the findings of an investigation on homology modeling of the Hoxc8-DNA complex (human) followed by a MD simulation based structural refinement. A homology-based realistic 3D model structure of the wild type Hoxc8-DNA complex has been presented. Use of sequence identity of the part of the DNA interacting with the protein in selecting the template structure greatly facilitated generating a reasonable 3D-structure of the complex. The low value (2.0Å) of the average RMSD of the relaxed complex over the stable part of the MD trajectory indicates that the model is stable and is globally close to the template crystal structure. In spite of that, comparison with the template crys-
tal structure indicates that there are substantial differences at local level, for example Gln198 plays an important role in Hoxc8-DNA complex but does not interact significantly with the DNA in the template. Moreover, the base S2-T7 does not participate in specific interaction with the protein in the template while in the case of Hoxc8 it interacts significantly with the protein mainly through favorable van der Waal interactions with Gln198 of Hoxc8. As no non-conserved residues are directly involved in these observed differences, it seems that these are the indirect consequences of the structural rearrangements induced by the non-conserved residues in the protein and the nucleotides in the DNA. The analysis of the interactions at the atomic level suggested that the residues Ile195, Gln198, and Asn199 are most important for recognition while for the DNA, the bases in the stretch TAATG were found to be responsible for the specific binding. The interaction pattern including the H-bonding data is also found to be consistent with the data available for Hoxc8 and other homeodomain proteins in general indicating the realistic nature of the present model. The base S2-G8 in the 3′ end of the TAAT recognition stretch is found to be interacting directly with the protein through a weak H-bond with the residue Gln198 and most importantly S1-C7, that is the complementary partner of S2-G8, interacts specifically with Gln198 through a strong water bridge. Thus the true recognition element appears to be 4TAATG8 in place of TAAT. References and Footnotes 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25.
26. 27. 28. 29. 30.
C. S. Tung. J. Biomol Struct Dyn. 17, 347-354 (1999). A. Tramontano and V. Morea. Proteins: Struct, Func and Genet. 5, 352-368 (2003). E. V. Koudan, J. M. Bujnicki, and E. S. Gromova. Biophys. J. 22, 339-346 (2004). C. S. Tung. Biophys J. 87, 2714-2722 (2004). C. Angsuthanasombat, P. Uawithya, S. Leetachewa, W. Pornwiroon, P. Ounjai, T. Kerdcharoen, G. R. Katzenmeier, and S. Panyim S. J Biochem Mol Biol. 31, 304-13 (2004). A. Hillisch, L. F. Pineda, and R. Hilgenfeld. Drug. Disc. Today. 9, 659-669 (2004). Modern Methods of Drug Discovery, Ed. A. Hillisch and R. Hilgenfeld. Birkhäuser Verlag (2003). H. Wieman, T. Kristin, E. Anderssen, and F. Drablos. Mini. Rev in Med Chem. 4, 793-804 (2004). C. Oshiro, E. K. Bradley, J. Eksterowicz, E. Evensen, M. L. Lamb, J. K. Lanctot, S. Putta, R. Stanton, and P. D. J. Grootenhuis. J. Med. Chem. 47, 764-767 (2004). J. A. McCammon and S. C. Harvey. Molecular Dynamics of Proteins and Nucleic Acids. New York: Cambridge University Press (1987). T. Fox and P. A. Kollman. Proteins. 25, 315-334 (1996). S. Sen and L. Nilsson. J. Am. Chem. Soc. 121, 619-631 (1998). P. Auffinger and W. Westhof. Curr Opin Struct Biol. 8, 227-236 (1998). S. Sen and L. Nilsson. Biophys J. 77, 1782-1800 (1999). S. Sen and L. Nilsson. J. Am. Chem. Soc. 123, 7414-7422 (2001). J. Norberg and L. Nilsson. Q Rev Biophys. 36, 257-306 (2003). A. Hamza, M. H. Sarma and R. H. Sarma. J Biomol Struct Dyn. 20, 751-758 (2003) A. Madhumalar and M. Bansal. Biophys J. 85, 1805-1816 (2003). J. Kua, Y Zhang, A. C Eslami, J. R. Butler, and J. A. McCammon. Protein Sci. 12, 26752684 (2003) H. Fan and A. E. Mark. Protein Sci. 13, 211-220 (2004) H. Yu, X. Daura, and W. F. Van Gunsteren. Proteins. 54, 116-127 (2004). T. E. Cheatham, III. Curr. Opin. Struct. Biol. 14, 360-367 (2004) A. B. Guliaev, B. Hang, and B. Singer. Nucleic Acids Res. 32, 2844-2852 (2004). E. Macro, R. Garcia-Nieto, and F. Gago. J. Mol. Biol. 328, 9-32 (2003) D. Chabas, S. E. Baranzini, D. Mitchell, C. C. Bernard, S. R. Rittling, D. T. Denhardt, R. A. SobeL, C. Lock, M. Karpuj, R. Pedotti. R. Heller, R. Jorge, J. R. Oksenberg, and L. Steinman. Science. 294, 1731-1735 (2001). E. M. Gravallese. J. Clin. Invest. 112, 147–149 (2003). Y. G. Yueh, D. P. Gargner, and C. Kappen. Proc. Natl. Acad. Sci. USA. 95, 9956-9961 (1998). H. Yoshitake, S. R. Rittling, D. T. Denhardt, and M. Noda. Proc Natl Acad Sci USA. 96, 8156-8160 (1999). G. F. Weber. Biochim. Biophys. Acta. 1552, 61-85 (2001) S. A. Khan, C. A. Lopez-Chua, J. Zhang, L. W. Fisher, E. S. Sorensen, and D. T. Denhardt. J. Cell. Biochem. 85, 728-736 (2002)
11 Running Title Needed
12 Roy and Sen
31. K. Yumoto, M. Ishijima, S. R. Rittling, K. Tsuji, Y. Tsuchiya, S. Kon, A. Nifuji, T. Uede, D. T. Denhardt, and M. Noda. Proc. Natl. Acad. Sci. USA 99, 4556-4561 (2002) 32. S. Ohshima, H. Kobayashi, N. Yamaguchi, K. Nishioka, M. Umeshita-Sasai, T. Mima, S. Nomura, S. Kon, M. Inobe, T. Uede and Y. Saeki. Arthritis Rheum. 46, 1094-1101 (2002). 33. N. Yamamoto, F. Sakai, S. Kon, J. Morimoto, C. Kimura, H. Yamazaki, I. Okajaki, N. Seki, T. Fujii, and T, Uede. J. Clin. Invest. 112, 181-188 (2003). 34. M. A, Chellaiah, N. Kizer, R. Biswas, U. Alvarez, J. Stauss-Schoenberger, L. Rifas, S. R. Rittling, D. T. Denhart, and K. A, Hruska. Mol. Biol. Cell. 14, 173-189 (2003). 35. Q. H Ye, L. X. Qin, M. Forgues, P. He, J. W. Kim, A. C. Peng, R. Simon, Y. Li., A. I. Robles, Y. Chen, Z. C. Ma, Z. Q. Wu, S. L. Ye, Y. K. Liu, Z. Y. Tang, and X. W. Wang. Nat Med. 9, 416-23 (2003) 36. E. A. Wang, V. Rosen, J. S. D’Alessandro, M. Bauduy, P. Cordes, T. Harada, D. I. Israel, R. M. Hewick, K. M. Kerns, P. LaPan, et al. Proc. Natl. Acad. Sci. USA 87, 2220-2224 (1990) 37. P. H. Francis, M. K. Richardson, P. M. Brickell, and C. Tickle. Development 120, 209218 (1994). 38. P. A. Hoodless, T. Haerry, S. Abdollah, M. Stapleton, M. B. O’Connor. L. Attisano, and J. L. Wrana. Cell. 85, 489-500 (1996). 39. C. H. Heldin, K. Miyazono, and D. P. Ten. Nature 390, 465-471 (1997). 40. R. Nishimura, Y. Kato, D. Chen, S. E. Harris, G. R. Mundy, and T. Yoneda. J. Biol. Chem. 273, 1872-1879 (1998). 41. X. Shi, X. Yang, D. Che., Z. Chan, and X. Cao. J. Biol. Chem. 274, 13711-13717 (1999). 42. X. Yang, X. Ji, X. Shi, and X. Cao. J. Biol. Chem. 275, 1065-1072 (2000) 43. T. B. Kornberg. J. Biol. Chem. 268, 26813-26816 (1993). 44. W. J. Gehring, M. Affolter, and T. Burglin. Ann. Rev. Biochem. 63, 487-526 (1994). 45. E. Fraenkel, M. A. Rould, K. A. Chambers, and C. O. Pabo. J. Mol. Biol. 284, 351-361 (1998). 46. J. M. Passner, H. D. Ryoo, L. Shen, R. S. Mann, and A. K. Agarwal. Nature 397, 649651 (1999). 47. M. Billeter, Y. O. Qian, G. Otting, M. Müller, W. Gehring, and K. Wüthrich. J. Mol. Biol. 234, 1084-1097 (1993). 48. E. Fraenkel and C. O. Pabo. Nature Struct. Biol. 5, 692-697 (1998). 49. A. F Schier and W. J. Gehring. Nature 356, 804-807 (1992) 50. Y. U. Qian, G. Otting, M. Billeter, M. Müller, W. Gehring, and K. Wüthrich. J. Mol. Biol. 234, 1070-1083 (1993). 51. B. R. Brooks, R. E. Bruccoleri, B. D. Olafson, D. J. States, S. Swaminathan, and M. Karplus. J. Comp. Chem. 4, 187–217 (1983). 52. W. L. Jorgensen, J. Chandrasekhar, J. D. Madura, R. W. Impey, and M. Klein. J. Chem. Phys. 7, 926-935.(1983). 53. C. L. Brooks. III and M. Karplus. J. Chem. Phys. 79, 6312-6325 (1983). 54. T. A. Daren, D. M. York and L. G. Pedersen. J. Chem. Phys. 98, 10089-10092 (1993). 55. U. Essman, L. Perela, M. L. Berkowitz, T. Darden, H. Lee, and L. G. Pedersen J. Chem. Phys. 103, 8577-8593 (1995). 56. P. J. Steinbach and B. R. Brooks. J. Comp. Chem. 15, 667-683 (1994). 57. P. Auffinger, and D. L. Beveridge. Chem. Phys. Lett. 234, 413 (1995). 58. T. E. Cheatham, J. L. Miller, T. Fox, T. Darden, and P. E. Kollman. J. Am. Chem. Soc. 117, 4193-4194 (1995). 59. J. Norberg and L. Nilsson. Biophys. J. 79, 1537-1553 (2000). 60. Hockney, R. W. Comp. Phys. 9, 135-211 (1970). 61. N. Foloppe and Jr. A.D. Mackerell. J. Comp. Chem. 56, 86-104 (2000). 62. W. F. Van Gunsteren and H. J. C. Berendsen. Mol. Phys. 34, 1311-1327 (1977). 63. J. P Ryckaert, G. H. Ciccotti, and H. J. C. Berendsen. J. Comput. Phys. 23, 327-341 (1997). 64. W. Saenger. Principles of Nucleic Acid Structure. New York: Spriger Verlag (1988). 65. J. K. Montclare and A. Schepartz. J. Am. Chem. Soc. 125, 3416-3417 (2003). 66. A. Gutmanas and M. Billeter. Proteins: Struct, Func and Bioinfo. 57, 772-782 (2004).
Date Received: October 18, 2004
Communicated by the Editor Krystyna Zakrzewska