Feb 12, 1994 - Abstraet--A procedure for the testing of Electron Microscope (EM) mapping data for DNA molecules with sit
Micron, Vol. 25, No. 5, pp. 439-446, 1994
Pergamon
Copyright © 1994 Elsevier Science Ltd Printed in Great Britain. All rights reserved 0968-4328/94 $7.00+ 0.00
0968-4328(94)00033--6
Testing the Quality of Electron Microscope Mapping Data for DNA Molecules with Sequence-specific Ligands ALEXEY A. PODTELEZHNIKOV, ALEXEY V. KURAKIN, ALEXANDER V. VOLOGODSKII and DMITRY I. CHERNY* Institute of Molecular Genetics Russian Academy of Sciences, Kurchatov's Sq., 123182 Moscow, Russia (Received 12 February 1994; Revised 11 July 1994)
Abstraet--A procedure for the testing of Electron Microscope (EM) mapping data for DNA molecules with site-specific bound ligands is suggested. The difficulty of distinguishing DNA molecule ends on electron micrographs indicates that their true orientations are not known. This in turn presents problems in obtaining correct maps relating to their alignment, and complicates checking the maps' validity. For these reasons a computer simulation of the EM study of double-stranded DNA molecules with site-specific bound ligands was carried out. The knowledge of the true orientations of the simulated DNA molecules allowed us to examine their final orientations after alignment. We used the number of improper-oriented molecules as the quantitative measure of the map quality. Detailed investigation based on this parameter permitted us to invent the criterion for the map validity, and to suggest the procedure for the testing of alignment of real DNA molecules. This procedure implies multiple randomization of initial orientations of the DNA molecules and minute analysis of the final maps. Most of the molecular, statistical and experimental parameters inherent to EM investigation of site-specific binding, such as the number of specific binding sites (N), the mean number of bound ligands (A), the length of the DNA molecules (L), the specific/non-specific ratio of binding (K), together with the standard deviation of DNA molecule lengths (HL) were tested for their influence upon the quality of EM mapping data. An empirical equation for the ultimate values of these parameters has been found, allowing us to predict the success of EM mapping.
Key words: Electron microscopy, DNA, DNA-protein interaction, DNA-ligand interaction, mapping, alignment.
INTRODUCTION Electron microscopy has proved to be an extremely valuable method for studying site-specific interactions of proteins and synthetic oligonucleotides with double stranded DNA (Giacomoni et al., 1977; Kadesh et al., 1980; Cherny and Alexandrov, 1982; Kramer et al., 1987; Theveny et al., 1987a; Mignotte et al., 1988; Johannssen, 1988; Kurakin et al., 1991; Le Cam et al., 1991; Cherny et al., 1993a,b). EM allows us to throw light on the process of the specific complex formation and to determine the positions of the specific binding sites (Vollenweider and Szybalskii, 1978; Cherny and Alexandrov, 1982; Mignotte et al., 1988; Kramer et al., 1987; Le Cam et al., 1991), affinities of the ligand to the specific sites over a range of two orders of magnitude (Williams and Chamberlin, 1977; Kadesh et al., 1980; Cherny and Alexandrov, 1982). In addition, the determination of the absolute values of the association and dissociation rate constants (Williams and Chamberlin, 1977; Giacomoni et al., 1977; Kadesh et al., 1980), also the presence of local DNA helix violations (if present) in the binding site (Travers, 1989; Le Cam et al., 1991). Recently it has been demonstrated that the application of EM methods has provided the tools for gene and genome mapping (Kurakin et al., 1991; Cherny et al., 1993a,b; Revet et al., 1993; Cherny et al., 1994). The first step in the analysis of the site-specific binding * Corresponding author. 439
of any ligand with DNA visible in the EM, is the determination of the positions of the specific sites alongside the DNA molecule (i.e. mapping). This information can be obtained from the binding map consisting of different peaks of various heights and widths. The center of every peak corresponds to the position of the specific site (or sites if they are positioned too close), while the size of it is assumed to correspond to the affinity of the ligand to the particular site. If a finite number of these molecules is presented as arrays of 0s and Is, where ls correspond to the DNA-ligand complexes, the binding map might be obtained as their sum in proper orientations. However, several problems exist with obtaining the true binding map. The ends of DNA molecules are indistinguishable on EM micrographs, together with the existence of nonspecific binding, variation of ligand's affinity to different specific sites and experimental errors of measurements. Taken together, these factors result in some ambiguity of EM mapping and poses a problem with alignment validity. In order to avoid ambiguity the different EM mapping techniques have been described. One of them implied specific labeling of only one end of the DNA molecules (Theveny and Revet, 1987b). In another method the authors used two sets of linearized DNA fragments cut at two different unique sites (Naumova et al., 1981). However, these techniques have been tested for short DNA fragments, but it is practically impossible to apply them to long pieces of genomic DNA.
440
A.A. Podtelezhnikovet al.
Common analysis of the EM data implies a multi-step comparison of DNA molecules in two orientations (forward and backward) for their best fit, which is normally performed on a computer with the aid of homemade software. A number of algorithms for the alignment of partially denatured DNA molecules have been described (Young et al., 1974; Borovik et al., 1980; van Dijken and Coetzee, 1981), and analyzed for the reproducibility which have provided reliable results (van Dijken and Coetzee, 1981). Over a period of time we have studied with the aid of electron microscopy the different features of DNA and its complexes with sequence-specific ligands (Borovik et al., 1980; Naumova et al., 1981; Cherny and Alexandrov, 1982; Lyamichev et al., 1983; Kurakin et al., 1991 ; Cherny et al., 1993a,b). For these studies we developed homemade software with the alignment algorithm, which is practically identical to that used for this work, thus accumulating considerable experience in the specific field of EM mapping. Occasionally we encountered the situation when the collection of all the DNA molecules could not be properly aligned. The possible reasons for this could be attributed to the inadequate set of experimental parameters mentioned above. In order to rationalize this uncertainty, a thorough computer simulation of the EM study of the DNA molecules carrying site-specific bound ligands was accomplished. Knowledge of the true orientations of the simulated DNA molecules allowed us to inspect their final orientations after alignment. Molecular, statistical, and experimental factors inherent to these types of D N A ligand molecules were examined for their influence upon obtaining valid maps. A procedure and empirical equation for the testing of the quality of EM mapping data for real DNA molecules with site-specific bound ligands is proposed.
MATERIALS AND METHODS
Simulation of the D N A ligand complexes Each DNA molecule with bound ligands was represented as a linear array, referred to below as a DNAarray, of length (m) and consisted of 0s and Is, where 0s corresponded to the absence of the bound ligand and ls corresponded to its presence. The length of the array (m) was usually equal to 100. This representation is similar to that obtained by the normalization of the measured length of DNA molecules to the unit divided into (m) segments of equal length, where the segment with bound ligand is designated by unity. At the first stage we simulated the DNA-arrays with bound ligands. The ligands were placed on the DNAarrays in accordance with the distribution of binding probabilities as described by Lukashin et al. (1976). The positions of the specific sites were chosen arbitrarily. The mean number of bound ligands per single DNA molecule (A) was chosen as a parameter. The distribution of probabilities of binding was determined by the positions
of the specific binding sites and the specific/non-specific binding ratio (K). Though this procedure simulated DNA molecules with bound ligands in a non-reversible manner, it was considered to be correct, because the final values of specific and non-specific bindings were true. At the second stage we introduced the errors of measurements. As a parameter we used the relative value (HE) of standard deviation (SL) of the total lengths (L) of DNA molecules, which is simply equal to SL/L. It was found empirically that S L = C x L 1/2 (Davidson et al., 1971), where C was a constant. The value of C varied about 2-fold for different EM procedures (Davidson et al., 1971; Hirsh and Schlief, 1976; Borovik and Cherny, unpublished observations). Variation in DNA length results in the appearance of peaks of finite width on the map. The width of the peak is related to the error of determination of ligand(s) position(s). The alignment is usually performed with DNA molecules of normalized lengths. For such molecules the width of the peak for a single binding site can be determined from the following formula: Dz=H ~ [Z(1--Z)],
(1)
where D z is the dispersion of the peak, Z is the position of the center of the peak, 0 < Z < 1 (Cherny and Alexandrov, 1982). From the equation it can be concluded that the widest peak should be in the center of the map and its width equal to half of the standard deviation for the whole length of DNA molecules. In order to simulate these errors we used the iterative procedure, which was applied for each DNA-array of a constant length. The first, closest to one end, bound ligand at position Z1 was randomly shifted with the dispersion D~I determined by eqn (1). Every subsequent bound ligand was successively shifted in two different ways as follows. First, it was shifted several times in the same directions as the preceding ones, with less magnitude proportional to their relative positions. Then second--at r a n d o m - - s o the dispersion of its total shift was equal to that determined by eqn (1). These successive shifts were needed not to invert the positions of closely located ligands, but also to take into account the correlation of their shifts in the normalized DNA-array. The orientations of DNA-arrays up to that moment were stored and considered further as being correct. In the last stage all the DNA-arrays were arbitrarily re-oriented to provide a ramdom set of their initial orientations. Thus, after the alignment was completed, the number of improperly oriented DNA-arrays could be determined. A lignment procedure The main idea of any algorithm of orientation is the conjecture that the scalar product of the DNA-array and the map is maximum for the right orientation of the DNAarray. To achieve this, at each cycle of orientation we calculated a current map simply by summarizing all DNA-arrays in their current orientations, (i.e. we summarized 0s and ls in the corresponding positions), then smoothed the map (see below) and calculated the scalar
Testing of EM mapping
product for each DNA-array in two orientations, back and forward, with the current smoothed map. The orientation, which gave rise to the higher value of the scalar product was used for the next current map. The first current map was obtained from the DNA-arrays taken in their initial orientations. The procedure was stopped when no one DNA-array had been inverted, then the last obtained current map was analysed as the binding map. The smoothing of the map was achieved by its convolution with the Gaussian of the dispersion D~, determined by eqn (1), where H L was chosen equal to 0.02. The smoothing was used to speed up the alignment procedure.
Computer experiment All the parameters of a real EM experiment on site-specific mapping, which can influence the final result, can be divided into two groups: those which are not affected significantly by the experimenter and those, which he could choose during the experiment. The first group comprises the following parameters: the number of specific binding sites (iV), the length of the DNA molecule (L), the relative value of the standard deviation of the DNA molecule lengths (HL) and the specific/non-specific binding ratio (K). The last parameter is equal to the ratio of association constants for specific and non-specific binding respectively. For the computer experiments we used the K/L ratio which is more convenient than K and L separately. Having varied Nfrom 4 to 32 and K/L from 0.25 to 16 we obtained a set of distributions of binding probabilities for the procedure of the simulation of the DNA-ligand complexes. The K/L range 0.25-16 corresponded to the L range of 1,000-16,000 bp and Krange of 250--16,000. The positions of the binding sites were chosen arbitrarily for the given value of N and did not change during the variation of other parameters. H L was varied from 0 to 0.09. We also attributed the mean number of bound ligands per one DNA molecule (A), which in general can be chosen by the experimenter, to this group we used three values equal to 0.SN, N and 2N. For each set of these parameters (N, L, K, HL and A) 8 independent collections of DNA-arrays were generated and aligned as described above. The number of improperly oriented molecules was determined for each collection, so consequently did its mean value (n). We studied the dependence of n on the parameters noted above, which was assumed to be useful for predicting the map validity quantitatively. The randomization of initial orientations of DNAarrays for each collection was usually performed more than five times. It should be noted that the number of improper-oriented DNA-arrays after alignment should depend only on their initial orientations, due to variations in the first current map (see alignment procedure). Other parameters are chosen during the measurement stage: they are the number of considered DNA molecules, (i.e. the volume of sampling), and the length of the DNA-
441
array (m). It is evident that the more DNA molecules the better the map. However, for certain technical problems it is not convenient to consider more than 100 molecules or divide DNA molecules into more than 100 segments. Usually, unless otherwise stated, we used a value of 100 for both parameters.
Electron microscopy We used the data of the EM study of the binding of the methylase BspR1 (Mw 55 kDa, recognition site GGCC) with the linear plasmid DNA pUC19/EcoR1, which were published by Kuratin et al. (1991), (see Figs 2B and 3B). The complexes were obtained by incubation of DNA (10 Ixg/ml) and enzyme (10 lxg/ml) in a buffer containing 20 mM tris-HC1, pH 7.5, 50 mM NaC1, 1 mM DTT, 1 mM Na3EDTA, 50 IxM S-adenosylhomocysteine at 37°C for 40 min. Heparin was then added to a final concentration of 10 lag/ml for 2 min and finally 20 volumes of 10 mM tris-HCl, pH 7.5, 50 mM NaC1 were added. The complexes were adsorbed onto carbon films activated by glow discharge in pentylamine, according to Dubochet (1971). The samples were stained with a 1% aqueous solution of uranyl acetate and shadowed with Pt/C (95/5). The micrographs of DNA-methylase complexes were digitized and analyzed with the aid of an HP9825B computer.
5