Structural basis of ICF-causing mutations in the methyltransferase ...

1 downloads 0 Views 3MB Size Report
(Cameron et al., 1999). In the murine knock-out model, ...... Roberts,R.J., Myers,P.A., Morrison,A. and Murray,K. (1976) J. Mol. Biol.,. 103, 199–208. Robertson ...
Protein Engineering vol.15 no.12 pp.1005–1014, 2003

Structural basis of ICF-causing mutations in the methyltransferase domain of DNMT3B

Ilkka Lappalainen1,2 and Mauno Vihinen1,3,4 1Institute of Medical Technology, FIN-33014 University of Tampere, 2Department of Biosciences, Division of Biochemistry, P.O. Box 56, FIN-00014 University of Helsinki and 3Research Unit, Tampere University

Hospital, FIN-30520 Tampere, Finland 4To

whom correspondence should be addressed, at the first address. E-mail: [email protected]

Mutations in the gene encoding for a de novo methyltransferase, DNMT3B, lead to an autosomal recessive Immunodeficiency, Centromeric instability and Facial anomalies (ICF) syndrome. To analyse the protein structure and consequences of ICF-causing mutations, we modelled the structure of the DNMT3B methyltransferase domain based on Haemophilus haemolyticus protein in complex with the cofactor AdoMet and the target DNA sequence. The structural model has a two-subdomain fold where the DNA-binding region is situated between the subdomains on a surface cleft having positive electrostatic potential. The smaller subdomains of the methyltransferases differ in length and sequences and therefore only the target recognition domain loop was modelled to show the location of an ICF-causing mutation. Based on the model, the DNMT3B recognizes the GC sequence and flips the cytosine from the double-stranded DNA to the catalytic pocket. The amino acids in the cofactor and target cytosine binding sites and also the electrostatic properties of the binding pockets are conserved. In addition, a registry of all known ICF-causing mutations, DNMT3Bbase, was constructed. The structural principles of the pathogenic mutations based on the modelled structure and the analysis of χ angle rotation changes of mutated side chains are discussed. Keywords: database/disease-causing mutations/DNMT3B/ICF syndrome/molecular modeling Introduction DNMT3B forms the mammalian family of DNA cytosine5-methyltransferases (m5C-MTases) together with DNMT1, DNMT2, DNMT3A and DNMT3L [reviewed by Bestor (Bestor, 2000)]. These enzymes catalyse the transfer of a methyl group from S-adenosyl-L-methionine (AdoMet) to the C5 position of cytosine, except for DNMT2 that has no methyltransferase activity (Okano et al., 1998). In mammals and other vertebrates, DNA methylation occurs predominantly at CpG dinucleotides. The effects of DNA methylation include transcriptional repression by methylation of promoter regions (Jones, 1996), formation of compact chromatin structures (Kass et al., 1997), X chromosome inactivation (Panning and Jaenisch, 1998) and imprinting control (Li et al., 1993; Bartolomei and Tilghman, 1997). The CpG dinucleotides are also well-known mutational hotspots in many diseases such as cancers and primary immunodeficiencies (Ollila et al., 1996; Warnecke and Bestor, 2000; Vihinen et al., 2001). Recently, the human DNMT3B gene was mapped to 20q11.2 © Oxford University Press

(Robertson et al., 1999; Xie et al., 1999), a locus associated with Immunodeficiency, Centromeric instability and Facial anomalies (ICF) syndrome (Wijmenga et al., 1998). The ICF syndrome (OMIM 242860) is a rare autosomal recessive disease characterized by an immunodeficiency in combination with facial dysmorphisms, including epicanthic folds, telecanthus, a flat nasal bridge, macroglossia with protruding tongue and mild micrognathia (Hulten, 1978; Tiepolo et al., 1979). All ICF patients have chromosomal instability involving chromosomes 1, 9 and/or 16 (Jeanpierre et al., 1993). These DNA regions contain classical satellites II and III, which are the major components of constitutive heterochromatin and are normally heavily methylated (Schuffenhauer et al., 1995; Miniou et al., 1997). In addition to the ICF syndrome, DNMT3B also plays a role in different forms of cancer (Robertson et al., 1999; Xie et al., 1999; Kanai et al., 2001; Mizuno et al., 2001; Saito et al., 2001). The expression level of DNMT3B was also shown to correlate with the methylation stage of the p15INAK4B tumour suppressor gene in acute myelogenous leukaemia (Mizuno et al., 2001). The hypermethylation of the promotor regions of tumor supressor genes is frequently associated with the inactivation of these genes (Cameron et al., 1999). In the murine knock-out model, inactivation of Dnmt3a or Dnmt3b blocks de novo methylation in embryonic stem (ES) cells and early embryos, but has no effect on the maintenance of imprinted methylation patterns (Okano et al., 1999). Both proteins are localized to the replication foci containing heterochromatin in the murine ES cells, whereas in more differentiated embryonic fibroblasts only the Dnmt3a is targeted to the foci (Bachman et al., 2001). The disruption of the Dnmt3L gene has been shown to cause the failure of maternally methylated imprints whereas the global methylation levels were not affected (Bourc’his et al., 2001). These results indicate a vital role for the DNMT3B protein in the proper regulation of stage-specific genes especially during lymphocyte maturation. The N-terminal region of DNMT3B contains the PWWPand PHD-type zinc finger domains in addition to a large unconserved region with an assumed role in targeting the protein to pericentromeric heterochromatin regions (Bachman et al., 2001). The catalytic m5C-MTase domain is located at the C-terminus of the protein (Figure 1A). The mouse Dnmt3b PWWP domain structure has been solved and shown to bind DNA (Qiu et al., 2002). The PHD domains have been suggested to mediate protein–protein interactions and to bind to DNA with the positively charged N-terminal residues (Jacobson and Pillus, 1999; Palena et al., 2001). Recently, the PHD domains of the mouse Dnmt3a and Dnmt3b proteins were shown to repress transcription independent of the methylation activity (Bachman et al., 2001). The N-terminal residues preceding the PHD domain were also shown to bind SUMO-1 and Ubc9 proteins (Kang et al., 2001). Mutations in the PHD domains of ATRX (Gibbons et al., 1997; Rinderle et al., 1999) and ING1 (Gunduz et al., 2000) as well as the deletion of the 1005

I.Lappalainen and M.Vihinen

1006

ICF-causing mutations in the methyltransferase domain of DNMT3B

domain from AIRE (Nagamine et al., 1997; The Finnish– German APECED Consortium, 1997), MLL, CBP, MOZ and AF10 (Gunduz et al., 2000) result in severe diseases. The DNMT3B PHD-type leuzine zipper is similar to the recently solved KAP-1 PHD domain (Capili et al., 2001). All the currently known ICF-causing mutations are located in the variable N-terminal region as well as in the m5C-MTase domain (Figure 1A). The crystal structure of bacterial DNA methyltransferases from Haemophilus haemolyticus (M. HhaI) and Haemophilus aegyptius (M. HaeIII) have been solved alone and in complex with cofactor and target DNA sequences (Klimasauskas et al., 1994; Reinisch et al., 1995; O’Gara et al., 1996a,b). Recently, the structure of the human DNMT2 m5C-MTase domain was solved and found to be very similar to the bacterial methyltransferases (Dong et al., 2001). In the amino acid level, both bacterial and eukaryotic enzymes have six strongly conserved and four less conserved sequence motifs (I–X) (Posfai et al., 1989; Kumar et al., 1994). Comparison of the two bacterial and the human m5C-MTase structures revealed a common two-domain architecture. Nine of the conserved motifs that are located in the larger subdomain are involved in cofactor and target cytosine binding. The smaller subdomain is formed of the residues located between the motifs VIII and IX (Figure 1B). Smaller domains share low sequence and structural similarity, which allows specific interactions between the DNA and protein and thereby recognition of different DNA sequences. To understand the structural and functional consequences of ICF-causing mutations, we modelled the DNMT3B m5CMTase domain structure using the bacterial M. HhaI crystal structure as the template. The geometry of the catalytic subdomain is very similar to all the known structures. Because the smaller subdomains differ in length and structure, it was possible to model only the DNA binding segment of this domain. The model was built to illustrate the location of the only ICF-causing mutation reported from the minor subdomain. In addition, a mutation database, DNMT3Bbase, was developed. The registry contains clinical information and immunological phenotype of the ICF patients along with the description of the gene defect. The missense mutations were analysed by rotating the side chain χ-angles to observe the local effects to the structure, function and interactions. The structural consequences of the disease-causing mutations as well as the alternative variants of DNMT3B are discussed based on the model. Materials and methods DNMT3B mutation database A submission program was developed to submit mutation and patient information to DNMT3Bbase. The registry was built according to the guidelines adopted in BTKbase (Vihinen et al., 1999) and is freely available at http://bioinf.uta.fi/ DNMT3Bbase/. The aim was to provide information about the

ICF patients and their genetic defects on the Internet. The DNMT3Bbase entry contains four main items: identification of the patient and mutation, reference either to published article(s) or a submitting physician, mutation information and data related to disease. The database has a systematic design, which allows the use of the MUTbase program package (Riikonen and Vihinen, 1999) to generate new information and to distribute the data on the Internet. The DNMT3Bbase is a patient-based mutation registry, in which a patient forms an entry and which facilitates also the coding of diseaserelated parameters. The entries can be analysed with the provided tools or with the SRS program (Etzold and Argos, 1993). The DNMT3Bbase web site also includes the results of the analysis of effects of mutations based on the modelled structure. Molecular modelling of the DNMT3B m5C-MTase structure The previously published alignments (Posfai et al., 1989; Kumar et al., 1994) were studied although only those sequences related to the solved structures are presented here. The methyltransferase sequences of human DNMT2 (TrEMBL code: O14717), H.aegyptius (SwisProt code: P20589), H.haemolyticus (P05102) and human DNMT3B (Q9UBC3) were aligned using Clustal W (Thompson et al., 1994). The final alignment was achieved through manual adjustments, based on the conserved motifs, catalytically important residues and secondary and tertiary structures (Figure 1B). The DNMT3B methylase domain was modelled based on the crystal structure of M. HhaI methyltranseferase at 2.05 Å (PDB entry 6MHT) (Kumar et al., 1997). The model was built by using InsightII (Accelrys, San Diego, CA). The six insertions and the three deletions were modelled by searching loops from a database containing either most of the PDB structures or an unbiased selection of PDB (Boberg et al., 1995). Several residues were taken as anchor points from both ends of the loop and the inserted loop was chosen so that it fitted the structure and had a low root mean square deviation from the anchor points. The model was refined by energy minimization with the program Discover (Accelrys) using the Amber force field in a stepwise manner. First, only the hydrogens were allowed to move, then the side chains of the insertions and deletions were relaxed while the rest of the molecule was fixed. In the next steps, the borders and the conserved regions and then only the Cα atoms were harmonically constrained. In the smaller subdomain, the TRD loop was minimized while only the hydrogens were allowed to relax in the rest of the subdomain. The model was evaluated by using program PROCHECK (Laskowski et al., 1993) and all parameters were found to be within the normal tolerances. Electrostatic surface potential The electrostatic surface potentials were calculated for the model and template with the program Delphi (Accelrys). The cofactor and the DNA was excluded from the template and only the modelled area of the DNMT3B protein was used in calculations.

Fig. 1. (A) A schematic depiction of distribution of the five DNMT3B splice variants. The identified ICF-causing mutations are presented on the DNMT3B1 according to convention used in BTKbase. The insertions are marked with an ‘@’ symbol in front of the mutation site. The number of added residues is given after the ‘⫹’ sign in the inframe insertion. If mutation causes a stop codon it is marked with X and the number of the codon with the newly inserted stop codon. The vertical arrows indicate the mutation positions. (B) The sequence alignment of the m5C-MTase domains of the human DNMT3B and DNMT2 and the bacterial enzymes M. HhaI and M. HaeIII. The α-helices are coloured red, β-strands blue and the conserved threonine involved in the flipping of the target cytosine in the TRD loop green. The M. HhaI secondary structure nomenclature is used although the secondary structures are coloured as in DSSP.

1007

I.Lappalainen and M.Vihinen

1008

ICF-causing mutations in the methyltransferase domain of DNMT3B

Analysis of missense mutations The mutated residues were examined by rotating over the full range of side chain χ angles to determine if the structure could accommodate the new side chain without significant changes to the conformation. Only the substituted side chain was allowed to move during the analysis. The rotatable side chain was created with PREKIN 5.93 and an automated sampling of torsional angles was done with the Autobondrot procedure under PROBE 2.80 as described previously (Word et al., 2000). The side chain conformations of mutated residues were summarized using the KIN2DCONT and KIN3DCONT programs. The A766P mutation was not included in the analyses as the proline alters the backbone configuration. The H814R mutation was surveyed in three runs with fixed χ1 of –65, –177 and ⫹62°. The χ angles were increased by 5° in each step. The acceptable conformations for a mutated side chain have a total score above –1.0 (Lovell et al., 2000). Each mutation was then visualized by using the MAGE 5.93 display program together with the all atom contacts calculated with PROBE 2.80 to study the positive van der Waals surfaces and possible unfavourable electrostatic effects. All the programs used were obtained from http://kinemage.biochem.duke.edu. Results DNMT3B mutation database The reported ICF-causing mutations (Hansen et al., 1999; Xu et al., 1999) were collected into a database, DNMT3Bbase. Currently, 14 mutation entries from 13 unrelated families showing 16 unique genotypes are listed (Figure 1A). In addition to the mutations, the registry contains information also about symptoms, age at diagnosis and various parameters from the patients. The registry was constructed according to the concepts used in BTKbase, an X-linked agammaglobulinaemia (XLA) mutation registry (Vihinen et al., 2001), by using the MUTbase system (Riikonen and Vihinen, 1999). DNMT3Bbase is freely accessible at http://bioinf.uta.fi/DNMT3Bbase. Researchers are encouraged to send their patient and mutation information to the registry. The DNMT3Bbase provides additional calculated information, such as alterations to restriction enzyme patterns due to mutations and mutation statistics. Sequence analyses Residues 575–739 and 801–853 of DNMT3B form the larger lobe and the smaller lobe is composed of residues 740–800 (Figure 1B). The secondary structures of the DNMT3B protein were predicted with the Internet server Jpred2 at http:// jura.ebi.ac.uk:8888 (Cuff et al., 1998) and the secondary structure information of M. HhaI, M. HaeIII and DNMT2 was taken from the DSSP database (Kabsch and Sander, 1983). According to the sequence alignment, the predicted secondary structural elements of DNMT3B match those for M. HhaI and M. HaeIII enzymes and DNMT2. In addition, nine of the 10

conserved motifs align sequentially with the other m5CMTases. The motif VII is absent as the structure formed by the αD-helix and β5-strand is seven residues shorter in DNMT3B. This motif is not fully conserved among the m5CMTases. A deletion of three residues precedes the αD-helix in both DNMT3B and DNMT2 and the modelled target recognition domain (TRD) loop contains a deletion of one residue. The DNMT3B model has six insertions compared with the M. HhaI methyltransferase (Figure 1B). The first insertion is of two residues in the loop αA/β2 and the second one of two residues precedes the catalytically important D627 (D60 of M. HhaI) in the loop connecting the helices αB and αB⬘. The loop between the αB⬘-helix and β3-strand contains the third insertion of three residues in the C-terminus of the loop. The active-site loop between the β3-strand and αC-helix contains an extra proline. The largest insertion of seven residues is in the loop connecting αC to β4 and the last insertion of two residues is located in the loop αE/αF, in both DNMT3B and DNMT2. Structural model The sequence identity of DNMT3B, the two bacterial and human proteins with structure is 13–18% for the whole domain and 19–21% without the variable minor domain. The DNMT2 structure also lacks residues from the catalytic loop. The m5CMTase domain of the DNMT3B protein was therefore modelled based on the M. HhaI structure having the best resolution (2.05 Å) in the presence of the AdoMet cofactor and the target DNA sequence containing 4⬘-thio-2⬘-deoxycytidine (Kumar et al., 1997). The modification of the target cytosine inhibits the enzyme activity, but does not prevent DNA recognition or disrupt the structure of the protein. The M. HhaI folds into two subdomains between which the DNA is bound. The DNMT3B m5C-MTase model has a similar positive DNA binding surface as in the M. HhaI structure (Figure 3). The residues essential for the catalytic activity and binding of the cofactor are located close to each other on the surface of the larger subdomain and these binding pockets show good electrostatic and spatial similarity (Figures 2C and 3). The M. HhaI recognizes the GCGC sequence and flips the inner cytosine out of the DNA helix into the catalytic site for methylation (Roberts et al., 1976; Klimasauskas et al., 1998). The residues involved in this process are located in the smaller subdomain. They form the TRD loop, which runs parallel with the target DNA strand and serves as a scaffold for conformational processing of the bound substrate (Cheng and Blumenthal, 1996). The comparison of the DNMT2, M. HhaI and M. HaeIII structures illustrates high sequence and structural similarity in the larger subdomains. The core of the subdomain is composed of a six-stranded β-sheet sandwiched between two α-helices

Fig. 2. (A) and (B) stereo views of the modelled structure of the DNMT3B m5C-MTase domain. Superimposition of the template protein M. HhaI (green) and the DNMT3B m5C-MTase domain (magenta) on the left. The target cytosine is coloured cyan and the cofactor orange. On the right are shown all the known missense mutations (white) leading to ICF. The active-site loop in the larger subdomain and the TRD loop of the smaller subdomain are shown as yellow and orange, respectively. (C) Detailed view of the target cytosine and cofactor-binding sites. The residues involved in the binding and recognition of the target cytosine (cyan) and cofactor (orange) in the M. HhaI structure are shown in green. The corresponding residues in the DNMT3B are coloured magenta. (D) Structure of the TRD loop binding to the target DNA. The M. HhaI TRD loop (green) superimposed with the M. HaeIII (red) and the modelled DNMT3B (magenta). The target DNA sequence (cyan) is from the M. HhaI structure. (E) The MAGE/PROBE display of the V606A mutation showing the hydrophobic contact surface dots of the cofactor (red) and the valine from two angles. The contact surface after substitution is circled. (F) 3D contour map summarizing the M818 side chain conformations in the model of the DNMT3B V818M mutant. The MAGE/PROBE analyses of all the mutants with contours are available in the DNMT3Bbase.

1009

I.Lappalainen and M.Vihinen

Fig. 3. The electrostatic surface potentials of the M. HhaI and DNMT3B proteins. Negative potential is shown in red and positive potential in blue. (A) M. HhaI bound to the target DNA (cyan) and cofactor (orange). The orientation is the same as in Figure 2. (B) Model of DNMT3B in the same orientation as M. HhaI. The cofactor (orange) is superimposed from the M. HhaI structure. Only the modelled TRD loop is shown and therefore part of the positive DNA binding area is missing. (C) and (D) side views of the M. HhaI and DNMT3B proteins, respectively. ICF-causing missense mutations (H814R, D817G, V818M) clustered around a positive surface are shown.

(C and D) on one side and two on the other (A and G). The αB-helix runs across the sheet in front of the sandwich (Cheng et al., 1993). The DNMT3B model shares the same geometry and most of the amino acids involved in catalysis are structurally conserved (Figure 2C). In the M. HhaI structure, the target cytosine is buried in the catalytic pocket as the active-site loop (78–100 in the M. HhaI) folds on top of it. This long loop contains the catalytically important cysteine in the N-terminus 1010

and four conserved amino acids involved in DNA binding (Cheng and Blumenthal, 1996). The DNMT3B active site loop (648–669) has a similar fold despite the additional proline in the middle of the loop. The other insertions and deletions located in the larger subdomain are not situated in the ligand binding areas. The smaller subdomains of m5C-MTases differ in length and sequence (Lauster et al., 1989; Posfai et al., 1989). The

ICF-causing mutations in the methyltransferase domain of DNMT3B

DNMT2 and M. HhaI structures share a similar propeller-like β-strand core surrounded by several α-helices in both structures (Cheng et al., 1993; Dong et al., 2001). The structure of the M. HaeIII smaller subdomain consists mostly of coils except for the four short β-strands and a short α-helix preceding the TRD loop (Reinisch et al., 1995). The TRD loop is the only structurally conserved segment in the smaller subdomains among M. HhaI, M. HaeIII and DNMT2. As the sequence identity and the number of secondary structures as well as their position differ among all the known structures and DNMT3B, we constructed a model only for the TRD loop based on the M. HhaI structure to show the position of the only ICF-causing mutation found from the smaller subdomain (Figure 2D). The loop contains the functionally important T773 (T250 of M. HhaI) and also two other residues involved in DNA recognition in the M. HhaI and M. HaeIII structures (see below). Discussion Residues involved in target cytosine and AdoMet binding The target cytosine and cofactor binding sites are located on the surface of the large subdomain and are conserved in all four proteins (Figure 2C). Residues F18, P80 and L100 form one side of the cofactor-binding pocket in the M. HhaI structure (Cheng et al., 1993). All three residues are conserved in M. HaeIII, DNMT2 and DNMT3B. The wall of the opposite side of the binding pocket is formed by W41 in M. HhaI but it is substituted by tyrosine in M. HaeIII and by valine in DNMT2 and DNMT3B. The tryptophan is poorly conserved among the m5C-MTases and it is often replaced by a hydrophobic amino acid (Cheng et al., 1993). Three charged residues are located in the AdoMet binding pocket. The side chain of E40 buried in the hydrophobic binding pocket interacts with the purine moiety of the AdoMet in the M. HhaI. The side chains of D60 and Q82 bind to amino and carboxyl groups of the methionine moiety, respectively (Cheng et al., 1993). Q82 is replaced by an asparagine in DNMT3B and in DNMT2 E40 and D60 correspond to the aspartate and threonine, respectively. The interaction between M. HhaI and DNA is dynamic. The target cytosine has two conformations. It is either flipped out from the double helix to the catalytic pocket near the AdoMet binding site or remains in the stacked state (Klimasauskas et al., 1998). The binding of the cofactor induces the conformational change of the active site loop locking the cytosine to the catalytic site (Cheng et al., 1993; Klimasauskas et al., 1994, 1998). In the M. HhaI–DNA– AdoMet structure, R165 and E119 bind to the cytosine allowing the C81 to attack C6 of cytosine (Figure 2C). This results in the addition of a methyl group to the C5 position of cytosine followed by elimination of the C5 proton and release of the covalent intermediate [reviewed by Kumar et al. (Kumar et al., 1994)]. The corresponding residues are conserved in M. HaeIII, DNMT2 and DNMT3B. Residues involved in DNA binding The DNA is bound to the positively charged cleft formed by the large and small subdomains (Figure 3). The m5C-MTases do not show binding specificity for the flippable base itself (Klimasauskas and Roberts, 1995; Yang et al., 1995). Instead, a number of specific interactions occur between the smaller domains and the major groove of the DNA. The residues involved in recognition are generally not conserved and the types of contacts differ between M. HhaI and M. HaeIII

(Reinisch et al., 1995). However, some features are shared. The TRD loop binds to the two phosphates preceding the target cytosine on the methylated strand (O’Gara et al., 1998). The 5⬘-terminal phosphate is bound by the Y242 and R229 (Figure 2D) in M. HhaI and M. HaeIII, respectively. The corresponding residues do not show conservation either in DNMT2 (Q284) or DNMT3B (A766). The other phosphate is recognized by a threonine (T250 of M. HhaI) in both structures. This residue is conserved among the m5C-MTases. It is involved in conformational change of the target base backbone as the base flipping occurs (Vilkaitis et al., 2000). Another shared feature is the recognition of guanine 5⬘ to the flipped cytosine by R240 and R227 in M. HhaI and M. HaeIII, respectively. The arginine (R764) is conserved also in DNMT3B, indicating that the target sequence contains the GC-dinucleotide. The minor groove of the DNA faces the larger subdomain in the solved m5C-MTases structures. The residues from the active site loop (S85, K89, K91 and R97) and R165 from the catalytic pocket interact with three phosphates of the target DNA strand (Cheng et al., 1993). S85 and K91 are conserved in DNMT3B. R661 of DNMT3B most likely corresponds to K89 of M. HhaI as the insertion of the proline in the middle of the active site loop changes the local structure of the loop. R97 in the C-terminus of the active site loop is conserved in M. HaeIII and DNMT2. The corresponding residue is substituted with a smaller and negatively charged threonine in DNMT3B. Structural basis of ICF-causing mutations Three-dimensional structural models have provided new insights into, e.g., ligand binding and interpretation of structural consequences of the mutations in the genes responsible for different diseases (Lappalainen et al., 2000; Matias et al., 2000; Rong et al., 2000; Mao et al., 2001). Here, the ICFcausing mutations in the human DNMT3B gene have been explained with the models of the three-dimensional structure of the DNMT3B m5C-MTase domain (Figure 1A). The effects of missense mutations to the function, interactions or structure of the protein were analysed by rotating the side chain χ angles to investigate whether the new side chain could be accommodated. As changes of the main chain often lead to large and unpredictable effects in the structure, the structural predictions were focused on the possible local alterations. A585 is situated at the end of a loop connecting the β1strand to the αA-helix (Figure 2B). The structure of this loop is essential for a tight turn to allow correct conformation for the adenine ring of the cofactor (Cheng et al., 1993). The substitution of alanine by valine in the modelled structure results in clashes with I584, Y588 and H618 residues in all rotamer conformations, causing unpredictable changes to the backbone conformation of the loop. A603, located at the β2-strand, is surrounded by three buried charged amino acids in DNMT3B. The side chain of K600 from the β2-strand forms a hydrogen bond with the hydroxyl group of S579 and the carboxylic group of D582 closes off the surface. Charged residues also surround the residue corresponding to A603 in DNMT2 and M. HhaI structures. The difference between the DNMT3B, DNMT2 and M. HhaI structures surrounding A603 is that Y43 (DNMT2) and Y49 (M. HhaI) side chains from the αB-helix occupy the same space as K600 (DNMT3B). The matching tyrosine is substituted by a G614 in the DNMT3B and a small residue in M. HhaI and 1011

I.Lappalainen and M.Vihinen

DNMT2 structures replaced the corresponding residue of K600. In the t-rotamer conformation, the hydroxyl group of the mutated threonine faces the carboxylic group of D582 and the hydroxyl group of S579 whereas the Cβ atom of the threonine interacts with K600. The corresponding A609T mutation in the mouse Dnmt3b protein shows ⬍2% of the normal catalytic activity but the stability of the mutated protein was not reported (Gowher and Jeltsch, 2002). Although the side chain of threonine can fit into the structure, the neutralization of the charged side chain could cause severe distortion to the local structure by affecting the protein stability or cofactor binding by misplacing the conserved F581 (Figure 2C). The third mutation affecting the cofactor binding is V606A. The corresponding W41 in the M. HhaI structure (Figure 2C) forms one side of the hydrophobic platform for the purine ring of AdoMet by lying parallel to it (Cheng et al., 1993). Recently, it was shown that the binding of AdoMet by M. HhaI occurs in two phases (Swaminathan et al., 2002). Both stages of binding were conserved also in the M. HhaI W41I mutant, indicating a similar dual recognition mode for the DNMT3B protein (Swaminathan et al., 2002). The primary binding is driven by the formation of specific hydrogen bonds and van der Waals interactions whereas the secondary weak entropic binding originates from the isolation of non-polar residues from the solvent. The substitution of valine by a smaller side chain in DNMT3B does not damage the structure of the binding site but reduces the critical hydrophobic surface between the mutated side chain and the cofactor by half (Figure 2E). The residues surrounding V606 in the model cannot compensate for the loss of hydrophobic surface. The G663S and L664T mutations are situated in the Cterminal part of the active-site loop. The substitution of glycine by serine in DNMT3B and also in both substrate bound/ unbound M. HhaI structures (6MHT/1HMY) causes collision with the surrounding residues. In the wild-type structure, the L664 side chain is surrounded by several hydrophobic residues (F673, Y665, I710 and Cα–Cε atoms of R707). The hydrophobic interactions with these amino acids are lost or disturbed by the introduction of threonine. In the best conformation, the γmethyl group of the t-rotamer exhibits van der Waals interactions with the aromatic ring of Y665, but the hydroxyl group interacts with the oxygen of the backbone carbonyl group. In the p-rotamer, the hydroxyl group has no restrictions but all the favourable van der Waals interactions are lost. In the third narrow acceptable conformation with χ1 ⫽ –45°, close to mrotamer, the γ-methyl group is stacked between the aromatic ring of F673 and backbone carbonyl group. The hydroxyl group is orientated into the hydrophobic pocket. Based on the results, the G663S and L664T mutations affect the flexibility and packing of the active-site loop, which is crucial for locking the target cytosine into the concave catalytic site. The equivalent mutations in the Dnmt3b protein showed only 5– 10% of wild-type activity (Gowher and Jeltsch, 2002). The V726G substitution is located in the loop connecting β5 and β6 strands, an area in the DNMT3B that differs from the known methylase structures (Figure 1B). Furthermore, the residues located in this loop face the smaller subdomain in the M. HhaI, M. HaeIII and DNMT2 structures. As a result, the analysis of the DNMT3B model does not provide a reliable structural interpretation for the pathogenicity of the V726G substitution, but the residue might be important for interactions within the protein. The corresponding V726G mutation in 1012

the Dnmt3b protein had no catalytic activity (Gowher and Jeltsch, 2002). The V699G mutation affects the highly conserved ENVmotif at the end of the β4-strand. This region is vital for the structure of the DNA binding region due to interactions with the TRD loop and catalytic pocket (Cheng et al., 1993). The side chain of V699 packs against the hydrophobic parts of S655, R733 and the catalytic C651. The γ1-methyl group of valine forms a hydrogen bond with the Oδ1 atom of N698. The mutation seems to cause ICF syndrome by eliminating the coordinating interactions. The H814R, D817G and V818M mutations are clustered in the loop between αE and αF helices. They appear on the border of a large positively charged surface next to the DNA and cofactor binding areas (Figure 3D). The H814R substitution replaces the imidazole ring of the histidine by a longer and more flexible side chain, which is able to move extensively on the surface. The M818 side chain has also a wide range of acceptable conformations with positive van der Waals contacts (Figure 2F). Although the methionine holds a long and hydrophobic side chain, the thioether group is a known nucleophile. The side chain of D817 points outwards from the protein and does not make any specific interactions with the surrounding residues. The corresponding residues of M. HhaI are also situated on the equivalent surface and are not involved in DNA binding or intramolecular interactions between the subdomains. The results indicate that these residues could be important for the solubility of the protein or for function as they affect only a small area at the surface of the molecule. The mouse D823G and V824M mutations corresponding to D817 and V818 showed only 7–10% of the wild-type activity (Gowher and Jeltsch, 2002). Additional biochemical data are required for a detailed structural explanation of the pathogenic consequences of these mutations. Currently, only one missense mutation has been described from the smaller subdomain of the DNMT3B. A766 is located in the TRD loop, the only structurally conserved region of smaller subdomains among the known methylase structures. The corresponding residue is involved in binding of the target sequence M. HhaI and the mutation of A766P in DNMT3B presumably affects the target DNA recognition and base flipping by altering the structure of the loop (Figure 2D). In addition to missense mutations, one insertion, two intron and two nonsense mutations have been described. The nonsense mutations truncate the protein before the PWWP domain, leading to inactive and most likely unstable proteins (Figure 1A). The insertion of one base leads to a frameshift and truncation of protein after W158 (Xu et al., 1999). A point mutation in the intron causes an alternative splice site and introduces a new stop codon in the middle of the smaller subdomain (Xu et al., 1999). Another point mutation in the intron 22 has been found from two unrelated families (Hansen et al., 1999). It causes an in-frame insertion of three residues, STP, after R807 in the C-terminus of the αE-helix. The insertion appears in a conserved hydrophobic cluster of residues and most likely affects the fold of the α-helix and also the scaffolding of the whole protein. Alternative forms of human DNMT3B protein Five different human DNMT3B splice variants have been described and their role as active enzymes hypothesized (Robertson et al., 1999; Xie et al., 1999). The variants are expressed in a tissue-specific manner and several of these

ICF-causing mutations in the methyltransferase domain of DNMT3B

forms are upregulated in tumours more frequently than others (Figure 1A). All the isoforms, except for DNMT3B1, contain a small deletion between the PWWP and PHD domains. The DNMT3B2 is most likely an active enzyme as the murine Dnmt3b2 was shown to be active in vitro (Robertson et al., 1999) and the Dnmt3b2 and DNMT3B2 sequences show identity of 86% for the whole proteins and 94% for the methyltransferase domains. The DNMT3B3 variant has an additional deletion of 64 residues, leading to an exclusion of the smaller subdomain. The DNMT3B4 and DNMT3B5 isoforms also lack the minor subdomain in addition to the αE– αF–αG segment from the larger subdomain. The DNMT3B3 protein is ubiquitously expressed in normal tissues in addition to most analysed tumours (Robertson et al., 1999). Further, the DNMT3B3 and DNMT3B4 proteins are the major splice variants in hepatocellular carcinomas (HCCs) and the overexpression of DNMT3B4 leads to hypomethylation of pericentromeric satellite regions in precancerous conditions and in HCCs (Saito et al., 2002). These results indicate an important role for the DNMT3B isoforms in vivo. The deletion of the minor subdomain could represent a natural way to change function of the protein by inactivating the methyltransferase activity. The mouse Dnmt3b3 was shown to be catalytically inactive in vitro (Aoki et al., 2001) and the N-terminal domains of DNMT3B have been shown to bind DNA or are involved in targeting to heterochromatin independently of methyltransferase activity. The known methyltransferase structures show only a few specific interactions between the catalytic subdomain and DNA. The larger subdomain of DNMT3B was shown to have a large positively charged surface that might bind DNA when the minor subdomain is missing. We also analysed the dbSNP database (http://www. ncbi.nlm.nih.gov/SNP/) to evaluate single nucleotide polymorphisms (SNPs). As of October 22, 2002, the database contained 75 different SNPs for DNMT3B. Of these, 20 are located in the untranslated regions, 52 in the introns and only three in the coding region. None of the intron SNPs were found to change a nucleotide reported in the DNMT3base and the coding region SNPs are silent (A14A, Y558Y and N406N). Only 5/75 of the listed SNPs contained heterozygosity data. As a summary, the DNMT3B m5C-MTase domain was modelled and the electrostatic surface potential calculated for the larger subdomain. The comparison of the modelled and known structures suggests similar fold for larger subdomains with highly conserved residues involved in cofactor and target cytosine binding. The smaller subdomain of DNMT3B differs from all the known structures apart from the TRD loop involved in recognizing and flipping of the target cytosine. The ICF-causing mutations identified thus far were interpreted in structural terms based on the model. In addition to the ICF syndrome, the DNMT3B protein has been linked to several forms of cancer. Therefore, the structural model of the DNMT3B m5C-MTase domain presented here may be used as a starting point for further analyses of rapidly accumulating variation data or solid pharmacodesign. Acknowledgements Financial support from the Academy of Finland, the Medical Research Fund of Tampere University Hospital and the Sigrid Juselius Foundation is gratefully acknowledged.

References Aoki,A., Suetake,I., Miyagawa,J., Fujio,T., Chijiwa,T., Sasaki,H. and Tajima,S. (2001) Nucleic Acids Res., 29, 3506–12.

Bachman,K.E., Rountree,M.R. and Baylin,S.B. (2001) J. Biol. Chem., 276, 32282–32287. Bartolomei,M.S. and Tilghman,S.M. (1997) Annu. Rev. Genet., 31, 493–525. Bestor,T.H. (2000) Hum. Mol. Genet., 9, 2395–2402. Boberg,J., Salakoski,T. and Vihinen,M. (1995) Protein. Eng., 8, 501–503. Bourc’his,D., Xu,G.L., Lin,C.S., Bollman,B. and Bestor,T.H. (2001) Science, 294, 2536–2539. Cameron,E.E., Baylin,S.B. and Herman,J.G. (1999) Blood, 94, 2445–2451. Capili,A.D., Schultz,D.C., Rauscher,I.F. and Borden,K.L. (2001) EMBO J., 20, 165–177. Cheng,X. and Blumenthal,R.M. (1996) Structure, 4, 639–645. Cheng,X., Kumar,S., Posfai,J., Pflugrath,J.W. and Roberts,R.J. (1993) Cell, 74, 299–307. Cuff,J.A., Clamp,M.E., Siddiqui,A.S., Finlay,M. and Barton,G.J. (1998) Bioinformatics, 14, 892–893. Dong,A., Yoder,J.A., Zhang,X., Zhou,L., Bestor,T.H. and Cheng,X. (2001) Nucleic Acids Res., 29, 439–448. Etzold,T. and Argos,P. (1993) Comput. Appl. Biosci., 9, 59–64. Gibbons,R.J. et al. (1997) Nature Genet., 17, 146–148. Gowher,H. and Jeltsch,A. (2002) J. Biol. Chem., 277, 20409–20414. Gunduz,M., Ouchida,M., Fukushima,K., Hanafusa,H., Etani,T., Nishioka,S., Nishizaki,K. and Shimizu,K. (2000) Cancer Res., 60, 3143–3146. Hansen,R.S., Wijmenga,C., Luo,P., Stanek,A.M., Canfield,T.K., Weemaes,C.M. and Gartler,S.M. (1999) Proc. Natl Acad. Sci. USA, 96, 14412–14417. Hulten,M. (1978) Clin. Genet., 14, 294. Jacobson,S. and Pillus,L. (1999) Curr. Opin. Genet. Dev., 9, 175–184. Jeanpierre,M., Turleau,C., Aurias,A., Prieur,M., Ledeist,F., Fischer,A. and Viegas-Pequignot,E. (1993) Hum. Mol. Genet., 2, 731–735. Jones,P.A. (1996) Cancer Res., 56, 2463–2467. Kabsch,W. and Sander,C. (1983) Biopolymers, 22, 2577–2637. Kanai,Y., Ushijima,S., Kondo,Y., Nakanishi,Y. and Hirohashi,S. (2001) Int. J. Cancer, 91, 205–212. Kang,E.S., Park,C.W. and Chung,J.H. (2001) Biochem. Biophys. Res. Commun., 289, 862–868. Kass,S.U., Landsberger,N. and Wolffe,A.P. (1997) Curr. Biol., 7, 157–165. Klimasauskas,S. and Roberts,R.J. (1995) Nucleic Acids Res., 23, 1388–1395. Klimasauskas,S., Kumar,S., Roberts,R.J. and Cheng,X. (1994) Cell, 76, 357–369. Klimasauskas,S., Szyperski,T., Serva,S. and Wuthrich,K. (1998) EMBO J., 17, 317–324. Kumar,S., Cheng,X., Klimasauskas,S., Mi,S., Posfai,J., Roberts,R.J. and Wilson,G.G. (1994) Nucleic Acids Res., 22, 1–10. Kumar,S., Horton,J.R., Jones,G.D., Walker,R.T., Roberts,R.J. and Cheng,X. (1997) Nucleic Acids Res., 25, 2773–2783. Lappalainen,I., Giliani,S., Franceschini,R., Bonnefoy,J.Y., Duckett,C., Notarangelo,L.D. and Vihinen,M. (2000) Biochem. Biophys. Res. Commun., 269, 124–130. Laskowski,R.A., MacArthur,M.W., Moss,D.S. and Thornton,J.M. (1993) J. Appl. Crystallogr., 26, 283–291. Lauster,R., Trautner,T.A. and Noyer-Weidner,M. (1989) J. Mol. Biol., 206, 305–312. Li,E., Beard,C. and Jaenisch,R. (1993) Nature, 366, 362–365. Lovell,S.C., Word,J.M., Richardson,J.S. and Richardson,D.C. (2000) Proteins, 40, 389–408. Mao,C., Zhou,M. and Uckun,F.M. (2001) J. Biol. Chem., 276, 41435–41443. Matias,P.M. et al. (2000) J. Biol. Chem., 275, 26164–26171. Miniou,P., Bourc’his,D., Molina Gomes,D., Jeanpierre,M. and ViegasPequignot,E. (1997) Cytogenet. Cell Genet., 77, 308–313. Mizuno,S., Chijiwa,T., Okamura,T., Akashi,K., Fukumaki,Y., Niho,Y. and Sasaki,H. (2001) Blood, 97, 1172–1179. Nagamine,K. et al. (1997) Nature Genet., 17, 393–398. O’Gara,M., Klimasauskas,S., Roberts,R.J. and Cheng,X. (1996a) J. Mol. Biol., 261, 634–645. O’Gara,M., Roberts,R.J. and Cheng,X. (1996b) J. Mol. Biol., 263, 597–606. O’Gara,M., Horton,J.R., Roberts,R.J. and Cheng,X. (1998) Nature Struct. Biol., 5, 872–877. Okano,M., Xie,S. and Li,E. (1998) Nucleic Acids Res., 26, 2536–2540. Okano,M., Bell,D.W., Haber,D.A. and Li,E. (1999) Cell, 99, 247–257. Ollila,J., Lappalainen,I. and Vihinen,M. (1996) FEBS Lett., 396, 119–122. Palena,C.M., Tron,A.E., Bertoncini,C.W., Gonzalez,D.H. and Chan,R.L. (2001) J. Mol. Biol., 308, 39–47. Panning,B. and Jaenisch,R. (1998) Cell, 93, 305–308. Posfai,J., Bhagwat,A.S., Posfai,G. and Roberts,R.J. (1989) Nucleic Acids Res., 17, 2421–2435. Qiu,C., Sawada,K., Zhang,X. and Cheng,X. (2002) Nature Struct. Biol., 9, 217–224.

1013

I.Lappalainen and M.Vihinen Reinisch,K.M., Chen,L., Verdine,G.L. and Lipscomb,W.N. (1995) Cell, 82, 143–153. Riikonen,P. and Vihinen,M. (1999) Bioinformatics, 15, 852–859. Rinderle,C., Christensen,H.M., Schweiger,S., Lehrach,H. and Yaspo,M.L. (1999) Hum. Mol. Genet., 8, 277–290. Roberts,R.J., Myers,P.A., Morrison,A. and Murray,K. (1976) J. Mol. Biol., 103, 199–208. Robertson,K.D., Uzvolgyi,E., Liang,G., Talmadge,C., Sumegi,J., Gonzales,F.A. and Jones,P.A. (1999) Nucleic Acids Res., 27, 2291–2298. Rong,S.B., Valiaho,J. and Vihinen,M. (2000) Mol. Med., 6, 155–164. Saito,Y., Kanai,Y., Sakamoto,M., Saito,H., Ishii,H. and Hirohashi,S. (2001) Hepatology, 33, 561–568. Schuffenhauer,S., Bartsch,O., Stumm,M., Buchholz,T., Petropoulou,T., Kraft,S., Belohradsky,B., Hinkel,G.K., Meitinger,T. and Wegner,R.D. (1995) Hum. Genet., 96, 562–571. Swaminathan,C.P., Sankpal,U.T., Rao,D.N. and Surolia,A. (2002) J. Biol. Chem., 277, 4042–4049. The Finnish–German APECED Consortium (1997) Nature Genet., 17, 399– 403. Thompson,J.D., Higgins,D.G. and Gibson,T.J. (1994) Nucleic Acids Res., 22, 4673–4680. Tiepolo,L., Maraschio,P., Gimelli,G., Cuoco,C., Gargani,G.F. and Romano,C. (1979) Hum. Genet., 51, 127–137. Vihinen,M., Kwan,S.P., Lester,T., Ochs,H.D., Resnick,I., Va¨ liaho,J., Conley,M.E. and Smith,C.I.E. (1999) Hum. Mutat., 13, 280–285. Vihinen,M. et al. (2001) Adv. Genet., 43, 103–188. Vilkaitis,G., Dong,A., Weinhold,E., Cheng,X. and Klimasauskas,S. (2000) J. Biol. Chem., 275, 38722–38730. Warnecke,P.M. and Bestor,T.H. (2000) Curr. Opin. Oncol., 12, 68–73. Wijmenga,C. et al. (1998) Am. J. Hum. Genet., 63, 803–809. Word,J.M., Bateman,R.C.,Jr, Presley,B.K., Lovell,S.C. and Richardson,D.C. (2000) Protein Sci., 9, 2251–2259. Xie,S., Wang,Z., Okano,M., Nogami,M., Li,Y., He,W.W., Okumura,K. and Li,E. (1999) Gene, 236, 87–95. Xu,G.L., Bestor,T.H., Bourc’his,D., Hsieh,C.L., Tommerup,N., Bugge,M., Hulten,M., Qu,X., Russo,J.J. and Viegas-Pequignot,E. (1999) Nature, 402, 187–191. Yang,A.S., Shen,J.C., Zingg,J.M., Mi,S. and Jones,P.A. (1995) Nucleic Acids Res., 23, 1380–1387. Received January 21, 2002; revised October 3, 2002; accepted October 10, 2002

1014