Molecular Cell, Vol. 10, 1129–1137, November, 2002, Copyright 2002 by Cell Press
Diabetes Mutations Delineate an Atypical POU Domain in HNF-1␣ Young-In Chi, J. Daniel Frantz,2 Byung-Chul Oh, Lone Hansen, Sirano Dhe-Paganon, and Steven E. Shoelson1 Joslin Diabetes Center and Department of Medicine Harvard Medical School Boston, Massachusetts 02215
Summary Mutations in Hnf-1␣ are the most common Mendelian cause of diabetes mellitus. To elucidate the molecular function of a mutational hotspot, we cocrystallized human HNF-1␣ 83–279 with a high-affinity promoter and solved the structure of the complex. Two identical protein molecules are bound to the promoter. Each contains a homeodomain and a second domain structurally similar to POU-specific domains that was not predicted on the basis of amino acid sequence. Atypical elements in both domains create a stable interface that further distinguishes HNF-1␣ from other flexible POU-homeodomain proteins. The numerous diabetescausing mutations in HNF-1␣ thus identified a previously unrecognized POU domain which was used as a search model to identify additional POU domain proteins in sequence databases. Introduction The identification of the MODY (maturity-onset diabetes of the young) genes has been a noteworthy success in what has otherwise been a difficult search for the genetic basis of type 2 diabetes. MODY is distinct from typical type 2 diabetes in that its inheritance is autosomal dominant and clinical onset usually occurs at less than 25 years of age. Six MODY genes, referred to as MODY1–6, have been identified: Hnf-4␣, Gck, Hnf-1␣, Pdx-1, Hnf-1, and Neuro-D1/Beta-2, respectively (Hattersley, 1998; Froguel and Velho, 2001; Fajans et al., 2001). Mutations in Hnf-1␣ (MODY3) are the most common of the known Mendelian causes of diabetes (Yamagata et al., 1996; Frayling et al., 1997; Iwasaki et al., 1997; Urhammer et al., 1997; Glucksmann et al., 1997; Hansen et al., 1997; Kaisaki et al., 1997; Yamada et al., 1997, 1999; Hattersley, 1998; Chevre et al., 1998; Moller et al., 1998; Vaxillaire et al., 1999; Yoshiuchi et al., 1999; Lehto et al., 1999; Hegele et al., 1999; Ng et al., 1999; Ellard, 2000; Bjorkhaug et al., 2000). First identified as a key transcription factor in liver (Courtois et al., 1987), HNF1␣ (hepatocyte nuclear factor-1␣) is also expressed in kidney and throughout the gastrointestinal tract (Mendel and Crabtree, 1991). The fact that diabetes is the clinically apparent phenotype in patients with Hnf-1␣ mutations (Yamagata et al., 1996; Ellard, 2000) appears to be 1
Correspondence:
[email protected] Present address: Vertex Pharmaceuticals, Cambridge, Massachusetts 02139. 2
due to altered gene expression in pancreatic  cells, which leads to impaired insulin synthesis and secretion and abnormal  cell development. Specifically implicated genes include those that encode insulin, the glucose transporter Glut2, neutral and basic amino acid transporters, and mitochondrial enzymes such as pyruvate dehydrogenase (Emens et al., 1992; Shih et al., 2001). The expression of other islet-enriched transcription factors is altered in Hnf-1␣⫺/⫺ mice, including the MODY gene products HNF-4␣, Pdx-1, and Neuro-D1/ Beta-2 (Shih et al., 2001), suggesting a complex interrelationship and hierarchical network of transcriptional elements in pancreatic  cells (Duncan et al., 1998). HNF-1␣ has been subdivided into three functional regions: an amino-terminal dimerization domain (residues 1–32), a DNA binding motif containing an atypical homeodomain (residues 203–276), and a carboxyl-terminal transactivation domain (residues 281–631) (Mendel and Crabtree, 1991). Three-dimensional structures have been reported for the dimerization domain and the homeodomain. The dimerization domain forms a four-helix bundle, with each identical protomer contributing two ␣ helices separated by a turn (Hua et al., 2000; Rose et al., 2000a, 2000b; Narayana et al., 2001). The three ␣ helices of the HNF-1␣ homeodomain are superimposable on the three helices of other homeodomains, although there is an extended loop between the second and third helices of HNF-1␣ relative to the canonical homeodomain fold (Ceska et al., 1993; Schott et al., 1997). Structures of HNF-1␣ bound to promoter DNA have not been reported. Our attention focused on regions of HNF-1␣ where diabetes-associated missense mutations cluster, because single amino acid substitutions are informative measures of protein function. Mutation rates are highest within the homeodomain (203–276) and an extended region amino-terminal to it (91–185), intermediate within the dimerization domain, and lowest within the intervening sequences and the carboxy-terminal trans-activation domain (Figure 1A). The region bounded by residues 91 and 185 lacks extended sequence homology with other transcription factors. Nevertheless this region is thought to bind DNA and provide specificity to the interaction (Tomei et al., 1992). To determine potential modes of DNA recognition and gain a more complete picture of HNF-1␣ function—and dysfunction due to diabetesassociated mutations—we crystallized the mutational hotspot in complex with a high-affinity promoter and solved its structure. Results and Discussion Domain Organization of HNF-1␣ Recombinant human HNF-1␣ 83–279 and synthetic dsDNA were prepared and purified by conventional methods. In the absence of DNA, the protein is monomeric in solution. Gel mobility shift assays showed that binding to DNA was maximal at ⬎2-fold molar excess of HNF-1␣ protein (data not shown), indicating that com-
Molecular Cell 1130
Figure 1. Mutation Rate and Structure-Based Sequence Alignment (A) Mutations per codon (vertical black bars) are plotted versus linear sequence. Mutation rate (colored boxes) was plotted per element of predicted structure (blue, dimerization domain [Dim]; green, DNA binding domains; red, linkers and trans-activation domain). (B) Schematic representation of HNF-1␣ domain organization. (C) Structure-based sequence alignment of POU-homeodomains: human HNF-1␣ and HNF-1 (this work), human Oct-1 (Klemm et al., 1994), and rat Pit-1 (Jacobson et al., 1997). Individual ␣ helices are labeled (POUS, P␣1–5; POUH, H␣1–3) and shaded green. Key contacts with DNA that are conserved between the HNF-1␣, HNF-1, Oct-1, and Pit-1 are white on a black background. Red stars denote residues of HNF-1␣ that interact at the POUS domain/POUH domain interface. MODY3 mutations are noted in black above the HNF-1␣ sequence; mutation A241T in HNF-1 is shown in red. Three additional sequences on the light gray background, from human (accession number AAH09259), mouse (AAH02212), and C. elegans (T33839), are predicted to contain POU domains based on predicted structural similarity to HNF-1␣ and conservation of relevant key residues (see Figure 5).
plexes contained protein and DNA in a stoichiometric ratio of 2:1. Crystals were generated and the structure of the complex was solved by multiple isomorphous replacement methods. Data and refinement statistics are provided in Table 1. Two identical protein molecules are bound to the DNA (Figure 2A). Each protein contains two distinct helical domains. Residues 203–279 form a homeodomain (POUH) which is nearly identical to previously solved structures of the isolated HNF-1␣ homeodomain (Ceska et al., 1993; Schott et al., 1997). Three ␣ helices form the compact domain. The 21 residue “insertion” between helices H␣2 and H␣3 (residues 238–258), relative to the canonical homeodomain fold, extends H␣2 by 8 residues and increases by 13 residues the length of the loop between H␣2 and H␣3 (Figures 1 and 2B). Unlike the Antennapedia and Engrailed homeodomains, whose N termini are ordered upon DNA binding (Kissinger et al., 1990; Otting et al., 1990), there are no major differences between previous structures of the isolated, unbound
domain and ours that is DNA bound. With the exception of the insertion, the homeodomains in our structure readily superimpose with homeodomains of other proteins (Figure 2B); rmsd values for superposition of backbone atoms of the HNF-1␣ homeodomain with those of Pit-1 and Oct-1 of the POU family are 1.17 and 1.28 A˚, respectively (Klemm et al., 1994; Jacobson et al., 1997). The second well-defined protein domain in our structure contains five ␣ helices encompassing residues 91– 181. Like the homeodomain, its intimate contact with DNA infers functionally relevant recognition. HNF-1␣ backbone atoms from helices P␣2–P␣5 superimpose on the four helices of Pit-1 and Oct-1 POU-specific domains with rmsd values of 1.60 and 1.62 A˚, respectively (Figure 2B). Therefore, despite a lack of sequence homology with other POU-specific (POUS) domains, homologous protein folds and modes of DNA recognition (see below) clearly establish HNF-1␣ as a new member of the POU domain family of transcription factors. Although P␣1 is absent in “classical” POUS domains, it is an integral part
Structure of the Atypical POU Domain in HNF-1␣ 1131
Table 1. Data and Refinement Statistics Diffraction Data Crystal/Derivative
Native
K2PtCl4
(EtHgO)HPO2
SeMet
Iodo1a
Iodo2a
Iodo3a
Resolution (A˚) Completeness (%)b Redundancy Rsym (%)c
30–2.6 93.4 3.23 5.2
30–2.8 82.0 3.41 9.1
30–3.0 86.2 3.73 5.6
30–2.8 83.6 4.06 7.3
30–3.2 83.7 3.56 6.2
30–3.0 83.0 3.14 5.4
30–3.0 79.7 2.73 7.1
– – – –
20.6 2 0.82 0.89 0.639/0.448
40.9 2 0.78 0.93
14.5 4 1.09 0.83
20.8 1 1.60 0.74
21.1 2 1.66 0.70
23.0 3 1.32 0.77
Isomorphous Replacement Riso (%)d Number of sites Phasing Powere Rcullis Mean FOM to 2.8 A˚ (centric/acentric) Final Model Statistics Resolution Reflections R factor (%) Rfree (%)f Non-hydrogen atoms per asymmetric unit Protein (two molecules) DNA Water ⬍B⬎ Rms deviation: bond length/bond angles
20–2.6 A˚ 12,593 25.4 31.2 2832 855 139 58.9 A˚2 0.007 A˚/1.20⬚
P41 crystal form, unit cell: a ⫽ b ⫽ 50.39 A˚ and c ⫽ 207.27 A˚ a For Iodo1, Iodo2, and Iodo3 derivatives, thymine was replaced with iodouracil at positions 24, 2 and 24, and 2, 22, and 24, respectively. b Due to anisotropy in the diffraction pattern, completeness drops towards high resolution. c Rsym ⫽ ⌺|Iobs ⫺ ⬍I⬎|/⌺⬍I⬎, calculated for all data. d Riso ⫽ ⌺|Fph ⫺ Fp|/⌺Fph, where Fph and Fp are the derivative and native structure facture amplitudes, respectively. e Phasing power (isomorphous) ⫽ ⬍|Fh|⬎/rms(⑀), where ⬍|Fh|⬎ is the mean calculated amplitude for the heavy-atom model and ⑀ is the lack of closure error. f 5% of the reflection data excluded from refinement.
Figure 2. Structure of the HNF-1␣/DNA Complex (A) Two protein molecules (1 and 2) are bound to dsDNA. The five ␣ helices of each POUS domain and three ␣ helices in each POUH domain are labeled P␣1–5 and H␣1–3, respectively, as in Figure 1. (B) Comparison of the HNF-1␣, Oct-1, and Pit-1 domains. Domains from HNF-1␣, colored mauve, are positioned as they occur in the structure. POUH and POUS domains from Oct-1 in blue and Pit-1 in turquoise were independently superimposed. Arrows point to “atypical” extensions from the HNF-1␣ domains, not present in Oct-1 and Pit-1, that interact with each other.
Molecular Cell 1132
of the HNF-1␣ domain that is critical for protein stability. Residues from P␣1 (V103, V104, L107, and L108), P␣2 (V115 and M118), and P␣5 (T156, A160, Y163, T164, and V167) form a hydrophobic patch that anchors P␣1 to the body of the POUS domain. The linker between POUS and POUH domains (residues 182–200) is disordered. Nevertheless, connectivity between domains was established based on crystal packing and distance considerations (Figure 2A). The presence of neighboring molecules in the crystal lattice precludes the alternative configuration.
Figure 3. The Interface between POUS and POUH Domains (A) Interfaces viewed as “open books.” POUS and POUH domains were rotated 90⬚ along a vertical axis, in opposite directions, to expose the buried surfaces between them. Contact residues are numbered and colored blue, or red if associated with MODY3. HNF1␣ is at the top; Pit-1 is in complex with the growth hormone (GH) or prolactin (Prl) promoters (Scully et al., 2000). (B) Transcriptional activation by HNF-1␣ and variants with substitutions at the POUS-POUH domain interface. Standard luciferasebased transcriptional reporter assays were conducted using HeLa cells transfected with 0.1 or 0.5 g of wild-type (wt) or mutated
A Stable Interface between POUS and POUH Domains A significant contact between the POUS and POUH domains in the monomer buries 297 A˚2 at each protein surface (Figure 3A). The unique HNF-1␣ homeodomain insertion interacts at this interface with POUS domain residues from the C-terminal extension of P␣5, the C terminus of P␣2, and P␣3 (Figures 1 and 2). Specific interactions include side chain-side chain (N127:K205, E132:N202), side chain-backbone (Q125:G245, N127:K205, Q176:G253, Q243:Q125, R244:H126, N257:N127) and backbone-backbone (Q175:G253) hydrogen bonds. Side chains from residues V173, F177, V246, P248, and A251 cluster to form a hydrophobic patch between the domains. Elements not present in classical POUH and POUS domains thus contribute to the interface that appears to fix orientations of the HNF-1␣ domains. POUH and POUS domains of each protomer bind the same face of DNA. The second protein molecule binds in the same fashion to the opposite DNA face and in the reverse orientation. Intermolecular interactions between two HNF-1␣ protomers are limited. Superficially, this mode of DNA recognition resembles that of Pit-1, which also binds palindromic DNA as a dimer (Jacobson et al., 1997; Scully et al., 2000). However, the intramolecular interface between POUS and POUH domains of Pit-1 is much smaller (Figure 3A), with 92 and 67 A˚2 buried for complexes with prolactin and growth hormone promoters, respectively (Scully et al., 2000). In the case of Pit-1, flexibility between POUS and POUH domains provides an allosteric mechanism for differential recognition of discrete promoters, which is thought to facilitate its association with distinct subsets of coactivators and corepressors. The more extended interface between POUS and POUH domains indicates the opposite for HNF-1␣, that rigidity as opposed to flexibility might be necessary for normal function. Several sites were mutated to test whether perturbation of the interface alters function. Expression of wild-type HNF-1␣ enhanced transcriptional activity 25- to 50-fold over basal in a luciferasebased assay, depending on the amount of pcDNA(HNF1␣) used in the transfections (Figure 3B). As a positive
pcDNA(HNF-1␣) and the luciferase reporter construct. Data from multiple experiments (⫾ SEM) were plotted as relative luciferase activity according to position of the substituted residue in the POUS domain, flexible interdomain (ID), or POUH domain. MODY3 mutations are indicated. The Western blot (top) compares expression levels of wt and mutated forms of HNF-1␣.
Structure of the Atypical POU Domain in HNF-1␣ 1133
Figure 4. DNA Interactions and MODY3 Mutations (A) Schematic summary of protein-DNA contacts. Residues from molecules 1 and 2 are in pink and green, respectively. POUH and POUS domain residues are in rectangles and ovals, respectively; filled shapes are MODY3 mutations. DNA bases within the consensus recognition motif are colored blue. (B) MODY3 mutations. Ribbon representation of HNF-1␣ in pink and DNA in orange. Side chains of residues affected by diabetes-associated missense mutations are displayed in green (predominantly affecting DNA binding), yellow (interdomain interactions), or blue (protein stability).
control for loss-of-function, S142F, a MODY3 mutation within the DNA binding surface of the POUS domain, reduced activity by roughly 80%. As negative controls, Table 2. Functional Characterization of MODY3 Mutations Disrupt DNA Interactions A116Va P129Ta,b R131Q(4); R131W(2) E132Ka,b S142F H143Y(2) Q146K H147Rb K158N R159Q(3); R159W(2)a,b R203C; R203Ha K205Q Y218Ca,b R229Q(2); R229Pa,b T260Mb R263C(2) R271W(2) R272C; R272H(4) Others A98V K117E G191D
Prevent Nuclear Localization R200Q; R200W R203C; R203Ha Perturb Protein Stability L107R A116Va Y122C(2) I128N P129Ta V133M R159Q(3); R159W(2)a A161T Y218Ca R229Q(2); R229Pa E240Qa C241R(3); C241Ga V259D Perturb POUS-POUH Interactions E132Ka K205Q E240Qa,b C241R(3); C241Ga,b
Substituted residues are categorized according to predicted effect. a Residues predicted to affect two functions. b Residues predicted to affect function indirectly through perturbations in local environment. Numbers in parentheses refer to frequency (unrelated families with the same mutation).
substitutions at sites predicted to be unimportant for transcriptional activity, I186Q and T190Q within the flexible interdomain and F177S at the periphery of the POUSPOUH domain interface, were found to have no deleterious effect. Additional substitutions tested whether disruption of the interface influenced transcriptional activity. The hydrogen bond between side chains of N127 and K205, in the POUS and POUH domains, respectively, was targeted first. N127W abolished transcriptional activity (Figure 3B), likely due to a combination of loss of the hydrogen bond and steric effects. K205Q, a MODY3 mutation, reduced transcriptional activity only to about one-half of that seen with the wild-type protein, perhaps because glutamine retains the capacity to form hydrogen bonds. The hydrogen bond between E132 and N202, also in the POUS and POUH domains, respectively, was similarly perturbed. E132K, another MODY3 mutation, abolished transcriptional activity, and N202D reduced it by 70%. The side chain of N257 in the POUH domain hydrogen bonds the backbone of the POUS domain; N257W reduced transcriptional activity by 70%. As a final test, we substituted a residue within the small hydrophobic cluster at the interface: V246D reduced transcriptional activity by 75%. In all cases, protein concentrations assessed by Western blotting were essentially identical to wild-type (Figure 3B). Since every substitution at the interface perturbed transcriptional activity, we conclude that rigidity as opposed to flexibility between the POUS and POUH domains of HNF1␣ is necessary for normal function. This is in distinct contrast with the situation for Pit-1 and presumably other POU domain proteins related to Pit-1 where flexibility is critical.
Molecular Cell 1134
Figure 5. Electrostatic Surface Potentials of HNF-1␣ and Additional Putative POU Domains Models of human HNF-1, human AAH09259, and C. elegans T33839 were based on the structure of HNF-1␣ using the program MODELLER. Electrostatic surface potentials were shaded red, ⫺7 kt/e to blue, ⫹7 kt/e, using GRASP. The structure of HNF-1␣ bound to DNA and models of HNF-1, AAH09259, and T33839 are similarly oriented, with putative DNA binding surfaces either facing the viewer (top panels) or rotated away by 180⬚ (bottom panels). The modeled structure of murine AAH02212 (data not shown) is highly similar to that of AAH09259.
DNA Recognition by HNF-1␣ The synthetic double-stranded DNA used in our study corresponds to the consensus motif for optimal binding (Mendel and Crabtree, 1991; Tronche et al., 1997). The hallmarks of DNA-homeodomain and DNA-POUS domain interactions are present in our structure (Figures 1, 2, and 4). Helix H␣3 of the homeodomain is situated in the major groove, oriented perpendicular to the long axis of the DNA. Residues N270 and R272 within the conserved WFXNXR motif of H␣3 form bidentate contacts with adenine and backbone DNA interactions, respectively (Wolberger, 1996). Another hallmark residue, R203 in the N-terminal arm of the homeodomain, forms both base-specific and DNA-backbone hydrogen bonds within the minor groove. Characteristic POUS domain interactions with DNA are also observed. Due to the presence of an additional helix (P␣1), the helix-turn-helix DNA binding motif in HNF-1␣ is formed by P␣3 and P␣4 instead of P␣2 and P␣3 as in other POUS domains. Invariant glutamines Q130 and Q141 (Figures 1 and 4A) create a network of hydrogen bonds with DNA. Bidentate hydrogen bonds between the Q141 sidechain and adenine, a hallmark of POUS-DNA interactions, are accepted from N6 and donated to N7; Q130 forms hydrogen bonds with the backbone phosphodiester oxygen of the same base (Figure 4A). The interaction is further stabilized by the
hydrogen bond between NE2 of Q141 and OE1 of Q130. Additional POUS-DNA interactions are distinct from those made by Oct-1 and Pit-1, including an extensive interaction with S142 and interactions with R131, H143, Q146, and K158, which are mutated in patients with Mendelian diabetes. Rationalizing Diabetes Mutations Nonsense and frameshift mutations found in MODY3 are distributed sporadically throughout the Hnf-1␣ coding sequence. In contrast, missense mutations and the encoded single amino acid substitutions, which are more instructive site-specific measures of protein function, cluster strikingly within the regions encoding residues 1–32 and 98–272 (Figure 1). The former is a dimerization domain and site of DCoH binding (Nicosia et al., 1990; Hua et al., 2000; Rose et al., 2000a). Thirty-four of fortyfive distinct, single amino acid substitutions (76%) are present within the latter region, which corresponds to just 29% of the 648 residue protein sequence (Figure 1). The distribution of diabetes-associated mutations is equivalently high throughout the POUS and POUH sequences (Figures 1 and 4; Table 2). We have subdivided the substitutions according to functional classes predicted to affect DNA binding, POUS-POUH domain interactions, protein stability, or nuclear localization (Table 2). The largest class affects DNA binding, either through
Structure of the Atypical POU Domain in HNF-1␣ 1135
direct interactions or indirectly by perturbing local environment. MODY3 mutations have been found in 9 of the 16 residues that bind DNA directly (Figures 4A and 4B). S142F and Q146K in the POUS domain and R203C/H in the POUH domain disrupt base-specific hydrogen bonds with DNA, whereas cationic side chain/phosphate backbone interactions are disrupted by R131Q/W, H143Y, and K158N substitutions in the POUS domain and R203C/H, R205Q, R263C, and R272C/H substitutions in the POUH domain (Figure 4A). Substitutions predicted to disrupt DNA recognition indirectly through perturbations in local environment are distributed as well throughout the POUS (A116V, P129T, H147R, R159Q/W) and POUH (Y218C, R229Q/P, T260M, R271W) domains. Many of the substituted proteins predicted by our structure to have reduced DNA binding show decreased function in in vitro transcriptional assays (Figure 3B) (Vaxillaire et al., 1999; Yamada et al., 1999, Yang et al., 1999). MODY3 mutations predicted to affect POUS-POUH domain interactions include E132K and K205Q at the domain interface, which as noted above, interfere with hydrogen bonds to N202 and N127, respectively. Substitutions at these sites diminish transcriptional activity (Figure 3B). Although POUH domain residues E240 and C241 do not contact the POUS domain, MODY3 substitutions E240Q, C241R, and C241G may perturb the interface. E240 salt bridges with R244, which also hydrogen bonds the backbone carbonyl of Q125 in the POUS domain. E240Q disrupts the salt bridge and thus perturbs interdomain interactions. C241 forms a hydrogen bond with the backbone carbonyl of A251, in the “atypical” extension of this POUH domain. Since C241R and C241G disrupt this interaction, these MODY3 substitutions are predicted to perturb POUS-POUH interactions and reduce protein stability. A cluster of basic residues at the amino terminus of the homeodomain generally serves as a nuclear localization signal (NLS) (Boulikas, 1994). Residues within the HNF1␣ sequence KKGRRNRFK (197–205) undoubtedly serve this purpose. MODY3 mutations R200Q, R200W, R203C, and R203H within the putative NLS are thus predicted to hinder nuclear translocation. Additional mutations predicted to disrupt protein folding or stability may lead to the accumulation of misfolded protein or premature degradation. Some mutations such as I128N and V259D in the POUS and POUH, respectively, perturb the hydrophobic core, whereas Y122C and Y218C disrupt a hydrogen bonding network critical to POUH domain stability. All four of the major functional classes of naturally occurring mutations seen in homeodomain proteins (D’Elia et al., 2001) are thus represented in HNF1␣ as MODY3 mutations. Additional Potential POU Domain Proteins Genome-wide scans that document ⵑ15 POU-homeodomain transcription factors in humans, and a third as many in flies and worms (Venter et al., 2001; Lander et al., 2001), fail to take HNF-1␣ and HNF-1 into account. We devised the following algorithm to identify additional potential POU domain proteins in available databases. The program SMART was used to identify all proteins with homeodomains. Secondary structure predictions using programs in the NPS@ collection were then used
to identify those homeodomain proteins with appropriately sized helical domains amino-terminal to the predicted homeodomains. ClustalW and Multalin programs were used to align the likeliest sequences. Notably, the programs correctly aligned key DNA contacts of several without investigator intervention (for example see Figure 1, accession numbers AAH09259, AAH02212, and T33839). Three-dimensional structures modeled on HNF-1␣ using MODELLER and GRASP programs, indicate that the sequences appropriately fold with hydrophobic residues at the core and clusters of basic residues poised for DNA recognition on the surface (Figure 5). Within the basic clusters, side chains of the strictly conserved glutamines are positioned at the POUS domain surfaces. The predicted POU domains in these proteins are flanked by serine-rich regions, much like the trans-activation domains found in classical POU domain proteins. These findings strongly indicate that additional, previously unrecognized POU domain proteins await discovery in available databases. Conclusion Our crystal structure reveals the site-specific mechanism for DNA recognition by HNF-1␣ and shows why function is impaired by MODY3 mutations. In addition, our findings extend the family of POU homeodomain proteins and demonstrate why the POUS domains in HNF-1␣ and HNF-1 had not been identified previously: they lack extended sequence homology with other POU domain proteins and each contains an extra helix (P␣1) and extension of the C-terminal helix (P␣5), relative to other POUS domains. Our findings further show why the HNF-1␣ homeodomain contains an atypical insertion—it interacts with a complementary insertion in the POUS domain to stabilize the interface for optimal transcriptional efficiency. Our findings predict that other POU domain proteins might also exist, and we have identified potential examples in humans, mice, and worms. Unraveling the mysteries of early development, cell fate, and the control of terminal differentiation relies on the recognition of relevant molecules. Our findings expand the family of POU domain transcription factors that regulate critical steps of these events—and identify the molecular basis for diseases associated with defects in them. Experimental Procedures Preparation and Purification of Protein and DNA Fragments of human HNF-1␣ cDNA were subcloned by PCR into a pGEX4T-1 vector (Pharmacia). The expressed GST-fusion proteins were isolated using glutathione-agarose in the presence of 0.5 M NaCl to prevent protein aggregation and nonspecific binding to bacterial DNA. HNF-1␣ (residues 83–279) was liberated with thrombin and further purified by ion exchange chromatography (Mono-Q FPLC). Tritylated oligonucleotides were purified by reverse phase HPLC, excess buffer was removed using Pharmacia HiTrapQ, and the trityl groups were removed with 80% acetic acid; the deprotected oligonucleotides were precipitated with 75% ethanol, dissolved in water, and lyophilized. Double-stranded DNA was generated by heating equimolar amounts of complementary oligonucleotides to 95⬚C for 10 min and slowly cooling to 4⬚C. Although many DNA fragments were tested, crystals were reproducibly obtained using the bluntended 21-mer shown (HNF-1␣ recognition motif is underlined). Heavy atom derivatives were prepared by substituting the italicized thymines with iodouracil.
Molecular Cell 1136
1 5⬘ C T T G G T T A A T A A T T C A C C A G A 3⬘ 21 42 3⬘ G A A C C A A T T A T T A A G T G G T C T 5⬘ 22
Crystallization and Data Collection Bipyramidal crystals were grown at room temperature using the hanging drop vapor diffusion method; 2 l drops containing a mixture of protein (20–30 mg/ml) and DNA (0.6 equivalents) and an equal volume of reservoir solution were equilibrated against 500 l of reservoir solution (100 mM imidazole, pH 8.5, and 30%–33% PEG 8K). Crystals having approximate dimensions of 0.2 ⫻ 0.3 ⫻ 0.3 mm grew within 1 week. Heavy atom derivatives were prepared by conventional soaking methods or substitution of protein Met residues or DNA thymines with selenomethionine or iodouracil, respectively. The crystals were transferred stepwise to mother liquor containing 30% (v/v) glycerol, flash-frozen in a liquid nitrogen stream, and data were collected at 100 K at APS (19-ID) or NSLS (X12C and X4A).
Structure Solution and Refinement Data were processed using HKL programs (Otwinowski and Minor, 1997), and the structure was determined by multiple isomorphous replacement (MIR). Heavy atom positions were identified by difference Patterson and difference Fourier methods utilizing programs in CCP4 (CCP4, 1994), and heavy atom refinement and phase calculations were made using SHARP (LaFortelle and Bricogne, 1997). Model building was done with O (Jones et al., 1991), and the backbone geometry was regularly checked against a structural database with the pep-flip option. Refinement was performed by simulated annealing using CNS (Brunger et al., 1998), with restraints placed on bond lengths, bond angles, nonbonded contacts, and temperature factors of neighboring atoms. Pseudo-noncrystallographic symmetry (pseudo-NCS) restraints were imposed only to proteins; bulksolvent correction was applied. The A-weighted 2Fo-Fc maps as well as omit maps were calculated at regular intervals to allow manual rebuilding. Solvent water molecules based on higher than 3 peaks in Fo-Fc A-weighted maps were added conservatively at appropriate sites. Inclusion of individual atomic temperature factors and removal of psuedo-NCS restraints during the final stages of refinement were accompanied by a substantial decrease in Rfree values (Table 1).
Transcription Assays Human Hnf1␣ was subcloned into the pcDNA 3.1/V5-HisA vector (Invitrogen), and three copies of the HNF1␣ binding element from the -fibrinogen promoter (pGL3-(28)3) were subcloned into the firefly luciferase reporter vector pGL3-Basic (Promega). The “megaprimer” method of PCR was used to create specific substitutions (Sarkar and Sommer, 1990); all sequences were verified. HeLa cells in 12-well plates were transfected using FuGENE (3–4 l per g DNA; Roche) and a total of 2.5 g DNA (1.0 g pGL3-(28)3 reporter, 0.1–0.5 g of wild-type or mutated pcDNA3.1HNF-1␣, 0.1 g of pRL-TK expressing control Renilla luciferase, and the remainder pcDNA3.1V5HisA). Cells were lysed 36–48 hr after transfection, and ratios (firefly/Renilla) of relative luciferase activity were determined.
Acknowledgments We thank Andrzej Krolewski and Ron Shigeta for helpful discussions, Aneel Aggarwal (Mount Sinai School of Medicine) for coordinates for Pit-1 complexes prior to deposition, the staff at APS (19-ID) and NSLS (X12C and X4A) for help with data collection, and Marco Pontoglio (Insitut Pasteur) for assistance in setting up the transcriptional reporter assay. Supported by NIH grant R01 DK43123 (S.E.S.), post-doctoral fellowships from the Juvenile Diabetes Foundation (Y.-I.C.) and the Mary K. Iacocca Foundation (Y.-I.C., S.D.), and a Burroughs Wellcome Fund Scholar Award in Experimental Therapeutics (S.E.S.).
Received: November 13, 2001 Revised: September 16, 2002
References Bjorkhaug, L., Ye, H., Horikawa, Y., Sovik, O., Molven, A., and Njolstad, P.R. (2000). MODY associated with two novel hepatocyte nuclear factor-1alpha loss-of-function mutations (P112L and Q466X). Biochem. Biophys. Res. Commun. 279, 792–798. Boulikas, T. (1994). Putative nuclear localization signals (NLS) in protein transcription factors. J. Cell. Biochem. 55, 32–58. Brunger, A.T., Adams, P.D., Clore, G.M., Delano, W.L., Gros, P., Grosse-Kunstleve, R.W., Jiang, J.-S., Kuszewski, J., Nilges, M., Pannu, N.S., et al. (1998). Crystallography and NMR system: a new software suite for macromolecular structure determination. Acta Crystallogr. D 54, 905–921. Ceska, T.A., Lamers, M., Monaci, P., Nicosia, A., Cortese, R., and Suck, D. (1993). The X-ray structure of an atypical homeodomain present in the rat liver transcription factor LFB1/HNF1 and implications for DNA binding. EMBO J. 12, 1805–1810. Chevre, J.C., Hani, E.H., Boutin, P., Vaxillaire, M., Blanche, H., Vionnet, N., Pardini, V.C., Timsit, J., Larger, E., Charpentier, G., et al. (1998). Mutation screening in 18 Caucasian families suggest the existence of other MODY genes. Diabetologia 41, 1017–1023. CCP4 (Collaborative Computational Project 4) (1994). The CCP4 suite: programs for protein crystallography. Acta Crystallography D 50, 760–776. Courtois, G., Morgan, J.G., Campbell, L.A., Fourel, G., and Crabtree, G.R. (1987). Interaction of a liver-specific nuclear factor with the fibrinogen and alpha 1-antitrypsin promoters. Science 238, 688–692. D’Elia, A.V., Tell, G., Paron, I., Pellizzari, L., Lonigro, R., and Damante, G. (2001). Missense mutations of human homeoboxes: A review. Hum. Mutat. 18, 361–374. Duncan, S.A., Navas, M.A., Dufort, D., Rossant, J., and Stoffel, M. (1998). Regulation of a transcription factor network required for differentiation and metabolism. Science 281, 692–695. Ellard, S. (2000). Hepatocyte nuclear factor 1 alpha (HNF-1 ␣) mutations in maturity-onset diabetes of the young. Hum. Mutat. 16, 377–385. Emens, L.A., Landers, D.W., and Moss, L.G. (1992). Hepatocyte nuclear factor 1␣ is expressed in a hamster insulinoma line and transactivates the rat insulin I gene. Proc. Natl. Acad. Sci. USA 89, 7300– 7304. Fajans, S.S., Bell, G.I., and Polonsky, K.S. (2001). Molecular mechanisms and clinical pathophysiology of maturity-onset diabetes of the young. N. Engl. J. Med. 345, 971–980. Frayling, T.M., Bulamn, M.P., Ellard, S., Appleton, M., Dronsfield, M.J., Mackie, A.D., Baird, J.D., Kaisaki, P.J., Yamagata, K., Bell, G.I., et al. (1997). Mutations in the hepatocyte nuclear factor-1alpha gene are a common cause of maturity-onset diabetes of the young in the UK. Diabetes 46, 720–725. Froguel, P., and Velho, G. (2001). Genetic determinants of type 2 diabetes. Recent Prog. Horm. Res. 56, 91–105. Glucksmann, M.A., Lehto, M., Tayber, O., Scotti, S., Berkemeier, L., Pulido, J.C., Wu, Y., Nir, W.J., Fang, L., Markel, P., et al. (1997). Novel mutations and a mutational hotspot in the MODY3 gene. Diabetes 46, 1081–1086. Hansen, T., Eiberg, H., Rouard, M., Vaxillaire, M., Moller, A.M., Rasmussen, S.K., Fridberg, M., Urhammer, S.A., Holst, J.J., Almind, K., et al. (1997). Novel MODY3 mutations in the hepatocyte nuclear factor-1alpha gene: evidence for a hyperexcitability of pancreatic beta-cells to intravenous secretagogues in a glucose-tolerant carrier of a P447L mutation. Diabetes 46, 726–730. Hattersley, A.T. (1998). Maturity-onset diabetes of the young: clinical heterogeneity explained by genetic heterogeneity. Diabet. Med. 15, 15–24. Hegele, R.A., Hanley, A.J., Zinman, B., Harris, S.B., and Anderson, C.M. (1999). Youth-onset type 2 diabetes (Y2DM) associated with HNF1A S319 in aboriginal Canadians. Diabetes Care 22, 2095–2096. Hua, Q.X., Zhao, M., Narayana, N., Nakagawa, S.H., Jia, W., and Weiss, M.A. (2000). Diabetes-associated mutations in a beta-cell transcription factor destabilize an antiparallel “mini-zipper” in a dimerization interface. Proc. Natl. Acad. Sci. USA 97, 1999–2004.
Structure of the Atypical POU Domain in HNF-1␣ 1137
Iwasaki, N., Oda, N., Ogata, M., Hara, M., Hinokio, Y., Oda, Y., Yamagata, K., Kanematsu, S., Ohgawara, H., Omori, Y., et al. (1997). Mutations in the hepatocyte nuclear factor-1alpha/MODY3 gene in Japanese subjects with early- and late-onset NIDDM. Diabetes 46, 1504–1508. Jacobson, E.M., Li, P., Leon-del-Rio, A., Rosenfeld, M.G., and Aggarwal, A.K. (1997). Structure of Pit-1 POU domain bound to DNA as a dimer: unexpected arrangement and flexibility. Genes Dev. 11, 198–212. Jones, T.A., Zou, J.Y., Cowan, S.W., and Kjeldgaard, M. (1991). Improved methods for binding protein models in electron density maps and the location of errors in these models. Acta Crystallogr. A 47, 110–119. Kaisaki, P.J., Menzel, S., Lindner, T., Oda, N., Rjasanowski, I., Sahm, J., Meincke, G., Schulze, J., Schmechel, H., Petzold, C., et al. (1997). Mutations in the hepatocyte nuclear factor-1alpha gene in MODY and early-onset NIDDM: evidence for a mutational hotspot in exon 4. Diabetes 46, 528–535. Kissinger, C.R., Liu, B.S., Martin-Blanco, E., Kornberg, T.B., and Pabo, C.O. (1990). Crystal structure of an engrailed homeodomainDNA complex at 2.8 A˚ resolution: a framework for understanding homeodomain-DNA interactions. Cell 63, 579–590. Klemm, J.D., Rould, M.A., Aurora, R., Herr, W., and Pabo, C.O. (1994). Crystal structure of the Oct-1 POU domain bound to an octamer site: DNA recognition with tethered DNA-binding modules. Cell 77, 21–32.
(2000b). High-resolution structure of the HNF-1alpha dimerization domain. Biochemistry 39, 15062–15070. Sarkar, G., and Sommer, S.S. (1990). The “megaprimer” method of site-directed mutagenesis. Biotechniques 8, 404–407. Schott, O., Billeter, M., Leiting, B., Wider, G., and Wuthrich, K. (1997). The NMR solution structure of the non-classical homeodomain from the rat liver LFB1/HNF1 transcription factor. J. Mol. Biol. 267, 673–683. Scully, K.M., Jacobson, E.M., Jepsen, K., Lunyak, V., Viadiu, H., Carriere, C., Rose, D.W., Hooshmand, F., Aggarwal, A.K., and Rosenfeld, M.G. (2000). Allosteric effects of Pit-1 DNA sites on long-term repression in cell type specification. Science 290, 1127–1131. Shih, D.Q., Screenan, S., Munoz, K.N., Philipson, L., Pontoglio, M., Yaniv, M., Polonsky, K.S., and Stoffel, M. (2001). Loss of HNF-1alpha function in mice leads to abnormal expression of genes involved in pancreatic islet development and metabolism. Diabetes 50, 2472– 2480. Tomei, L., Cortese, R., and De Francesco, R. (1992). A POU-A related region dictates DNA binding specificity of LFB1/HNF1 by orienting the two XL-homeodomains in the dimer. EMBO J. 11, 4119–4129. Tronche, F., Ringeisen, F., Blumenfeld, M., Yaniv, M., and Pontoglio, M. (1997). Analysis of the distribution of binding sites for a tissuespecific transcription factor in the vertebrate genome. J. Mol. Biol. 266, 231–245.
LaFortelle, E., and Bricogne, G. (1997). Maximum-likelihood heavy atom parameter refinement in the MIR and MAD methods. Methods Enzymol. 276, 472–494.
Urhammer, S.A., Fridberg, M., Hansen, T., Rasmussen, S.K., Moller, A.M., Clausen, J.O., and Pedersen, O. (1997). A prevalent amino acid polymorphism at codon 98 in the hepatocyte nuclear factor1␣ gene is associated with reduced serum C-peptide and insulin responses to an oral glucose challenge. Diabetes 46, 912–916.
Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody, M.C., Baldwin, J., Devon, K., Dewar, K., Doyle, M., FitzHugh, W., et al. (2001). Initial sequencing and analysis of the human genome. Nature 409, 860–921.
Vaxillaire, M., Abderrahmani, A., Boutin, P., Bailleul, B., Froguel, P., Yaniv, M., and Pontoglio, M. (1999). Anatomy of a homeoprotein revealed by the analysis of human MODY3 mutations. J. Biol. Chem. 274, 35639–35646.
Lehto, M., Wipemo, C., Ivarsson, S.A., Lindgren, C., LipsanenNyman, M., Weng, J., Wibell, L., Widen, E., Tuomi, T., and Groop, L. (1999). High frequency of mutations in MODY and mitochondrial genes in Scandinavian patients with familial early-onset diabetes. Diabetologia 42, 1131–1137.
Venter, J.C., Adams, M.D., Myers, E.W., Li, P.W., Mural, R.J., Sutton, G.G., Smith, H.O., Yandell, M., Evans, C.A., Holt, R.A., et al. (2001). The sequence of the human genome. Science 291, 1304–1351.
Mendel, D.B., and Crabtree, G.R. (1991). HNF-1, a member of a novel class of dimerizing homeodomain proteins. J. Biol. Chem. 266, 677–680.
Yamada, S., Nishigori, H., Onda, H., Utsugi, T., Yanagawa, T., Maruyama, T., Onigata, K., Nagashima, K., Nagai, R., Morikawa, A., et al. (1997). Identification of mutations in the hepatocyte nuclear factor (HNF)-1 alpha gene in Japanese subjects with IDDM. Diabetes 46, 1643–1647.
Moller, A.M., Dalgaard, L.T., Pociot, F., Nerup, J., Hansen, T., and Pedersen, O. (1998). Mutations in the hepatocyte nuclear factor1alpha gene in Caucasian families originally classified as having Type I diabetes. Diabetologia 41, 1528–1531. Narayana, N., Hua, Q., and Weiss, M.A. (2001). The dimerization domain of HNF-1alpha: structure and plasticity of an intertwined four-helix bundle with application to diabetes mellitus. J. Mol. Biol. 310, 635–658. Ng, M.C., Cockburn, B.N., Lindner, T.H., Yeung, V.T., Chow, C.C., So, W.Y., Li, J.K., Lo, Y.M., Lee, Z.S., Cockram, C.S., et al. (1999). Molecular genetics of diabetes mellitus in Chinese subjects: identification of mutations in glucokinase and hepatocyte nuclear factor1alpha genes in patients with early-onset type 2 diabetes mellitus/ MODY. Diabet. Med. 16, 956–963. Nicosia, A., Monaci, P., Tomei, L., De Francesco, R., Nuzzo, M., Stunnenberg, H., and Cortese, R. (1990). A myosin-like dimerization helix and an extra-large homeodomain are essential elements of the tripartite DNA binding structure of LFB1. Cell 61, 1225–1236. Otting, G., Qian, Y.Q., Billeter, M., Muller, M., Affolter, M., Gehring, W.J., and Wuthrich, K. (1990). Protein-DNA contacts in the structure of a homeodomain-DNA complex determined by nuclear magnetic resonance spectroscopy in solution. EMBO J. 9, 3085–3092. Otwinowski, Z., and Minor, W. (1997). Processing of X-ray diffraction data collected in oscillation mode. Methods Enzymol. 276, 307–326. Rose, R.B., Bayle, J.H., Endrizzi, J.A., Cronk, J.D., Crabtree, G.R., and Alber, T. (2000a). Structural basis of dimerization, coactivator recognition and MODY3 mutations in HNF-1alpha. Nat. Struct. Biol. 7, 744–748. Rose, R.B., Endrizzi, J.A., Cronk, J.D., Holton, J., and Alber, T.
Wolberger, C. (1996). Homeodomain interactions. Curr. Opin. Struct. Biol. 6, 62–68.
Yamada, S., Tomura, H., Nishigori, H., Sho, K., Mabe, H., Iwatani, N., Takumi, T., Kito, Y., Moriya, N., Muroya, K., et al. (1999). Identification of mutations in the hepatocyte nuclear factor-1alpha gene in Japanese subjects with early-onset NIDDM and functional analysis of the mutant proteins. Diabetes 48, 645–648. Yamagata, K., Oda, N., Kaisaki, P.J., Menzel, S., Furuta, H., Vaxillaire, M., Southam, L., Cox, R.D., Lathrop, G.M., Boriraj, V.V., et al. (1996). Mutations in the hepatocyte nuclear factor-1␣ gene in maturityonset diabetes of the young (MODY3). Nature 384, 455–458. Yang, Q., Yamagata, K., Yamamoto, K., Miyagawa, J., Takeda, J., Iwasaki, N., Iwahashi, H., Yoshiuchi, I., Namba, M., Miyazaki, J., et al. (1999). Structure/function studies of hepatocyte nuclear factor1alpha, a diabetes-associated transcription factor. Biochem. Biophys. Res. Commun. 266, 196–202. Yoshiuchi, I., Yamagata, K., Yang, Q., Iwahashi, H., Okita, K., Yamamoto, K., Oue, T., Imagawa, A., Hamaguchi, T., Yamasaki, T., et al. (1999). Three new mutations in the hepatocyte nuclear factor-1alpha gene in Japanese subjects with diabetes mellitus: clinical features and functional characterization. Diabetologia 42, 621–626. Accession Numbers The atomic coordinates have been deposited in the Protein Data Bank under accession code 1IC8.