Structure and function of KH domains - Wiley Online Library

2 downloads 2071 Views 944KB Size Report
acid complex and free KH domain have been deter- mined, ligand binding .... bold), a different 'register' of the nucleic acid–protein complex was observed [28] ...
REVIEW ARTICLE

Structure and function of KH domains Roberto Valverde1, Laura Edwards2 and Lynne Regan1,3 1 Department of Molecular Biophysics & Biochemistry, Yale University, New Haven, CT, USA 2 Department of Molecular and Cellular Developmental Biology, Yale University, New Haven, CT, USA 3 Department of Chemistry, Yale University, New Haven, CT, USA

Keywords fragile X mental retardation; interaction motif; KH domains; K homology domain; noncrystallographic symmetry; protein motif; RNA-binding; RNA-binding protein; RNA-recognition; solvent accessibility Correspondence L. Regan, Yale University, 266 Whitney Avenue, New Haven, CT 06520, USA Fax: +1 203 432 3104 Tel: +1 203 432 9843 E-mail: [email protected] (Received 3 January 2008, revised 18 February 2008, accepted 14 March 2008) doi:10.1111/j.1742-4658.2008.06411.x

The hnRNP K homology (KH) domain was first identified in the protein human heterogeneous nuclear ribonucleoprotein K (hnRNP K) 14 years ago. Since then, KH domains have been identified as nucleic acid recognition motifs in proteins that perform a wide range of cellular functions. KH domains bind RNA or ssDNA, and are found in proteins associated with transcriptional and translational regulation, along with other cellular processes. Several diseases, e.g. fragile X mental retardation syndrome and paraneoplastic disease, are associated with the loss of function of a particular KH domain. Here we discuss the progress made towards understanding both general and specific features of the molecular recognition of nucleic acids by KH domains. The typical binding surface of KH domains is a cleft that is versatile but that can typically accommodate only four unpaired bases. Van der Waals forces and hydrophobic interactions and, to a lesser extent, electrostatic interactions, contribute to the nucleic acid binding affinity. ‘Augmented’ KH domains or multiple copies of KH domains within a protein are two strategies that are used to achieve greater affinity and specificity of nucleic acid binding. Isolated KH domains have been seen to crystallize as monomers, dimers and tetramers, but no published data support the formation of noncovalent higher-order oligomers by KH domains in solution. Much attention has been given in the literature to a conserved hydrophobic residue (typically Ile or Leu) that is present in most KH domains. The interest derives from the observation that an individual with this Ile mutated to Asn, in the KH2 domain of fragile X mental retardation protein, exhibits a particularly severe form of the syndrome. The structural effects of this mutation in the fragile X mental retardation protein KH2 domain have recently been reported. We discuss the use of analogous point mutations at this position in other KH domains to dissect both structure and function.

Introduction The hnRNP K homology (KH) domain was named for the human heterogeneous nuclear ribonucleopro-

tein K (hnRNP K), the first protein in which the motif was identified [1]. The KH motif consists of approximately 70 amino acids, and is found in a diverse variety of proteins in archaea, bacteria and eukaryota

Abbreviations BPS, branchpoint sequence; dFXRP, Drosophila fragile X-related protein; FBP, FUSE-binding protein; FMRP, fragile X mental retardation protein; FUSE, ssDNA far-upstream element; FXRP, fragile X-related protein; hFMRP, human fragile X mental retardation protein; hnRNP K, human heterogeneous nuclear ribonucleoprotein K; KH, hnRNP K homology; KSRP, K homology splicing regulator protein; NCS, noncrystallographic symmetry; PCBP, poly(C)-binding protein; PSI, P-element somatic inhibitor protein; SF1, splicing factor 1; Y2H, yeast-two hybrid.

2712

FEBS Journal 275 (2008) 2712–2726 ª 2008 The Authors Journal compilation ª 2008 FEBS

R. Valverde et al.

Structure and function of KH domains

[1,2]. Typically, KH domains are found in multiple copies, two in fragile X mental retardation protein (FMRP) [3–5], three in hnRNP K [1,6], and 14 in vigilin [7,8]. There are, however, a few examples of proteins with single KH motifs; Mer1p [1,9] and Sam68 [10] each have just one. The typical function of KH domains, whether they are present in single or multiple copies, is RNA or ssDNA recognition. When present in a protein in multiple copies, KH domains can function independently or cooperatively. In ssDNA farupstream element (FUSE)-binding protein (FBP), for example, the KH3 and KH4 domains are separated by a flexible Gly linker with no interdomain contacts [11]. Each KH domains binds to a segment of ssDNA, with a linker of noncontacted ssDNA between [12]. By contrast, the two KH domain of NusA have an extensive interdomain contact area, and bind an extended segment of RNA that runs across both domains [13–15]. KH modules are found in many different proteins, which are involved in a myriad of different biological processes, including splicing, transcriptional regulation, and translational control.

Two folds, one motif It was pointed out by Grishin that there are actually two different versions of the KH motif, which he named type I and type II KH folds (Fig. 1) [2]. The type I fold is typically found in eukaryotic proteins, whereas the type II fold is typically found in prokaryotic proteins. Although type I and type II folds both share a ‘minimal KH motif’ in the linear sequence, the three-dimensional arrangement of the secondary structural elements is different. In the type I fold, a b-sheet composed of three antiparallel b-strands is abutted by three a-helices (a1, a2, and a¢). The b-sheet in type I

KH domains consists of three b-strands in the order b1, b¢ and b2. The b1-strand and b2-strand are parallel to each other, and the b¢-strand is antiparallel to both (Fig. 1). This all-antiparallel arrangement of strands distinguishes the type I KH fold from the type II KH fold, in which the b1-strand and b2-strand are adjacent and parallel to each other, and the b¢-strand is adjacent and antiparallel to the b1-strand (Fig. 1). The length and sequence of the variable loop are different in different KH domains, be they type I or type II (the variable loop is shown as a dotted line in Fig. 1). Variable loop lengths from three to over 60 residues are known. All typical KH domains have a GXXG loop (shown in white in Fig. 1) [2], although this is sometimes altered or interrupted in divergent KH domains [16]. Not only is the order of secondary structural elements in individual eukaryotic type I KH domains different from that in prokaryotic type II KH domains, but the relative orientation of tandem type I versus type II KH domains is also quite different. The comparison is limited, however, because the structure of only one of each type of tandem KH domain has been published. Here we compare the structures of the tandem KH1–KH2 domains from protein NusA (Protein Data Bank entry 2ASB) [14,15] and from human FMRP (hFMRP) (Protein Data Bank entry 2QND) [17] as examples of tandem prokaryotic KH (type II) domains and tandem eukaryotic (type I) KH domains, respectively. In NusA, an unstructured six amino acid linker connects KH1 to KH2, and an area of  1380 A˚2 is buried at the interface between the b-sheet of KH1 and the a-helices (a¢ and a2) of KH2 (Fig. 2B). By contrast, in hFMRP(KH1–KH2D), the a¢-helix of KH1 is linked to the b1-strand of KH2 by the single residue, Glu280, which adopts non-b non-a

A

Fig. 1. Type I and type II KH domain folds. Stylized representations of (A) the type I KH domain (eukaryotic) and (B) the type II KH domain (prokaryotic). The labeling of secondary structure elements is according to standard KH nomenclature [2]. The dotted line connecting the b2-strand and b¢-strand represents the variable loop. The white line connecting the a1-helix and the a2-helix represents the GXXG loop.

B

FEBS Journal 275 (2008) 2712–2726 ª 2008 The Authors Journal compilation ª 2008 FEBS

2713

Structure and function of KH domains

R. Valverde et al.

subsequent evolutionary divergence produced different members of the family.

Nucleic acid binding by KH domains – general features

Fig. 2. The orientation of individual KH domains in tandem type I and type II arrays. Schematics are based on the crystal structures of the KH1–KH2 domains of NusA (type II) (Protein Data Bank entry 1KOR) and fragile X mental retardation protein [type I (B)] (Protein Data Bank entry 2QND). Each domain is represented as an oval with the b-sheet side colored solid black and the abutting a-helices striped.

phi ⁄ psi angles to accomplish this tight connection, which contains minimal interdomain contacts between aliphatic residues from the a1-helix of KH1 and the b-sheet of KH2 [17,18].

Evolutionary relationships between KH domains Type I domains are found in multiple copies in eukaryotic proteins, whereas type II KH domains are typically found as single copies in prokaryotic proteins. Here, therefore, we discuss eukaryotic proteins. Within a family of KH proteins with multiple KH domains (i.e. type I KH domains), the KH1 domain is always more similar to other KH1 domains in different proteins than to the KH2 and KH3 domains in the same protein. Similar relationships are seen for KH2 and KH3 domains – they are more similar to other KH2 and KH3 domains, respectively, than they are to each other or to KH1 domains (Fig. 3). This relationship holds true in all families and between species, from those within which the like-pairs of domains have very high identity [over 95% in the Nova and poly(C)-binding protein families], to those within which like-pairs of domains have much lower identity (around 50% in the FXR family; Fig. 3). From this observation, a number of hypotheses about the origin and evolution of the KH domains may be proposed. If multiple KH domains arose as a result of a gene duplication event, the results cited above suggest that duplication occurred before the divergent evolution of the members of each protein family. Alternatively, one could speculate that the interdomain identities are a result of convergent evolution of different domains in a parent protein, before 2714

The structures of KH domains in complex with their cognate nucleic acid ligand are mostly of type I domains from eukaryotic proteins, which function in transcriptional and translation regulation. The only structures of type II KH domains in complex with nucleic acid ligand are of the bacterial protein NusA [15] (Protein Data Bank entries 2ATW and 2ASB). Although the total number of structures in the Protein Data Bank of KH domains bound to cognate nucleic acid ligand is small, some common features of nucleic acid recognition emerge among them. The RNA or DNA is bound in an extended, singlestranded conformation across one face of the KH domain, between the a1-helix and the a2-helix and GXXG on the ‘left’, and the b2-sheet and the variable loop on the right (Fig. 4A). Together, these secondary structural elements form a binding cleft that accommodates four bases. Note that the secondary structure elements that shape the binding cleft comprise, in part, the core motif found in type I and type II domains. The variable loop in type II KH domains, however, is located at the bottom of the binding cleft (Fig. 4A). The center of the binding pocket tends to be hydrophobic, with a variety of additional specific interactions stabilizing the complex. Nucleic acid base-to-protein aromatic side-chain stacking interactions, which are prevalent in other types of single-stranded nucleic acid binding motifs [19,20], are notably absent in KH domain nucleic acid recognition. In some complexes, the bases in the ssDNA or RNA bound by the KH domain stack with each other (Fig. 4B), whereas in other examples there is no base stacking. An adenine–backbone interaction is a feature seen in some KH domain–nucleic acid structures (Fig. 4C). Examples are (relevant adenine in bold) A42–G43– A44–A45 in NusA KH1, C48–A49–A50–U51 in NusA KH2 [15], U12–C13–A14–C15 in Nova-2 KH3 [21], and U6–A7–A8–C9 in splicing factor 1 (SF1) [22]. The adenine bases hydrogen bond to the protein backbone, mimicking a Watson–Crick base pairing pattern. Superimposing the NusA KH1 domain and ribonucleotides 42–46 on the NusA KH2 domain and ribonucleotides 48–53 reveals that the adenine bases of A44 and A50 make exactly equivalent hydrogen bonds to the protein backbone [15].

FEBS Journal 275 (2008) 2712–2726 ª 2008 The Authors Journal compilation ª 2008 FEBS

R. Valverde et al.

FMRP KH1 FMRP KH2 FXR1 KH1 FXR1 KH2 FXR2 KH1 FXR2 KH2 dmFMR1 KH1 dmFMR1 KH2

NOVA-1 KH1 NOVA-1 KH2 NOVA-1 KH3 NOVA-2 KH1 NOVA-2 KH2 NOVA-2 KH3

PCB1 KH1 PCB1 KH2 PCB1 KH3 PCB2 KH1 PCB2 KH2 PCB2 KH3 PCB3 KH1 PCB3 KH2 PCB3 KH3 PCB4 KH1 PCB4 KH2 PCB4 KH3

PCB1 KH1 PCB1 KH2 PCB1 KH3 PCB2 KH1 PCB2 KH2 PCB2 KH3 PCB3 KH1 PCB3 KH2 PCB3 KH3 PCB4 KH1 PCB4 KH2 PCB4 KH3

Structure and function of KH domains

FMRP KH1 100.0 21.7 82.0 20.3 64.3 21.7 54.4 22.5

NOVA-1 KH1 100.0 35.3 40.3 95.5 32.4 38.8

FMRP KH2 100.0 23.0 55.2 20.0 53.7 22.1 43.1

FXR1 KH1 100.0 17.7 68.6 22.9 58.8 24.0

NOVA-1 KH2 100.0 37.3 36.8 86.3 35.8

FXR1 KH2 100.0 19.0 82.0 23.1 65.3

NOVA-1 KH3 100.0 36.8 34.3 90.9

FXR2 KH1 100.0 22.8 58.8 26.7

NOVA-2 KH1 100.0 34.3 37.3

FXR2 KH2 100.0 25.6 62.5

NOVA-2 KH2 100.0 35.4

dmFMR1 KH1 100.0 22.6

dmFMR1 KH2 100.0

NOVA-2 KH3 100.0

PCB1 KH1 100.0 33.8 35.4 95.2 35.4 33.8 88.7 35.4 36.4 74.2 35.4 33.8

PCB1 KH2 100.0 33.8 33.8 93.8 35.4 32.3 84.6 38.5 35.4 76.9 33.8

PCB1 KH3 100.0 32.3 31.0 92.1 36.9 35.4 84.1 35.4 36.9 66.7

PCB2 KH1 100.0 35.4 30.8 90.3 33.8 33.3 69.4 33.8 30.8

PCB2 KH2 100.0 32.4 33.8 89.2 40.0 33.8 80.0 35.4

PCB2 KH3 100.0 35.4 36.9 84.1 35.4 38.5 71.4

PCB3 KH1 100.0 33.4 36.9 75.8 36.9 35.4

PCB3 KH2 100.0 40.0 35.4 86.2 36.9

PCB3 KH3 100.0 36.9 41.5 68.3

PCB4 KH1 100.0 33.8 33.8

PCB4 KH2 100.0 33.8

PCB4 KH3 100.0

Fig. 3. Table showing sequence identities of KH domains within protein families. Data for the FMRP, Nova and PCBP families are shown. For each family, the sequences of individual KH domains were aligned with KH domains at different positions in the same protein, and KH domains at the same position in different proteins. The highest percentage identities were consistently those between KH domains at the same position in different members of a protein family (highlighted in purple).

KH domains bind ssDNA and RNA with low micromolar affinity. For example, the Kd values of the KH domain of the SF1–DNA complex and the hnRNP K KH3 domain–DNA complex are 3 lm and 1 lm, respectively [22,23]. The clustering of KH domains increases nucleic acid recognition and specificity [24]; the four tandem KH domains of P-element somatic inhibitor protein (PSI), for example, bind ligand cooperatively [25]. The KH1–KH2 domains of NusA (Protein Data Bank entries 2ATW and 2ASB) form an uninterrupted recognition surface that binds RNA with nanomolar affinity [15]. Together, the third and fourth KH domains of the K homology splicing regulator protein (KSRP) bind RNA ligand more tightly than each does separately [26].

Finally, where the structures of both the KH–nucleic acid complex and free KH domain have been determined, ligand binding produces little or no structural change in the protein as determined by our analysis [27] and concluded in [15,21,28].

Nucleic acid recognition by KH domains – specific examples NMR structure of the KH3 domain of hnRNPK with ssDNA bound The type I KH3 domain of the transcriptional regulator hnRNP K binds to a 10mer ssDNA, specifically recognizing the tetrad 5¢-dTCCC (Fig. 5) [23] (Protein

FEBS Journal 275 (2008) 2712–2726 ª 2008 The Authors Journal compilation ª 2008 FEBS

2715

Structure and function of KH domains

A

R. Valverde et al.

B

Fig. 4. Common features of KH domain– nucleic acid interactions. (A) Type I KH domain; the binding cleft comprises the secondary structural elements a1-helix, GXXG loop, a2-helix, b2-strand, and variable loop (colored green), and recognizes four nucleotides (cyan sticks). The green dotted line represents the location of the variable loop in type II KH domains. (B) Nucleic acid bases of the ligand stacking with each other. Coordinates from Protein Data Bank entry 1J5K were used in (A) and (B), and coordinates from Protein Data Bank entry 2ASB were used in (C).

C

Data Bank entry 1JK5). The authors propose that the complex is stabilized by methyl-to-oxygen hydrogen bonds between three Ile side-chains and the O2 and N3 atoms of the two central cytosine bases. Methyl-tooxygen hydrogen bonds are uncommon and weak, but not without precedent [29,30]. Additional interactions that stabilize the complex include protein backbone and side-chain hydrogen bonds to bases, and electrostatic interactions between positively charged sidechains on the protein and the phosphate backbone of the nucleic acid. Poly(C)-binding proteins Poly(C)-binding proteins (PCBPs) contain three type I KH domains, which appear to function independently, because they are separated by long linkers: KH1–(16 amino acid spacer)–KH2–(67 to > 100 amino acid spacer)–KH3. They bind to poly(C)-rich DNA and RNA sequences and function in a diverse range of cellular processes, including mRNA stabilization, translational activation, and translational silencing [31,32]. Crystal structures have been solved of the PCBP2 KH1 in complex with a 12-nucleotide ssDNA and with its RNA equivalent (Protein Data Bank entries 2PQU and 2PQY, respectively) [33]. In both the ssDNA and RNA complexes, the 12 nucleotides correspond to two repeats of the human C-rich strand telomeric DNA, 5¢-AACCCTAACCCT-3¢ (a single repeat is underlined, and the core recognition sequence is in bold). The asymmetric unit of both ssDNA and RNA crystals contains two KH1 molecules tethered by one oligonucleotide ligand. The crystal structures of PCBP2 KH1 in complex with either 12-nucleotide

2716

ssDNA or its equivalent RNA are similar, with no indication that the hydroxyl groups of the RNA bases are involved in interactions with the protein (Fig. 6A). The CCCT ⁄ U tetranucleotide motif constitutes the core recognition sequence. Interestingly, however, when PCBP2 KH1 was crystallized with a seven-nucleotide single repeat ssDNA ligand 5¢-AACCCTA-3¢ (core recognition sequence in bold), a different ‘register’ of the nucleic acid–protein complex was observed [28] (Protein Data Bank entry 2AXY, shown in Fig. 6B). In all structures, the nucleic acid was in the ‘typical’ cleft, but its position relative to the protein was shifted up by one base in the 5¢-direction in the seven-nucleotide structure (ACCC versus CCCT; Fig. 6A–C). The first position of the core recognition motif sits on top of the a1-helix, and then the phosphate backbone of the next two nucleotides interacts with the a1-helix and the GXXG motif on the left, and the b2-strand and the variable loop on the right. Base stacking is observed between the third and fourth position nucleotides of the core recognition sequence. The recently solved high-resolution structure of the third KH domain of PCBP2 bound to ssDNA, 5¢-dAACCCTA-3¢ [34] (Protein Data Bank entry 2P2R) is similar to previous structures of the first KH domain of PCBP2. However, because the crystals diffracted to ultra-high resolution, hydrogen bonding and water molecules mediating protein DNA contacts were observed that previously could not be resolved in other crystal structures. Specifically, the binding cleft is occupied by the tetrad 5¢-CCCT-3¢, with direct water-mediated contacts stabilizing the last two bases, and protein nucleic acid contacts to two additional bases beyond the binding cleft where seen. Also of interest is

FEBS Journal 275 (2008) 2712–2726 ª 2008 The Authors Journal compilation ª 2008 FEBS

R. Valverde et al.

Structure and function of KH domains

is bound in a hydrophobic groove between QUA2, the GXXG loop and the variable loop of the KH domain [22] (Fig. 7; Protein Data Bank entry 1K1G). The QUA2 region recognizes the 5¢-nucleotides of the BPS (ACU), with the a1-helix and a2-helix and the b2-strand of the KH domain region interacting with the next nucleotides of the RNA in ‘typical’ fashion. A large surface area of predominantly aliphatic hydrophobic residues is buried at the protein–RNA interface. In addition, positively charged side-chains undergo electrostatic interactions with the solventexposed phosphate backbone. Protein contacts to the 3¢-end of the RNA are provided by the variable loop and the b2-strand. Binding of the seven-nucleotide RNA BPS requires both the QUA2 and KH regions. Another example of an augmented KH domain is the fourth KH domain of KSRP [26], which contains a novel fourth b-strand located adjacent and angled to the b1-strand and contributes to the stability of the protein (Protein Data Bank entry 2HH2). It is not yet known whether the fourth b-strand is involved in contacts with RNA [26]. X-ray structure of Nova-2 KH3 plus SELEX RNA Fig. 5. Solution structure of the KH3 domain of hnRNP K bound to ssDNA. The third KH domain of hnRNP K (Protein Data Bank entry 1J5K) recognizes a tetrad of sequence 5¢-dTCCC (purple sticks). Regions on the protein that are in contact with the nucleic acid ligand are colored green (hydrophilic) and cyan (polar). The sugar phosphate backbone curves around the a1-helix near the GXXG loop before proceeding parallel to the a2-helix. The first base sits on top of the a1-helix, and the 5¢-dCCC bases of the tetrad fill the interior of the predominantly hydrophobic cleft and base stack with each other (see Fig. 4B). The ends of the ssDNA sugar backbone are stabilized by electrostatic interactions with positively charged residues that line the ridge of the cleft on the GXXG loop and a2-helix.

the observation that in different crystal forms, the KH domains of PCBP2 were either monomeric or were a crystal-contact-mediated dimer (see section on KH dimers). RNA recognition by a single KH domain in cooperation with a QUA2 domain SF1 specifically recognizes the intron branchpoint sequence (BPS) UACUAAC in pre-mRNA transcripts [35], with KH domain binding augmented by additional interactions with an N-terminal helix known as the QUA2 domain (labeled in Fig. 7) [36]. The RNA adopts an extended single-stranded conformation, and

The X-ray structure of the KH3 domain of Nova-2 bound to an in vitro selected stem–loop RNA containing the 5¢-UCAC-3¢ core recognition sequence has been solved [21] (Fig. 8). This structure is something of an ‘outlier’, because the nucleic acid has a doublestranded hairpin stretch (not shown in Fig. 8), which may be a consequence of stability requirements for selection in vitro [37]. The stem of the hairpin adopts the A-form doublehelical conformation, with four Watson–Crick base pairs (G1–C20, A2–U19, G3–C18, G4–C17) and a single hydrogen bond between A5 and C16 (N1– O2 = 2.4 A˚). The extended target RNA (A11, U12, C13, A14, C15) lies upon a hydrophobic platform (formed by the a1-helix and the edge of the b2-strand), where it contacts both the invariant GXXG motif and the variable loop. Nucleic acid binding by tandem but independent KH domains – NMR structure of the KH3 and KH4 domains of FBP in complex with FUSE ssDNA FUSE-binding protein has four KH domains, which are separated by linkers of varying lengths [11]. FBP regulates c-myc expression by binding to FUSE [38]. The NMR structure of a complex between the KH3

FEBS Journal 275 (2008) 2712–2726 ª 2008 The Authors Journal compilation ª 2008 FEBS

2717

Structure and function of KH domains

R. Valverde et al.

Fig. 6. Crystal structures of the first KH domain from PCBP-2 in complex with ssDNA. The first KH domain of PCBP2 recognizes the tetrad sequence 5¢-dCCCT [(A) Protein Data Bank entry 2PQU] and 5¢-dACCC [(B) Protein Data Bank entry 2AXY). Polar and hydrophobic residues that make contacts with nucleic acid (purple sticks) are colored cyan and green, respectively. Waters (gray spheres) that bridge protein and ssDNA contacts were unambiguously resolved in the high-resolution structure in (B). Both structures are representative molecules within the asymmetric unit. In (C), the tetrad sequence (purple letters) of each structure is aligned with respect to the seven-nucleotide single repeat ssDNA ligand. The register of the sequence is shifted in the 5¢-direction in (A). In both structures, the nucleotide at the 5¢-end of the ssDNA strand sits on the top of the a1-helix, and is stabilized by contacts that can recognize an adenine or cytosine nucleotide. The central cytosine bases of the tetrad sequence occupy the hydrophobic interior of the binding cleft. The last nucleotide at the 3¢-end of the ssDNA strand (dC in 2AXY; dT in 2PQU) is participating in base-stacking interactions with the preceding cytosine base.

and KH4 domains of FBP and a 29-base ssDNA fragment from FUSE [12] shows that each KH domain binds to a separate 9- to10-base segment of ssDNA (Fig. 9). The KH domains are connected by a flexible Gly-rich linker, and behave independently. In addition, the two ssDNA segments to which the KH domains bind are themselves separated by a five-base linker of ssDNA. There are no protein contacts between the KH domains, and the linker DNA is not in contact with protein. In both KH domains, the ssDNA is bound in the typical extended orientation, in the groove between the a1-helix and a2-helix plus the GXXG loop on one side, and the b2-strand and the variable loop on the other. The center of the groove is hydrophobic, and the edges are hydrophilic and charged, with the narrow binding site (10 A˚) favoring pyrimidines over purines. NusA – crystal structure of tandem type II KH domains NusA regulates transcriptional elongation, pausing, termination and antitermination in prokaryotes [39– 2718

41]. The protein contains two tandem type II KH domains, which are connected by a short six-residue linker [14,15]. This short linker, combined with a tight turn between the domains, results in a structure in which the two KH domains are in contact and form an extended and continuous surface for RNA binding. NusA binds with high affinity and specificity to BoxB– BoxA–BoxC antitermination sequences within the leader region of the rRNA operon [15]. Ligand binding produces no change in the structure or relative orientation of the KH domains, (Protein Data Bank entries 1KOR and 2ASB) [27]. The ssRNA is bound in an extended conformation and is in contact with large areas on both KH domains (Fig. 10). Despite having type II connectivity, each KH domain of NusA contains a ‘typical’ binding cleft. The variable loop, however, hangs at the bottom of the cleft (Fig. 4A) instead of up and across from the GXXG loop, as in type I KH domains. The 5¢-end of the RNA (bases A42 through A45) is buried in and across the groove between the a1-helix and a2helix and the b2-strand of KH1. Intimate contacts between protein and RNA continue across the cusp of the KH1 and KH2 domains. C46 binds to the

FEBS Journal 275 (2008) 2712–2726 ª 2008 The Authors Journal compilation ª 2008 FEBS

R. Valverde et al.

Fig. 7. Solution structure of the QUA2 and KH domains of SF1 in complex with RNA. The Qua2 and KH domain of SF1, together, recognize RNA BPS 5¢-UACUAAC (blue sticks; Protein Data Bank entry 1K1G). Protein side-chains making polar and hydrophobic contacts with RNA are colored cyan and green, respectively. The QUA2 domain (labeled) abuts the a2-helix of the KH domain, giving rise to an expanded contact with RNA, with the five nucleotides at the 5¢-end of the RNA contacting the QUA2 domain, exclusively. The base of Ura6 is buried between the a1-helix and the QUA2 helix. The RNA then continues in single-stranded, extended conformation into the ‘typical’ KH groove. Finally, the RNA loops over to the right and makes contact with the b2-strand. Note also the very long variable loop, 24 amino acids, which loops back over the RNA from the right.

loop connecting b¢ strand and a¢ helix of KH2, and U47 and C48 make contacts with the a1-helix and the GXXG loop of KH2. Finally, the nucleotides at the 3¢-end of the RNA (A49–A52) pack against the groove comprising the a1-helix and a2-helix and the b2-strand of KH2. Hydrogen bonds to both amino acid side-chains and the protein backbone, electrostatic and polar interactions and, to a lesser extent, hydrophobic interactions between bases and nonaromatic amino acid side-chains stabilize the protein RNA complex. The interaction of the NusA tandem KH domains with RNA is quite different from that seen in the double KH domain of FBP bound to ssDNA from FUSE – the only other structure of a double KH domain bound to a nucleic acid target. In FBP, the two KH domains are connected by a flexible 30-residue Glyrich linker and behave like beads on a string [12]. In the protein DNA complex, each KH domain interacts with a separate ssDNA recognition sequence, and a

Structure and function of KH domains

Fig. 8. Crystal structure of Nova-2 KH3 bound to SELEX RNA. The third KH domain of the protein Nova-2 binds to the tetranucleotide sequence 5¢-UCAC (blue sticks; Protein Data Bank entry 1EC6), which is part of the larger SELEX RNA. Protein side-chains making polar and hydrophobic contacts with RNA are shown in cyan and green, respectively. U12–C13–A14 rests on a hydrophobic platform formed by the a1-helix and the b2-strand. Electrostatic interactions between protein side-chains, nucleic acid bases and the sugar phosphate backbone further stabilize the complex. Bases A14 and C15 participate in base-stacking interactions with each other. The 2¢-hydroxyl groups of the tetrad hydrogen bond with protein or other bases, making it unlikely that ssDNA could bind tightly to this KH domain.

five-nucleotide noninteracting spacer separates the two bound DNA recognition sequences. Although in both examples the coupling of two RNA-binding domains will effectively increase the specificity and affinity of the RNA–protein interaction, the two different binding modes have very different consequences for the type and length of RNA bound.

KH crystal dimers – a tenuous relationship Crystallographic data Different KH domains crystallize as monomers, dimers, or tetramers. This and other observations have

FEBS Journal 275 (2008) 2712–2726 ª 2008 The Authors Journal compilation ª 2008 FEBS

2719

Structure and function of KH domains

R. Valverde et al.

Fig. 9. Solution structure of the FBP KH3–KH4 domain bound to ssDNA. The third and fourth KH domains of FBP recognize ssDNA 5¢-dTTTT (A) and 5¢-ATTC (B), respectively. In both domains, the binding cleft makes hydrophobic contacts with the ssDNA bases, and polar residues lining the edge of the cleft contact the sugar phosphate backbone. The bases of the DNA ligand stack with each other, with the methyl groups of thymine pointing away from the binding cleft. Both domains behave independently. Although both the KH domains and both the DNA-binding sites were present as a single unit, neither the Gly-rich protein linker nor the noncontacted ssDNA were resolved.

2720

led to the proposal that the functional form of certain KH domains may involve noncovalent dimers or higher-order oligomers. Here we review the data. Crystals of the single KH3 domain of the protein Nova-2 contain four KH molecules per asymmetric unit (Protein Data Bank entry 1DTJ) related by pseudo-222 noncrystallographic symmetry (NCS; Fig. 11A) with two different surfaces on each KH domain mediating protein–protein contacts (Fig. 11B,C). One protein– protein interface comprises primarily two b1-strands from two KH domains related by two-fold NCS. This arrangement creates an augmented antiparallel b-sheet stabilized by cross-strand side-chain interactions [42] and a buried surface area of 890 A˚2 [18,43] (reported as 950 A˚2 in [44]) (Fig. 12A). The other interface comprises two a¢-helices with an  500 packing angle [45] of the two KH domains related by NCS that buries 1000 A˚ (reported as 1250 A˚2 in [44]; Fig. 11C). Interestingly, the same KH domain in complex with a SELEX RNA crystallizes with only two KH molecules in the asymmetric unit related by NCS [21]. The two KH molecules interact through related a¢-helices and bury 1000 A˚2 (Fig. 12B). This arrangement is identical to the protein–protein interactions observed in crystals of apo-Nova (Fig. 11C). Crystals of the first KH domain of PCBP2 in complex with ssDNA contain two identical dimer complexes per asymmetric unit related by two-fold NCS [28] (Protein Data Bank entry 2AXY; Fig. 13A). The dimer buries 1890 A˚2, and as in the protein–protein interface depicted in Fig. 12A, an augmented antiparallel b-sheet is formed by symmetry-related b1-strands and further stabilized by interactions between a¢-helices (Fig. 13B). This dimeric arrangement is reproduced in crystals of two PCBP2 KH1 molecules tethered by one ssDNA or RNA ligand [33] (Protein Data Bank entries 2PQU and 2PYQ). In the cocrystal structure of the third KH domain of human PCBP-2 with DNA [34], however, no protein–protein contacts were observed in the crystal. Instead, crystal contacts were solely formed by base-stacking interactions of DNA molecules from adjacent asymmetric units. A1 of the heptanucleotide stacks on C3 of a symmetry-related DNA and vice versa. For neither the apo nor nucleic acid-bound forms of these KH domains are there published solution data in support of the idea that these KH domains may exist as dimers or higher-order oligomers in solution [17,44], and nor have dimers or higher-order oligomers been shown to be of functional significance in vivo.

FEBS Journal 275 (2008) 2712–2726 ª 2008 The Authors Journal compilation ª 2008 FEBS

R. Valverde et al.

Structure and function of KH domains

A

B

C

Fig. 10. Crystal structure of tandem type II KH domains of NusA in complex with RNA. The tandem KH1–KH2 domains of NusA recognize RNA ligand 5¢-GAACUCAAUAG. (A) The KH1–KH2 domains of NusA bound to cognate RNA ligand (Protein Data Bank entry 2ASB). The RNA–protein contact surface spans across both domains. In particular, A45 makes contacts with residues in both KH1 and KH2. Additional polar contacts with 2¢-hydroxyls specify RNA recognition. The KH1 and KH2 domains are shown separately in (B) and (C), respectively. Type II KH domains are connected differently. The variable loop, for example, is located at the bottom and to the left of the binding cleft. Although the connection of type II KH domains is different, the structural elements that comprise the binding cleft are the same in as type I domains, and accommodate four nucleotides as well.

Fig. 11. Protein–protein interfaces in Nova-2 KH3 in crystals. This figure is an adaptation of Figs 6 and 7 from Lewis et al. [44], using Protein Data Bank coordinates 1DTJ. (A) Contents of the asymmetric unit with the two-fold NCS axis labeled. The tetrameric arrangement of molecules produces two protein–protein interfaces. (B) One protein–protein interface generated by two-fold NCS. (C) Other protein–protein interfaces also generated by two-fold NCS.

In crystals of the tandem KH domains from human FMRP, there are also two molecules in the asymmetric unit related by NCS [17] (Protein Data Bank entry 2QND). Contacts between NCS-related b2-strands and, to a lesser extent, a1-helices bury, 2100 A˚2 (Fig. 14A). This b-sheet augmentation is similar to that seen with apo-Nova-2 KH3 and PCBP2 KH1, but its interface comprises primarily b2–b2 and not b1–b1 interactions (compare Figs 12 and 13 with Fig. 14B). When the C2 operation is applied to the asymmetric unit, another interface is formed between neighboring KH domains. This interface is mediated by symmetry-related a¢-helices, as seen in crystals of

RNA-bound Nova-2 KH3 (Figs 11C and 12B), and buries 1200 A˚2 – significantly less than observed in the asymmetric unit. In summary, two interfaces are commonly observed in the crystals: (a) helix–helix packing between symmetry-related a¢-helices with a  500 packing angle, as seen in the Nova-2 KH3–RNA structure; and (b) bsheet augmentation achieved by contacts between b1 or b2 symmetry-related strands, as seen in Nova-2 KH3, hFMRP (KH1–KH2D), and PCBP2 KH1. Caution is advised in extrapolating from crystal structures to predict the solution oligomeric state of KH domains. Although several KH domains form

FEBS Journal 275 (2008) 2712–2726 ª 2008 The Authors Journal compilation ª 2008 FEBS

2721

Structure and function of KH domains

R. Valverde et al.

Biochemical studies

A

B

Fig. 12. Schematic representation of protein–protein surfaces of free and RNA-bound Nova-2 crystals. The schematic in (A) and (B) is based on the protein–protein interactions shown in Fig. 11B,C, respectively. Salient secondary structure elements are labeled. Cross-strand side-chain interactions are shown in open and closed circles.

dimers in the crystal, in several cases different crystallization constructs and conditions give rise to different crystal forms – in which the KH domain is monomeric, or has different crystal packing contacts. In the crystal structure of the tandem KH1–KH2 domains from hFMRP, a crystallographic dimer with the most buried surface area of all previous KH crystal dimers is observed. Solution analytical ultracentrifugation measurements, however, clearly showed that the protein is monomeric in solution [17]. A

Git and Standart [46] investigated the potential for the four KH domains of the protein Vg1RBP to interact with each other noncovalently. The results were somewhat ambiguous, because although they found that – using dimethyl suberimidate as a crosslinking agent – dimers and higher-order oligomers were formed in solution, in the absence of the crosslinking agent, association of the KH domains was only observed in the presence of RNA. Chen et al. [47] investigated the possibility of selfassociation of the protein Sam68, which contains a single KH domain. They showed that Sam68 selfassociated in vivo [a c-myc-tagged Sma68 could coimmunoprecipitate a non-c-myc-tagged Sam68, and Sam68–Sam68 gave a positive signal in a yeast twohybrid (Y2H) assay]. However, the KH domain alone neither self-associated nor bound RNA. Ramos et al. [48] investigated the potential for the KH3 domain of Nova-2 to self-associate in vitro, by performing limited equilibrium ultracentrifugation experiments, from which they estimated that 10–20% dimer may be present, which would correspond to a dissociation constant of about 300 lm. The authors also reported a concentration-dependent increase of the rotational correlation times, but these data were not analyzed quantitatively with respect to either size or dissociation constant. Kim et al. [49] investigated possible homoprotein and heteroprotein associations between hnRNPs. They used the Y2H assay to show that the full-length proteins formed specific homocomplexes and heterocomplexes. Then they used a Y2H assay to map which parts of the large proteins were involved in associa-

B

Fig. 13. Schematic representation of protein–protein interfaces in the structure of PCBP. (A) was generated using Protein Data Bank coordinates 2AXY, and is oriented looking down the NCS axis that generates the dimeric arrangement of molecules. The schematic in (B) shows the crystal contacts stabilizing the protein–protein interaction.

2722

FEBS Journal 275 (2008) 2712–2726 ª 2008 The Authors Journal compilation ª 2008 FEBS

R. Valverde et al.

Fig. 14. Dimeric arrangement of hFMRP (KH1–KH2D) molecules. The crystal of hFMRP (KH1–KH2D) contain two copies in the asymmetric unit related by two-fold NCS (Protein Data Bank coordinates 2QND). The strands of one chain are represented as open arrows, and the symmetry-related strands are shaded. Hydrophobic and polar side-chains are shown in closed and open circles, respectively. This orientation creates an augmented b-sheet composed of six antiparallel strands. This arrangement of KH molecules buries 2100 A˚2 of total buried surface area with cross-strand side-chain interactions.

tions. For hnRNP K, the N-terminal two-thirds of the protein, spanning KH1, KH2, and the junction between KH2 and KH3, was required for interactions with hnRNP E2, I, K, and L. Deletion of the junction sequence including the Pro-rich regions (but not the KH domain) abolished protein–protein interactions, and the region spanning the junction sequence and KH3 domain was not sufficient for the protein–protein interaction. In other words, these results do not map the interacting region to the KH domains. Again, we caution against any model in which isolated KH domains are proposed to form stable dimers in solution. There are no data to support this hypothesis. One cannot, however, exclude the possibility that a full-length protein containing a KH domain(s) may have a dimerization interface that includes KH-mediated contacts. Fragile X mental retardation syndrome – devastating effects of a single point mutation within a KH domain Fragile X mental retardation syndrome is the most common form of inherited mental impairment in

Structure and function of KH domains

humans. For all fragile X individuals, the underlying cause of the syndrome is lack of functional FMRP. In the majority of cases, FMRP is not made because a CGG repeat expansion in the 5¢-UTR of the gene encoding it is hypermethylated, causing both chromosomal fragility and transcriptional silencing [50]. FMRP is a putative RNA-binding protein with two tandem KH domains [3–5]. A particularly pernicious case of fragile X syndrome was identified in a boy who did not have the CGG expansion, and who produced normal levels of FMRP, but who had a single mutation within the KH2 domain of FMRP: Ile304 was mutated to Asn (Ile304 fi Asn) [51,52]. Since the clinical description of the consequences of the Ile304 fi Asn mutation, various efforts have been undertaken to determine the effects of the mutation on FMRP structure and function. However, until recently, all have been inconclusive and even contradictory. For example, the mutation has been proposed to abrogate RNA binding, have no effect on RNA binding, completely unfold the KH domain, have no effect on protein structure, be buried in the hydrophobic core, be solvent-exposed, and be involved in direct interactions with RNA [21,53,54]. The lack of a consensus can be attributed, at least in part, to the extrapolation of data from other KH domains to the KH1–KH2 domains of FMRP. The structure of the tandem KH1–KH2 domains of hFMRP provided the first crystallographic description of the structural environment of the Ile304 residue [17]. It revealed that Ile304 is located in the main hydrophobic core of the KH2 domain, which comprises buried hydrophobic residues from the hydrophobic face of the b1-strand and b2-strand and in part the a1-helix and a2-helix. Ile304 is completely solvent-inaccessible, except for a single atom, Ile304-Cc2, whose solvent accessibility is less than one-third that of an Ile-Cc2 atom in a Gly-Ile-Gly extended chain [17]. Ile304 could only make significant contacts with RNA if substantial structural rearrangements occurred upon binding, which is not typical for KH domains (see fig. 5A in [17]). If a polar Asn residue were substituted for Ile at position 304, one would expect that the integrity of the hydrophobic core would be perturbed. Such a structural perturbation is indeed observed, as evidenced by substantial changes in the structure and a decrease in the stability of the KH1–KH2 domains of hFMRP containing the Ile304 fi Asn mutation. Even within the same KH domain family, however, the Ile304 fi Asn equivalent mutation can have different effects on protein structure, ranging from modest to significant perturbation, despite the predicted similar

FEBS Journal 275 (2008) 2712–2726 ª 2008 The Authors Journal compilation ª 2008 FEBS

2723

Structure and function of KH domains

R. Valverde et al.

local environment of Ile304. For example, higher vertebrae have two autosomal paralogs, fragile X-related protein (FXRP) 1 and FXRP2, which have similar expression patterns and domain organization to FMRP [55]. The main hydrophobic core of the KH2 domain of human FMRP comprises residues, including Ile304, that are identical in FXRP1 and FXRP2 [17]. From these observations, it follows that introducing the equivalent Ile304 fi Asn mutation in FXRP1 KH1–KH2 would result in loss of secondary structure, and indeed it does – as confirmed by CD spectroscopy (data not shown). Amphibians and arthropods have one FMRP-like ortholog, of which Drosophila FXRP (dFXRP) has been most studied [56,57]. dFXRP is 46.9% identical to hFMRP in the KH1–KH2 region, yet the Ile307 fi Asn (equivalent mutation) has relatively modest effects on protein secondary structure [58]. The Ile304 fi Asn mutation has been studied in the context of other tandem KH domain-containing proteins. Drosophila PSI has four tandem KH domains, PSI KH1–KH4, that bind pre-mRNA cooperatively. As with dFXRP, introducing the Ile304 fi Asn equivalent mutation into each KH domain has relatively subtle effects on secondary structure [25]. Leu28 in the KH3 domain of the protein Nova-2 is structurally equivalent to Ile304 in FMRP. A study by Lewis et al. [44] found that the Ile304 fi Asn mutation perturbs the structure of Nova-2 KH3 and would destabilize the hydrophobic core of the KH2 domain of FMRP. This group subsequently reported that introducing an iso-structural Asn in place of Leu28 would alter the electrostatic properties of a hydrophobic platform, stabilizing the RNA ligand on the protein, without changing the hydrophobic interior of the domain [21]. The Ca backbones of the structures of both free and RNA-bound Nova-2 KH3 are essentially the same (compare Protein Data Bank entries 1DTJ and 1EC6) [27], signifying that the protein backbone does not move in the presence of RNA ligand. The main hydrophobic core of Nova-2 KH3 comprises residues that are similar but not identical to the residues in the main core of fragile X KH domains. Analysis of the RNA-bound structure of Nova-2 KH3 reveals that the atoms of Leu28 are buried except for Cb, Cc, and Cd1, whose combined solvent accessibility change upon RNA binding is < 1% of the total surface area buried when RNA binds. Introducing a Leu28 fi Asn mutation would more likely affect the hydrophobic core of the protein. We caution that the effects of Ile304 are different in different contexts. The Ile304 fi Asn equivalent mutation unfolds the first KH domain of FMRP, for exam2724

ple, but has lesser structural effects on the Drosophila proteins FXRP [58] and PSI [25].

Conclusions The nucleic acid-binding activity of KH domains is central to many cellular processes. Nucleic acid recognition by KH domains is unique. Unlike RNA recognition motifs, which recognize a diversity of RNA lengths, the binding cleft of KH domains is versatile but accommodates only four nucleic acid bases. When more specificity is required, beyond that possible with a single KH domain, an augmented recognition surface may be achieved either by multiple tandem KH domains or by including neighboring structural motifs. KH domains are well-tuned motifs that balance functional diversity and specificity, and are thus widely utilized in biology.

References 1 Siomi H, Matunis MJ, Michael WM & Dreyfuss G (1993) The pre-mRNA binding K protein contains a novel evolutionarily conserved motif. Nucleic Acids Res 21, 1193–1198. 2 Grishin NV (2001) KH domain: one motif, two folds. Nucleic Acids Res 29, 638–643. 3 Siomi H, Siomi MC, Nussbaum RL & Dreyfuss G (1993) The protein product of the fragile X gene, FMR1, has characteristics of an RNA-binding protein. Cell 74, 291–298. 4 Ashley CT, Wilkinson KD, Reines D & Warren ST (1993) FMR1 protein: conserved RNP family domains and selective RNA binding. Science 262, 563–566. 5 O’Donnell WT & Warren ST (2002) A decade of molecular studies of fragile X syndrome. Annu Rev Neurosci 25, 315–328. 6 Ostareck LA & Ostareck DH (2004) Control of mRNA translation and stability in haematopoietic cells: the function of hnRNPs K and E1 ⁄ E2. Biol Cell 96, 407– 411. 7 McKnight GL, Reasoner J, Gilbert T, Sundquist KO, Hokland B, McKernan PA, Champagne J, Johnson CJ, Bailey MC, Holly R et al. (1992) Cloning and expression of a cellular high density lipoprotein-binding protein that is up-regulated by cholesterol loading of cells. J Biol Chem 267, 12131–12141. 8 Currie JR & Brown T (1999) KH domain-containing proteins of yeast: absence of a fragile X gene homologue. Am J Med Genet 84, 272–276. 9 Spingola M, Armisen J & Ares MJ (2004) Mer1p is a modular splicing factor whose function depends on the conserved U2 snRNP Snu17p. Nucleic Acids Res 32, 1242–1250.

FEBS Journal 275 (2008) 2712–2726 ª 2008 The Authors Journal compilation ª 2008 FEBS

R. Valverde et al.

10 Lukong KE & Richard S (2003) Sam68, the KH domain-containing superSTAR. Biochim Biophys Acta 1653, 73–86. 11 Duncan R, Bazar L, Michelotti G, Tomonaga T, Krutzsch H, Avigan M & Levens D (1994) A sequence-specific, single-stranded binding protein activates the far upstream element of c-myc and defines a new DNAbinding motif. Genes Dev 8, 465–480. 12 Braddock DT, Louis JM, Baber JL, Levens D & Clore GM (2002) Structure and dynamics of KH domains from FBP bound to single-stranded DNA. Nature 415, 1051–1056. 13 Gibson TJ, Thompson JD & Heringa J (1993) The KH domain occurs in a diverse set of RNA-binding proteins that include the antiterminator NusA and is probably involved in binding nucleic acid. FEBS Lett 21, 361– 366. 14 Gopal B, Haire LF, Gamblin SJ, Dodson EJ, Lane AN, Papavinasasundaram KG, Colston MJ & Dodson G (2001) Crystal structure of the transcription elongation ⁄ anti-termination factor NusA from Mycobacterium tuberculosis at 1.7 A resolution. J Mol Biol 314, 1087– 1095. 15 Beuth B, Pennell S, Arnvig KB, Martin SR & Taylor IA (2005) Structure of a Mycobacterium tuberculosis NusA-RNA complex. EMBO J 24, 3576–3587. 16 Brykailo MA, Corbett AH & Fridovich-Keil JL (2007) Functional overlap between conserved and diverged KH domains in Saccharomyces cervisiae SCP160. Nucleic Acids Res 35, 1108–1118. 17 Valverde R, Pozdnyakova I, Kajander T & Regan L (2007) Fragile X mental retardation: the structure of the KH1–KH2 domains of fragile X mental retardation protein. Structure 15, 1090–1098. 18 Collaborative Computational Project Number 4 (1994) The CCP4 Suite: programs for protein crystallography. Acta Crystallogr D 50, 760–763. 19 Nagai K (1996) Protein–RNA complexes. Curr Opin Struct Biol 6, 53–61. 20 Stelf R, Skrisokska L & Allain FH (2005) RNA sequence- and shape-dependent recognition by protein in the ribonucleoprotein particle. EMBO Rep 6, 33–38. 21 Lewis HA, Musunuru K, Jensen KB, Edo C, Chen H, Darnell RB & Burley SK (2000) Sequence-specific RNA binding by a Nova KH domain: implications for paraneoplastic disease and the fragile X syndrome. Cell 100, 323–332. 22 Liu Z, Luyten I, Bottomley MJ, Messias AC, Hounginou-Molangao S, Spragers R, Zanier K, Kramer A & Sattler M (2001) Structural basis of recognition of the intron branch site RNA by splicing factor 1. Science 294, 1098–1101. 23 Braddock DT, Baber JL, Levens D & Clore GM (2002) Molecular basis of sequence-specific single-stranded DNA recognition by KH domains: solution structure of

Structure and function of KH domains

24

25

26

27

28

29

30

31

32

33

34

35

36

a complex between hnRNP K KH3 single-stranded DNA. EMBO 21, 3476–3485. Lunde BM, Moore C & Varani G (2007) RNA-binding proteins: modular design for efficient function. Nat Rev Mol Cell Biol 8, 479–490. Chmiel NH, Rio DC & Doudna JA (2006) Distinct contributions of KH domains to substrate binding affinity of Drosophila P-element somatic inhibitor protein. RNA 12, 283–291. Garcia-Mayoral MF, Hollingworth D, Masino L, Diaz-Moreno I, Kelly G, Gherzi R, Chou CF, Chen CY & Ramos A (2007) The structure of the C-terminal KH domains of KSRP reveals a noncanonical motif Important for mRNA degredation. Structure 15, 485–498. Jones TA, Zou JY, Cowan SW & Kjeldgaard M (1991) Improved methods for building protein models in electron density maps and the location of errors in these models. Acta Crystallogr A 47, 110–119. Du Z, Lee JK, Tjhen R, Li S, Pan H, Stroud RM & James TL (2005) Crystal structure of the first KH domain of human poly(C)-binding protein-2 in complex with a C-rich strand of human telomeric DNA at 1.7A˚. J Biol Chem 280, 38823–38829. Senes A, Ubarretxena-Belandia I & Engelman DM (2001) The Calpha–H…O hydrogen bond: a determinant of stability and specificity in transmembrane helix interactions. Proc Natl Acad Sci 98, 9056–9061. Vargas R, Garza J, Dixon DA & Hay BP (2000) How Strong Is the Ca-H…O=C hydrogen bond? J Am Chem Soc 122, 4750–4755. Makeyev AV & Liebhaber SA (2002) The poly(C)-binding proteins: a multiplicity of functions and search for mechanisms. RNA 8, 265–278. Gamernik AV & Andino R (2000) Interactions of viral protein 3CD and poly(rC) binding protein with the 5¢ untranslated region of the poliovirus genome. J Virol 74, 2219–2226. Du Z, Lee JK, Fenn S, Tjhen R, Stroud RM & James TL (2007) X-ray crystallographic and NMR studies of protein–protein and protein–nucleic acid interactions involving the KH domains from human poly(C)-binding protein-2. RNA 13, 1043–1051. Fenn S, Du Z, Lee JK, Tjhen R, Stroud RM & James TL (2007) Crystal structure of the third KH domain of human poly(C)-binding protein-2 in complex with a C-rich strand of human telomeric DNA at 1.6A˚ resolution. Nucleic Acids Res 35, 2651–2660. Berglund JA, Chua K, Abovich N, Reed R & Rosbash M (1997) The splicing factor BBP interacts specifically with the pre-mRNA branchpoint sequence UACUAAC. Cell 89, 781–787. Vernet C & Artzt Z (1997) STAR, a gene family involved in signal transduction and activation of RNA. Trends Genet 13, 479–484.

FEBS Journal 275 (2008) 2712–2726 ª 2008 The Authors Journal compilation ª 2008 FEBS

2725

Structure and function of KH domains

R. Valverde et al.

37 Tuerk C & Gold L (1990) Systematic evolution of ligands by exponential enrichment: RNA ligands to bacteriophage T4 DNA polymerase. Science 249, 505–510. 38 Michelotti GA, Michelotti EF, Pullner A, Duncan RC, Eick D & Levens D (1996) Multiple single-stranded cis elements are associated with activated chromatin of human c-myc gene in vivo. Mol Cell Biol 16, 2656– 2669. 39 Gibson TJ, Thompson JD & Heringa J (1993) The KH domain occurs in a diverse set of RNA-binding proteins that include the antiterminator NusA and is probably involved in binding to nucleic acid. FEBS 324, 361–366. 40 Linn T & Greenblatt J (1992) The NusA and NusG proteins of Escherichia coli increase the in vitro readthrough frequency of a transcriptional attenuator preceding the gene for the beta subuit of RNA polymerase. J Biol Chem 267, 1449–1454. 41 Borukhov S, Lee J & Laptenko O (2005) Bacterial transcription elongation factors: new insights into molecular mechanism of action. Mol Microbiol 55, 1315–1324. 42 Merkel JS, Sturtevant JM & Regan L (1999) Sidechain interactions in parallel beta sheets: the energetics of cross-strand pairings. Structure 7, 1333–1341. 43 Lee B & Richards FM (1971) The interpretation of protein structures: estimation of static accessibility. J Mol Biol 55, 379–400. 44 Lewis HA, Chen H, Edo C, Buckanovich RJ, Yang YY, Musunuru K, Zhong R, Darnell RB & Burley SK (1999) Crystal structures of Nova-1 and Nova-2 K-homology RNA-binding domains. Structure 7, 191–203. 45 Chothia C, Levitt M & Richardson D (1891) Helix to helix packing in proteins. J Mol Biol 145, 215–250. 46 Git A & Standart N (2002) The KH domains of Xenopus Vg1RBP mediate RNA binding and self association. RNA 8, 1319–1333. 47 Chen T, Damaj BB, Herrera C, Lasko P & Richard S (1997) Self-association of the single-KH-domain family members Sam68, GRP33, GLD-1, and Qk1: role of the KH domain. Mol Cell Biol 17, 5707–5718. 48 Ramos A, Hollingworth D, Major SA, Adinolfi S, Kelly G, Muskett FW & Pastore A (2002) Role of

2726

49

50

51

52

53

54

55

56

57

58

dimerization in KH ⁄ RNA complexes: the example of Nova KH3. Biochemistry 41, 4193–4203. Kim JH, Hahm B, Kim YK, Choi M & Jang SK (2000) Protein–protein interaction among hnRNPs shuttling between nucleus and cytoplasm. J Mol Biol 298, 395– 405. Jin P & Warren ST (2000) Understanding the molecular basis of fragile X syndrome Hum. Mol Genet 9, 901– 908. De Boulle K, Verkerk AJ, Reyniers E, Vits L, Hendrickx J, Van Roy B, van den Bos F, de Graaff E, Oostra BA & Willems PJ (1993) A point mutation in the FMR-1 gene associated with fragile X mental retardation Nat. Genetics 3, 31–35. Feng Y, Absher D, Eberhart D, Brown V, Malter H & Warren S (1997) FMRP associates with polyribosomes as an mRNP, and the I304N mutation of severe fragile X syndrome abolishes this association. Mol Cell 1, 109– 118. Musco G, Kharrat A, Stier G, Fraternali F, Gibson TJ, Nilges M & Pastore A (1997) The solution structure of the first KH domain of FMR1, the protein responsible for the fragile X syndrome. Nat Struct Biol 4, 712–716. Ramos A, Hollingworth D & Pastore A (2003) The role of a clinically important mutation in the fold and RNA-binding properties of KH motifs. RNA 9, 293– 298. Tamanini F, Willemsen R, van Unen L, Bontekoe C, Galjaard H, Oostra BA & Hoogeveen AT (1997) Differential expression of FMR1, FXR1 and FXR2 proteins in human brain and testis. Hum Mol Genet 6, 1315– 1322. Wan L, Dockendorff TC, Jongens TA & Dreyfuss G (2000) Characterization of dFMR1, a Drosophila melanogaster homolog of the fragile X mental retardation protein. Mol Cell Biol 20, 8536–8547. Zarnescu DC, Shan G, Warren ST & Jin P (2005) Come FLY with us: toward understanding fragile X syndrome. Genes Brain Behav 4, 385–392. Pozdnyakova I & Regan L (2005) New insights into fragile X syndrome. Relating genotype to phenotype at the molecular level. FEBS 272, 872–878.

FEBS Journal 275 (2008) 2712–2726 ª 2008 The Authors Journal compilation ª 2008 FEBS

Suggest Documents