Colworth Medal Lecture - Biochemical Society Transactions

4 downloads 96 Views 1MB Size Report
Colworth Medal Lecture. The emergence of protein complexes: quaternary structure, dynamics and allostery. Tina Perica*, Joseph A. Marsh*, Filipa L. Sousa*, ...
The emergence of protein complexes: quaternary structure, dynamics and allostery Tina Perica*, Joseph A. Marsh*, Filipa L. Sousa*, Eviatar Natan*, Lucy J. Colwell*, Sebastian E. Ahnert† and Sarah A. Teichmann*1 *MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 0QH, U.K., and †Theory of Condensed Matter, Cavendish Laboratory, University of Cambridge, JJ Thomson Avenue, Cambridge CB3 0HE, U.K.

Introduction

Colworth Medal Lecture Delivered at the Biochemical Society Centenary Event held at the Royal Society, London, on 16 December 2011 Sarah Teichmann

Abstract All proteins require physical interactions with other proteins in order to perform their functions. Most of them oligomerize into homomers, and a vast majority of these homomers interact with other proteins, at least part of the time, forming transient or obligate heteromers. In the present paper, we review the structural, biophysical and evolutionary aspects of these protein interactions. We discuss how protein function and stability benefit from oligomerization, as well as evolutionary pathways by which oligomers emerge, mostly from the perspective of homomers. Finally, we emphasize the specificities of heteromeric complexes and their structure and evolution. We also discuss two analytical approaches increasingly being used to study protein structures as well as their interactions. First, we review the use of the biological networks and graph theory for analysis of protein interactions and structure. Secondly, we discuss recent advances in techniques for detecting correlated mutations, with the emphasis on their role in identifying pathways of allosteric communication.

Key words: allostery, dynamics, oligomerization, p53, protein complex, quaternary structure. Abbreviations used: Arel , relative solvent-accessible surface area; DNE, dominant-negative effect; GPCR, G-protein-coupled receptor; MSA, multiple sequence alignment; NAGK, N-acetyll-glutamate kinase; RE, response element; SCA, statistical coupling analysis; TF, transcription factor. 1 To whom correspondence should be addressed (email [email protected]).

Biochem. Soc. Trans. (2012) 40, 475–491; doi:10.1042/BST20120056

Colworth Medal Lecture

Colworth Medal Lecture

Over the last two decades, it has become clear that most biological functions can only be described using a system of interacting biological molecules. This observation has shifted the focus of biological science from classical molecular biology, with its focus on individual biological molecules, towards systems biology. In order to understand and analyse these systems, new computational and experimental methods are emerging. We therefore begin the present review with a short overview of the concepts of biological modularity, networks and graph theory, which are being applied to increasingly more types of molecular data. Further on, we review the structural, biophysical and evolutionary aspects of these interactions. We start by discussing adaptation and oligomerization, highlighting examples from the literature showing how protein function and/or stability benefit from proteins forming oligomers. Then, we use the example of p53 to show that oligomerization can, at the same time, affect a protein in both an advantageous and a detrimental manner. The ubiquitous nature of oligomerization cannot be understood without considering the evolutionary pathways of protein complexes. We therefore discuss how protein interactions evolve faster than protein folds and how easily novel symmetrical interactions emerge. Once a protein complex is sufficiently populated, it is continuously modified and optimized by natural selection. The idea of detecting coevolving residues, which one would expect to find in such evolutionary dynamic systems, has received considerable attention over the years. We identify some recent advances in techniques for detecting such residues, which have significant implications for understanding and predicting protein structure, function and interactions. Interactions between proteins are central to almost every biological process, and the importance of oligomerization for the evolution of allostery was recognized very early on [1]. We review the classical view of protein allostery and its relationship to the different conformational states of protein subunits. Furthermore, we discuss protein allostery in the  C The

Authors Journal compilation

 C 2012

Biochemical Society

475

476

Biochemical Society Transactions (2012) Volume 40, part 3

context of protein dynamics and the energetic coupling of distal sites, and address in detail the specific example of the lac repressor. The protein complexes we survey throughout the majority of the present review are oligomers of identical subunits, i.e. homomers. Although global surveys at the level of protein structure analysis and functional genomics experiments point towards the fact that homomerization is ubiquitous [2–4], the vast majority of these homomers assemble into higher-order heteromeric complexes in a cellular context. The ubiquitous nature of physical interactions between proteins is particularly powerfully conveyed by the network representation of these interactions (Figure 1, level 1). At the same time, our structural understanding of heteromers lags far behind that of homomers. Luckily, many of the principles of protein interaction and evolution we discuss throughout the present review, such as epistasis, co-evolution and indirect allosteric mutations, are shared between homomers and heteromers. In addition, principles of symmetry and allosteric mechanisms apply equally to homomers and heteromers. We therefore conclude with a discussion of heteromeric complexes and the structural and evolutionary diversity afforded by the incorporation of multiple distinct subunit types.

Protein modularity and networks Although Nature does not function as an engineer [5], many of its systems share an important principle with engineered networks: modularity [6]. Modularity of natural systems can be studied from different perspectives, and, in the present review, we examine structural features of physical protein interactions. Protein complexes constitute functional modules, as members of a protein complex engage in stronger interactions within the complex than with external components, and a protein complex can be reconstituted in a functional form independent of the rest of the network [7]. An enzymatic protein complex, e.g. thiocyanate hydrolase (see Figure 6a), concentrates different molecular functions, and a signal transduction system, e.g. the MAPK (mitogenactivated protein kinase)/ERK (extracellular-signal-regulated kinase) signalling pathway, is an extended module, isolated by the specificity of its interactions. In both cases, complex formation enables the proteins to perform an important biological function. There are both functional and structural differences between these two types of protein complexes, and they are often grouped as obligate and transient complexes respectively. However, it may be more appropriate to think in terms of ranges of protein interactions: from obligate-to-transient, strong-to-weak or rigid-to-dynamic interactions [8]. Insulation of functional modules from each other prevents potentially harmful cross-talk, yet high connectivity of protein interaction modules enables one function to influence and thus regulate another, an important feature of biological modules [9]. Nature, being a rather parsimonious tinkerer, also connects its networks through protein reuse, whereby a protein participates in more than one complex. A protein might, for example, perform different functions if it is  C The

C 2012 Biochemical Society Authors Journal compilation 

expressed in different tissues or have a second, moonlighting, role [10]. Even an organism as simple as Mycoplasma pneumoniae has extensive physical interconnections between approximately one-third of its heteromeric complexes [2]. The modularity of biological systems facilitates their representation as networks, enabling the use of graph theory to systematically and quantitatively analyse these highly complex systems. Protein systems networks can be analysed at three levels, two of which involve interactions between proteins, whereas the third involves interactions between individual residues (Figure 1). The highest, most coarse-grained, level describes largescale protein–protein interactions, such as the interaction network of the yeast proteome shown in Figure 1. The information needed to build such networks is obtained through high-throughput experiments, such as yeast twohybrid [11] or TAP (tandem affinity purification)-tag purification combined with MS [12]. The next two levels of interaction require structural information. The middle level of protein networks (for example, the ATP synthase protein complex network in Figure 1) also represents interactions between proteins. In contrast with top-level protein–protein interaction networks, it describes only a subset of interactions, but contains information about stoichiometry, and the interface size and symmetry. Finally, at the most detailed level, atomic contacts between residues in a single protein or protein complex can also be represented as a network of residue contacts (for example, the ATP synthase residue contact network in Figure 1). Residue interaction graphs have been used to systematically represent protein folds [13], identify functional residues [14] or analyse protein dynamics [15]. Later in the present review, we discuss in more detail how residue–residue contact graphs can be used to study conformational changes linked to allostery.

Networks representing interactions between subunits in protein complexes A protein complex can be represented as a weighted contact graph of subunits, in which the edge weights can represent the size of the intermolecular interfaces (Figure 1, level 2). Networks of individual protein complexes contain a small number of nodes and edges compared with networks from the other two levels, but can still be very powerful. For example, the network representation of protein complexes allows simple graph matching between different complexes and hence construction of a hierarchical classification of protein complexes, as in the 3DComplex database [4]. Protein complex networks also represent a framework for more sophisticated analyses that provide deeper insight into the symmetry and modularity of complexes. Specifically, the complexity of protein quaternary structure can be measured quantitatively by determining the minimum amount of information necessary to describe the protein complex in terms of self-assembling units [16]. To do this, the proteins are treated as building blocks with a set of specific pairwise binary

The emergence of protein complexes: quaternary structure, dynamics and allostery

Figure 1 Different network representations of protein interactions (1) Yeast protein–protein interaction network (based on data taken from [12]). (2) F1 -c10 subcomplex of yeast ATP synthase protein complex network (based on PDB code 2XOK). Edge thickness reflects the relative sizes of interfaces between subunits. (3) F1 -c10 subcomplex of yeast ATP synthase residue contact network (based on PDB code 2XOK).

attractive interactions which undergo a stochastic assembly process. The set of building blocks that corresponds to the most concise description can be found by employing an algorithm that labels the building blocks iteratively [16]. This approach is based on the concept of Kolmogorov complexity [17,18], which can be extended to physical structures [16,19]. Formally, the Kolmogorov complexity of a given string of data is the length of the shortest program on a Universal Turing Machine that reproduces it. Just as a string of data can be described using different programs, a physical structure can be described by different sets of selfassembly building blocks and their interactions. Hence the simplest such set of building blocks, i.e. that which requires the shortest description, gives us a quantitative measure of the complexity of the structure. Moreover, this description will highlight any symmetry, and, more generally, any modularity, present in the structure, as the minimization of the assembly

description must take into account any repeated sets of building blocks that are connected in the same way within the structure. Using this approach, we showed that in terms of the 3DComplex quaternary structure topologies [4], most protein complexes are modular, and simple topologies are much more frequent than complex ones [16].

Adaptive aspects of homomeric protein complexes Summarizing data from large-scale protein–protein interaction screens using the network representation discussed above demonstrates the ubiquitous nature of physical interactions between proteins (Figure 1, level 1). Most proteins form homomers [2,4], and a vast majority of those homomers interact with other proteins, at least part of the time [20]. This system of interactions constitutes biological units and  C The

C 2012 Biochemical Society Authors Journal compilation 

477

478

Biochemical Society Transactions (2012) Volume 40, part 3

gives them modular properties, which impose a certain degree of independence and therefore evolvability. This means that the modules of physically interacting proteins are first conserved and then occasionally reshuffled by evolution. We start this section by describing diverse examples where oligomerization benefits protein function and/or stability. We then use the extensive literature on p53 biophysics and function to show how oligomerization can be both advantageous and disadvantageous depending on which properties are considered. This equilibrium between function, stability and evolvability is characteristic of proteins, which brings us to the main topic of the following section on evolutionary dynamics of protein interactions.

Homomers and protein function In some cases, homomeric oligomerization facilitates the formation of an active site from residues contributed by more than one polypeptide chain. There are many examples of enzymes for which this occurs, such as dihydropicolinate synthase [21] and HIV protease [22]. In the case of HIV protease, a face-to-face homodimeric interface presented a simple way of evolving a symmetric active site. New protein interfaces emerge in evolution much more easily than new folds, therefore enabling a protein interaction provides a good way of bringing about new structural combinations of amino acids. Whatever the initial trigger might be, once an active site in the interface has emerged, the protein oligomer is ‘trapped’ by selection pressure, and the interface is conserved. There are also less direct ways in which oligomerization can enable the molecular function of a protein, for example by optimizing the protein dynamics, as association into higher oligomeric states affects the protein collective motions (as reviewed in [23]). Monomeric TIM (triose phosphate isomerase) contains all of the residues forming the active site, but, although there is no co-operativity between the subunits, only the homodimer is enzymatically active. The proposed mechanism involves transmission of the dynamics of the dimeric interface to the loop covering the active site [24]. Oligomerization has also been proposed to benefit enzymes catalysing oxidation processes, such as bacterial periplasmic hydrogenases. These enzymes are continually inactivated under aerobic conditions, as O2 oxidizes the active-site Ni(II), and then reactivated by electrons replenished from the surrounding membrane molecules, such as reduced quinoles [25]. However, a more efficient way of reducing the active-site Ni(II) is to draw electrons directly from another protein, a mechanism that is shown to apply to certain hydrogenases [26]. Independently of whether the activity of an enzyme is directly or indirectly connected to oligomerization, it provides a potential means of regulation, as oligomerization is highly dependent on protein concentration [27]. There are many more examples where oligomerization is directly involved in protein function, including the dimerization of caspase 9 and GPCRs (G-protein-coupled receptors) (reviewed in [27]).  C The

C 2012 Biochemical Society Authors Journal compilation 

Returning to the concept of modularity, protein oligomerization is a parsimonious way of increasing complexity. When explaining allostery, Monod et al. [1] made the point that having a symmetrical oligomer enhances the sensitivity of selection, since every mutation counts twice. The concepts of allostery and co-operativity in the context of protein oligomerization are discussed in further detail later in the present review.

Homomers and protein stability Larger proteins have a lower surface/core ratio and thus Goodsell and Olsen [28] argue that their extensive internal interactions and reduced solvent-exposed surface area make them more stable against denaturation. That is why small proteins often have to resort to disulfide bonds or specific metal-binding sites to achieve stability. Large proteins, however, are difficult to maintain, and the same effect can be achieved by oligomerization [28]. Packing of atoms in the protein interfaces is, in principle, the same as their packing in the protein core [29], and both protein folding and interface formation are governed by the same biophysical principles [30]. Oligomerization is potentially a simple way to increase stability, but an extensive analysis of protein structures from Thermotoga maritima, a thermophilic organism, showed that oligomerization plays a minor role in adaptation to high-temperature conditions [31]. Other mechanisms, including an increase in the numbers of salt bridges and structural compactness, seem to be more significant. There are, however, numerous anecdotal examples where either a loss of interaction decreased the stability of the protein or, conversely, decreased stability was compensated for by oligomerization. Inactivating mutations in the human homotetrameric fructose-bisphosphate aldolase B cause hereditary fructose intolerance. One of the most common point mutations associated with the disease, A149P, turns the protein into a dimer with decreased thermal stability [32] (Figure 2). An extreme case of overlap between protein folding and oligomerization is a mechanism of oligomerization called domain swapping [33]. Domain swapping is, in many aspects, a process similar to aggregation [34,35], but domainswapped oligomers can stay functional, which is confirmed by numerous examples of domain-swapped oligomers [36].

Advantages and disadvantages of oligomerization: a case study of the p53 TF (transcription factor) Oligomerization of TFs Many prokaryotic and eukaryotic TFs act as homo- or hetero-dimers [37]. One benefit of dimerization is higher affinity and specificity of DNA binding. Homodimeric TFs bind DNA on both strands in a clamp-like manner and a dimer binds twice the number of bases compared

The emergence of protein complexes: quaternary structure, dynamics and allostery

Figure 2 Wild-type human aldolase B and A149P aldolase mutant Wild-type aldolase is a homotetramer and an A149P substitution (in cyan) turns the protein into an inactive dimer. In wild-type aldolase, Ala149 is buried in the interior and is not part of the tetrameric interface (purple surface on the yellow half of the tetramer). An alanine-to-proline mutation causes local disorder, which is illustrated by a loss of numerous residue–residue contacts, which indirectly disrupts the interaction.

with a monomer [38]. In addition, TF dimerization enables a negative autoregulation mechanism, which reduces genetic network noise levels. Proteins can either dimerize upon DNA binding (monomeric pathway) or bind DNA as preformed dimers (dimeric pathway), with the latter providing a better noise-reduction mechanism [39]. Although dimerization is common among TFs, higherorder oligomerization also occurs. HSFs (heat-shock factors) act as trimers [40], whereas the master regulator p53 acts as a tetramer. Tetrameric p53 has evolved from a dimer, as can be deduced from its ancestral homologues [41] and its symmetry [3]. Above, we have discussed several aspects of oligomerization, from stability to function. In this section, we explore which of these may explain the adaptive advantages of p53 tetramerization.

p53 DNA recognition Four p53 DNA-binding domains bind a canonical four-site (→← →←) RE (response element) [42]. Full-length p53 first dimerizes in solution via its C-terminal oligomerization domain [43]. Then the two p53 core domains bind a DNA RE half-site, forming an additional symmetrical (dimeric) interface, which is stabilized by both protein–protein and protein–DNA interactions, and finally a translational (tetrameric) interface with the dimer bound to the second half-site [44] (Figure 3). The co-operativity of DNA binding comes from protein–protein interactions between the core domains of the two dimers, as the oligomerization domain mutant still binds DNA co-operatively, although with decreased affinity [45].

It was shown previously that the p53 ‘transcriptional universe’ also includes non-canonical half- (→←) and threequarter (→← →) RE sites [46]. Although two p53 proteins bind to a half-site, it is not a simple one-monomer-toone-quarter-site binding, as some p53 residues interact with the quarter-site covered mostly by the other p53 molecule. Also, similarly to other TFs, protein–protein interactions comprise an essential part of the DNA–protein complex. It is thus not surprising that a p53 dimer does not bind a quartersite RE (E. Natan and A.R. Fersh, unpublished work) and there are no known quarter-site REs (→). This range of REs, which introduces considerable variation and evolutionary flexibility into the system, is attributed to the tetrameric and co-operative nature of p53 DNA binding. Existence of non-canonical sites also implies that tetramerization allows for a certain degree of mutational robustness. If p53 can functionally bind a three-quarter site, it could as well bind a four-site RE with one quarter-site mutated. This claim, however, comes with several open questions. What is the oligomeric state of p53 at the mutant four-site RE? Will it be a tetramer with one p53 molecule loosely bound (Figure 3)? And what would be the efficiency of its transcriptional regulation?

Robustness to mutations and the dominant-negative effect in oligomers Is an oligomer composed of wild-type and mutant subunits sufficiently functional or, in the case of p53, can heteromeric p53 be tolerated in a manner similar to a heterogeneous RE site? A similar general question was raised early on, discussing  C The

C 2012 Biochemical Society Authors Journal compilation 

479

480

Biochemical Society Transactions (2012) Volume 40, part 3

Figure 3 Oligomerization of p53 TF (a) Tetrameric p53 bound to its four-site RE (PDB code 3KZ8). The protein tetramerizes via its C-terminal oligomerization domains and via the DNA-binding domains upon DNA binding. (b) Effects of mutations in the DNA RE on binding of a monomeric or a tetrameric TF. PU.1 is a monomeric TF (PDB code 1PUE) and a member of the Ets TF family [158]. A single mutation in the DNA-binding domain or in the DNA RE can abolish binding. In the case of a tetrameric TF, such as p53, which binds to a four-site RE (PDB code 3KZ8) a mutation in one of the sites (or one of the subunits) will also decrease the binding affinity of the complex. However, the oligomerization of the protein, through both the tetramerization and the DNA-binding domain, will still promote DNA binding. (c) Mutational robustness of a tetrameric p53, compared with a theoretical dimeric p53. Cell homoeostasis is highly dependent on the concentration of active p53 [159], and the co-operative nature of tetrameric DNA binding enables the cell to maintain levels of functional p53 well over 50% of normal activity for many types of mutations.

the relationship between different alleles in a heterozygous cell and the degree of the overall functionality as a function of oligomerization [47]. Most p53 mutations related to cancer can be divided into class I mutants, which affect the residues involved in DNA binding, but not the protein conformation, and class II mutants, which change the native p53 conformation. When a class I mutation in one of the p53 alleles occurs, the expression levels of both the mutant and the wild-type p53 will be, at least initially, equal. Slow dissociation kinetics and high interface affinity [43,48] mean that p53 dimerizes co-translationally [49,50]. One can thus expect only three types of tetrameric species in a heterozygotic cell: wt4 , wt2 mut2 and mut4 , in a 1:2:1 ratio and with mutational robustness depending on the activity of the wt2 mut2 tetramer. In theory, any activity of the wt2 mut2 species higher than half of the wt4 activity means the tetramerization plays a role in buffering mutations (Figure 3c). It was long emphasized how many p53 mutations have a DNE (dominant-negative effect) [51]. However, in most of the earlier experiments, the mutant was overexpressed ([52], and references therein, [53]), and such an excess of the mutant in the cell cannot take place without the loss of the wild-type allele [52]. More recent experiments show that, often, when the two alleles are expressed equally, a wild-type  C The

C 2012 Biochemical Society Authors Journal compilation 

phenotype is observed [54,55]. Furthermore, Demidenko et al. [56] hypothesize that some pairs of different mutants, e.g. those affecting different p53 domains, can complement each other and exhibit a wild-type phenotype when expressed in the same cell. The authors provide an example where one p53 mutant is DNA-binding-deficient (mutDNA) and the other is transactivation deficient, i.e. cannot bind its transcription co-activators (mutTrans). If the mutants form a mutDNA2 mutTrans2 tetramer, the transactivationdeficient dimer can bind DNA and, through protein– protein interactions, facilitate the low-affinity binding of the DNA-binding-deficient dimer (Figure 3B). At the same time, the DNA-binding-deficient mutant can successfully transactivate. In short, the structural assembly, which enables p53 to bind DNA co-operatively, can also allow the complex to overcome deficiencies. Some p53 mutants do, however, exhibit a strong DNE and enable complete inactivation of p53. These mutants are often structural, i.e. their mutations cause protein instability [57–60]. For example, a mutation in the hydrophobic core of p53 would show a strong DNE, as it causes aggregation of the mutant protein, which in turn induces severe instability of the wild-type protein. On the other hand, tumours that carry p53 with DNA-contact mutations, which normally do not affect the native conformation, show higher rates of loss

The emergence of protein complexes: quaternary structure, dynamics and allostery

of heterozygosity [61] probably since their negative effect was not dominant. In summary, p53 oligomerization confers mutational robustness, which increases adaptation and thus fitness [62], but most p53 research is conducted in the context of cancer, and thus cannot be considered in terms of fitness. However, p53 has more fundamental functions, which can be discussed in the adaptive context, e.g. in metabolism [63] or stem cell differentiation [64,65]. Furthermore, when discussing oligomerization and mutational robustness, one needs to also take into account the effects of loss of heterozygosity, i.e. loss of one of the alleles. These events are less common than point mutations, but oligomerization might, in those cases, be detrimental. For a monomer, the levels of a functional protein will fall to approximately 50%, but, for an oligomer, the levels can fall much lower. Owing to p53 autoregulation and its dependency on the cellular concentration threshold for oligomer formation, levels of functional p53 fall to 25% upon loss of heterozygosity [66].

Local concentration: increase in binding frequency and aggregation tendency Close proximity of similar hydrophobic patches can lead to aggregation and amyloids [67]. There is evidence that homologous adjacent domains of multidomain proteins are under selection pressure to reduce misfolding on the basis of their rapid sequence divergence [68]. In the case of multidomain homo-oligomers, interactions via the oligomerization domains will inevitably increase the local concentration of other domains. If these domains have unstable or disordered regions, as is the case with p53, their proximity could impair stability. Thermodynamic stability experiments show that fulllength p53 is less stable than its monomeric constructs, but, at the same time, more stable than a tetrameric construct lacking only the disordered N- and C-termini [48]. Five highly conserved amino acids at the N-terminus are sufficient to restore stability to the levels of the full-length protein. This short fragment interacts with the unstable DNA-binding domain and is not required for DNA binding, but rather seems to act as a guardian, which protects the homomer from accelerated aggregation. In summary, above we have surveyed oligomerization of p53 in terms of its (i) function, (ii) mutational robustness, and (iii) stability. Tetramerization enables the co-operative nature of p53 DNA binding and a certain degree of mutational robustness, which are both advantageous for regulation of transcription and protein interactions. At the same time, tetramerization of a multidomain protein increases the local concentration of protein domains, which do not need to interact. In the case of p53, a short stabilizing region compensates for this disadvantage. Sometimes the scenario is the opposite: oligomerization increases stability, but at the same time impairs function. The interplay between stability and function is characteristic of evolvable proteins. In the

following section, we survey how protein interactions emerge and how they evolve.

Evolutionary pathways and evolutionary dynamics of protein complexes When discussing different adaptive aspects of oligomerization, it would be more correct to refer to them as adaptive benefits rather than reasons for oligomerization. As Andr´e et al. [69] have emphasized, evolution can only optimize a complex that is already significantly populated, meaning that its binding energy first needs to be high enough to overcome the loss in entropy. Indeed, as mentioned earlier, protein interactions emerge in evolution with relative ease in comparison with evolution of new enzymatic functions or new structures [70]. Beltrao and Serrano [71] estimate that, in the Homo sapiens protein interaction network, approximately 1000 interactions change, or rewire, per 1 million years. It was shown recently that protein interfaces require, on average, only a few mutations to evolve from a typical protein surface patch [72]. Interestingly, when searching for these easily evolvable protein interactions, a clear trend is obvious: low-energy complexes are highly enriched in symmetrical complexes composed of structurally similar (or identical) subunits, both in simulated [69,73] and real systems [4]. Homooligomers represent a large fraction of protein complexes, as shown by analysis of known protein complex structures [4], as well as systematic analyses of protein interaction networks [2]. Symmetry clearly plays an important part in the evolution of interactions. For example, there is a clear enrichment of complexes with self-complementary interfaces (complexes with dihedral symmetry) over those with asymmetric face-toback interfaces (complexes with cyclic symmetry) [3]. One of the reasons could be that the self-complementary interfaces in complexes with dihedral symmetry are stronger on average, since each mutation that contributes to an increase in binding energy counts twice [74].

Oligomeric state and indirect mutational pathways There have been some impressive achievements in the de novo engineering of protein interactions [75–77], and their focus has been on engineering an interaction by designing a novel interface, with the mutations introduced directly at the interface positions. However, fructose-bisphosphate aldolase B illustrates how a mutation outside the interface can also indirectly play an important role in the evolution of interactions. A point mutation (A149P) changes the protein from a native tetramer to a non-functional dimer with decreased thermostability (Figure 2). Intriguingly, the mutation is not in the interface. Malay et al. [32] solved the structures of the mutant at two different temperatures (4◦ C and 18◦ C) and showed how mutation from alanine to proline causes local disorder, which propagates to other regions of the structure. This structural perturbation influences the active site of the enzyme, making it less active,  C The

C 2012 Biochemical Society Authors Journal compilation 

481

482

Biochemical Society Transactions (2012) Volume 40, part 3

even at low temperatures. The same mutation also explains the loss of tetrameric state. Disorder is introduced to the 110– 129 loop, which forms the tetrameric interface in the wildtype enzyme [32]. One could thus say that this mutation has an allosteric knock-on effect on the oligomeric state (Figure 2), analogous to allosteric effects of small molecules, which are known to affect protein–protein interactions. In this particular example, the allosteric mutation has a highly deleterious effect, but this does not preclude the opposite case in which mutations indirectly influence oligomerization in a way that is advantageous. Cases where several mutations would be involved raise interesting questions about possible epistatic interactions between residues that may be distant in three-dimensional space.

Evolutionary pathways of oligomerization and residue co-evolution Formation of both homomeric and heteromeric complexes requires cognate binding patches to exist. Mutations on the protein surface may enable new complexes to form, and this change may be consolidated by subsequent mutations at other sites. In addition, mutations that occur in the interior of a protein can change the physicochemical properties of the binding surfaces, such as their conformations, potentially destabilizing the interactions they make, as illustrated by the aldolase B example in Figure 2. In some cases, these interactions can be rescued by compensatory mutations that occur, or pre-exist, at other sites. Such groups of mutations will then be selected for in evolution. This scenario is termed ‘co-evolution’ of residues, and can be important in the evolution of a protein’s ability to engage in different complexes. Although, in some cases, we understand the effects of individual amino acid changes, little is known about how groups of two or more amino acids mutate to control phenotypes such as interaction specificity. This is partly due to the complexity of the phylogenetic models required, and partly due to the complexity of protein structures. For example, even in the best-studied allosteric proteins, as is discussed below, the mechanism by which induced changes propagate through the molecule is not well understood. If groups of amino acids co-evolve to control protein phenotypes, then we would expect to observe correlations between the mutation patterns of such a group of amino acids in MSAs (multiple sequence alignments). While detecting correlations between groups of residues is prohibitive both computationally and in terms of the number of sequences required, many algorithms have been developed to detect co-evolving pairs of residue from MSAs [78–85]. It is hypothesized that highly correlated residues are likely to be in close structural proximity to each other. Indeed, recent statistical work has found that correlated residue pairs among Drosophila species are more likely to be close in sequence [86], and that structurally proximal residue pairs that change between human and rat tend to co-evolve [87]. Applying co-evolution methods to the prediction of intermolecular interactions requires the assembly of MSAs  C The

C 2012 Biochemical Society Authors Journal compilation 

of potentially interacting pairs. Although early results were not encouraging [83,88–90], a more recent correlation analysis was used to predict specificity-determining residues between histidine kinases and response regulators [84]. The set of specificity determining residues identified in this system contains examples of residues that co-evolve with more than one other residue, suggesting the presence of higher-order interactions between groups of amino acids. A similar hypothesis was put forward in work identifying intramolecular correlations in the serine protease family of enzymes; in this study, it was experimentally demonstrated that distinct groups of amino acids were able to modify different protein phenotypes [91]. In a correlation analysis that considers pairs of amino acids, co-evolution of a group of amino acids will result in high correlation scores between all possible pairs in the group. Conversely, if we consider a chain of co-evolving amino acids where, for example, residues i and j co-evolve because they are structurally proximal, as do residues j and k, then, in many cases, we will see a transitive correlation between residues i and k, which may be structurally distal. Given that the correlation scores assigned to each pair of residues is known to depend strongly on their respective conservation in the MSA, these induced, or transitive, correlations may result in a high rate of false-positive predictions [92]. Methods from statistical physics or probability theory can be used to build a global model of the protein sequence that allows such transitive correlations to be removed from the set of observed correlations. The algorithms developed previously are local algorithms: they measure the correlation Cij between each pair of residues i and j, independent of the alignment context (e.g. Cij will not change if column k of the alignment is changed). In contrast, in a global method, the correlation score assigned to each pair of residues depends on the rest of the alignment. An implementation of these ideas that used a Bayesian network method to build a global probability model and remove transitive correlations was able to significantly improve prediction of protein–protein interactions from sequence alignments. Subsequent work has applied these ideas to the prediction of residue–residue interaction in both the inter- and intra-molecular cases. Removing the transitive correlations results in an orders of magnitude reduction in the level of false-positive predictions [93–96]. The implementation of global probability models that successfully identify co-evolving residues raises a number of important questions. Using a global probability model dramatically improved both contact prediction and prediction of protein–protein interaction partners by removing chains of co-varying residues. Previously, it has been suggested that such chains of co-varying residues are themselves key to understanding allosteric mechanisms, and it is not yet clear how to resolve these two points of view. Another important unresolved question involves the concerted action of groups of co-evolving residues. In each of the global models implemented to date, higher-order interactions between groups of amino acids are explicitly

The emergence of protein complexes: quaternary structure, dynamics and allostery

excluded from the model [96]. However, in the mutational analyses described above, it was shown that groups of amino acids control protein phenotypes such as interaction specificity and substrate specificity [84,91]. In particular, the analysis of the interfaces between homomeric or heteromeric complexes suggests that a number of amino acids are likely to be involved in co-ordinating interaction specificity [4,97]. Recent work on HIV identified such a correlated group of amino acids in the Gag protein within which multiple mutations were less likely to be tolerated by the virus [98]. Members of this group were found to lie on the interfaces between molecular subunits of the viral capsid, suggesting that mutations within the group destabilize the structural interactions necessary for viral function. This suggests that further application of correlated mutation analysis to proteins involved in complex formation might yield interesting insights into the evolutionary constraints that interface residues are required to satisfy, and help in the development of mutational strategies for protein complex engineering. Analysis of residue co-evolution has been applied to prediction and engineering of allosteric pathways, as well as protein interfaces [81,99]. In many cases, the allosteric pathways are intimately linked to oligomerization, and, in the next section, we discuss this connection.

Allostery and protein oligomerization Allostery is fundamental to many biochemical pathways, from cell signalling to metabolic regulation [100], and a recently developed Allosteric Database [101] classifies more than 300 proteins as allosteric. Although some of the proteins in this database are monomeric, the majority form homo- or hetero-oligomers. In this section, we discuss why oligomerization is so common among allosteric proteins. Allostery is intimately associated with intrinsic flexibility and intra/inter-molecular communication between different parts of a protein. Structures of many allosteric proteins are a combination of semi-rigid domains connected by flexible hinges. Such assembly of subunits constrains the intrinsic dynamics of a monomer, but, on the other hand, creates novel collective motions by means of intersubunit communication. This communication between different parts of the structure, or in this case, different subunits of an oligomer, enables a high level of allosteric control. An interesting example where oligomerization promotes allostery is the NAGK (N-acetyl-L-glutamate kinase), a member of the amino-acid kinase family [102]. Depending on the organism, NAGKs can be hexameric and allosterically regulated by arginine, or homodimeric and arginineinsensitive. The formation of an extra trimeric interface in the hexamer provides an additional mode of collective motion of the dimeric blocks, in turn promoting allosteric communication [102] (Figure 4a). In summary, a combination of selective dynamics in monomers, as well as their assembly in a complex seems to have been recurrently optimized to achieve modes of motion necessary for allostery. Even

though it may be clear that an oligomeric arrangement can promote additional communications essential for allostery, the atomic mechanisms and pathways by which this occurs remain elusive.

Allosteric mechanisms Two early models explained allostery through conformational changes of homodimeric proteins. The MWC (Monod– Wyman–Changeux) model introduced the concept of equilibrium between different conformations assuming concerted and symmetrical changes from one structure to another [1]. In contrast, the KNF (Koshland–N´emethy–Filmer) model proposed sequential structural changes upon ligand binding, introducing the idea of different conformations between the initial and final state [103]. In recent years, our understanding of allostery has changed and the boundaries between these two seemingly exclusive models have blurred. Proteins are dynamic and exist in an ensemble of conformations, populated according to their Boltzmann distribution [104]. More than one conformation may be able to bind a ligand, which can, in turn, promote subsequent rearrangements and shift a population distribution. The debate, however, remains about the mechanism of allosteric communication, especially regarding the relationship between the emergence of the binding conformation and the binding event itself. There are numerous examples showing that induced-fit or conformational selection, or both, play important roles in allostery, and several explanations have been put forward showing predominance of one mechanism over the other [105–107]. At the macroscopic level, the main difference is whether the ligand binds to a preexisting lowly populated conformation resembling the bound state (conformational selection) or, instead, binding to the unbound state promotes structural changes (induced-fit). The conformational selection model has different implications in terms of the evolutionary constraints on individual residues compared with the induced-fit model, as discussed further below.

Allosteric pathways The idea of an allosteric pathway is related to the allosteric mechanism, and refers to a potential structural pathway that energetically couples two binding sites. The conformational selection mechanism allows for multiple pathways, but the induced-fit model implies a much narrower and more structured route of allosteric communication. Conformational and dynamic transitions within a protein have been successfully probed by mutagenesis and doublecycle mutants [108], and many computational approaches have been used to correlate protein structure to structural or dynamic coupling between residues (see [109] for a review). Most information regarding allosteric pathways comes from high-resolution structures, which can be represented as networks of residue contacts as in Figure 1, level 3. These networks can then be used to analyse structural changes of allosteric proteins in different states. This approach has  C The

C 2012 Biochemical Society Authors Journal compilation 

483

484

Biochemical Society Transactions (2012) Volume 40, part 3

Figure 4 Evolution of allosteric complexes (a) Hexameric Pseudomonas aeruginosa NAGK (PaNAGK) (PDB code 2BUF) and dimeric Escherichia coli NAGK (EcNAGK) (PDB code 2X2W). (b) Network representation of E. coli LacI repressor (EcLacI). The residue contact map was constructed using RING [160] and represented using CMView [161]. The Cα atoms of each residue are represented as black spheres and the edges connecting them represent non-covalent interactions. Residues determined by targeted molecular dynamics (TMD) [129] as involved in the three allosteric pathways are coloured red, green and blue respectively. (c) E. coli LacI repressor (EcLacI) in the anti-inducer (left, PDB code 1EFA) and apo form (right). The apo form is modelled using a structure of a periplasmic binding domain (PDB code 1LBI) (light blue) and a DNA-binding domain (PDB code 1LQC model 19) (dark blue). Black ellipses mark interfaces between monomers.

been used to detect key residues involved in the allosteric communication [110], and global communication networks involving tertiary and quaternary motions within a protein complex [111]. In general, network representation methods show that residue contact structural networks form ‘small world’ networks with high local residue linkage and sparser long-range connectivity determined by specific residue clusters with critical roles in allosteric communication [112– 115] (Figure 4b). However, as proposed by Cooper and Dryden [116] and seen in the case of dimeric bacterial methionine regulator protein [117,118], allostery can occur without significant changes in the backbone conformation of a protein. This can make the identification of networks of residues that mediate the allosteric communication between distal sites even more challenging and, since the conformational change might be very subtle, even more useful. It has been suggested that networks of evolutionarily correlated residue pairs provide information about pathways of allosteric communication both within protein monomers, and between members of protein complexes. For example in the GroEL–GroES chaperonin system, networks of residues in GroES coupled to residues in GroEL were identified via a correlated mutation analysis [90]. These networks involve both short- and long-range intersubunit coupling, and were suggested to reflect pathways of information transfer between distal locations. Similarly, an SCA (statistical coupling analysis) was used to identify chain-like networks of amino acid interactions within single PDZ domains that link active-site residues with distant sites [81], where non C The

C 2012 Biochemical Society Authors Journal compilation 

additive binding energy from double mutant cycles was shown to correlate with SCA pairwise correlation scores. A network of residues linking the ligand-binding pocket of GPCRs to conformational changes that occur at the Gprotein-binding sites was also identified using SCA [99]. Perhaps most pertinent to the question of allostery is the use of SCA to identify a physically connected pathway of packing interactions that links the allosteric haem pairs across the tetramerization interface in haemoglobin [99].

Allostery in the lac repressor The bacterial LacI transcription regulator, i.e. the lac repressor, controls lactose metabolic enzymes and is a classical allosteric protein. It is also an example where flexibility and oligomerization have been evolutionarily optimized to ensure diverse specificities and allosteric control. LacI belongs to the LacI/GaIR family of dimeric or tetrameric allosteric regulators, where each monomer comprises a DNA-binding domain, a linker region and a socalled periplasmic small-ligand-binding domain (Figure 4c). The DNA-bound form of the protein is always dimeric [119], although the tetramer (dimer of dimers) is also thought to be functionally significant in mediating looping between binding sites separated in sequence. Numerous high-resolution structures in different conformational states [120,121] and a plethora of genetic and biochemical data makes LacI one of the most studied proteins. The existence of LacI point mutations with inverse phenotypic response relative to the wild-type [122] and the different ligand

The emergence of protein complexes: quaternary structure, dynamics and allostery

Figure 5 Number of published heteromeric and homomeric crystal structures over time, as well as the fraction of complexes that are heteromeric These values are based upon crystal structures in the PDB on 1 October 2011, excluding nucleic acid-containing complexes and only considering polypeptide chains >50 residues in length.

phenotytic responses of paralogous families highlights the intrinsic evolvability of LacI. Different methods identified the linker region as playing a key role in the functional rearrangement of the two domains and in the allosteric communication between them [120,123,124]. The existence of such a region is now proposed to be a fundamental feature of many allosteric proteins [125]. Further on, rearrangements in the dimeric interface seem to form part of a primary allosteric pathway illustrating the role of oligomerization [111,126–129]. Additionally, two other allosteric pathways were identified by targeted molecular dynamics [129] and are also represented in Figure 4. The architecture of the allosteric pathways of LacI, consisting of domains connected by a hinge and an intersubunit interface, is also seen in other proteins, such as the nucleotide-binding region of Hsp70 (heat-shock protein 70) or human/rat DNA polymerase β [130]. Although it may be premature to generalize, it is beginning to appear that this pathway architecture may be a common principle across many allosteric proteins. Flexibility, order–disorder transitions and interface communication between domains and subunits appear to be recurring themes in the allosteric mechanisms of many proteins. These patterns create specific evolutionary constraints on families with allostery. The LacI family of allosteric transcriptional regulators are homomeric, and allostery is often linked to symmetric complexes. However, these can be heteromeric, as in the case of haemoglobin or ATCase (aspartate transcarbamoylase), for instance. In the next section, we shift our focus from structure, evolution and allostery of homomers to the structure and evolution of heteromers.

Structural and evolutionary diversity of heteromeric complexes As mentioned above, the focus of the present review so far has been mainly on homomeric protein complexes. One of the reasons for this bias is that, to date, the structures of many more homomeric than heteromeric

protein complexes have been determined. In fact, while the number of published structures of protein complexes has been rapidly increasing, the fraction of those from heteromers has declined substantially since the 1980s (Figure 5). Although seemingly in contrast with observations that a large fraction of proteins participates in heteromeric protein– protein interactions in vivo [20], this probably reflects biases in structure-determination methodologies. Whereas early structural studies focused primarily on proteins purified directly from cell extracts, which included many heteromeric complexes, the subsequent adoption of recombinant protein technologies encouraged studies of individual gene products. This trend may have increased in recent years given the huge output of structural genomics projects. Although progress is being made on the high-throughput modelling of heteromeric complexes from monomeric or homomeric structures [131,132], tremendous opportunities remain in structural biology for the characterization of new heteromers. Owing to their multiple subunit types, heteromers can adopt a far greater range of quaternary structures than is possible for homomers (Figure 6a). At the simplest level, the different subunits of a heteromeric complex can be homologous with each other and adopt homomer-like quaternary structure, as is the case for haemoglobin, which resembles a homotetramer [133]. Alternatively, heteromers can have more complicated, but still highly symmetrical, topologies, such as thiocyanate hydrolase which repeats its three different subunits four times each [134]. Some heteromers are highly asymmetrical, such as RNA polymerase II, which contains one copy of each of its ten different heteromeric subunits [135]. Finally, there are heteromers with both symmetrical and asymmetrical regions, such as ATP synthase, in which a cyclic ring of ten γ subunits is connected via an asymmetric stalk region to another subcomplex containing three α and three β subunits [136]. The symmetry mismatch between the different regions of this complex is thought to be crucial for its functioning as a rotary motor. Not only are heteromers more diverse than homomers in their quaternary structures, but also their subunits are  C The

C 2012 Biochemical Society Authors Journal compilation 

485

486

Biochemical Society Transactions (2012) Volume 40, part 3

Figure 6 Abundance and diversity of heteromeric complexes (a) Diversity of heteromer quaternary structure in four complexes: human deoxyhaemoglobin (PDB code 2HHB), a heteromer with homologous subunits formed by gene duplication followed by divergence; Thiobacillus thioparus thiocyanate hydrolase (PDB code 2DD4), a heteromer with a more complex but D2 symmetrical topology; F1 -c10 subcomplex of yeast ATP synthase (PDB code 2XOK), a heteromer with two regions of mismatched cyclic symmetry separated by an asymmetric stalk region; and yeast RNA polymerase II (PDB code 1I50), a heteromer with a highly asymmetric topology. (b) Gene fusion leads to three different forms of the urease complex in different species: Klebsiella aerogenes (PDB code 1KRA), Helicobacter pylori (PDB code 1E9Y) and Canavalia ensiformis (PDB code 3LA4). Note that only one-quarter of the full tetrahahedral H. pylori complex is shown, which corresponds to the C 3 symmetry of the other two. In all protein complex networks, edge thickness reflects the relative sizes of interfaces between subunits.

significantly more flexible [137]. Flexibility is intimately related to complex formation, as more flexible proteins tend to undergo larger conformational changes upon binding [137,138]. The increased flexibility of heteromers may be indicative of fundamentally different interaction mechanisms for heteromers compared with homomers. Homomers are inherently symmetrical, and thus each  C The

C 2012 Biochemical Society Authors Journal compilation 

subunit in a complex must undergo the same conformational change upon binding. However, interactions between heteromeric subunits can involve asymmetric conformational changes (e.g. one subunit keeps the same conformation upon binding, while the other experiences a large change). These asymmetric interactions are more likely to require flexible subunits that can adjust their

The emergence of protein complexes: quaternary structure, dynamics and allostery

conformations to the structures of their binding partners. An extreme case of asymmetric binding and subunit flexibility involves the intrinsically disordered proteins, which have received considerable attention in recent years because of their associations with important biological functions as well as various diseases [139]. These proteins can be partially or fully disordered in isolation, yet are often observed to fold upon complex formation [140]. Increasing intrinsic disorder has also been correlated with the size of protein complexes [141]. Moreover, not all intrinsically disordered proteins become fully folded upon binding. Instead, there exist ‘fuzzy’ complexes, in which subunits can retain significant flexibility or disorder, despite being bound [142]. Recent ensemblemodelling strategies using diverse experimental data have begun to shed light on the fascinating structural properties of these highly dynamic complexes [143–145]. We recently introduced a simple structural measure useful for studying protein flexibility called the relative solventaccessible surface area (Arel ) [137]. We found that Arel could be used to predict both intrinsic flexibility and the magnitude of conformational changes upon binding from the structures of either free proteins or bound subunits. Furthermore, intrinsically disordered subunits could be identified from bound complexes by their very high Arel values. We anticipate considerable utility for Arel , both in studies of protein flexibility and conformational changes and as a general tool for structural characterization. Compared with homomers, heteromers also have more diverse evolutionary mechanisms. Whereas homomer quaternary structure can generally only evolve by changing the number of repeated subunits along defined symmetryrelated pathways [3], heteromers can also gain or lose distinct subunits, allowing evolution to proceed in an orthogonal dimension. This can be through the evolution of new interfaces to bind new subunits, or through gene duplications, as is likely to be the case for haemoglobin (Figure 6a), where the gene encoding a homomeric subunit gradually diverges into two increasingly dissimilar genes encoding heteromeric subunits [146]. An especially interesting evolutionary mechanism relevant to heteromers is gene fusion, in which two separate genes become fused into one. Since gene fusion often occurs between genes encoding interacting proteins [147,148], this provides a way for the number of distinct subunits in a complex to be reduced, while leaving the overall structure largely unchanged. For example, structures of three different forms of the urease complex have been published (Figure 6b). In Klebsiella aerogenes, the subunits are encoded by three different genes [149], two of which become fused in the Helicobacter pylori form of the enzyme [150]. Finally, all three genes are fused in the homomeric jack bean (Canavalia ensiformis) urease [151]. Thus fusion represents a mechanism by which a heteromer can evolve into a homomer. The reverse process, gene fission, is also possible as a mechanism for increasing the number of subunits, although it is much less frequent than fusion [152].

Conclusions and perspectives In the present review, we have discussed structure, evolution and allostery of homomers and heteromers. Both types of complexes are based on one or more cognate interfaces. An interesting and rarely addressed question is what distinguishes a native functional interface from a spurious non-functional interaction, of which there must be many in the crowded cell. Meenan et al. [153] recently managed to ‘trap’ a non-functional interface using a clever disulfide cross-link. This allowed them to compare two homologous interfaces with seven orders of magnitude difference in binding affinity. The authors identify subtle and indirect structural differences, which distinguish an interface with femtomolar binding K d from a homologous ‘non-interface’ with a K d in the micromolar range [153]. At the same time, some biologically interesting complexes, e.g. ones formed by ubiquitin, have binding affinities in the high micromolar range [154]. This illustrates how important it is to discuss protein interactions and interfaces in a cellular and biological context, ideally taking into account local protein concentrations. Although individual researchers are aware of this, and the differences between transient (e.g. signalling) and obligate interactions have been known for a long time [8,155,156], the community would greatly benefit from a systematic approach combining biophysical and structural data on protein interfaces with quantitative proteomics data on protein concentrations [157]. Finally, it is worth emphasizing that the future of research on the biophysics and evolution of protein interactions is bright. There are exciting developments related to both experimental and computational techniques for studying the evolutionary and biophysical principles of protein complexes. These include next-generation sequencing, which has produced an explosion of data on protein sequence families, opening a new frontier in protein sequence analysis, as well as more sensitive, comprehensive and novel proteomics methods. Together with increased computational power, this is catalysing new bioinformatics and molecular evolution methods for interpreting the data. Cheap and fast ways of synthesizing DNA mean that it is easier to produce engineered proteins, which makes analysis of mutants more accessible. Together, these and other technologies will usher in a new era in our understanding of proteins and their interactions.

Acknowledgements We thank Sjors H.W. Scheres for a critical reading of the paper and helpful comments.

Funding T.P. is supported by a Laboratory of Molecular Biology Medical Research Council (LMB-MRC) Scholarship. J.A.M. is supported by a Long-Term Fellowship from the Human Frontier Science Program. F.L.S. is supported by a fellowship from Fundac¸ ao ˜ para a Ciencia ˆ  C The

C 2012 Biochemical Society Authors Journal compilation 

487

488

Biochemical Society Transactions (2012) Volume 40, part 3

e a Tecnologia [grant number SFRH/BPD/73058/2010]. L.J.C. is supported by an Engineering and Physical Sciences Research Council fellowship [grant number EP/H028064/1]. S.E.A. is supported by the Royal Society. S.A.T. is supported by the Medical Research Council [grant number U105161047] and the European Research Council.

23

24

25

References 1 2

3

4

5 6 7

8 9 10 11

12

13

14

15

16

17

18 19 20

21

22

 C The

Monod, J., Wyman, J. and Changeux, J.P. (1965) On the nature of allosteric transitions: a plausible model. J. Mol. Biol. 12, 88–118 Kuhner, ¨ S., van Noort, V., Betts, M.J., Leo-Macias, A., Batisse, C., Rode, M., Yamada, T., Maier, T., Bader, S., Beltran-Alvarez, P. et al. (2009) Proteome organization in a genome-reduced bacterium. Science 326, 1235–1240 Levy, E.D., Erba, E.B., Robinson, C.V. and Teichmann, S.A. (2008) Assembly reflects evolution of protein complexes. Nature 453, 1262–1265 Levy, E.D., Pereira-Leal, J.B., Chothia, C. and Teichmann, S.A. (2006) 3D complex: a structural classification of protein complexes. PLoS Comput. Biol. 2, e155 Jacob, F. (1977) Evolution and tinkering. Science 196, 1161–1166 Alon, U. (2003) Biological networks: the tinkerer as an engineer. Science 301, 1866–1867 Pereira-Leal, J.B., Levy, E.D. and Teichmann, S.A. (2006) The origins and evolution of functional modules: lessons from protein complexes. Philos. Trans. R. Soc. London Ser. B 361, 507–517 Nooren, I.M.A. and Thornton, J.M. (2003) Diversity of protein–protein interactions. EMBO J. 22, 3486–3492 Hartwell, L.H., Hopfield, J.J., Leibler, S. and Murray, A.W. (1999) From molecular to modular cell biology. Nature 402, C47–C52 Hodgkin, J. (1998) Seven types of pleiotropy. Int. J. Dev. Biol. 42, 501–505 Yu, H., Braun, P., Yildirim, M.A., Lemmens, I., Venkatesan, K., Sahalie, J., Hirozane-Kishikawa, T., Gebreab, F., Li, N., Simonis, N. et al. (2008) High-quality binary protein interaction map of the yeast interactome network. Science 322, 104–110 Krogan, N.J., Cagney, G., Yu, H., Zhong, G., Guo, X., Ignatchenko, A., Li, J., Pu, S., Datta, N., Tikuisis, A.P. et al. (2006) Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature 440, 637–643 Soundararajan, V., Raman, R., Raguram, S., Sasisekharan, V. and Sasisekharan, R. (2010) Atomic interaction networks in the core of protein domains and their native folds. PLoS ONE 5, e9391 Amitai, G., Shemesh, A., Sitbon, E., Shklar, M., Netanely, D., Venger, I. and Pietrokovski, S. (2004) Network analysis of protein structures identifies functional residues. J. Mol. Biol. 344, 1135–1146 Bode, ¨ C., Kovacs, ´ I.A., Szalay, M.S., Palotai, R., Korcsmaros, ´ T. and Csermely, P. (2007) Network analysis of protein dynamics. FEBS Lett. 581, 2776–2782 Ahnert, S.E., Johnston, I.G., Fink, T.M.A., Doye, J.P.K. and Louis, A.A. (2010) Self-assembly, modularity, and physical complexity. Phys. Rev. E Stat. Nonlinear Soft Matter Phys. 82, 026117 Kolmogorov, A.N. (1965) Three approaches to the definition of the concept “quantity of information”. Probl. Inf. Transm. (Engl. Trans.) 1, 3–11 Chaitin, G. (1966) On the length of programs for computing finite binary sequences. J. Assoc. Comput. Mach. 13, 547–569 Soloveichik, D. and Winfree, E. (2007) Complexity of self-assembled shapes. SIAM J. Comput. 36, 1544–1569 Tarassov, K., Messier, V., Landry, C.R., Radinovic, S., Serna Molina, M.M., Shames, I., Malitskaya, Y., Vogel, J., Bussey, H. and Michnick, S.W. (2008) An in vivo map of the yeast protein interactome. Science 320, 1465–1470 Dobson, R.C.J., Valegård, K. and Gerrard, J.A. (2004) The crystal structure of three site-directed mutants of Escherichia coli dihydrodipicolinate synthase: further evidence for a catalytic triad. J. Mol. Biol. 338, 329–339 Navia, M.A., Fitzgerald, P.M., McKeever, B.M., Leu, C.T., Heimbach, J.C., Herber, W.K., Sigal, I.S., Darke, P.L. and Springer, J.P. (1989) Three-dimensional structure of aspartyl protease from human immunodeficiency virus HIV-1. Nature 337, 615–620 C 2012 Biochemical Society Authors Journal compilation 

26

27

28 29 30 31

32

33

34

35

36 37

38

39

40 41

42 43

44

45 46 47

Devenish, S.R.A. and Gerrard, J.A. (2009) The role of quaternary structure in (β/α)8 -barrel proteins: evolutionary happenstance or a higher level of structure–function relationships? Org. Biomol. Chem. 7, 833–839 Cansu, S. and Doruker, P. (2008) Dimerization affects collective dynamics of triosephosphate isomerase. Biochemistry 47, 1358–1368 Wait, A.F., Parkin, A., Morley, G.M., dos Santos, L. and Armstrong, F.A. (2010) Characteristics of enzyme-based hydrogen fuel cells using an oxygen-tolerant hydrogenase as the anodic catalyst. J. Phys. Chem. C. 114, 12003–12009 Volbeda, A., Amara, P., Darnault, C., Mouesca, J.M., Parkin, A., Roessler, M.M., Armstrong, F.A. and Fontecilla-Camps, J.C. (2012) X-ray crystallographic and computational studies of the O2 -tolerant [NiFe]-hydrogenase 1 from Escherichia coli. Proc. Natl. Acad., Sci. U.S.A. 109, 5305–5310 Marianayagam, N.J., Sunde, M. and Matthews, J.M. (2004) The power of two: protein dimerization in biology. Trends Biochem. Sci. 29, 618–625 Goodsell, D.S. and Olson, A.J. (2000) Structural symmetry and protein function. Annu. Rev. Biophys. Biomol. Struct. 29, 105–153 Janin, J., Bahadur, R.P. and Chakrabarti, P. (2008) Protein–protein interaction and quaternary structure. Q. Rev. Biophys. 41, 133–180 Chothia, C. and Janin, J. (1975) Principles of protein–protein recognition. Nature 256, 705–708 Robinson-Rechavi, M., Alibes, ´ A. and Godzik, A. (2006) Contribution of electrostatic interactions, compactness and quaternary structure to protein thermostability: lessons from structural genomics of Thermotoga maritima. J. Mol. Biol. 356, 547–557 Malay, A.D., Allen, K.N. and Tolan, D.R. (2005) Structure of the thermolabile mutant aldolase B, A149P: molecular basis of hereditary fructose intolerance. J. Mol. Biol. 347, 135–144 Bennett, M.J., Schlunegger, M.P. and Eisenberg, D. (1995) 3D domain swapping: a mechanism for oligomer assembly. Protein Sci. 4, 2455–2468 Rousseau, F., Wilkinson, H., Villanueva, J., Serrano, L., Schymkowitz, J.W.H. and Itzhaki, L.S. (2006) Domain swapping in p13suc1 results in formation of native-like, cytotoxic aggregates. J. Mol. Biol. 363, 496–505 Ding, F., Dokholyan, N.V., Buldyrev, S.V., Stanley, H.E. and Shakhnovich, E.I. (2002) Molecular dynamics simulation of the SH3 domain aggregation suggests a generic amyloidogenesis mechanism. J. Mol. Biol. 324, 851–857 Liu, Y. and Eisenberg, D. (2002) 3D domain swapping: as domains continue to swap. Protein Sci. 11, 1285–1299 Messina, D.N., Glasscock, J., Gish, W. and Lovett, M. (2004) An ORFeome-based analysis of human transcription factor genes and the construction of a microarray to interrogate their expression. Genome Res. 14, 2041–2047 Klemm, J.D., Schreiber, S.L. and Crabtree, G.R. (1998) Dimerization as a regulatory mechanism in signal transduction. Annu. Rev. Immunol. 16, 569–592 Bundschuh, R., Hayot, F. and Jayaprakash, C. (2003) The role of dimerization in noise reduction of simple genetic networks. J. Theor. Biol. 220, 261–269 Morimoto, R.I. (1993) Cells in stress: transcriptional activation of heat shock genes. Science 259, 1409–1410 Ou, H.D., Lohr, F., Vogel, V., Mantele, W. and Dotsch, V. (2007) Structural evolution of C-terminal domains in the p53 family. EMBO J. 26, 3463–3473 McLure, K.G. and Lee, P.W. (1998) How p53 binds DNA as a tetramer. EMBO J. 17, 3342–3350 Rajagopalan, S., Huang, F. and Fersht, A.R. Single-molecule characterization of oligomerization kinetics and equilibria of the tumor suppressor p53. Nucleic Acids Res. 39, 2294–2303 Kitayner, M., Rozenberg, H., Rohs, R., Suad, O., Rabinovich, D., Honig, B. and Shakked, Z. (2010) Diversity in DNA recognition by p53 revealed by crystal structures with Hoogsteen base pairs. Nat. Struct. Mol. Biol. 17, 423–429 Weinberg, R.L., Veprintsev, D.B. and Fersht, A.R. (2004) Cooperative binding of tetrameric p53 to DNA. J. Mol. Biol. 341, 1145–1159 Menendez, D., Inga, A. and Resnick, M.A. (2009) The expanding universe of p53 targets. Nat. Rev. Cancer 9, 724–737 de Vienne, D. and Rodolphe, F. (1985) Biochemical and genetic properties of oligomeric structures: a general approach. J. Theor. Biol. 116, 527–568

The emergence of protein complexes: quaternary structure, dynamics and allostery

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62 63 64

65

66

67

68

69

70

Natan, E., Baloglu, C., Pagel, K., Freund, S.M., Morgner, N., Robinson, C.V., Fersht, A.R. and Joerger, A.C. (2011) Interaction of the p53 DNA-binding domain with its N-terminal extension modulates the stability of the p53 tetramer. J. Mol. Biol. 409, 358–368 Mateu, M.G., Sanchez Del Pino, M.M. and Fersht, A.R. (1999) Mechanism of folding and assembly of a small tetrameric protein domain from tumor suppressor p53. Nat. Struct. Biol. 6, 191–198 Nicholls, C.D., McLure, K.G., Shields, M.A. and Lee, P.W. (2002) Biogenesis of p53 involves cotranslational dimerization of monomers and posttranslational dimerization of dimers: implications on the dominant negative effect. J. Biol. Chem. 277, 12937–12945 Brachmann, R.K., Vidal, M. and Boeke, J.D. (1996) Dominant-negative p53 mutations selected in yeast hit cancer hot spots. Proc. Natl. Acad. Sci. U.S.A. 93, 4091–4095 Blagosklonny, M.V. (2000) p53 from complexity to simplicity: mutant p53 stabilization, gain-of-function, and dominant-negative effect. FASEB J. 14, 1901–1907 Dearth, L.R., Qian, H., Wang, T., Baroni, T.E., Zeng, J., Chen, S.W., Yi, S.Y. and Brachmann, R.K. (2007) Inactive full-length p53 mutants lacking dominant wild-type p53 inhibition highlight loss of heterozygosity as an important aspect of p53 status in human cancers. Carcinogenesis 28, 289–298 Frebourg, T., Sadelain, M., Ng, Y.S., Kassel, J. and Friend, S.H. (1994) Equal transcription of wild-type and mutant p53 using bicistronic vectors results in the wild-type phenotype. Cancer Res. 54, 878–881 Natan, E., Hirschberg, D., Morgner, N., Robinson, C.V. and Fersht, A.R. (2009) Ultraslow oligomerization equilibria of p53 and its implications. Proc. Natl. Acad. Sci. U.S.A. 106, 14327–14332 Demidenko, Z.N., Fojo, T. and Blagosklonny, M.V. (2005) Complementation of two mutant p53: implications for loss of heterozygosity in cancer. FEBS Lett. 579, 2231–2235 Bhatia, K., Goldschmidts, W., Gutierrez, M., Gaidano, G., Dalla-Favera, R. and Magrath, I. (1993) Hemi- or homozygosity: a requirement for some but not other p53 mutant proteins to accumulate and exert a pathogenetic effect. FASEB J. 7, 951–956 Gannon, J.V., Greaves, R., Iggo, R. and Lane, D.P. (1990) Activating mutations in p53 produce a common conformational effect: a monoclonal antibody specific for the mutant form. EMBO J. 9, 1595–1602 Garcia-Alai, M.M., Tidow, H., Natan, E., Townsley, F.M., Veprintsev, D.B. and Fersht, A.R. (2008) The novel p53 isoform “delta p53” is a misfolded protein and does not bind the p21 promoter site. Protein Sci. 17, 1671–1678 Milner, J. and Medcalf, E.A. (1991) Cotranslation of activated mutant p53 with wild type drives the wild-type p53 protein into the mutant conformation. Cell 65, 765–774 Xu, J., Reumers, J., Couceiro, J.R., De Smet, F., Gallardo, R., Rudyak, S., Cornelis, A., Rozenski, J., Zwolinska, A., Marine, J.C. et al. (2011) Gain of function of mutant p53 by coaggregation with multiple tumor suppressors. Nat. Chem. Biol. 7, 285–295 Draghi, J.A., Parsons, T.L., Wagner, G.P. and Plotkin, J.B. (2010) Mutational robustness can facilitate adaptation. Nature 463, 353–355 Vousden, K.H. and Ryan, K.M. (2009) p53 and metabolism. Nat. Rev. Cancer 9, 691–700 Kawamura, T., Suzuki, J., Wang, Y.V., Menendez, S., Morera, L.B., Raya, A., Wahl, G.M. and Belmonte, J.C. (2009) Linking the p53 tumour suppressor pathway to somatic cell reprogramming. Nature 460, 1140–1144 Marion, R.M., Strati, K., Li, H., Murga, M., Blanco, R., Ortega, S., Fernandez-Capetillo, O., Serrano, M. and Blasco, M.A. (2009) A p53-mediated DNA damage response limits reprogramming to ensure iPS cell genomic integrity. Nature 460, 1149–1153 Lynch, C.J. and Milner, J. (2006) Loss of one p53 allele results in four-fold reduction of p53 mRNA and protein: a basis for p53 haplo-insufficiency. Oncogene 25, 3463–3470 Harrison, R.S., Sharpe, P.C., Singh, Y. and Fairlie, D.P. (2007) Amyloid peptides and proteins in review. Rev. Physiol. Biochem. Pharmacol. 159, 1–77 Wright, C.F., Teichmann, S.A., Clarke, J. and Dobson, C.M. (2005) The importance of sequence diversity in the aggregation and evolution of proteins. Nature 438, 878–881 Andre, ´ I., Strauss, C.E.M., Kaplan, D.B., Bradley, P. and Baker, D. (2008) Emergence of symmetry in homooligomeric biological assemblies. Proc. Natl. Acad. Sci. U.S.A. 105, 16148–16152 Todd, A.E., Orengo, C.A. and Thornton, J.M. (2001) Evolution of function in protein superfamilies, from a structural perspective. J. Mol. Biol. 307, 1113–1143

71 72 73

74

75

76

77

78

79 80

81

82

83 84

85

86

87

88

89

90

91

92

93

94

95

96

Beltrao, P. and Serrano, L. (2007) Specificity and evolvability in eukaryotic protein interaction networks. PLoS Comput. Biol. 3, e25 Levy, E.D. (2010) A simple definition of structural regions in proteins and its use in analyzing interface evolution. J. Mol. Biol. 403, 660–670 Lukatsky, D.B., Shakhnovich, B.E., Mintseris, J. and Shakhnovich, E.I. (2007) Structural similarity enhances interaction propensity of proteins. J. Mol. Biol. 365, 1596–1606 Lukatsky, D.B., Zeldovich, K.B. and Shakhnovich, E.I. (2006) Statistically enhanced self-attraction of random patterns. Phys. Rev. Lett. 97, 178101 Fleishman, S.J., Whitehead, T.A., Ekiert, D.C., Dreyfus, C., Corn, J.E., Strauch, E.-M., Wilson, I.A. and Baker, D. (2011) Computational design of proteins targeting the conserved stem region of influenza hemagglutinin. Science 332, 816–821 Grueninger, D., Treiber, N., Ziegler, M.O.P., Koetter, J.W.A., Schulze, M.-S. and Schulz, G.E. (2008) Designed protein–protein association. Science 319, 206–209 Karanicolas, J., Corn, J.E., Chen, I., Joachimiak, L.A., Dym, O., Peck, S.H., Albeck, S., Unger, T., Hu, W., Liu, G. et al. (2011) A de novo protein binding pair by computational design and directed evolution. Mol. Cell 42, 250–260 Altschuh, D., Lesk, A.M., Bloomer, A.C. and Klug, A. (1987) Correlation of co-ordinated amino acid substitutions with function in viruses related to tobacco mosaic virus. J. Mol. Biol. 193, 693–707 Neher, E. (1994) How frequent are correlated changes in families of protein sequences? Proc. Natl. Acad. Sci. U.S.A. 91, 98–102 Shindyalov, I.N., Kolchanov, N.A. and Sander, C. (1994) Can three-dimensional contacts in protein structures be predicted by analysis of correlated mutations? Protein Eng. 7, 349–358 Lockless, S.W. and Ranganathan, R. (1999) Evolutionarily conserved pathways of energetic connectivity in protein families. Science 286, 295–299 Pollock, D.D. and Taylor, W.R. (1997) Effectiveness of correlation analysis in identifying protein residues undergoing correlated evolution. Protein Eng. 10, 647–657 Yeang, C.H. and Haussler, D. (2007) Detecting coevolution in and among protein domains. PLoS Comput. Biol. 3, e211 Skerker, J.M., Perchuk, B.S., Siryaporn, A., Lubin, E.A., Ashenberg, O., Goulian, M. and Laub, M.T. (2008) Rewiring the specificity of two-component signal transduction systems. Cell 133, 1043–1054 Dunn, S.D., Wahl, L.M. and Gloor, G.B. (2008) Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction. Bioinformatics 24, 333–340 Callahan, B., Neher, R.A., Bachtrog, D., Andolfatto, P. and Shraiman, B.I. (2011) Correlated evolution of nearby residues in Drosophilid proteins. PLoS Genet. 7, e1001315 Choi, S.S., Li, W. and Lahn, B.T. (2005) Robust signals of coevolution of interacting residues in mammalian proteomes identified by phylogeny-aided structural analysis. Nat. Genet. 37, 1367–1371 Pazos, F., Helmer-Citterich, M., Ausiello, G. and Valencia, A. (1997) Correlated mutations contain information about protein–protein interaction. J. Mol. Biol. 271, 511–523 Halperin, I., Wolfson, H. and Nussinov, R. (2006) Correlated mutations: advances and limitations: a study on fusion proteins and on the Cohesin–Dockerin families. Proteins 63, 832–845 Kass, I. and Horovitz, A. (2002) Mapping pathways of allosteric communication in GroEL by analysis of correlated mutations. Proteins 48, 611–617 Halabi, N., Rivoire, O., Leibler, S. and Ranganathan, R. (2009) Protein sectors: evolutionary units of three-dimensional structure. Cell 138, 774–786 Lapedes, A.S., Giraud, B.G., Liu, L.C. and Stormo, G.D. (1999) Correlated mutations in models of protein sequences: phylogenetic and structural effects. Stat. Mol. Biol. 33, 236–256 Burger, L. and van Nimwegen, E. (2008) Accurate prediction of protein–protein interactions from sequence alignments using a Bayesian method. Mol. Syst. Biol. 4, 165 Burger, L. and van Nimwegen, E. (2010) Disentangling direct from indirect co-evolution of residues in protein alignments. PLoS Comput. Biol. 6, e1000633 Weigt, M., White, R.A., Szurmant, H., Hoch, J.A. and Hwa, T. (2009) Identification of direct residue contacts in protein–protein interaction by message passing. Proc. Natl. Acad. Sci. U.S.A. 106, 67–72 Marks, D.S., Colwell, L.J., Sheridan, R., Hopf, T.A., Pagnani, A., Zecchina, R. and Sander, C. (2011) Protein 3D structure computed from evolutionary sequence variation. PLoS ONE 6, e28766  C The

C 2012 Biochemical Society Authors Journal compilation 

489

490

Biochemical Society Transactions (2012) Volume 40, part 3

97 98

99

100 101

102

103

104

105

106

107 108

109 110

111

112 113

114 115

116 117

118

119 120

121 122

123

 C The

Davis, F.P. and Sali, A. (2005) PIBASE: a comprehensive database of structurally defined protein interfaces. Bioinformatics 21, 1901–1907 Dahirel, V., Shekhar, K., Pereyra, F., Miura, T., Artyomov, M., Talsania, S., Allen, T.M., Altfeld, M., Carrington, M., Irvine, D.J. et al. (2011) Coordinate linkage of HIV evolution reveals regions of immunological vulnerability. Proc. Natl. Acad. Sci. U.S.A. 108, 11530–11535 Suel, G.M., Lockless, S.W., Wall, M.A. and Ranganathan, R. (2003) Evolutionarily conserved networks of residues mediate allosteric communication in proteins. Nat. Struct. Biol. 10, 59–69 Kuriyan, J. and Eisenberg, D. (2007) The origin of protein interactions and allostery in colocalization. Nature 450, 983–990 Huang, Z., Zhu, L., Cao, Y., Wu, G., Liu, X., Chen, Y., Wang, Q., Shi, T., Zhao, Y., Wang, Y. et al. (2011) ASD: a comprehensive database of allosteric proteins and modulators. Nucleic Acids Res. 39, D663–D669 Marcos, E., Crehuet, R. and Bahar, I. (2011) Changes in dynamics upon oligomerization regulate substrate binding and allostery in amino acid kinase family members. PLoS Comput. Biol. 7, e1002201 Koshland, Jr, D.E., Nemethy, G. and Filmer, D. (1966) Comparison of experimental binding data and theoretical models in proteins containing subunits. Biochemistry 5, 365–385 Hilser, V.J., Dowdy, D., Oas, T.G. and Freire, E. (1998) The structural distribution of cooperative interactions in proteins: analysis of the native state ensemble. Proc. Natl. Acad. Sci. U.S.A. 95, 9903–9908 Hammes, G.G., Chang, Y.C. and Oas, T.G. (2009) Conformational selection or induced fit: a flux description of reaction mechanism. Proc. Natl. Acad. Sci. U.S.A. 106, 13737–13741 Okazaki, K. and Takada, S. (2008) Dynamic energy landscape view of coupled binding and protein conformational change: induced-fit versus population-shift mechanisms. Proc. Natl. Acad. Sci. U.S.A. 105, 11182–11187 Dodla, R. and Wilson, C.J. (2010) A phase function to quantify serial dependence between discrete samples. Biophys. J. 98, L5–L7 Horovitz, A. and Fersht, A.R. (1990) Strategy for analysing the co-operativity of intramolecular interactions in peptides and proteins. J. Mol. Biol. 214, 613–617 Elber, R. (2011) Simulations of allosteric transitions. Curr. Opin. Struct. Biol. 21, 167–172 del Sol, A., Fujihashi, H., Amoros, D. and Nussinov, R. (2006) Residues crucial for maintaining short paths in network communication mediate signaling in proteins. Mol. Syst. Biol. 2, 2006.0019 Daily, M.D. and Gray, J.J. (2009) Allosteric communication occurs via networks of tertiary and quaternary motions in proteins. PLoS Comput. Biol. 5, e1000293 Greene, L.H. and Higman, V.A. (2003) Uncovering network systems within protein structures. J. Mol. Biol. 334, 781–791 Atilgan, A.R., Akan, P. and Baysal, C. (2004) Small-world communication of residues and significance for protein dynamics. Biophys. J. 86, 85–91 Daily, M.D. and Gray, J.J. (2007) Local motions in a benchmark of allosteric proteins. Proteins 67, 385–399 Yang, L.W., Rader, A.J., Liu, X., Jursa, C.J., Chen, S.C., Karimi, H.A. and Bahar, I. (2006) oGNM: online computation of structural dynamics using the Gaussian Network Model. Nucleic Acids Res. 34, W24–W31 Cooper, A. and Dryden, D.T. (1984) Allostery without conformational change: a plausible model. Eur. Biophys. J. 11, 103–109 Rafferty, J.B., Somers, W.S., Saint-Girons, I. and Phillips, S.E. (1989) Three-dimensional crystal structures of Escherichia coli met repressor with and without corepressor. Nature 341, 705–710 Stacklies, W., Xia, F. and Grater, F. (2009) Dynamic allostery in the methionine repressor revealed by force distribution analysis. PLoS Comput. Biol. 5, e1000574 Swint-Kruse, L. and Matthews, K.S. (2009) Allostery in the LacI/GalR family: variations on a theme. Curr. Opin. Microbiol. 12, 129–137 Lewis, M., Chang, G., Horton, N.C., Kercher, M.A., Pace, H.C., Schumacher, M.A., Brennan, R.G. and Lu, P. (1996) Crystal structure of the lactose operon repressor and its complexes with DNA and inducer. Science 271, 1247–1254 Bell, C.E. and Lewis, M. (2000) A closer view of the conformation of the Lac repressor bound to operator. Nat. Struct. Biol. 7, 209–214 Markiewicz, P., Kleina, L.G., Cruz, C., Ehret, S. and Miller, J.H. (1994) Genetic studies of the lac repressor. XIV. Analysis of 4000 altered Escherichia coli lac repressors reveals essential and non-essential residues, as well as “spacers” which do not require a specific sequence. J. Mol. Biol. 240, 421–433 Kalodimos, C.G., Boelens, R. and Kaptein, R. (2002) A residue-specific view of the association and dissociation pathway in protein–DNA recognition. Nat. Struct. Biol. 9, 193–197 C 2012 Biochemical Society Authors Journal compilation 

124

125 126

127

128

129

130

131

132

133

134

135

136 137

138

139

140 141 142

143

144

145

146

147

Tungtur, S., Meinhardt, S. and Swint-Kruse, L. (2010) Comparing the functional roles of nonconserved sequence positions in homologous transcription repressors: implications for sequence/function analyses. J. Mol. Biol. 395, 785–802 Ma, B., Tsai, C.J., Haliloglu, T. and Nussinov, R. (2011) Dynamic allostery: linkers are not merely flexible. Structure 19, 907–917 Swint-Kruse, L. (2004) Using networks to identify fine structural differences between functionally distinct protein states. Biochemistry 43, 10886–10895 Demerdash, O.N., Daily, M.D. and Mitchell, J.C. (2009) Structure-based predictive models for allosteric hot spots. PLoS Comput. Biol. 5, e1000531 Zhan, H., Camargo, M. and Matthews, K.S. (2010) Positions 94–98 of the lactose repressor N-subdomain monomer–monomer interface are critical for allosteric communication. Biochemistry 49, 8636–8645 Flynn, T.C., Swint-Kruse, L., Kong, Y., Booth, C., Matthews, K.S. and Ma, J. (2003) Allosteric transition pathways in the lactose repressor protein core domains: asymmetric motions in a homodimer. Protein Sci. 12, 2523–2541 Su, J.G., Jin Xu, X., Hua Li, C., Chen, W.Z. and Wang, C.X. (2011) Identification of key residues for protein conformational transition using elastic network model. J. Chem. Phys. 135, 174101 Mukherjee, S. and Zhang, Y. (2011) Protein–protein complex structure predictions by multimeric threading and template recombination. Structure 19, 955–966 Stein, A., Mosca, R. and Aloy, P. (2011) Three-dimensional modeling of protein interactions and complexes is going ‘omics. Curr. Opin. Struct. Biol. 21, 200–208 Fermi, G., Perutz, M.F., Shaanan, B. and Fourme, R. (1984) The crystal structure of human deoxyhaemoglobin at 1.74 Å resolution. J. Mol. Biol. 175, 159–174 Arakawa, T., Kawano, Y., Kataoka, S., Katayama, Y., Kamiya, N., Yohda, M. and Odaka, M. (2007) Structure of thiocyanate hydrolase: a new nitrile hydratase family protein with a novel five-coordinate cobalt(III) center. J. Mol. Biol. 366, 1497–1509 Cramer, P., Bushnell, D.A. and Kornberg, R.D. (2001) Structural basis of transcription: RNA polymerase II at 2.8 angstrom resolution. Science 292, 1863–1876 Stock, D., Leslie, A.G. and Walker, J.E. (1999) Molecular architecture of the rotary motor in ATP synthase. Science 286, 1700–1705 Marsh, J.A. and Teichmann, S.A. (2011) Relative solvent accessible surface area predicts protein conformational changes upon binding. Structure 19, 859–867 Dobbins, S.E., Lesk, V.I. and Sternberg, M.J.E. (2008) Insights into protein flexibility: the relationship between normal modes and conformational change upon protein–protein docking. Proc. Natl. Acad. Sci. U.S.A. 105, 10390–10395 Babu, M.M., van der Lee, R., de Groot, N.S. and Gsponer, J. (2011) Intrinsically disordered proteins: regulation and disease. Curr. Opin. Struct. Biol. 21, 432–440 Wright, P.E. and Dyson, H.J. (2009) Linking folding and binding. Curr. Opin. Struct. Biol. 19, 31–38 Hegyi, H., Schad, E. and Tompa, P. (2007) Structural disorder promotes assembly of protein complexes. BMC Struct. Biol. 7, 65 Tompa, P. and Fuxreiter, M. (2008) Fuzzy complexes: polymorphism and structural disorder in protein–protein interactions. Trends Biochem. Sci. 33, 2–8 Marsh, J.A., Dancheck, B., Ragusa, M.J., Allaire, M., Forman-Kay, J.D. and Peti, W. (2010) Structural diversity in free and bound states of intrinsically disordered protein phosphatase 1 regulators. Structure 18, 1094–1103 Mittag, T., Marsh, J., Grishaev, A., Orlicky, S., Lin, H., Sicheri, F., Tyers, M. and Forman-Kay, J.D. (2010) Structure/function implications in a dynamic complex of the intrinsically disordered Sic1 with the Cdc4 subunit of an SCF ubiquitin ligase. Structure 18, 494–506 Wells, M., Tidow, H., Rutherford, T.J., Markwick, P., Jensen, M.R., Mylonas, E., Svergun, D.I., Blackledge, M. and Fersht, A.R. (2008) Structure of tumor suppressor p53 and its intrinsically disordered N-terminal transactivation domain. Proc. Natl. Acad. Sci. U.S.A. 105, 5762–5767 Pereira-Leal, J.B., Levy, E.D., Kamp, C. and Teichmann, S.A. (2007) Evolution of protein complexes by duplication of homomeric interactions. Genome Biol. 8, R51–R51 Enright, A.J., Iliopoulos, I., Kyrpides, N.C. and Ouzounis, C.A. (1999) Protein interaction maps for complete genomes based on gene fusion events. Nature 402, 86–90

The emergence of protein complexes: quaternary structure, dynamics and allostery

148

149

150

151

152 153

154 155

Marcotte, E.M., Pellegrini, M., Ng, H.L., Rice, D.W., Yeates, T.O. and Eisenberg, D. (1999) Detecting protein function and protein–protein interactions from genome sequences. Science 285, 751–753 Jabri, E. and Karplus, P.A. (1996) Structures of the Klebsiella aerogenes urease apoenzyme and two active-site mutants. Biochemistry 35, 10616–10626 Ha, N.C., Oh, S.T., Sung, J.Y., Cha, K.A., Lee, M.H. and Oh, B.H. (2001) Supramolecular assembly and acid resistance of Helicobacter pylori urease. Nat. Struct. Biol. 8, 505–509 Balasubramanian, A. and Ponnuraj, K. (2010) Crystal structure of the first plant urease from jack bean: 83 years of journey from its first crystal to molecular structure. J. Mol. Biol. 400, 274–283 Kummerfeld, S.K. and Teichmann, S.A. (2005) Relative rates of gene fusion and fission in multi-domain proteins. Trends Genet. 21, 25–30 Meenan, N.A., Sharma, A., Fleishman, S.J., Macdonald, C.J., Morel, B., Boetzel, R., Moore, G.R., Baker, D. and Kleanthous, C. (2010) The structural and energetic basis for high selectivity in a high-affinity protein–protein interaction. Proc. Natl. Acad. Sci. U.S.A. 107, 10080–10085 Hurley, J.H., Lee, S. and Prag, G. (2006) Ubiquitin-binding domains. Biochem. J. 399, 361–372 Lo Conte, L., Chothia, C. and Janin, J. (1999) The atomic structure of protein–protein recognition sites. J. Mol. Biol. 285, 2177–2198

156 157 158

159

160

161

Ozbabacan, S.E., Engin, H.B., Gursoy, A. and Keskin, O. (2011) Transient protein–protein interactions. Protein Eng. Des. Sel. 24, 635–648 Cox, J. and Mann, M. (2011) Quantitative, high-resolution proteomics for data-driven systems biology. Annu. Rev. Biochem. 80, 273–299 Kodandapani, R., Pio, F., Ni, C.Z., Piccialli, G., Klemsz, M., McKercher, S., Maki, R.A. and Ely, K.R. (1996) A new pattern for helix–turn–helix recognition revealed by the PU.1 ETS-domain–DNA complex. Nature 380, 456–460 Venkatachalam, S., Shi, Y.P., Jones, S.N., Vogel, H., Bradley, A., Pinkel, D. and Donehower, L.A. (1998) Retention of wild-type p53 in tumors from p53 heterozygous mice: reduction of p53 dosage can promote cancer formation. EMBO J. 17, 4657–4667 Martin, A.J., Vidotto, M., Boscariol, F., Di Domenico, T., Walsh, I. and Tosatto, S.C. (2011) RING: networking interacting residues, evolutionary information and energetics in protein structures. Bioinformatics 27, 2003–2005 Vehlow, C., Stehr, H., Winkelmann, M., Duarte, J.M., Petzold, L., Dinse, J. and Lappe, M. (2011) CMView: interactive contact map visualization and analysis. Bioinformatics 27, 1573–1574

Received 1 March 2012 doi:10.1042/BST20120056

 C The

C 2012 Biochemical Society Authors Journal compilation 

491