Current Pharmaceutical Biotechnology, 2008, 9, 57-66
57
Predicting 3D Structures of Protein-Protein Complexes Ilya A. Vakser*1,2 and Petras Kundrotas1 1
Center for Bioinformatics and 2Department of Molecular Biosciences, The University of Kansas, Lawrence, Kansas 66047, USA Abstract: The protein-protein docking problem is one of the focal points of activity in computational structural biology. Adequate computational techniques for structural modeling of protein interactions are important because of the growing number of known protein structures, particularly in the context of structural genomics. The protein docking methodology offers tools for fundamental studies of protein interactions and provides structural basis for drug design. The paper presents a critical review of the existing protein-protein docking approaches in view of the fundamental principles of protein recognition.
Keywords: Protein recognition, structural bioinformatics, protein modeling, docking. INTRODUCTION The protein-protein docking problem is one of the focal points of activity in computational structural biology. The 3D structure of a protein-protein complex, generally, is more difficult to determine experimentally than the structure of an individual protein. Adequate computational techniques to model protein interactions are important because of the growing number of known protein 3D structures, particularly in the context of structural genomics [1, 2]. The protein docking techniques offer tools for fundamental studies of protein interactions and provide structural basis for drug design. Computational structural approaches to molecular recognition were introduced in early 70’s by Scheraga and coworkers [3] for small ligand interactions with proteins. Specifically, protein-protein docking techniques were pioneered in 1978 by Wodak and Janin [4] and Greer and Bush [5]. Since then, the field has grown substantially, especially starting from early 90’s, through the development of powerful docking algorithms, rapid progress in computer hardware, and significant expansion of available experimental data on structures of protein-protein complexes. Nevertheless, our understanding of the principles of protein recognition is still limited. With the rapid advances in experimental and computational determination of structures of individual proteins, the importance of modeling of protein 3D interactions increases. We now face the challenge of structural modeling of protein-interaction networks on the genome scale, requiring much more powerful docking methodologies. In this paper, we describe the structural and physicochemical foundations of modern protein-docking techniques, the fundamental elements of docking, and provide an
*Address Correspondence to these Authors at the Center for Bioinformatics and Department of Molecular Biosciences, The University of Kansas, Lawrence, Kansas 66047, USA; E-mails:
[email protected] and
[email protected] 1389-2010/08 $55.00+.00
overview of existing docking approaches and future directions in methodological developments, especially in light of the challenges of genomics/proteomics. PROTEIN RECOGNITION Protein-protein complex formation can be viewed from either a more physical perspective as a minimization of the free energy of the system, or from a more empirical point of view as a match between various phenomenological structural and/or physicochemical motifs (so called “recognition factors”). In living organisms, proteins recognize their partners among many other proteins and bind in a specific way in short physiological timeframes. Given the complexity of the system, from either the physical or empirical points of view, the formation of a protein-protein complex is a remarkable event, based on the nature’s super-efficient “energy-minimization protocol” and guided by long-range and short-range recognition factors. Modern methods of protein docking are based on our efforts to simulate and navigate the intermolecular energy landscape, and on our current understanding of the recognition factors governing complex formation. Parallels Between Protein Recognition and Protein Folding Principles of protein folding and protein recognition are the basis for understanding life processes at the molecular level. They also provide the foundation for the algorithms of predicting protein structure and interactions. The underlying physical principles that determine the structure of individual proteins and the structure of protein complexes are identical. The physicochemical and structural principles of protein interactions will be discussed in greater detail later. Here we mention them briefly within the context of binding and folding similarity. Databases of co-crystallized protein-protein complexes are used to study the interface properties and derive relevant principles. Statistically derived residue-residue and atomatom preferences for protein-protein interfaces were found © 2008 Bentham Science Publishers Ltd.
Vakser and Kundrotas
58 Current Pharmaceutical Biotechnology, 2008, Vol. 9, No. 2
similar to those in protein cores [6-11]. A major role of hydrophobicity in protein folding is well established [12, 13]. Studies of protein-protein interfaces confirm the importance of hydrophobicity in complex formation as well [6, 14-17]. The importance of the concept of the energy funnel, first demonstrated for protein folding [18, 19], has been expanded to the intermolecular energy landscape in protein-protein interactions [20-27]. Tight packing of structural elements inside proteins is one of the fundamental concepts in our understanding of protein structures (e.g., see [28-30]). The same concept of compactness applies to protein-protein interfaces as well (see, Geometric and physicochemical complementarity section). Geometric and Physicochemical Complementarity A tight geometric complementarity between interacting protein surfaces is a cornerstone of protein-protein docking methodology since its inception in 1978 [4]. Systematic database analysis of the rapidly growing number of cocrystallized protein-protein complexes provides an increasing amount of evidence supporting this concept [17, 31]. A number of investigations of packing and buried surface area at protein-protein interfaces [29, 32, 33] supported the general conclusion that the interacting proteins have a high degree of surface complementarity, but indicated that there is a significant variation in this regard between different complexes. For example, packing at the antigen-antibody interface is relatively loose [32, 34]. The contact surface area in protein-protein complexes generally varies from 500 to 5000 Å2 with many complexes having even larger contact areas [31, 35]. Most protein-protein interfaces are found to be more hydrophobic than exposed areas [14-16, 36]. Hydrophobic amino acid residues tend to be enriched in the interface in hydrophobic patches of 200-400 Å2 [6, 37, 38]. A high degree of electrostatic and hydrogen-bonding complementarity is also observed for protein-protein interfaces [33, 39-41]. Structural Recognition Motifs and Hot Spots The receptor (the larger protein in the complex) sites are often concave [42-45]. The binding surface is also known to be more conserved than the non-binding surface. A degree of residues conservation and evolutionary importance is an indicator of the binding and/or functional region [46-50]. It has also been determined that entropic properties of the binding site are different from those of the non-binding surface [51, 52]. The binding “hot spots” theory indicates existence of a small number (e.g., three) of interface residues that are key to binding. They are usually positioned in the middle of the interface, are inaccessible to solvent, complementary to other “hot spot” residues across the interface, are evolutionary conserved, and keep their conformation upon binding [5256]. Large-Scale Recognition Factors An important insight into the basic rules of protein recognition is provided by the studies of large-scale structural recognition factors, such as correlation of the antigenicity of surface areas with their accessibility to large probes [57], role of the surface clefts [58], binding-site characterization
based on geometric criteria [42, 43, 45], study of the “lowfrequency” surface properties [59], recognition of proteins deprived of atom-size structural features [60-63], and backbone complementarity in protein recognition [64]. The practical importance of the large-scale recognition factors for docking methodologies is that they often allow one to ignore local structural inaccuracies (e.g., those caused by conformational changes of the partners upon complex formation). Intermolecular energy landscape The existence of the large-scale structural recognition factors in protein association has to do with the funnel-like intermolecular energy landscape. The concept of the funnellike energy landscapes has had a significant impact on the understanding of protein folding [19]. The kinetics of the amino-acid chain folding into a unique 3D structure is impossible to explain using “flat” energy landscapes, where minima are located on the energy “surface” that do not favor the native structure (so called “golf-course” landscapes). The general slope of the energy landscape towards the native structure (“the funnel”) explains the kinetics of protein folding. It also provides the basis for protein-structure prediction. The basic physicochemical and structural principles of protein binding are similar, if not identical, to those of protein folding. Thus, the funnel concept can be naturally extended to intermolecular energy [20, 22]. As in protein folding, this concept is necessary to explain the kinetics data for protein/protein association. The existence of a funnel in protein/protein interactions is supported by considerations regarding long-range electrostatic and/or hydrophobic “steering forces” and the geometry of proteins [65, 66], energy estimates for near-native complex structures [67], and binding mechanism that involves protein folding [21]. It has been shown that simple energy functions, including coarse-grain (low-resolution) models, reveal major landscape characteristics. The large-scale, systematic studies of protein-protein complexes confirmed the existence of the intermolecular binding funnel [22, 63] and further revealed that the number of distinct energy basins is small and that they are wellformed (funnel like) and correlated with known binding modes [25]. The characteristic funnel size is 6 – 8 Å interface RMSD and correlates with the type of the complex (e.g. larger for enzyme-inhibitor, smaller for antigen-antibody, etc. [27]). Important direction for designing better docking procedures is smoothing of the landscape, by reducing its ruggedness [24, 68, 69], amplifying the funnel/local minima depth ratio [15], etc. PRINCIPLES OF DOCKING Docking vs. Binding The only computational approaches that directly model physical interactions between proteins are docking and binding simulations [66]. Docking approaches, as opposed to binding simulations, are not concerned with modeling of real binding pathways, but rather focus on the final configuration(s) of the complex, i.e. equilibrium vs. kinetics. This makes docking computationally efficient and allows scanning of a wide variety of potential matches. Binding simulations potentially provide deeper insight into the mechanism of complex formation, assuming a correct force field and adequate sampling. Docking and binding simulations address
Predicting 3D Structures of Protein-Protein Complexes
modeling of protein-protein interactions from different perspectives and are complementary to each other. In this paper, we focus exclusively on docking methodologies. Protein/Protein vs. Protein/ligand and Protein/DNA Docking Proteins naturally interact not only with other proteins, but also with small ligands, as well as with DNA and other biopolymers. These interactions share the same physical and empirical principles of molecular recognition. For practical purposes, however, the relative importance of major recognition factors, as well as the docking strategy, is different. In small ligand/protein interactions, the small ligand size reduces the relative importance of surface complementarity, especially in the lower-resolution, first approximation aspect. Correspondingly, from the start, it elevates the relative importance of other physicochemical factors (hydrophobicity, electrostatics, and hydrogen bonding), structural-recognition motifs on the receptor site, as well as the ligand’s flexibility. These aspects are reflected in the docking strategy. Another major difference with the protein-protein case in docking strategy is that for the small ligand/protein case, the binding site on the receptor is often known or presumed. The main, and often the only, goal remaining is to determine the intricate details of the ligand/receptor interactions. Protein/protein docking usually requires prediction of the general docking mode for the two proteins, which combined with the large interaction interfaces makes determination of the intricate interatomic details impossible in existing docking approaches. In the protein/DNA case, the two major differences from the protein/protein case are the ultimate flexibility of nucleic acids except in multimeric helices, and related to it, the general absence of structural recognition factors beyond the local sequence of the nucleic acids. On the other hand, the binding site on the protein may be derived from known cocrystallized protein/DNA complexes. Also protein/DNA docking may involve a reduced dimensionality of the docking space, when the protein molecule basically slides along the DNA helix until a matching combination of the nucleic acids is found. The practice of protein/DNA docking, although very important, is still currently limited. Small ligand/protein docking enjoys huge popularity in both industrial and academic communities, and its practice preceded protein/protein docking. Realistically, although the small ligand/protein and protein/protein docking share many principles, algorithms, and procedures, they have diverged into two distinct fields. In this paper, we focus on protein/protein docking. Template-Based vs. ab Initio Docking Protein folding and protein binding are based on the same physical principles and thus share many aspects of empirical modeling. Computational techniques for structural modeling of protein-protein interactions often follow computational approaches to individual protein structures in a number of aspects and have been rapidly developing in recent decades, both in terms of methodology and computing power. Computational prediction of individual protein structures has de-
Current Pharmaceutical Biotechnology, 2008, Vol. 9, No. 2
59
veloped from the “first principles” force fields approaches to the currently dominating “knowledge-based” modeling. The two main reasons for this paradigm shift are: on the one hand, the inability of the existing first-principles methods to deliver reliable solutions, and on the other hand, the explosive growth of experimentally determined knowledge on the protein structures. The knowledge-based modeling dominates the individual protein structure prediction field in most of its aspects, from comparative modeling, based on the knowledge of the templates, to new fold predictions, based largely on knowledge-based potentials [70]. Protein docking field is substantially younger than the individual structure prediction and is much less advanced on the transition path from “first principles” to “knowledgebased” approaches. The same two factors as in the evolution of the individual structures prediction are responsible for the current much lesser role of knowledge-based techniques in protein docking [71]. First, the first-principles approaches, based on physical force fields, have been relatively successful in prediction of protein complexes, as opposed to prediction of individual structures. The reason is that in the docking first approximation (rigid-body docking) that works in many cases, the number of the degrees of freedom is only six, which is incomparable with the number of degrees of freedom in prediction of individual proteins structures at any meaningful level of approximation. Second, the amount of experimentally determined information on the structure of the protein-protein complexes, based primarily on X-ray structures, is by far less than that on the individual protein structures. It is also widely believed that the majority of functional protein/protein interactions are transient, and thus do not form complexes stable enough for crystallization. Thus, the pool of protein/protein structural templates is limited and heavily biased towards multisubunit proteins. However, with the advances in the experimental determination of protein complexes the situation is changing. New docking approaches are emerging that involve knowledge-based techniques ranging from template-based docking [2, 72], including such important applications as predictions of the existence of an interaction [73], to ab initio docking with knowledge-based potentials [74-76]. In reflection of the growing importance of this subject, the latest Modeling of Protein Interactions meeting (http://www.bioinformatics.ku.edu/conferences) was largely devoted to exploration of community-wide efforts for largescale experimental determination of the structures of proteinprotein complexes and the new horizons these efforts would open for the docking field. The meeting hosted top experimental and computational experts in the field and underscored the feasibility and the need for such an effort, as well as the necessity to develop knowledge-based docking techniques to meet the challenge. DOCKING METHODOLOGY Energy minimization of structures is a very difficult problem. The large molecular sizes and the rugged nature of the energy landscape with many local minima result in a challenging global-optimization problem. The objective is to find conformation with the lowest energy, which should correspond to the native structure. Thus, the task is split in two:
60 Current Pharmaceutical Biotechnology, 2008, Vol. 9, No. 2
development of an objective function (force field) as a representative of the energy surface that has its global minimum at the native conformation, or close to it, and the procedure used to locate that global minimum. The typical tradeoff is between the quality/complexity of the force field, and the realistic possibility of locating the global minimum of a more complex objective function. This dichotomy of function – search procedure (engine) is often complemented by a third stage – post-processing, aimed at improving the signal (correct prediction)-to-noise (false-positive prediction) ratio of the results using “scoring” functions that are too expensive computationally to be included in the main search procedure. Force Fields Force fields in existing protein/protein docking procedures widely range between the most trivial ones (e.g., an empirical step-function approximation of the Lennard-Jones potential, which exclusively employs digitized steric complementarity [68] to full-scale “standard” force fields. The difference between the complex and the simple force fields often reflects two different paradigms of docking strategy – generating a large number of candidate matches based on accurate force fields and finding the correct match at the post-processing (refinement and scoring) stage, as opposed to locating the approximate position of the proteins within the complex using a simple force field and bringing in the details at the subsequent refinement stage using a more complex force field. These two paradigms are conceptually similar in the second stage (post-processing/refinement), but differ in the first stage of generating the candidate matches. An important development in docking force fields has been introduction of intermolecular statistical potentials [8, 9, 11, 74, 77, 78] as opposed to the “first principles” force fields. Sampling ab Initio Sampling of the docking degrees of freedom is a crucial step in docking methodology. The docking degrees of freedom involve six external degrees of the rigid body movement (3 translation coordinates and 3 angles of rotation), as well as internal degrees of freedom, which determine the conformation of the proteins. To make the number of the internal degrees of freedom manageable, approximations are essential. The “rigid-body” approximation leaves only the external degrees of freedom and approximates internal de-
Fig. (1). Cross section through the 3D representation of the molecules.
Vakser and Kundrotas
grees of freedom by making the protein “body” soft and thus tolerant to local structural mismatches. The rigid-body approximation is adequate for docking separated proteins from co-crystallized complexes (no structural changes upon complex formation), low-resolution docking of the unbound/inaccurate structures, as well as in some cases of high-resolution docking of unbound proteins. For an atomic resolution docking of unbound structures, however, some form of explicit processing of internal degrees of freedom is required in general. It may be assumed (although not proven by systematic studies) that for most complexes of unbound proteins atomic resolution accuracy in docking may be achieved by properly designed limited conformational search of the surface side chains. The existing search procedures used in the sampling of the docking degrees of freedom vary dramatically, from exhaustive search on a grid to Monte Carlo, molecular dynamics, and genetic algorithms. An important direction in protein docking is based on the correlation technique by the fast Fourier transformation (FFT), introduced in 1992 by Katchalski-Katzir et al. [79]. It predicts the structure of a complex by maximizing surface overlap between the two molecular images. The images are digital 3D representations of molecular shape which distinguish between the surface and the interior of a molecule (Fig (1)). The algorithm is based on the correlation between these images, which is calculated rapidly using fast Fourier transformation. Correlation peaks found by the procedure indicate geometric match and thus represent a potential complex. The procedure is equivalent to the full six-dimensional search (three translations and three rotations of the ligand) but much faster by design. The approach has been implemented in a large number of algorithms that share the same paradigm of sampling of three or more external docking coordinates by integral operations, but differ in representation of molecules, structural tolerance, and physicochemical characterization of surfaces. Other grid-based algorithms perform the search explicitly, but contain empirical structure-based filters that reduce the search space [80-82]. A docking algorithm developed by Nussinov, Wolfson, and co-workers [83-85] represents protein surfaces by a reduced set of critical points that contains roughly the same information about the surface as a more dense surface representation. The triplets or pairs of the critical points with associated normals representing the proteins are matched using
Predicting 3D Structures of Protein-Protein Complexes
an efficient computational algorithm based on computer vision techniques. An alternative approach to matching surface points is based on the genetic algorithm [86]. Abagyan and co-workers [87, 88] and Scheraga and coworkers [89] developed flexible docking algorithms that use Monte Carlo-based energy-minimization protocols, which include both external and internal coordinates. The method developed by Baker and colleagues [74] docks two proteins using a Monte Carlo protocol with low-resolution residuescale statistical potentials, followed by a high-resolution refinement. A simple long-range electrostatic guidance method based on the balance of intermolecular atom-atom forces was developed [90]. In case of objective functions designed to be sufficiently realistic, the global minimum is often never found due to time constraints and the complexity of the energy landscape, and the minimization determines only local minima which can be located far away from the native conformations. Thus, global optimization methodologies for complex objective functions in protein docking remain of high priority despite the multitude of existing techniques. There is a consensus among researchers that progress in solving structureprediction problems depends on the presence of certain properties of the native energy landscape (which must be reflected in the objective function landscape). Most importantly, equilibrium energy landscapes in folding and docking often reveal a funnel-like character of the energy landscape near the native structures [20-22, 63, 91-94]. Several protein recognition techniques use smoothed potential functions [68, 95-99]. These methods transform the original objective function in such a way that the number of local minima is significantly decreased, the energetic barriers between minima reduced, and global optimization is corre-
Current Pharmaceutical Biotechnology, 2008, Vol. 9, No. 2
61
spondingly much easier. The underlying idea is to transform minimum found on the smoothed function gradually back to the original function in a way that allows one to track the location of the global minima. Sampling Template-Based The starting point in homology based approaches to protein-protein docking is submitting a pair of query (target) sequences to a procedure that searches for homologous sequences in a database of putative template sequences performing alignment of query and template sequences (Fig (2)). The template sequences should have their 3D structure known and therefore the search is usually performed against the non-redundant set of sequences from the PDB, sometimes with an intermediate search against non-redundant sequence database. After creating the list of homologous template sequences for both query sequences separately, one searches both lists for the template sequences that belong to the same complex. Alignment protocols often produce alignments that are not suitable for building of reliable 3D models. Thus the initial list of templates should be filtered according to certain criteria thereby creating the list of common templates appropriate for further model building. These criteria may vary depending on a particular task and/or alignment protocol, but in general they have to include a measure of statistical significance [100]. For the PSI-BLAST alignment protocol such measure is e-value; however other alignment methods may utilize a normalized raw score. An important measure of alignment quality is alignment of the part of query sequence that is anticipated to be at the complex interface. For instance, local alignment protocols (PSI-BLAST) may produce alignments that are statistically
Fig. (2). Schematic representation of sequence homology approach to 3D modeling of protein-protein complexes.
Vakser and Kundrotas
62 Current Pharmaceutical Biotechnology, 2008, Vol. 9, No. 2
significant but cover only a fraction of the query sequence. This may be detrimental for modeling of complex structures since aligned part of the query sequence may be located far from the interface. Global alignment protocols (e.g., profileto-profile alignments) may result in long stretches of deletions and/or insertions in the query sequence at the complex interface. Since such stretches are built ab initio, this significantly reduces the model quality because of potential steric clashes at the interface. Thus it is important to have a reliable method of predicting putative interfacial patches from the sequence alone. The identified pairs of “good” alignments are then submitted to 3D modeling software which builds structures of both monomers based on the templates. It is possible that monomers of the query complex would be modeled on the different parts of the same monomer of the template structure, especially when a local alignment protocol is used. However such cases are rare [101]. The final step is constructing the model of the complex from the models of separate monomers (often by simply joining the PDB files of individual models). The steric clashes at the interface are typically small and may be removed by minimization. A number of template-based techniques have their roots in predicting 3D structures of individual protein molecules. Skolnick’s group applied threading protocol, developed for predicting 3D structures of individual proteins, separately to each monomer of the complex, with added rethreading of both partners in a putative complex [73]. Evaluation of threading results was enhanced by utilizing structural alignment to assess the percentage of correctly assigned monomer and dimer templates [102]. Jackson et al. [103] complemented the search for homologous sequence in the pool of putative templates by the search for conserved structural patterns at the interface. Structure and sequence similarities between known interacting proteins were used to increase sensitivity and specificity of fold and family assignment [104]. Although this work was not directly related to modeling of protein-protein interactions, it demonstrated that proteins, similar in sequence in structure, interact in a similar manner. The method was later applied to predicting domain interactions in PSD-95 protein [105] and to modeling of host-pathogen interactions [106]. Post-Processing: Scoring and Refinement A variety of approaches to scoring have been suggested in the literature. These vary from smoothed potentials (as crude as smoothing the Leonard-Jones potential in GRAMM [68]) to molecular dynamics simulations with elaborate force fields and explicit solvation. The low-resolution (unrefined) scoring is supposed to improve the hierarchy of the matches from the scan (global docking search) output before the refinement. Scoring and refinement may be distinguished in the sense that scoring reevaluates the values of the objective function without changing the coordinates, whereas refinement moves the structures in the coordinates space. This distinction to a certain degree may be arbitrary – some scoring components may become part of the refinement. Moreover, some of the unrefined scoring components may be incorporated in the docking search (sampling). Thus one can envision that in the
future this low-resolution/unrefined scoring stage may disappear divided between the two search procedures – the scan and the refinement. However, at the present stage of the docking methodology development, this stage is necessary from the implementation standpoint, as well as for the evaluation of different scoring components that can be later included in the docking scan and the refinement searches. There might be also a separate high-resolution/refined scoring stage because some scoring components may work differently for low-resolution (structural clashes, gaps) and high-resolution structures. Examples of low-resolution scoring components are: across-interface statistical propensities such as environment dependent residue-residue potentials, structural motifs coupling, energy basin characteristics that help identify the funnel (size, ruggedness, and slope), etc. At the refinement stage, the degrees of freedom in the minimization protocol often are the rigid-body movements of the ligand plus selected torsional angles of the side chains and, potentially, of the more flexible elements of the backbone at the putative interface (e.g., loops).The structural changes upon binding involve both the side-chain and the main-chain movement. A reliable modeling of the backbone movement presents a significant computational challenge. However, the analysis of the DOCKGROUND datasets (http://dockground.bioinformatics.ku.edu) indicates that only 15% of complexes have significant (> 3 Å C RMSD) changes upon complex formation. Thus, docking approaches which do not take into account backbone movements are expected to be applicable to the majority of protein complexes. The refinement may become computationally expensive, depending on the approach and the required degree of structural details. Thus refinement has to be flexible, terminating at certain structural resolution, if the computational time becomes a factor. The high-resolution/refined scoring typically contains scoring components that are sensitive to the structural imperfections and/or are too computationally expensive to be included in the refinement search (e.g., advance electrostatics calculations, etc.). LARGE-SCALE DOCKING Direct experimental approaches to structure determination (primarily, X-ray crystallography and NMR) are developing fast. However, they are capable of determining only a fraction of all protein structures. Thus, the structures of most individual proteins in genomes have to be modeled by highthroughput modeling approaches [107, 108]. The growing availability of the experimentally determined structures of representative protein folds makes the template-based modeling of the majority of proteins in genomes quite realistic. The limitations of the direct experimental techniques are even more evident in the case of the structures of proteinprotein complexes, which are, in general, more difficult to determine than the structures of individual proteins. The fact that most individual protein structures from the genome will be models makes the computational docking approaches indispensable for the direct 3D analysis of probable protein/protein interactions in genomes. The number of potential protein/protein interactions and the nature of protein structures to be docked impose strong
Predicting 3D Structures of Protein-Protein Complexes
requirements on the docking techniques. Because of the large number of proteins to dock, docking has to be fast. At the same time, since the majority of individual protein structures in a genome are models, the docking has to be capable of predicting complexes of modeled proteins. The major difference between an experimental (X-ray) protein 3D structure and a model, in general, is the substantially lower accuracy of the latter [109]. The accuracy of the protein models may vary significantly, based on the availability of an appropriate structural template and the degree of targettemplate similarity, from ~1 Å RMSD (high-sequence similarity to templates) to >6 Å RMSD (low-sequence similarity to templates, or no templates). Thus, the docking procedure has to be capable of tolerating very significant structural inaccuracies. Obviously, docking cannot yield greater precision than the precision of the participating protein structures. Even the low precision of ~10 Å relative displacement of the two proteins, however, results in meaningful predictions of the binding interfaces and the gross structural features of the complex. Vakser et al. reported [63] the application of ab initio docking at low resolution to X-ray protein structures from a non-redundant database of co-crystallized protein/protein complexes. The results of the study were further analyzed in a subsequent report [22] using various statistical models. The same techniques have been applied to docking of protein models of different accuracies [110]. To simulate the precision of protein models, all proteins in the protein/protein database were structurally modified in the range of 1 – 10 Å RMSD, with 1 Å intervals. A sophisticated procedure was specifically designed and implemented for that purpose. All resulting models of the proteins were docked. The statistical significance of the docking was analyzed, and the results were correlated with the precision of the models. The data showed that even highly imprecise protein models (>6 Å RMSD) still yield structurally meaningful docking results, that are accurate enough to predict binding interfaces and to serve as starting points for further structural analysis. An example of docking protein models of low accuracy is
Current Pharmaceutical Biotechnology, 2008, Vol. 9, No. 2
63
shown in Fig (3). The study demonstrated the applicability of existing ab initio docking techniques to genome-wide modeling of protein/protein interactions. Although currently, primarily due to the lack of the structural templates, the docking methodology is largely ab initio (not knowledge-based), the knowledge-based techniques promise greater accuracy and precision. Even with the current very limited amount of available protein-protein structural templates, the template-based approaches to genomewide structural modeling of protein complexes are quite successful [1, 73, 111]. DOCKING IN TERTIARY-STRUCTURE PREDICTION The interaction of secondary structure elements in protein structures may be formulated in terms of docking, even though docking is traditionally considered to be a problem of matching two separate molecules. The main difference in matching secondary structure elements and matching separate molecules is in the constraints imposed by the environment. A number of studies explored the applicability of docking to secondary structure packing [30, 112-115]. A multiplicity of physicochemical factors obviously plays a role in the packing of secondary structure elements in proteins and in the formation of protein complexes. However, the well-known tight packing of structural elements suggests the importance of the geometric fit. Steric complementarity in protein interactions has been studied extensively [116]. Obviously, the role of steric complementarity in the interaction of secondary structure elements deserves similar attention. Earlier studies of this subject were primarily focused on helix-helix packing (e.g., refs. [117-122]). One reason was the limited number of high-quality crystal structures, mostly containing helices. A traditional biochemical view on interactions of secondary structure elements largely neglected geometric complementarity as an important factor (with the exception of helix-helix interactions). A docking algorithm
Fig. (3). Results of the low-resolution docking of trypsin and BPTI. The experimental structures are on the left and the low-resolution models (RMSD = 6 Å, both trypsin and BPTI) are on the right. The dark gray spheres are the BPTI center of mass in 100 lowest energy positions. The light gray sphere (indicated by an arrow) is the BPTI center of mass in the co-crystallized complex. For comparison, the experimental structure of trypsin (dark gray backbone) is overlapped with the model. The docking of the models clearly preserves the cluster of correct predictions in the area of the binding site.
Vakser and Kundrotas
64 Current Pharmaceutical Biotechnology, 2008, Vol. 9, No. 2
based on geometric complementarity was applied to a comprehensive database of secondary structure elements derived from the PDB [30]. The results show that the steric fit plays an important role in the interaction of all secondary structure elements. Docking procedures have started to be utilized in protein-structure prediction [113, 115, 123]. In such cases, the secondary structure elements are docked by rigid-body procedures followed by structural refinement. Docking approaches are popular in modeling the structure of transmembrane (TM) helix bundles in G-protein coupled receptors (GPCR) and other integral membrane proteins. The few existing crystal structures of integral membrane proteins provide useful information on the TM bundle configurations. The TM helices are roughly parallel to each other; they are of similar length (determined by the thickness of the membrane), and are well packed. Thus, it is reasonable to assume that the structure of the bundle is determined primarily by the helix-helix interactions, rather than by the interhelical loops (which, of course, still determine the general topology of the bundle). Most helix-helix interfaces in TM bundles are predominantly binary – if two interfaces overlap, one of them is usually dominant. In that regard TM bundles are ideal objects for docking predictions. At the same time helices are simple enough to provide validation ground for new docking concepts (e.g., see [98]). It has been noted that the side chains at the helix-helix interfaces, on average, are shorter than those at the non-interface helix areas [124, 125]. This structural characteristic creates a low-resolution recognition factor that allows one to model the TM bundle at low resolution [114]. However, a high-resolution model of the TM bundle requires explicit conformational search of the helix internal coordinates (primarily side chains). Thus, from the practical point, the high-resolution modeling of TM bundles is currently useful only if accompanied by a set of experimentally derived structural constraints. COMMUNITY-WIDE EVALUATIONS OF DOCKING TECHNIQUES Following dramatic progress in genomics, accompanied by advances in structural and computational biology, the importance of modeling protein/protein interactions has grown significantly. Accordingly, the visibility of protein/protein docking field increased and the protein-docking community began to organize and actively develop community-wide activities. At the First Conference on Modeling of Protein Interactions (MPI) at Charleston, SC, 2001 [126], a number of such activities were discussed and decided upon, including CAPRI and other benchmarking community-wide experiments. These activities were further developed at the subsequent MPI (http://www.bioinformatics.ku.edu/conferences) and CAPRI (http://capri.ebi.ac.uk) conferences and other meetings. Protein/Protein Decoy Sets Protein-structure decoy sets are extremely popular in protein-modeling community. Decoys are protein structures artificially put in a “wrong” (non-native) conformation, which otherwise look like native structures, at least in terms of structural packing. They are used by the developers of modeling techniques to test potentials and scoring functions
in their ability to discriminate false-positive predictions. Several groups put together decoy sets of protein/protein complexes. These sets contain false-positive matches of proteins and are useful in development of better potentials and scoring functions for the discrimination of false-positive predictions. The first set of protein/protein docking decoys was compiled by Vakser’s group in late 90’s. The current version is based in the DOCKGROUND system (http://dockground.bioinformatics.ku.edu/UNBOUND/decoy / decoy.php). Currently, other decoy sets are available from Sternberg’s group (http://www.sbg.bio.ic.ac.uk/docking/all_decoys.html), Baker’s group (http:// depts.washington.edu/bakerpg), Weng’s group (http://zlab.bu.edu/~rong/dock/ software.shtml) and others. Benchmarking Non-redundant benchmark sets of protein-protein complexes are developed in Weng’s group [127] and Vakser’s group [128]. Both contain ~100 complexes of co-crystallized proteins and their components crystallized separately (unbound structures). The idea behind the benchmark is to provide pairs of unbound structures with the correct match known (co-crystallized complex) for the development of algorithms for the prediction of complexes of unbound proteins. The sets are used in the docking community for the development of docking methodology. CASP/CAPRI An important activity in the field of protein-protein docking is the community-wide experiments on Critical Assessment of Structure Prediction (CASP; http://predictioncenter.llnl.gov) and Critical Assessment of Predicted Interactions (CAPRI; http://capri.ebi.ac.uk). These experiments allow a comparison of different computational methods on a set of prediction targets (experimentally determined structures unknown to the predictors). The protein/protein docking category was introduced at CASP2 [69, 129] and has been successfully continued in CAPRI [71, 130, 131]. The CAPRI paradigm solicits yet unpublished structures of cocrystallized protein-protein complexes from experimentalists (primarily, X-ray crystallographers) and distributes the separately crystallized structures of the components to the community of predictors. The CAPRI experiment is conducted on a continued basis, upon the availability of new prediction targets. Currently, almost seven years from its inception, twelve rounds of CAPRI have been completed. CAPRI generated great interest in the protein-docking and proteinmodeling communities and has already led to visible progress in docking methodologies. FUTURE OF DOCKING Greater docking accuracy, suitable for computer-aided drug design, obviously cannot be achieved without adequate accounting for structural flexibility upon complex formation. Thus flexible docking methodologies need to be significantly improved. Experimental and computational techniques yield genome-wide maps of protein-protein interactions with increasingly greater precision, paving the way for future genomewide structural modeling of protein-protein interactions.
Predicting 3D Structures of Protein-Protein Complexes
Such modeling will reveal deep insights into the complexity of life at the molecular level. It will also lead to structural characterization of drug targets and facilitate structure-based drug design. Expert human intervention often plays a crucial role in the quality of the predictions. This limits the usefulness of docking software for the broad community of biomedical scientist, who do not necessarily have extensive expertise in protein recognition. Thus increased utility of automated docking web servers is of great importance. Progress in the development of docking servers will be facilitated by advances in docking methodology and computer hardware. Reliable prediction of the structure of a protein complex without human intervention is an important goal in the rapidly developing protein-docking field.
Current Pharmaceutical Biotechnology, 2008, Vol. 9, No. 2 [28]
[29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39]
ACKNOWLEDGMENTS [40]
The authors are grateful to many colleagues who helped them understand principles of protein recognition and docking methodologies. The authors wish to acknowledge NIH support through R01 GM074255 grant.
[41] [42] [43]
REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27]
Russell, R. B.; Alber, F.; Aloy, P.; Davis, F. P.; Korkin, D.; Pichaud, M.; Topf, M. and Sali, A. (2004) Curr. Opin. Struct. Biol., 14, 313-324. Szilagyi, A.; Grimm, V.; Arakaki, A. K. and Skolnick, J. (2005) Phys. Biol., 2, S1-S16. Platzer, K. E. B.; Momany, F. A. and Scheraga, H. A. (1972) Int. J. Pept. Prot. Res., 4, 187-200. Wodak, S. J. and Janin, J. (1978) J. Mol. Biol., 124, 323-342. Greer, J. and Bush, B. L. (1978) Proc. Natl. Acad. Sci. USA, 75, 303-307. Tsai, C.-J.; Lin, S. L.; Wolfson, H. J. and Nussinov, R. (1996) Crit. Rev. Biochem. Mol. Biol., 31, 127-152. Vajda, S.; Sippl, M. and Novotny, J. (1997) Curr. Opin. Struct. Biol., 7, 222-228. Keskin, O.; Bahar, I.; Badretdinov, A. Y.; Ptitsyn, O. B. and Jernigan, R. L. (1998) Protein Sci., 7, 2578-2586. Glaser, F.; Steinberg, D.; Vakser, I. A. and Ben-Tal, N. (2001) Proteins, 43, 89-102. Zhou, H. and Zhou, Y. (2002) Protein Sci., 11, 2714--2726. Liang, S.; Liu, S.; Zhang, C. and Zhou, Y. (2007) Proteins, 69, 244–253. Richards, F. M. (1977) Ann. Rev. Biophys. Bioeng., 6, 151-176. Dill, K. A. (1990) Biochemistry, 29, 7133-7155. Korn, A. P. and Burnett, R. M. (1991) Proteins, 9, 37-55. Vakser, I. A. and Aflalo, C. (1994) Proteins, 20, 320-329. Young, L.; Jernigan, R. L. and Covell, D. G. (1994) Protein Sci., 3, 717-729. Keskin, O.; Tsai, C. J.; Wolfson, H. and Nussinov, R. (2004) Protein Sci., 13, 1043-1055. Bryngelson, J. D. and Onuchic, J. N.; Socci, N. D.; Wolynes, P. G. (1995) Proteins, 21, 167-195. Dill, K. A. (1999) Protein Sci., 8, 1166-1180. Tsai, C.-J.; Kumar, S.; Ma, B. and Nussinov, R. (1999) Protein Sci., 8, 1181-1190. Shoemaker, B. A.; Portman, J. J. and Wolynes, P. G. (2000) Proc. Natl. Acad. Sci., USA, 97, 8868-8873. Tovchigrechko, A.and Vakser, I. A. (2001) Protein Sci., 10, 15721583. Baker, D. and Lim, W. A. (2002) Curr. Opin. Struct. Biol., 12, 1113. Ruvinsky, A. M. and Vakser, I. A. (2007) Proteins, Early View online. O'Toole, N. and Vakser, I. A. (2007) Proteins, Early View on-line. Ruvinsky, A. M. and Vakser, I. A., (2007) submitted. Hunjan, J.; Tovchigrechko, A.; Gao, Y. and Vakser, I. A. (2007) Proteins, accepted.
[44] [45] [46] [47] [48]
[49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61]
[62] [63] [64] [65] [66] [67] [68] [69] [70]
65
Ponder, J. W. and Richards, F. M. (1987) In Cold Spring Harbor Symposia on Quantitative Biology; Cold Spring Harbor Laboratory, Vol. LII, pp. 421-428. Hubbard, S. J. and Argos, P. (1994) Protein Sci., 3, 2194-2206. Jiang, S.; Tovchigrechko, A. and Vakser, I. A. (2003) Protein Sci., 12, 1646-1651. Douguet, D.; Chen, H. C.; Tovchigrechko, A. and Vakser, I. A. (2006) Bioinformatics, 22, 2612–2618. Lawrence, M. C. and Colman, P. M. (1993) J. Mol. Biol., 234, 946950. Janin, J. (1995) Biochimie, 77, 497-505. Mariuzza, R. A. and Poljak, R. J. (1993) Curr. Opin. Immunol., 5, 50-55. Lo Conte, L.; Chothia, C. and Janin, J. (1999) J. Mol. Biol., 285, 2177-2198. Tsai, C.-J. and Nussinov, R. (1997) Protein Sci., 6, 1426-1437. Jones, S. and Thornton, J. M. (1996) Proc. Natl. Acad. Sci. USA, 93, 13-20. Lijnzaad, P. and Argos, P. (1997) Proteins, 28, 333-343. McCoy, A. J.; Epa, V. C. and Colman, P. M. (1997) J. Mol. Biol., 268, 570-584. Tsai, C.-J.; Lin, S. L.; Wolfson, H. and Nussinov, R. (1997) Protein Sci., 6, 53-64. Larsen, T. A.; Olson, A. J. and Goodsell, D. S. (1998) Structure, 6, 421-427. Ho, C. M. W. and Marshall, G. R. (1990) J. Comput. Aided Mol. Des., 4, 337-354. Peters, K. P.; Fauck, J. and Frommel, C. (1996) J. Mol. Biol., 256, 201-213. Binkowski, T. A.; Naghibzadeh, S. and Liang, J. (2003) Nucleic Acids Res., 31, 3352-3355. Nicola, G. and Vakser, I. A. (2007) Bioinformatics, 23, 789-792. Zhang, B.; Rychlewski, L.; Pawlowski, K.; Fetrow, J. S.; Skolnick, J. and Godzik, A. (1999) Protein Sci., 8, 1104-1115. Elcock, A. H. and McCammon, J. A. (2001) Proc. Natl. Acad. Sci. USA, 98, 2990-2994. Cammer, S. A.; Hoffman, B. T.; Speir, J. A.; Canady, M. A.; Nelson, M. R.; Knutson, S.; Gallina, M.; Baxter, S. M.; and Fetrow, J. S. (2003) J. Mol. Biol., 334, 387-401. Armon, A.; Graur, D. and Ben-Tal, N. (2001) J. Mol. Biol., 307, 447-463. Yao, H.; Kristensen, D. M.; Mihalek, I.; Sowa, M. E.; Shaw, C.; Kimmel, M.; Kavraki, L. and Lichtarge, O. (2003) J. Mol. Biol., 326, 255-261. Elcock, A. H. (2001) J. Mol. Biol., 312, 885-896. Rajamani, D.; Thiel, S.; Vajda, S. and Camacho, C. J. (2004) Proc. Natl. Acad. Sci. USA, 101, 11287-11292. Halperin, I.; Wolfson, H. J. and Nussinov, R. (2004) Structure, 12, 1027-1038. Vakser, I. A. (2004) Structure, 12, 910-912. Tovchigrechko, A. and Vakser, I. A. (2005) Proteins, 60, 296-301. Keskin, O.; Ma, B. and Nussinov, R. (2005) J. Mol. Biol., 345, 1281-1294 Novotny, J.; Handschumacher, M.; Haber, E.; Bruccoleri, R. E.; Carlson, W. B.; Fanning, D. W.; Smith, J. A. and Rose, G. D. (1986) Proc. Natl. Acad. Sci. USA, 83, 226-230. Laskowski, R. A.; Luscombe, N. M.; Swindells, M. B. and Thornton, J. M. (1996) Protein Sci., 5, 2438-2452. Duncan, B. S. and Olson, A. J. (1993) Biopolymers, 33, 219-229. Vakser, I. A.( 1995) Protein Eng., 8, 371-377. Vakser, I. A. and Nikiforovich, G. V. (1995) In Methods in Protein Structure Analysis, M.Z. Atassi; E. Appella, eds.; Plenum Press: New York, pp. 505-514. Vakser, I. A. (1996)Biopolymers, 39, 455-464. Vakser, I. A.; Matar, O. G. and Lam, C. F. (1999) Proc. Natl. Acad. Sci. USA, 96, 8477-8482. Vakser, I. A. (1996) Protein Eng., 9, 741-744. Berg, O. G. and von Hippel, P. H. (1985) Ann. Rev. Biophys. Biophys. Chem., 14, 131-160. McCammon, J. A. (1998) Curr. Opin. Struct. Biol., 8, 245-249. Camacho, C. J.; Gatchell, D. W.; Kimura, S. R. and Vajda, S. (2000) Proteins, 40, 525-537. Vakser, I. A. (1996) Protein Eng., 9, 37-41. Vakser, I. A. (1997) Proteins Suppl., 1, 226-230. Venclovas, C.; Zemla, A.; Fidelis, K. and Moult, J. (2003) Proteins, 53, 585-595.
Vakser and Kundrotas
66 Current Pharmaceutical Biotechnology, 2008, Vol. 9, No. 2 [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85]
[86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101]
Lensink, M. F.; Mendez, R. and Wodak, S. J. (2007) Proteins, 69, 704–718. Gunther, S.; May, P.; Hoppe, A.; Frommel, C. and Preissner, R. (2008) Proteins, Early View on-line. Lu, L.; Lu, H. and Skolnick, J. (2002) Proteins, 49, 350-364. Gray, J. J.; Moughon, S.; Wang, C.; Schueler-Furman, O.; Kuhlman, B.; Rohl, C. A. and Baker, D. (2003) J. Mol. Biol., 331, 281-299. Kozakov, D.; Brenke, R.; Comeau, S. R. and Vajda, S. (2006) Proteins, 65, 392-406. Mintseris, J.; Pierce, B.; Wiehe, K.; Anderson, R.; Chen, R. and Weng, Z. (2007) Proteins, 69, 511–520. Moont, G.; Gabb, H. A. and Sternberg, M. J. E. (1999) Proteins, 35, 364-373. Lu, H.; Lu, L. and Skolnick, J. (2003) Biophys. J., 84, 1895-1901. Katchalski-Katzir, E.; Shariv, I.; Eisenstein, M.; Friesem, A. A.; Aflalo, C. and Vakser, I. A. (1992) Proc. Natl. Acad. Sci. USA, 89, 2195-2199. Jiang, F. and Kim, S.-H. (1991) J. Mol. Biol., 219, 79-102. Palma, P. N.; Krippahl, L.; Wampler, J. E. and Moura, J. J. G. (2000) Proteins, 39, 372 - 384. Jiang, F.; Lin, W. and Rao, Z. (2002) Protein Eng., 15, 257-263. Nussinov, R. and Wolfson, H. J. (1991) Proc. Natl. Acad. Sci. USA, 88, 10495-10499. Fischer, D.; Lin, S. L.; Wolfson, H. J. and Nussinov, R. (1995) J. Mol. Biol., 248, 459-477. Schneidman-Duhovny, D.; Inbar, Y.; Polak, V.; Shatsky, M.; Halperin, I.; Benyamini, H.; Barzilai, A.; Dror, O.; Haspel, N.; Nussinov, R. and Wolfson, H. J. (2003) Proteins, 52, 107-112. Gardiner, E. J.; Willett, P. and Artymiuk, P. J. (2001) Proteins, 44, 44-56. Totrov, M. and Abagyan, R. (1994) Nat. Struct. Biol., 1, 259-263. Fernandez-Recio, J.; Totrov, M. and Abagyan, R. (2002) Protein Sci., 11, 280-291. Trosset, J. Y. and Scheraga, H. A. (1999) J. Comput. Chem., 20, 412-427. Fitzjohn, P. W. and Bates, P. A. (2003) Proteins, 52, 28-32. Miller, D. W. and Dill, K. A. (1997) Protein Sci., 6, 2166-2179. Bouzida, D.; Rejto, P. A. and Verkhivker, G. M. (1999) Int. J. Quantum. Chem., 73, 113-121. Camacho, C. J.; Weng, Z.; Vajda, S. and DeLisi, C. (1999) Biophys. J., 76, 1166-1178. Zhang, C.; Chen, J. and DeLisi, C. (1999) Proteins, 34, 255-267. Piela, L.; Kostrowicki, J. and Scheraga, H. A. (1989) J. Phys. Chem., 93, 3339-3346. Ma, J. and Straub, J. E. (1994) J. Chem. Phys., 101, 533-541. Trosset, J.-Y. and Scheraga, H. A. (1998) Proc. Natl. Acad. Sci. USA, 95, 8011-8015. Pappu, R. V.; Marshall, G. R. and Ponder, J. W. (1999) Nat. Struct. Biol., 6, 50-55. Hart, R. K.; Pappu, R. V. and Ponder, J. W. (2000) J. Comput. Chem., 21, 531-552. Abagyan, R. A. and Batalov, S. (1997) J. Mol. Biol., 273, 355-368. Kundrotas, P. J. and Alexov, E. (2006) Bioch. Biophys. Acta, 1764, 1498-1511.
Received: January 24, 2008
Accepted: January 30, 2008
[102] [103] [104]
[105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131]
Grimm, V.; Zhang, Y. and Skolnick, J. (2006) Proteins, 63, 457465. Cockell, S. J.; Oliva, B. and Jackson, R. M. (2007) Bioinformatics, 23, 573–581. Espadaler, J.; Aragues, R.; Eswar, N.; Marti-Renom, M. A.; Querol, E.; Aviles, F. X.; Sali, A. and Oliva, B. (2005) Proc. Natl. Acad. Sci. USA, 102, 7151-7156. Korkin, D.; Davis, F. P.; Alber, F.; Luong, T.; Shen, M.; Lucic, V.; Kennedy, M. B. and Sali, A. (2006) PLoS Comp. Biol., 2, 13651376. Davis, F. P.; Barkan, D. T.; Eswar, N.; McKerrow, J. H. and Sali, A. (2007) Protein Sci., 16, 2585-2596. Burley, S. K. (2000) Nat. Struct. Biol., 7, 932-934. Sali, A.; Glaeser, R.; Earnest, T. and Baumeister, W. (2003) Nature, 422, 216 - 225. Moult, J.; Fidelis, K.; Zemla, A. and Hubbard, T. (2003) Proteins, 53, 334-339. Tovchigrechko, A.; Wells, C. A. and Vakser, I. A. (2002) Protein Sci., 11, 1888-1896. Aloy, P. and Russell, R. B. (2004) Nat. Biotech., 22, 1317 - 1321. Ausiello, G.; Cesareni, G. and Helmer-Citterich, M. (1997) Proteins, 28, 556-567. Yue, K. and Dill, K. A. (2000) Protein Sci., 9, 1935-1946. Vakser, I. A.; Jiang, S. (2002) Methods Enzym., 343, 313-328. Inbar, Y.; Benyamini, H.; Nussinov, R. and Wolfson, H. J. (2003) Bioinformatics, , 19, i158-i168. Halperin, I.; Ma, B.; Wolfson, H.; Nussinov, R. (2002) Proteins, 47, 409-443. Richmond, T. J. and Richards, F. M. (1978) J. Mol. Biol., 119, 537555. Cohen, F. E.; Richmond, T. J. and Richards, F. M. (1979) J. Mol. Biol., 132, 275-288. Chothia, C.; Levitt, M. and Richardson, D. (1981) J. Mol. Biol., 145, 215-250. Murzin, A. G. and Finkelstein, A. V. (1988) J. Mol. Biol., 204, 749-769. Reddy, B. V. B. and Blundell, T. L. (1993) J. Mol. Biol., 233, 464479. Walther, D.; Eisenhaber, F. and Argos, P. (1996) J. Mol. Biol., 255, 536-553. Haspel, N.; Tsai, C. J.; Wolfson, H. and Nussinov, R. (2002) Protein Sci., 12, 1177-1187. Jiang, S. and Vakser, I. A. (2000) Proteins, 40, 429-435. Jiang, S. and Vakser, I. A. (2004) Protein Sci., 13, 1426-1429. Vajda, S.; Vakser, I. A.; Sternberg, M. J. E. and Janin, J. (2002) Proteins, 47, 444-446. Mintseris, J.; Wiehe, K.; Pierce, B.; Anderson, R.; Chen, R.; Janin, J. and Weng, Z. (2005) Proteins, 60, 214-216. Gao, Y.; Douguet, D.; Tovchigrechko, A. and Vakser, I. A. (2007) ,Proteins, 69, 845-851. Dixon, J. S. (1997) Proteins, Suppl. 1, 198-204. Wodak, S. J. (2007) Proteins, 69, 697-698. Janin, J. (2007) Proteins, 69, 699–703.