Robotic Path Planning and Protein Complex Modeling Considering ...

1 downloads 0 Views 1MB Size Report
Robotic Path Planning and Protein Complex Modeling. Considering ... decoys and decrease the number of false positives output by the original algorithm. One of these at ..... biological problem with implications reaching other life science elds.
270

Genome Informatics 17(2): 270{278 (2006)

Robotic Path Planning and Protein Complex Modeling Considering Low Frequency Intra-Molecular Loop and Domain Motions Carlos A. Del Carpio

1

[email protected] Hideyuki Tsuboi

1

[email protected] 1

Akira Endou

[email protected]

Pei Qiang

1

Eiichiro Ichiishi

[email protected] Michihisa Koyama

1

[email protected] Hiromitsu Takaba

1

[email protected] Akira Miyamoto

2

[email protected] 1

Nozomu Hatakeyama

[email protected] Momoji Kubo

1;3

[email protected]

1;4

[email protected] 1 2 3 4

Department of Applied Chemistry, Graduate School of Engineering, Tohoku University, 6-6-11-1302 Aoba, Aramaki, Aoba-ku, Sendai 980-8579, Japan Biomedical Engineering Research Organization, Tohoku University, Aoba-ku, Sendai 980-8579, Japan PRESTO, Japan Science and Technology Agency, 4-1-8 Honcho, Kawaguchi, Saitama 332-0012, Japan New Industry Creation Hatchery Center (NICHe), Tohoku University, 6-6-10 Aoba, Aramaki, Aoba-ku, Sendai 980-8579, Japan

Abstract A novel algorithm is introduced to deal with intra-molecular motions of loops and domains that undergo proteins at interaction with other proteins. The methodology is based on complex energy landscape sampling and robotic motion planning. Mapping high exibility regions on the protein underlies the proposed algorithm. This is the rst time this type of research has been reported. Application of the methodology to several protein complexes where remarkable backbone rearrangement is observed shows that the new algorithm is able to deal with the problem of change of backbone conformation at protein interaction. We have implemented the module within the system MIAX (Macromolecular interaction assessment computer system) and together with our already reported soft and exible docking algorithms we have developed a powerful tool for protein function analysis as part of wide genome function evaluation. Keywords:

protein-protein interaction, robotic path planning, molecular exibility, protein complex

1 Introduction Genome wide functional analysis has taken the stage in the post genome era and it is experiencing exponential growth fueled by advances in information technologies and hardware sophistication. Central to these e orts has been the outstanding work directed to solve the problem of protein-protein interaction and complex formation, ubiquitous to the entire spectra of intracellular biochemical processes in living organisms. Diculties intrinsic to the problem have, however, prevented major breakthroughs in the eld and unrelenting e ort to overcome them is being devoted, as can be epitomized by recent international contests like CAPRI [7], to assess the problem solving ability of automatic methodologies developed for this purpose. A rst attempt to deal with the problem was the development of

Robotic Path Planning and Protein Complex Modeling

271

algorithms to dock proteins, the structures of which are treated as rigid bodies, neglecting any intramolecular rearrangement undergone by the interacting molecules. These e orts have resulted in the development of several algorithms for protein rigid docking [4, 8, 9, 13, 16], one such widely referenced being the grid scoring algorithm, proposed by Katchalski-Katzir et al. [10]. Since the main diculty of the problem stems precisely from the fact that proteins are intrinsically exible molecules, several improvements to the seminal technique have been implemented in order to rank higher close-to-native decoys and decrease the number of false positives output by the original algorithm. One of these attempts is constituted by our recently developed system MIAX, which endows the receptor and ligand molecules a exibility character that embedded within the grid scoring algorithm, in what we call a soft docking algorithm, performs to a great extent better than the rigid body docking alone, especially in cases where independently crystallized structures known to interact are docked [3] (the unbound docking problem). Notwithstanding this limited exibility and the better performance of the algorithm, particularly in recognizing the interaction sites on both molecules, the central problem stemming from collective motions within each protein molecule has to be focused and addressed from a completely di erent angle. Common sense in the manipulation of this type of intra-molecular motions within the protein points namely at molecular dynamics simulations as the most suitable technique at hand to determine those vibrations and partial motions. Nevertheless, the intrinsically large simulation times and computational infrastructure associated with their use prevent the production of enough data from which any plausible and meaningful motion of this nature may be singled out. This fact makes of this conventional methodology -at best- the last alternative of choice, especially in cases in which a thorough study of a large number of molecular systems is targeted. In the present article we present a novel algorithm to deal with the conformational change that proteins undergo upon association, especially those generated by collective motions of atoms involved in large loops and/or exible domains proper to interacting proteins. The algorithm takes advantage of methodologies to sample protein energy landscapes and motion planning adopting well known paradigms in the eld of robotics, where robot motion path planning is a well solved problem. The implementation of such techniques for the analysis of complex formation enables building the most plausible road map that leads to the nal complex con guration, involving conformational changes dicult to predict by other methodologies. Underlying the proposed algorithm for molecular motion planning at interaction is the analysis of regions or domains (including loops) of high exibility in the interacting units. The exibility of a protein molecule is computed in terms of the number of degrees of freedom it possesses and it leads to map regions of high and low exibility on the isolated proteins as well as the complex. The algorithm based on graph theoretical instances has been previously reported elsewhere [10]. Hitherto, protein docking related work considering the exibility of a protein molecule has been limited to the extrapolation of conformation of the side chains of amino acids on the surface of the molecule thought to be directly involved in the interaction. Soft docking has been the concept proposed to deal with this type of exibility in proteins by ourselves and other authors [5, 13]. The algorithm proposed here transcends the consideration of this type of exibility, and assesses the changes in the backbone of the protein (loop and domain motion) that leads the molecule to interact with high speci city with any other protein, macromolecule or even small organic compounds. We have applied our methodology to the study of the conformation changes undergone by the subunits in six protein complexes, where the backbone change is remarkable. The results show that the technique proposed to deal with the task is quite encouraging, and that a broad spectrum of applications for the algorithm itself can be depicted as well as for the results derived from it.

2 Methodology Modeling backbone conformational change at protein interaction equates to assume a motion trajectory from one equilibrium state to another downstream in the energy landscape of con guration for

272

Del Carpio et al.

the complex molecule and its components. Motion modeling for proteins has been mainly performed using normal mode dynamics (NM) and molecular dynamics (MD). While the former renders a set of independent harmonic oscillations about the equilibrium atomic positions useful for building informative visual models in terms of frequency and amplitude of movements, the latter, which implies solving Newton's equations of motion, and in spite of representing a rather realistic approximation to molecular motion even including solvent e ects explicitly, is limited by the computational costs which derives in the short time scales (of the nanosecond order) attainable. Furthermore, the complexities of the resultant trajectories have no trivial interpretations, making extremely dicult the study of long range collective motions [3].

Figure 1: Building a roadmap to complex formation conneting points in the landscape of complex con guration. E =energy, , ' backbone torsional angles (10 1 ). Here we propose a hybrid methodology to approach the problem. The central aspect to this hybrid methodology is the incorporation of motion planning to map the con guration landscape of the protein complex. The probabilistic roadmap method (PRM) is therefore used to generate a large set of di erent con gurations pathways that provide information about the protein complex energy landscape. This methodology has been applied by Thomas et al. [14], to the problem of protein folding. Here we apply it for the rst time to generate the con guration change in proteins along the interaction path that leads to produce a close-to-native complex structure. Although the protein folding problem and the protein-protein interaction problem are of the same nature, the approaches to handle them di er because of their innate implications, thus, interaction implies also function expression beyond structural characteristics alone. The hybrid feature of the methodology we propose stems from the fact that before performing the generation of the con guration landscape for the complex those regions on the interacting subunits prone to higher exibility change are mapped in advance. The most relevant e ect of this exibility mapping is the reduction of the con guration space to be searched by the PRM. On the other hand the authors have reported on a methodology to account for protein

exibility reduction or rigidity increase at complex formation [6]. The methodology, based on graph theoretical instances, is able to map these regions on protein complexes of known 3D structure. The objective that the present hybrid methodology aims at is the evaluation of the decoys generated by the soft docking module embedded in our general system for protein docking MIAX [5] taking as a further evaluation criteria stability of the decoy accounting for the intra-molecular motion of loops and domains involved in the interaction.

Robotic Path Planning and Protein Complex Modeling 2.1

273

Probabilistic Roadmap Method for Protein Complex Formation

The process underlying the PRM introduced here is the sampling of several points in the conformational space of the exible domains that have been previously mapped on the interface of the interacting units. The roadmap is built by connecting these points generated in the conformation space so as to form a graph or the roadmap to the nal complex con guration (Figure 1). Since edges between points of the graph (con gurations) can be weighted according to the energetic feasibility of the transitions that they re ect, optimal interaction pathways can be extracted using standard graph search techniques. The most expensive process in terms computing time of this approach is the generation of conformations for domains involving loops or closed chains. Several algorithms exist in the literature to approach the problem of loop closure in robotic motion planning, however their applicability to the problem dealt here is prevented by the intrinsic complexity of the algorithms that derive in less computational e ectiveness. We have developed here in addition to the general algorithm for inferring the road map an algorithm for fast loop generation and it is exposed in the following section.

Figure 2: The reverse kinematic problem and the loop closure constraints in the generation of loop conformations. 2.2

Inverse Kinematics and the Problem of Protein Loop Closure

Given the positions of the end points of a chain (base and end e ectors), characterizing the geometry of an open kinematic chain composed of rigid links has been typically referred to as the inverse kinematics problem, and arises very frequently in the eld of robotics (Figure 2). Many algorithms have been proposed on a problem speci c basis to solve inverse kinematics problems on closed chain systems [11]. In chemistry the problem is solved implicitly when the structures of cyclic molecules are modeled. To sample biological macro-molecular conformations the algorithm must satisfy the loop closure constraints for all the loops in the molecule, this condition results in a reduced number of valid conformations when these conformations are generated by randomly perturbing the torsional angles of the links. Very often a two step solution to the problem is adopted, the rst step being the generation of a conformation by rotating some of the bonds, and the second step trying to re-establish the violated constraints by the rotations in the rst step. The algorithm we propose here, is also a two stage process, but that allows to map the entire conformational space of molecular structures containing loops by random perturbation of the torsional angles. Concretely, the algorithm consists in obtaining key sets of torques (1 ; 2 ;    ; i ;    ; n 1 ; n ) for the n torsional angles in a particular loop of the molecule for randomly generated force vectors (Figure 3) that when applied to each joint of the loop do not disrupt the loop closure constraints our algorithm, although sharing some similarities

Del Carpio et al.

274

with the algorithms of Lee et al. [12] and Thorpe et al. [15], is substantially di erent since theirs are of the type of algorithms in which in the second step namely re-attaining loop closure constraints is performed. Ours on the other hand can be categorized as a learning algorithm in its rst stage, and the application of the force vectors in the second to map the conformation space of the loops. Despite the rst stage for generating the adequate sets of torques, which is a learning algorithm that is the controlling step in the sampling of loop conformations, the algorithm performs well, since the second stage is straightforward. Therefore a trade o between the learning process of the rst stage of in the proposed algorithm with the interpolation of new angles to re-attain the loop closure constraints in other algorithms can be drawn. Fn

τn

rn

F㧞

Fi

Xo F1 Yo Yo r1 r1

r㧞

ri τ㧞

τi

τ1 Zo Zo

Figure 3: Computation of torque sets from randomly applied forces to loop joints. The sets of torques must represent angular variations that do not disruct the loop closure constraints. Since from elemental physics for a particle with unit mass:

 = ~r  F~

(1)

where F is the force vector and r is the vector from the axis of rotation to the point on which the force is acting, and on the other hand the magnitude of the torque is given by:

 = rF sin 

(2)

where r and F are the magnitudes of the distance and force vectors, the computation of the rotation angle due to a certain applied force is straightforward. The learning algorithm generates force vectors that when applied to the joints must not disrupt the loop closure constrain (the distance between the last joint and the end point of the loop Figure 3). Applying the algorithm to the generation of loop conformations for a typical loop in a protein structure is shown in Figure 4. 2.3

Node Generation and Road Map Building for Complex Formation including Intra-molecular Collective Motions

As mentioned earlier the algorithm proposed in this paper has as a-priori information the rough structure of the complex and the information on the change in exibility of domains of the interact-

Robotic Path Planning and Protein Complex Modeling

275

ing subunits. Then the algorithm proceeds to map the interaction landscape by generating nodes corresponding to conformations in the neighborhood of the initial conformation. Original Loop

Figure 4: Loop generation algorithm. Sets of generated loops according to singular Torque sets. Optimal torque sets for loops if existent in the subunits is performed as a preprocessing stage. A complex conformation is accepted and added to the roadmap based on its potential energy. Here we use an overall atom force eld to compute the potential energy of the complex conformation. Thus a conformation q for the complex is accepted with probability P (q ) given by the following equation:

8 >< P (q) = > :

1 Emax Eq Emax Emin 0

if Eq < Emin if Emin  Eq  Emax if Eq > Emax

(3)

The algorithm applied to the problem of protein complex formation gives insights not only on the nal conformation of the complex but also on the dynamics of the protein subunits as the interaction undergoes through the interacting axis. We have applied the algorithm to several complexes where remarkable protein structure change at interaction has been observed. This is discussed in the following section.

3 Results and Discussion To validate the proposed algorithm, we have focused in complexes where perceivable change in backbone conformation, especially at the interaction interface, has distorted the initial conformations of the interacting units. As expressed before, the initial step in the algorithm is the perception of regions of high exibility on the proteins. The following step is the random generation of interaction trajectories, the starting con guration being that of the decoy output by our system for protein soft docking. We focused in six protein complexes, for which noticeable change in backbone conformation exists. The complexes used for the study are PDB:1A0O, PDB:1CGI, PDB:1FIN, PDB:1TGS, PDB:1GOT, and PDB:3HHR. Here the models for these complexes are decoys obtained by docking the unbound proteins found in PDB using our soft docking system embedded in MIAX [5].

Del Carpio et al.

276 a

1A0O

b

c

1FIN

d

e

1CGI

1TGS

1GOT

f

3HHR

Figure 5: Fractional trajectories to complex formation by the PRM proposed. Left: conformations in the neighborhood of the nal conformation. Right: Final con guration of subunit in the complexes. (a) 1D0O, (b) 1CG1, (c) 1FIN, (d) 1TGS, (e) 1GOT, and (f) 3HHR.

Figure 5 shows the results visually in ribbon models for the units where the change in backbone conformation was namely observed. The model at the left of each gure shows the superposition of a number of structures constituting the complex formation trajectory deduced from the road map built by the mapping of the conformation landscape as explained in the methodology section. Therefore the superposition of the structures is evident in the regions of high exibility. The gure at the right of each model is the most similar conformation to the conformation of the molecule observed in the complex extracted from PDB. Similarities in terms of RMS for each model are summarized in Table 1. Inspection of this table shows that the RMS is improved as compared to that of the soft docking alone, and that the di erence stems exactly from the change undergone by the interacting unit in their backbone conformation. Figure 5 illustrates visually the excellent behavior, in general, of the loop generation algorithm which satis es the loop closure constraints. Furthermore the gure also shows a plausible roadmap in the adoption of the nal backbone con guration at interaction. It is evident that for the receptors of complexes PDB:1FIN and PDB:3HHR the generated loops are still remarkable di erent from the last conformation. This is due in part to a reduced number of torque sets for the joints found in only one run of our learning algorithm. This problem can be remarked a problem of processing times rather than an intrinsic ap in the methodology we propose.

Robotic Path Planning and Protein Complex Modeling

277

4 Conclusion We propose a methodology to predict major backbone conformation change undergone by protein units at complex formation. The methodology is based on the probabilistic road map methods broadly used in the eld of robotics. We have also developed an algorithm for the generation of loop conformation respecting the loop closure constrains. The results of the proposed methodology are very encouraging since its application to the subtle problem of loop conformation change at interaction can be handled adequately by the proposed method. Although our validation set is still small, improvement in the processing times will be translated in terms of better torque sets for the closed chain conformation prediction and the generation of the conformers in general. To our knowledge, there is no other algorithm like ours, which based on exibility mapping and PRM building approaches this dicult biological problem with implications reaching other life science elds. There are other attempts, however, like the ROCK program [17] where for example, random-walk sampling of routatble bonds is performed to explore correlated motions by sampling dihedral angles; the multi-copy representation within a reduced protein model of ensembles of probable loop conformations proposed by Bastard et al. [2], constitutes another approach to the problem, while a combination of multiscale modeling of macromolecular conformational changes combining concepts of rigidity and elastic network theory to consider mobile protein regions in addition to exible ones to model correlated motions [1] also points to way of solving this problem, but in all this cases the approaches are substantially di erent to the algorithm proposed here. Table 1: Comparison of RMS for Soft Docking and RPM Docking. Complex RMS (After Soft Docking) (Backbone) 1A0O 1CGI 1FIN 1TGS 1GOT 3HHR

7.2 4.9 12.0 11.0 13.6 4.8

RMS (After PRM) All Atoms Carbon Backbone 3.50 3.26 3.24 4.72 4.24 4.20 7.49 7.19 7.16 5.30 5.0 5.0 2.60 1.83 1.75 2.38 2.0 2.0

References [1] Ahmed, A. and Gohlkie, H., Multiscale modeling of macromolecular conformational changes combining concepts from rigidity and elastic network theory, Proteins, 63:1038{1051, 2006. [2] Bastard, K., Prevost, C., and Zacharias, M., Accounting for loop exibility during protein-protein docking, Proteins, 62:956{969, 2006. [3] Dauber-Osguthorpe, P., Osguthorpe, D.J., Stern, P.S., and Moult, J., Low frequency motion in proteins, J. Compt. Physics, 151:169{189, 1999. [4] Del Carpio, C.A., Ichiishi, E., Yoshimori, A., and Yoshikawa, T., MIAX: A new paradigm for modeling boiomacromolecular interactions and complex formation in condensed phases, Proteins, 48:696{732, 2002. [5] Del Carpio, C.A., Peissker, T., Yoshimori, A., and Ichiishi, E., Docking unbound proteins with MIAX: A novel algorithm for protein-protein soft docking, Genome Inform., 14:238{249, 2003.

Del Carpio et al.

278

[6] Del Carpio, C.A., Shaikh, A.R., Ichiishi, E., Koyama, M., Kubo, M., Nishijima, K., and Miyamoto, A., A graph theoretical approach for analysis of protein exibility change at protein complex formation, Genome Inform., 15:148{160, 2005. [7] Fernandez-Recio, J., Abagyan, R., and Totrov, M., Improving CAPRI predictions: Optimized desolvation for rigid-body docking, Proteins, 60:308{313, 2005. [8] Fischer, D., Lin, S.L., Wolfson, L., and Nussinov, R., A geometry-based suite of molecular docking processes, J. Mol. Biol., 248:459{477, 1995. [9] Janin, J., Quantifying biological speci city: The statistical mechanics of molecular recognition, Proteins, 28;153{161, 1997. [10] Katchalski-Katzir, E., Shariv, I., Eisenstein, M., Friesema, A.A., A alo, C., and Vakser, I.A., Molecular surface recognition: Determination of geometric t between proteins and their ligands by correlation techniques, Proc. Natl. Acad. Sci. USA, 89:2195{2199, 1992. [11] Kolodny, R., Guibas, L., Levitt, M., and Koehl, P., Inverse kinematics in biology: The protein loop closure problem, The International Journal of Robotics Research, 24:151{163, 2005. [12] Lee, A., Streinu, I., and Brock, O., A methodology for eciently sampling the conformation space of molecular structures, Phys. Biol., 2:108{115, 2005. [13] Palma, P.N., Krippahl, L., Wampler, J.E., and Moura, J.J.G., BIGGER: A new (soft) docking algorithm for predicting protein interactions, Proteins, 39:372{384, 2000. [14] Thomas, S., Song, G., Amato, N.M., Protein folding by motion planning, Phys. Biol., 2:148{155, 2005. [15] Thorpe, M.F. and Ming, L., Macromolecular exibility, Phil.

, 84:1323{1331, 2005.

Mag.

[16] Weng, Z, Vajd, S., and Delisi, C., Prediction of protein complexes using empirical free energy functions, Protein Sci., 5:614{626, 1996. [17] Zavodszky, M.K., Lei, M., Thorpe, M.F., Day, A.R., and Kuhn, L.A., Modeling correlated mainchain motions in proteins for exible molecular recognition, Proteins, 57:243{261, 2004.