Send Orders for Reprints to
[email protected] Current Protein and Peptide Science, 2015, 16, 701-717
701
Computational Biology Tools for Identifying Specific Ligand Binding Residues for Novel Agrochemical and Drug Design Izabella Agostinho Pena Neshich1, Leticia Nishimura2, Fabio Rogério de Moraes3, Jose Augusto Salim1, Fabian Villalta-Romero, Luiz Borro1, Inacio Henrique Yano5, Ivan Mazoni5, Ljubica Tasic4, Jose Gilberto Jardine5 and Goran Neshich5,* 1
Unicamp, Campinas, Brazil; 2IQSC-USP, São Carlos, Brazil; 3UNESP, Sao Jose do Rio Preto, Brazil; 4 Chemical Biology Laboratory, Organic Chemistry Department, Institute of Chemistry, UNICAMP. Campinas, SP, Brazil; 5Embrapa Agricultural Informatics, Campinas, Brazil Abstract: The term “agrochemicals” is used in its generic form to represent a spectrum of pesticides, such as insecticides, fungicides or bactericides. They contain active components designed for optimized pest management and control, therefore allowing for economically sound and labor efficient agricultural production. A “drug” on the other side is a term that is used for compounds designed for controlling human diseases. Although drugs are subjected to much more severe testing and regulation procedures before reaching the market, they might contain exactly the same active ingredient as certain agrochemicals, what is the case described in present work, showing how a small chemical compound might be used to control pathogenicity of Gram negative bacteria Xylella fastidiosa which devastates citrus plantations, as well as for control of, for example, meningitis in humans. It is also clear that so far the production of new agrochemicals is not benefiting as much from the in silico new chemical compound identification/discovery as pharmaceutical production. Rational drug design crucially depends on detailed knowledge of structural information about the receptor (target protein) and the ligand (drug/agrochemical). The interaction between the two molecules is the subject of analysis that aims to understand relationship between structure and function, mainly deciphering some fundamental elements of the nanoenvironment where the interaction occurs. In this work we will emphasize the role of understanding nanoenvironmental factors that guide recognition and interaction of target protein and its function modifier, an agrochemical or a drug. The repertoire of nanoenvironment descriptors is used for two selected and specific cases we have approached in order to offer a technological solution for some very important problems that needs special attention in agriculture: elimination of pathogenicity of a bacterium which is attacking citrus plants and formulation of a new fungicide. Finally, we also briefly describe a workflow which might be useful when research requires that model structures of target proteins are firstly generated (starting from genome sequences), followed by identification of ligand-target sites at the surface of those modeled structures, then application of procedures that adequately prepare both protein and ligand structures (the latter also involving filtration that satisfies acceptable adsorption/desorption/metabolism/excretion/toxicity [ADMET] parameters) for virtual high throughput screening (involving docking of ligands to indicated sites) and terminating by ranking of best pairs: target protein with selected ligand.
Keywords: Protein-ligand interactions, interaction nanoenvironment, STING structure-function descriptors, agrochemicals, ligand docking. INTRODUCTION A demand for increasing agricultural production is nowadays an important challenge from the point of view that encompasses environmental issues, as well as, from the point of view of compiling novel technological solutions formulated to help meeting ever increasing demand for larger plant production (what in general terms include food, feed and fiber). Such scenario where a fine balance among strict environmental, safety and efficiency regulations (which current agricultural production is facing) imposes clear demands for creating a new inventory of agrochemicals satisfying *Address correspondence to this author at the Embrapa Agricultural Informatics, Campinas, Brazil; Tel: [55] 19 3211 5774; Fax: [55] 19 3211 5754; E-mail:
[email protected] 18
-
/15 $58.00+.00
requested characteristics. In addition, having such inventory is a crucial factor in conquering a competitive edge at the international market what is a scenario also easily observable in pharmaceutical industry. Although most of the agrochemicals listed in the Pesticide Manual were established from screening programs relied on trial-and-error testing, it is suggested that the acquired knowledge in the past decades about structural biology, computer sciences, chemistry and molecular biology could significantly impact in compound development through rational approaches [1]. Today, rational drug discovery relays on knowledge-based technologies that, in turn, are founded on databases containing as much information as possible about a drug (agrochemical) and reactive nanoenvi© 2015 Bentham Science Publishers
702 Current Protein and Peptide Science, 2015, Vol. 16, No. 8
ronments of its target protein. Rational drug design is based on the principle that modulation of a specific target, which is linked to disease/phenotype, and modification of its activity can have therapeutic value. Additionally, the target should be “druggable” and contain sites in its tridimensional structure that could bind a small molecule and be modulated by it. Druggability is then considered the ability of a target protein to bind small, drug-like molecules with high affinity and this is related to several factors, mainly the size of the target, presence of pockets, overall charge and hydrophobicity of the surface [2]. In conclusion, in order for a small molecule to be able to bind to a given target, it should have certain characteristics in a range considered optimal so that it may interact with the nanoenvironment of the protein target. Both the interaction and optimal ligand characteristics may be predictable by computational methods. In these aspects, computational biology and computational chemistry can play a relevant role aiding drug discovery by Ligand-Based Drug Design (LBDD) and StructureBased Drug Design (SBDD) [3]. Both are extensively used in medicinal chemistry and were successful in the development of a number of drugs such as Zanamivir [4], Nelfinavir [5] and other examples [6, 7]. In contrast to what has been observed in drug industry, structure-based design is relatively new for agriculture and there are currently no products launched in the market as a direct result of this approach, but it is a growing discipline within crop protection research [8]. One example of agrochemical under development in progress with the help of structural biology, computational biochemistry and iterative rounds of experimental validation is the design and improvement of scytalone dehydratase inhibitors as rice blast fungicides [1]. Extracting meaningful information from extensive databases with the repertoire of molecular descriptors [9-11] exemplifies a crucial ability that successful research teams are acquiring in order to be able to firstly identify new protein targets and then, develop new biologically active chemicals that might modulate target function. The expression “new” is used above to indicate that both molecules: targeted protein and ligand are identified for the first time as an interacting couple. Predicting properties of those new chemical compounds remains as a formidable challenge, but in general researchers are getting better in this regard particularly when experimental verification is coupled to iterative in silico predictions. Along sides of creating extensive, curated databases containing protein nanoenvironment descriptors, it is also necessary to include knowledge of biologists who already have a complete roadmap regarding biological interaction which must be controlled/managed/eliminated or enhanced. As mentioned above, the former one is necessary for formulating complete procedures for extracting meaningful information from voluminous resources, but the latter is crucial because experimentalists could check out in silico predicted scenarios for interactions among targets and proposed modifiers of target activity. Herein, two molecular approaches that were developed for in silico analysis of protein - protein and protein - ligand interactions are described. The objective of the work described is to turn such interactions either difficult (too slow) or even impossible (total inhibition). The former one is based
Neshich et al.
on exploring protein-protein interactions as targets for new ligands that might interfere with the pathogenicity of bacterium Xylella fastidiosa. In this case, the computational drug design is solely based on the knowledge of the tridimensional structure of the protein target (SBDD) and ligands can be designed based on their interaction within target nanoenvironment. In silico docking and scoring functions can help deciding which ligands could perform better in vitro and in vivo. The second example is based on using the substrate of an enzyme target as a starting point and the goal is to interfere with protein - substrate interactions (by replacing them with more potent protein-ligand interactions). This procedure is actually combining LBDD and SBDD. In this case, the designed compounds should perform better than the native substrate. This approach is explored in the design of potential novel fungicides. We will present full details for the former case, as it is generally much less explored than the later one. STRUCTURE BASED DRUG DESIGN FOCUSED ON INTERFACES: EXAMPLE USING XYLELLA FASTIDIOSA MOTILITY PROTEIN Xylella fastidiosa (Xf) is a gram-negative and nonflagellated bacterium that causes several diseases in plants, such as CVC and Pierce's Disease which affects citrus and wine production, respectively. Xf is limited to persist only by colonizing the xylem vessels, the water conduits (sap) in plants, and in the foregut of some insects such as sharpshooters that serve as vectors when they feed on xylem fluids [12]. This bacterium is able to form aggregates within xylem in plants, causing reduced or blocked flux of water and nutrients [13, 14]. Hopkins and co-workers suggested that the colonization and pathogenicity of Xf are strictly related to its capability of moving within the elements of the xylem vessel which enables spreading from inoculation point [12]. It is also known that the avirulent and attenuated strains rarely move from inoculation point [13]. Xf genome project revealed the presence of genes encoding proteins involved in biogenesis and function of type IV pili (T4P) [15] which causes the “twitching motility” [16], a form of surface-associated movement whereby bacteria pull themselves rapidly along surfaces by cycles of pilus assembly and disassembly via ATP hydrolysis carried out by PilT and PilB proteins [17]. The loss of PilT function results in a lack of twitching motility, associated to deprivation of pilus extension or retraction [18]. Recently, Satyshur et al. [19] have solved by X-ray diffraction four structures from PilT proteins (PDB codes: 2GSZ, 2EWV, 2EWW and 2EYU) from a hyperthermophilic organism, Aquifex aeolicus (Aa). PilT is a hexameric ATPase from a subgroup of the bacterial type II/type IV secretion systems, and have two major structural domains: the Nterminal Domain (NTD) and the C-Terminal Domain (CTD) which contains the ATPase core (Satyshur et al., 2007). They have shown the noteworthy importance of polar and charged CTDn: NTDn+1 interface interactions to retraction motion (1157 Å2, from the 1782 Å2 of total interface area, are provided by polar and charged residues). Through random and site-directed mutagenesis on Pseudomonas aeruginosa (Pa) PilT, they also demonstrated that six residues are
Computational Biology Tools for Identifying Specific Ligand Binding Residues
particularly crucial to protein function, three of them (D29, R95 and R207) are located on interface region [19], which further highlights the importance of the hexameric structure to proper functioning of the protein. More recently, the tridimensional structures of PilT protein from Pseudomonas aeruginosa were also solved by the same research group that published the Aa PilT structures. This bacterium is a closer relative (phylogenetically related) of X. faslidiosa in respect to A. aeolicus. The structures reported there are of partial complexes, one attached to an ATP analog: AMP-PCP (code PDB: 3jvv) and the other was from apo-form (code PDB: 3jvu). In that work the motor protein mechanism was also suggested [20]. Together, Aa and Pa structures provide evidences for the highly dynamic nature of PilT and the importance of interface contacts. Anderson (2003) published a description of how one should carry out the procedure of drug design based on protein structure, starting from the criteria that should be adopted for choosing the therapeutic target for such a design up till the development procedures, “docking” and “virtual screening” [21]. It is discussed that the design of antimicrobial drugs should be based on protein targets that are essential, found predominantly in pathogens (rather than in nonpathogenic organisms), have a single function in the pathogen and are likely to undergo inhibition by small molecules (have “druggable” sites). Other reviews agree that the target should be an essential protein, present as a single copy and found in a broad range of pathogenic organisms [22]. PiIT and, more precisely, specific regions of that protein, could be used as target sites, following these recommendations. It is a protein essential to the Xylella faslidiosa pathogenicity as it is known that the movement via IV-type pilus is responsible for the spreading of the pathogen; it is important in other pathogens as well in terms of movement, formation of aggregates, adhesion and evasion of host immunologic system [23] and the residues described hereinafter, chosen as the preferred ligand-target sites, have occurrence limited strictly to the PilTs from pathogenic organisms (meaning: being absent in the free-life organisms such as cyanobacteria, which also use PilT). As already mentioned, it is known that, upon mutation of some key amino acids located at its interface, the PilT ceased exerting its function and the mutated organism was not capable of moving (“Twitching”). In order to understand more about the physical-chemical features of Xf PilT interfaces, the hexameric structure of A. aeolicus and P. aeruginosa PilT were analyzed using Blue Star STING platform [24] and in particular, the JavaProtein Dossier (JPD) [25] to choose/select and examine the Interface Forming Residues (IFR) for each chain. As the Xf PilT has a high primary sequence similarity to both but mainly Pa, homology modeling was employed to create three models for each X. fastidiosa PilT hexamer using Modeller [26] based on the Aa Pilt template (ADP-bound form) and two Pa templates (apo and ATP-bound form). Using the Swiss-PDBviewer program [27], coordinate files were created for each Xf PilT hexamer and the similarities and differences among Xf, Pa and Aa PilT interfaces that maintain the protein complexes were highlighted. Then the druggable sites within interface could be searched by using STING JPD, based on physical-chemical characteristics such as: polarity, electro-
Current Protein and Peptide Science, 2015, Vol. 16, No. 8
703
static potential, exposed area to solvent, and existence within pockets. The interesting residues were also analyzed regarding their existence in other pathogenic bacteria at the corresponding positions identified at the Xf PilT. The residues found as existing only in pathogenic organisms (rather than in free-living organisms) were considered the preferential targets. With the targets selected, structure based drug design could take place. INHIBITION OF PROTEIN-SUBSTRATE INTERACTIONS: CASE OF POLYGALACTURONASE ENZYME In this section, the rational computational agrochemical design for endo-polygalacturonase inhibition, an enzyme produced by pathogenic fungi, involved in processes that allow penetration of fungi through plant cell walls, will be described. Main focus will be given to soil fungi genus Fusariium, involved in number of phytopathogenic processes in variety of plants in all stages of their development (seeds, grown plants, flowers, fruits etc.) [28]. Nowadays, the use of plants resistant to fungi attacks is a viable option to avoid potential damages usually caused by this pest, however, not always this option is welcomed by the market/society nor is it available as a resource in many cases. The phytopathogenicity of Fusarium is achieved by penetration of hyphae through plant cell wall due to secretion of cell wall-degrading enzymes (CWDE) which are capable of hydrolyzing structural molecules constituting plant cell wall. Among those enzymes are: cutinases, proteases, celulases, quitine, amino acid permeases and polygalacturonases [29]. The polygalacturonases have a strategic importance for fungi and its penetration through the cell wall; namely, they catalyze the cleavage of galacturonic acid polymers which form smooth pectin regions. Pectin is an important component of the plant cell wall and as such, when hydrolyzed by polygalacturonases causes serious structural damages to integrity of the cell [30]. There are three different types of polygalacturonases: endopolygalacturonases (PG), exopolygalacturonases (EPG) and exopoly-alpha-galacturonase (EPGD), each one of them having their own mode of acting during hydrolyses. The PG catalyzes hydrolysis of galacturonic acid at randomly selected residues, generating fragments of the oligogalacturonate [31]. The major challenge in designing the chemical compound which should inhibit PG is the fact that the same type of enzyme found in phytopathogenic micro-organisms does exist in plants. Plants produce them in order to control processes of their own growth, using the very same mechanism for breaking the polymers of the galacturonic acid. The expression of those native plant PGs is fine controlled in order to perfectly suite given demands. It was described in literature [32] that phytopathogenic fungi generally have small hydrophobic residues at position 270 while plants PG have large hydrophobic ones. This structural positioning might well interfere at binding of plant produced polygalacturonase-inhibiting protein (PGIP) to plant PGs. Above cited work experimentally confirmed that the F. moniliforme PG mutant: S270W, did not bind to PGIP. Those PGIP molecules are in fact glycoproteins localized at the cellular wall and they were found to reduce PG activity and are also known for subsequently inducing plant defense mechanisms.
704 Current Protein and Peptide Science, 2015, Vol. 16, No. 8
While the PG inhibition by PGIP might be competitive or non-competitive [33], a given PG from the pool of existing pathogenic PGs is not necessarily going to be inhibited by a specific plant PGIP as specificity of interactions would more often than not charge its cost by preventing the engagement of existing partner candidates into a viable protein complex. Nevertheless, the existence of PGIPs is of fundamental importance for initiating ligand based drug design precisely by carefully studying interactions established across the interface in protein complexes formed between those two molecules. By doing so, researchers may gather the necessary information so that the inhibitor with much broader spectrum of action might be designed (meaning that the newly designed inhibitor would act against sizable portion of all known PGs from pathogenic microorganisms). In addition, considering that the PGIP is a molecule with relatively large molecular mass and therefore having a large interface area for interacting with variety of PGs, it is intuitively clear that by decreasing this area (reducing the inhibitor to its smaller, but still functional form) one could avoid space-clashes and unsatisfied space fitting which necessarily happens with larger interface areas. Therefore the advantage of using a small ligand is exactly in its virtue that it would interact only with the conserved structural parts of different PGs (coming from different pathogenic micro-organisms). In this work, we actually explored those two crucial elements in order to establish target residues inside PG pockets and to design effective inhibitor of PGs from variety of pathogenic fungi. This was mostly based on the identification of conserved structural features in PGs binding site (interfacing with PGIP) and then searching through conformational space of small chemical compounds (relatively well represented in public databases such as the PubChem) similar to a binding portion of PGIPs.
Neshich et al.
previous experimental determination of the protein structure, one can obtain its coordinate files from Protein Data Bank (PDB files) and move to item “2” if the protein is in its monomeric form and there are available structures for homologous proteins in their oligomeric form. If the protein of interest already has its oligomeric structure solved, one can move to item “3”. 2-Modeling structure of the oligomer and optimization by energy minimization and molecular dynamics: Use a PDB file viewer/editor such as the program SwissPDB-viewer for generating the coordinate file (.PDB) of the complex of the protein of interest, by structural overlapping with the complex used as template during the modeling procedure and then minimizing energy by using the program Gromacs [36]. Molecular dynamics simulation can be applied to obtain the optimal models (using programs such as Gromacs); 3-Generating molecular structure descriptors: In case the structure to use is deposited in the PDB, all the physical-chemical and structural descriptors are already pre-calculated and stored in STING_DB. If not, one should use the STING server [24] for generating the “TGZ” file containing all these parameters for the generated protein model. For local files containing modelled structures, such descriptors are separately calculated and delivered to a user by e-mail in file format called TGZ. 4-Analyzing the target protein structure by observing macromolecular descriptors:
MATERIALS AND METHODS
All the parameters generated by STING server and delivered to a user by e-mail, are imported in the BlueStar STING (BSS) [37] and in the JavaProtein Dossier [25], a platform for detailing and integrating the analysis of structure and function.
Method for Identification of Potential Druggable Sites and Ligand Binding Residues at the Interfaces of a Protein-Target Biological Unit
5-Identifying interface forming residues (IRFs) for target protein biological unit and calculating respective areas by using the program SurfV [38]:
The proposed method consists of the following steps (which might be properly adopted for other enzyme-inhibitor systems, however, general logistics would and should remain same):
This BSS embedded program SurfV is used to calculate the area of accessibility to the solvent by residues in two sceneries: for the isolated chain and for the chain in the complex with other chains within the biological unit (for the Xf case that would be the hexameric structure of the PilT ATPase).
1-The target protein structure modelling and validation: The first step is the proper selection of a protein target. This protein should be carefully selected as suggested before: the protein target should be essential, found mostly in pathogenic organisms, exert a key function in the pathogen’s biology and preferentially that there are evidences linking the inhibition/absence of this protein function to reduced/blocked pathogenicity. If the tridimensional structure of the protein of interest, for any reason, is not described or is undetermined by experimental techniques (such as X-ray crystallography and NMR), one should generate a model from a template protein with high similarity in its primary sequence, by using homology modeling. For that particular purpose, we selected the programs Swiss-Model [27] Modeller [26]. The validation of the modelled structure is then carried out through the analysis of the Ramachandran graphs [34] and the use of the ProSA web-service [35]. In case of
6-Studying the occurrence of categories of amino acids in ensemble of Interface Forming Residues (IFR) and of Free Surface (the latter is obtained by subtracting the IFR ensemble from the total molecular surface), with regard to the type of amino acids that form it: The amino acid categories are here defined as Polar: Cys, Ser, Thr, Tyr, Asn, Gln, His and Trp; Charged: Asp, Glu, Arg, Lys; Hydrophobic: Ala, Ile, Leu, Val, Met, Phe and Pro; and Glycine: which by itself may be considered the fourth independent category [39]. This phase is important for indicating to which extent the interface in question is polar and charged. The analysis can be done by using MySQL databank platform (http://www.mysql.com/) and
Computational Biology Tools for Identifying Specific Ligand Binding Residues
respective tables containing area values obtained with the Surf V program. 7-Calculating residue contacts and respective energies: Use of the STING [24] and of the JavaProtein Dossier [25] for calculating the number, type, distance and, therefore, the total energy of the contacts established between the interface of the amino acids of selected chain and other adjacent chains within biological unit. 8-Calculating the ICD and IECD: Once in possession of the data describing the area occupied by specific residues in a determined surface, two new indexes should be calculated: the Interface Contacts Energy Density (ICED), which is given by the sum of the energies for all the contacts that IFRs established across the interface, divided by the sum of the area occupied at the interface by all the IFRs; and Interface Contacts Density (ICD) that is given by the total number of contacts established divided by the corresponding total area. Additionally, the program Java Protein Dossier [25] is used for general structural analysis and the program PyMol [40] is used for generating the demonstrative molecular images. 9-Carrying out the selection of target site residues (TSR): On the basis of analyzed physicochemical and structural characteristics of amino acid residues (such as: high contact energy values, polarity, area exposed to the solvent, presence in pockets and cavities, among others of interest), and by using the module “Select” of the STING JavaProtein Dossier [25], one may quickly compile amino acid residues which could sustain constraints usually applied to preferred target residues. 10-Distinguishing among TSR residues present in pathogens and in nonpathogenic organisms: Alignment among the primary structures of proteins homologous to the protein of interest by the program ClustalW [41], evidencing the similarities and differences between proteins from these two sets of organisms (pathogens and non-pathogens) and seeking correspondences in the alignment of primary structures for the residues selected in step “9” above. 11-Proceeding from this point on with usual methods and protocols for structure-based drug design: From this point, SBDD can be carried out regardless of whether it is a method practicing de novo agrochemical/drug design or one based on already known structures, previously identified by virtual screening and then optimized for best affinity. When possible, true positive and true negative ligands should be included to validate the docking method and scoring function. For de novo drug design based solely on the structure, one should describe properties of the TSRs and the nanoenvironment of the target site (receptor-based pharmacophore). By using this knowledge it would be possible to identify the binding interactions that are possible within the site and also the size and shape of the ligand that will be able to fit well into the pocket [42]. This can be done in a manual way by using chemical drawing softwares (such as ChemDraw (www.cambridgesoft.com) and ACD/ChemSketch (www.acdlabs.com). Manual approaches
Current Protein and Peptide Science, 2015, Vol. 16, No. 8
705
have the advantage to give to the researcher full control of the work and use of their own ideas and expectations, but have the disadvantage in being limited by the current level of expertise of the operator [42]. In order to be faster and explore more chemical options one could employ programs that try to build ligand molecules automatically according to a given receptor-pharmacophore, such as LigBuilder, SPROUT, LEGEND and LUDI [42]. The suggested ligands can then be used in molecular docking, which is a method to predict the most probable binding mode of a ligand to a given target structure, focusing on the chosen set of TSRs. Evaluation of purpose and characteristics of several docking methodologies and softwares can be found in recent reviews [43]. Another common approach for identification of potential ligands is virtual screening, where the researcher can use databases of ligand structures (such as PubChem, ChEMBL or ZINC) to dock previously filtered ensemble of those into the selected TSRs and then score the best interaction pairs of ligand-protein complexes [21]. Inhibition of the Active and Binding Site of Enzymes For this part of the work, the very similar path to the one described above in the section dedicated to Xf, was used. Here, of course, the target is not an interface (among monomers constituting biological unit) of the target protein complex but the protein ligand binding site. The second major difference between this approach and the one applied for Xf is that only the target PG site of a pathogen, identified by comparative analysis among all PG proteins, was used. The third difference is that in this case the PG inhibitor design was based on true ligands (substrates) and chemically similar compounds. The inspection of which interactions and properties of TSR and substrate are important for binding is an advantage for the development of a good docking model. The best ligands would probably contain these or very similar properties. RESULTS Finding Potential “Druggable” Binding Sites at Interface Regions of Xf PilT As already mentioned, the 3-D structures of the PiIT hexamers from Aquifex aeolicus (Aa) (PDB code: 2gsz, in a conformation bound to ADP) and the Pseudomonas aeruginosa (Pa) PiITs hexameric complexes (PDB codes: 3jvu and 3jvv, in unbound conformation and conformation bound to ATP, respectively) were used as templates for homology modeling of each of the 6 chains from X faslidiosa PiIT. The proper search of structure templates for modeling was performed using blastP searches with Xf Pilt (GI 15838234) as a query sequence against sequences found in the PDB, which revealed that Xf and Pa PilT are 87% similar and 74% identical for 344 aligned residues and that Xf and Aa PilT are 68% similar and 50% identical for 340 aligned residues. The modelling was done for each corresponding chain in the templates, and in the three mentioned conformations, based on 2gsz, 3jvu and 3jvv, using the program Modeller [26], generating, in total, 18 chains, of 3 complexes. Model validating procedures indicated that the models of Xf PilT hexamer can be considered acceptable since more than 98% residues are located at the Ramachandran allowed regions and the z-score
706 Current Protein and Peptide Science, 2015, Vol. 16, No. 8
values calculated using ProSA were in agreement with a characteristic range of values for native proteins. The assembly of the complexes was done on the basis of structural overlapping of the Xf PiIT chains with their templates. In this way, three PDB files were created for the Xf PiIT hexamers: one based on Aa PilT (2gsz, X-ray resolution of 4.2 Å) and two based on Pa PilT (3jvu and 3jvv, X-ray resolution of 3.1 and 2.6, respectively). The use of these three templates of complexes is of great importance, since it permits simulating the various states of the PiIT protein, which is in fact dynamic and probably cycles from one conformation to the other upon ligand binding. The existence of these different conformations in the templates enables us to suggest targets that are exposed in a given chain as being interesting ones for preventing the correct function of the protein. The modeled structures of Xf Pilt were named as XfAa1, XfPa1 and XfPa2, respectively, and are shown in Fig. (1). Structural alignment of Xf complex to their respective templates yielded root mean deviations of 0.388, 0.297 and 0.231 respectively (Fig. 1a-c). The nature of Xf PilT interface forming residues is predominantly charged and polar, what is similar to results obtained by Satyshur [19] in relation to Aa PilT interfaces. The sum of all interface areas from all monomers of hexameric Aa PilT is 18,890.20 Å2 out of which 13,455.55 Å2 (71.23%) is formed by polar and charged residues. For Xf PilT hexamer, out of the total 16,832.85 Å2 interface areas, 12,659.97 Å2 (75.21%) is formed by polar and charged groups, a percentage superior to that found for Aa PilT hexamer. For X faslidiosa, PiIT hexamers have: 18,173.88 A2, 18,446.96A2 and 18,882.558 A2 of total interface areas and 13,592.32 A2 (72.5%), 14,020.77 A2 (70.88%) and 14,208.91 A2 (70.88%) of areas occupied by polar and charged residues, respectively, for XfAal, XfPa1 and XfPa2. This clearly demonstrates the importance that such residues have in the constitution of the IFRs. The interfaces of the Pa hexamers have values of area occupied by polar and charged amino acids similar to those found for Xf: 71.42% and 70.93% for 3jvu and 3jvv, respectively. Statistical analysis also has shown that the interfaces of the pathogenic proteobacteria are more polar and charged than those of other distant bacteria such as Aa. This characteristic is of the utmost importance, since the presence of
Neshich et al.
various residues capable of establishing contact of electrostatic nature as well as other polar contacts such as hydrogen bridges, indicates that various sites among these may be used to identify a target for the structure-based drug design. Additionally and by using STING JPD one can identify which IFRs are establishing contacts across the interface. Moreover, one can observe that a great percentage of IFRs are hydrophilic and only few residues are hydrophobic. Furthermore, it is remarkable that more hydrophilic amino acids are identified establishing contacts across the interfaces on Xf PilT if compared to Aa PilT interfaces. Using STING and JPD we collected data about the number of IFR contacts and their energies. 2GSZ have a sum of total interface areas at all chains of 18,890.2 Å2, higher than the same sum of Xf ones, 16,832.85 Å2. In spite of a smallest interface area, the sum of all contacts energies from Xf PilT IFR, 6,895.4 Kcal/mol, is higher than Aa PilT one, 5,223.6 Kcal/mol. In relation to the number of IFR contacts, Aa PilT interfaces makes a total of 886 contacts, and Xf PilT interfaces makes a surprisingly higher number of those: 1,468 contacts. About the non-hydrophobic IFR contacts (those are the more energetically and stronger contacts), such as the charged-attractive, charged-repulsive and hydrogen-bonds, Xf exhibits a higher number and corresponding total energy than the Aa PilT for such contacts. The X fastidiosa and P aeruginosa interfaces demonstrate to have contacts in larger number and with higher energy than those established in the A. aeolicus PiIT interface. While Aa PilT (2gsZ) has, on an average, 88 of these contacts per chain and an average of 824.87 Kcal/mol of contact energy per chain, the Xf (XfAal, XfPal and XfPa2) PiITs have, respectively: an average of 115, 133 and 134 non-hydrophobic contacts per chain, corresponding to interaction energy at the interface of 1,044.8, 1,143.4 and 1,157.5 Kcal/mol. The corresponding contact energy values for the Pa PiITs are similar to those of the Xf PiITs: number of non-hydrophobic contacts is 126 and 119 and corresponding energies are 1,074.13 and 1.018,93 Kcal/mol, on an average, per chain for the 3jvu and 3jvv hexamers. This indicates that the interfaces of the PiIT hexamers of the Xf and Pa pathogenic proteobacteria, regardless of the conformation, do establish larger number of highenergy contacts than the interfaces of 2gsZ.
Fig. (1). The Xf PilT (yellow) hexamer is readily superimposable upon the six chains of hexameric Aa PilT and Pa PilT (black) from three different template structures.
Computational Biology Tools for Identifying Specific Ligand Binding Residues
Subsequently, these data were used to calculate the interface contacts energy density per chain (ICED). The structures of the X fastidiosa and P aeruginosa PiITs have higher ICEDs than A. aeolicus PiIT, which suggests that the interfaces of the PiITs of these organisms establish more contacts and with higher energy per area with respect to what is observed for Aa PiIT. The values for Xf PiITs are still higher than those of Pa PiIT. The average of the ICEDs for Aa PiIT (2gsZ) is of 0.28 Kcal/mol/A2, while for the Xf and Pa PiITs they are, respectively, of 0.38 and 0.34 Kcal/mol/A2. To address the question whether the difference observed in ICEDs and ICDs between Xf and Pa from one side and Aa from the other, is statistically significant, a Student-T test was performed to validate the hypothesis that the Xf and Pa PilT model has both more contacts per area and more energetically rich ones in comparison to the Aa PilT structure. It was observed that the ICED data are highly likely to follow a normal distribution (P-value of the D’Agsotino normality test: 0.061) and have similar standard diversion of values. Thus, one obtains a P-value equivalent to 3.48x10-7 for the test that compared the averages of the ICED values of the Aa (2gsZ) PiIT with those of the Pa and Xf PiITs. This value indicates that there is a high probability that these are statistically different samples, and what it can suggest is that the interfaces of the PiIT hexamers of the pathogenic Proteobacteria have higher contact energies per area than the Aa PiIT hexamer. Finally, the program BlueStar STING and its JPD was used to select (by using “SELECT” feature of the J PD module) and view the amino acids that establish the energetically richer contacts (herein, we decided to use as a cutoff value for the selection set, at least 10 Kcal/mol of contact energy at the interface). In addition, the cascade of constraints imposed on residues included a condition where the residues selected also must have as lower as possible area of exposure to the solvent when in complex (in these cases, in the hexameric form), meaning low but non zero area, and have the characteristic of establishing hydrogen bridges and/or contacts of electrostatic nature. This search with above cited constrains resulted in 54 different residues (42 of which are part of the pockets found in complex or in isolation). This ensemble of selected amino acid residues (product of a simple procedure applied within JPD SELCT feature) is probably the most likely to be used as ligand binding residues (TSR) for structure-based drug design. The listing of these residues, as well as some of their characteristics, such as: number of Xf PiIT chains containing them, maximum and average values of contact energy which the residues establish across the interface and presence within pockets, is shown in Table 1. Another important characteristic that should be taken into account for previous to designing drugs is the frequency of occurrence of the residues which are to be considered as a target in all the 18 possible chains in three modeled complexes. A determined residue may occur with depicted characteristics in only one conformation of the Xf PiIT (as for example, residue E74 in the XfAa1 model), just as it can occur with some other characteristics in all the three conformations (as for example, residue E89, which has such characteristics in the three hexamers: XfAa1, XfPa1 and XfPa2).
Current Protein and Peptide Science, 2015, Vol. 16, No. 8
707
The importance of this description lies in a fact that a determined drug may have, as a target, the binding aiming on a residue at a given moment of the PiIT protein (e.g.: if, in its bound state to ATP) or it may be targeting a site which is always available: meaning that it has same characteristics in any hexamer conformation. The future processes of docking chemical compounds will have, as a target, one or more chains of one or more hexamers (each one representing a specific conformation (bound or not-bound to ATP)). Therefore, a Venn diagram was generated (shown in Fig. 2) containing the distribution of the 54 residues selected in the three hexamers modeled. With regard to the definition of the therapeutic targets, as already cited before, the anti-microbial drug design should be based on targets that are found mainly in the pathogens (not in non-pathogenic organism). This concept can be extended to the ligand-binding site residues (TSR). In order to check whether, among the 54 residues selected, existed any residue of occurrence restricted only to pathogenic organism such as Xf, Pa and other described above, and not present in non-pathogenic organisms (such as Aa) nor in organisms essential to the balance of ecosystems such as cyanobacteria (bacteria that photosynthesize and fixe atmospheric nitrogen), an alignment of primary structure was carried out, which included PiIT of various organisms that are known in the literature for having the PiIT, Type-IV Pilus and motility of the twitching type. This alignment, represented in Fig. (3a-c), indicates that there is a high preservation of the IFRs, described above as potential targets. This represents an additional evidence of the importance of the IFRs for pathogenic organisms that make use of the PiIT IV. One may speculate that the sequences of the Proteobacterium PiIT, being extremely preserved [44], would have their structure equally similar and, if so, the similarities and the differences observed in this alignment could also reflect the real characteristics of the structure. In alignment shown at Fig. (3), the 54 residues were analyzed, and 25 of them occurred in all the organisms, coinciding with the position of the residue in XfPiIT (D160, D17, D196, D198, D207, D242, D70, E159, E163, E204, E209, E219, H154, H19, H222, H229, R176, R194, R206, R239, R29, R294, R80, R82 and R97). That particular ensemble of residues would constitute unspecific TSR of the pathogens. Nine residues occurred in all the pathogenic Proteobacteria (E258, E74, K235, K249, R35, R90, D33, E248 and R36) constituting good TSR for the development of drugs that reach this range of pathogens. Three residues are exclusive of the Xylella fastidiosa PiIT (D184, E89 and K187) which may constitute TSR for the development of drugs more specific to Xf PiIT. Six other residues occur distributed among pathogenic bacteria, but not in all of them (H152, E336, K58, R212, R335 and E65). Finally, seventeen residues occur in Xf, other proteobacteria and always including Aa or some of the cyanobacteria: these are constituting other unspecific targets (H152, E336, K58, R212, R335, E65, E177, K170, E68, E64, R290, R180, D181, H179, D62, H183 and E120). Thus, the most important amino acids among the IFRs were defined as the interface residues that establish more energetic contacts and are also conserved within a group of
708 Current Protein and Peptide Science, 2015, Vol. 16, No. 8
Table 1.
Neshich et al.
Listing of the 54 residues identified as possible target site residues for the process of structure-based drug design and important information about them, such as the number of chains of hexameric Xf PilT models in which the residue appears as a target, the maximum and average value of interface contact energy found for it, presence within a pocket in isolation (when the chain is isolated from the remaining complex, that is when it is in its monomeric state) and in complex (that is, if in the hexameric formation). The last column refers to the alignment of primary sequence shown in Fig. (3), with respect to presence/absence in three groups of organisms; pathogenic proteobacteria for animals and plants (Pt), Aquificae and Cyanobacleria (CY). The sequences used in making this alignment were extracted from the Protein database of the NCBI (http://WWW.ncbi.nlm.nih.gov/protein/). The pathogenic proteobacteria sequences are the following: Xylella faslidiosa PilT 9 to 50 with GI: 15838234 (Xf), Xanthomonas axonoodis pv cilri str. 306 with GI: 21243651 (Xa), Pseudomonas syringae pv labaci ATCC 11528 with GI: 257483590 (Ps), Pseudomonas aeruginosa GI: 301015820 (Pa), Ralslonia solanacearum UW551 with GI: 83747263 (Rs), Vibrio cholerae V52 with GI: 121728590 (Vc), Dichelobacler nodosus VCS1703A with GI: 146329541 (Dn), Neisseria gonorrhoeae DGI18 with GI: 240013043 (Ng), Neisseria meningiridis Z2491 with GI: 218767246 (Nm). The Cyanobacterium PilT sequences are: Nostoc sp. PCC 7120 with GI: 17229935 (Ns) and Synechocysris sp. PCC 6803 with GI: 16331158 (Ss). The sequence of Aquifiex aeolicus V135 (Aa) used was GI: 1193 89376.
Computational Biology Tools for Identifying Specific Ligand Binding Residues
Current Protein and Peptide Science, 2015, Vol. 16, No. 8
709
(Table 1) contd….
XfPa1
0
XfAa1 13 residues: D184 D207 D62 E163
4 residues: E248 H154 K170 R36 E64 E74 H179 H222 K249 K58 R176 R212 R335
28 residues: D160 D70 H152 R239 D17 E159 H183 R29 D181 E177 H19 R294 D196 E219 K187 R35 D198 E258 K235 R80 D242 E68 R180 R90 D33 E89 R194 R97 4 residues: E336 E65 H229 R82
2 residues: R206 R290
XfPa2
3 residues: E120 E240 E209
Fig. (2). The Venn diagram demonstrating the occurrence of the residues suggested as target (see Table 1) in the three different complexes. (In this case, the occurrence is considered if the residue is present in at least one chain of the hexameric models of Xf PiITs, with at least 10 kcal/mol of interface contact energy, more than 1 A2 of area exposed to the solvent when in complex, which have the characteristic of being capable of establishing hydrogen bridges and/or contacts of electrostatic nature and, preferably, that are located in pockets in complex and/or in isolation. The residues highlighted in white are located in pockets in complex (that is, in the hexameric conformation they are a constitutive part of a pocket) on at least one chain. The residue E89 is underlined, because it will be used subsequently as an example of structural representation of a target.
pathogenic proteobacteria that make use of the PiITdependent twitching motility. These residues would be considered the primary TSR for developing specific chemical that would bind to their nanoenvironment. From supplementary Fig. (1) it can be seen that most of these primary ligand binding residues can roughly form six patches both at the surface of a given PilT chain and the adjacent interface residues that together can be considered TSR for the design of new ligands. Some of these primary target regions have very distinct physical-chemical characteristics if compared to the corresponding regions found in free-living bacteria. One example is the patch formed by D33, R35 and R36, one negatively charged residue followed by two positively charged
aminoacids while in the cyanobacteria and AaPilt residues there is one positively charged or polar residue followed by a variable region. The patch formed by E74, E89 and R90 is also distinct, while position 89 is occupied by an arginine and positions 74 and 90 by glycine in the free-living organisms. The main objective from this point on will be the computational design of a ligand that is predicted to bind at the interface region (based on low negative docking scores) of a certain monomer. Such ligand might prevent PilT oligomerization in vitro and in vivo and by doing so prevents the correct functioning of the interface and ATPase activity. One specific case is further taken for the final analysis and design of specific drug for binding to nanoenvironment of E89
710 Current Protein and Peptide Science, 2015, Vol. 16, No. 8
Neshich et al. 17 19
29
33 35 36
XfPilT_Pt XaPilT_Pt PsPilT_Pt PaPilT_Pt DnPilT_Pt NgPilT_Pt NmPilT_Pt RsPilT_Pt VcPilT_Pt NsPilT_Cy SsPilT_Cy AaPilT
----MDIAELLAFSVKNNASDLHLSAGLPPMIRVDGDVRRINIPA-FDHKQVQSLIYDIM ----MDIAELLAFSVKNKASDLHLSAGLPPMIRVDGDVRRINIPA-LDHKQVHALVYDIM ----MDITELLAFSAKQGASDLHLSAGLPPMIRVDGDVRRINLPP-LDAKEVKALIYDIM ----MDITELLAFSAKQGASDLHLSAGLPPMIRVDGDVRRINLPP-LEHKQVHALIYDIM -KKELKLDDLLRFAQKQGASDLHLSCGVPPMIRIDGDVRRLNLPP-LQNQQMRDMIFGIM ----MQITDLLAFGAKNKASDLHLSSGISPMIRVHGDMRRINLPE-MSAEEVGNMVTSVM ----MQITDLLAFGAKNKASDLHLSSGISPMIRVHGDMRRINLPE-MSAEEVGNMVTSVM ----MDIAQLLAFAAKNKASDLHLSAGLPPMIRIHGDMRRINVPP-LTHQDVHAMVYDIM ----MDIAELLEFSVKHNASDLHLSAGVPPMVRIDGEVRKLGVPA-FTHSDVHRLIFEIM --MEMMIEDLMEQMIEMGGSDMHLSAGLPPYFRISGKLTPIGEEV-LTADQCQRLIFSML MALEYMIEDLMEQLVEMGGSDMHIQAGAPVYFRVSGKLEPINEEV-LTPQESQKLIFSML ---ELKILEIIKEAIELGASDIHLTAGAPPAVRIDGYIKFLKDFPRLTPEDTQKLAYSVM : ::: : .**:*: .* . .*: * : : : .: : ::
XfPilT_Pt XaPilT_Pt PsPilT_Pt PaPilT_Pt DnPilT_Pt NgPilT_Pt NmPilT_Pt RsPilT_Pt VcPilT_Pt NsPilT_Cy SsPilT_Cy AaPilT
SDKQRRDYEEALETDFSFEIPSLSRFRVNAFNQERGAGAVFRTIPNKVMTLDELGCLPVF SDKQRRDYEEFLEVDFSFEIPSLARFRVNAFNQNRGAGAVFRTIPSEVLTLEDLGCPPIF NDKQRQDFEERLETDFSFEVPGVARFRVNAFNQNRGAGAVFRTIPSKILSMEDLGMGSVF NDKQRKDFEEFLETDFSFEVPGVARFRVNAFNQNRGAGAVFRTIPSKVLTMEELGMGEVF TDAQMKSFEEKWEADFSTEIRGVSRFRVNVFQQNRGMGIVFRTIPSKVLSLEDLKAPAKF NDRQRKIYQQNLEVDFSFELPNVARFRVNAFNTGRGPAAVFRTIPSTVLSLEELKAPSIF NDHQRKIYQQNLEVDFSFELPNVARFRVNAFNTGRGPAAVFRTIPSTVLSLEELKAPSIF SDVQRKHYEENLEADFSFEIPGLSRFRVNAFNQNRGAAAVFRTIPSKVLTLEDLKAPAVF NDAQRSEYEEKLEVDFSFELPNVGRFRVNAFHQARGCSAVFRTIPTVIPTLEQLDAPEIF NNTQRKTLEQTWELDCSYGVKGLARFRVNVYKERGAYAACLRALSSKIPNFEKLGLPDVV NNSQRKELEQNWELDCSYGVKGLARFRINVYKERGCYAACLRALSSKIPNFEQLGLPNIV SEKHRQKLEENGQVDFSFGVRGVGRFRANVFYQRGSVAAALRSLPAEIPEFKKLGLPDKV .: : :: : * * : .:.*** *.: . :*::. : :..* .
70
74
80 82
89 90
97
55 55 55 55 58 55 55 55 55 57 59 57
115 115 115 115 118 115 115 115 115 117 119 117
(3a) 154
XfPilT_Pt XaPilT_Pt PsPilT_Pt PaPilT_Pt DnPilT_Pt NgPilT_Pt NmPilT_Pt RsPilT_Pt VcPilT_Pt NsPilT_Cy SsPilT_Cy AaPilT
RKLIEEPQGLILVTGPTGSGKSTTLAAMIDHINKNAHGHILTIEDPIEFVHTSQKCLINQ RQLIDQPQGLILVTGPTGSGKSTTLAGMIDYINKNEYGHILTVEDPIEFVHTSQKCLINQ RKITDVARGLILVTGPTGSGKSTTLAAMIDYLNCNKHHHILTIEDPIEFVHESKKCLVNQ KRVSDVPRGLVLVTGPTGSGKSTTLAAMLDYLNNTKYHHILTIEDPIEFVHESKKCLVNQ VDIIDVPRGLVLVTGPTGSGKSTTLAAMIDHINNNRHEHILTVEDPIEFVHESKKCLVNQ QKIAESPRGMVLVTGPTGSGKSTTLAAMINYINETQPAHILTIEDPIEFVHQSKKSLINQ QKIAESPRGMVLVTGPTGSGKSTTLAAMINYINETQPAHILTIEDPIEFVHQSKKSLINQ SDLAMKPRGLVLVTGPTGSGKSTTLAAMVNHRNESDLGHILTVEDPIEFVHESKKSLINQ SKIANYEKGLVLVTGPTGSGKSTTLAAMVNYVNAHHNKHILTIEDPIEFVHSNNKCLINQ REMCDKPRGLILVTGPTGSGKTTTLAAMIDLINRTKAEHILTVEDPIEFVYEPIKSLVHQ REMAERPRGLILVTGQTGSGKTTTLAAILDLINRTRAEHILTIEDPIEYVFPNVRSLFHQ LELCHRKMGLILVTGPTGSGKSTTIASMIDYINQTKSYHIITIEDPIEYVFKHKKSIVNQ : *::**** *****:**:*.::: * **:*:*****:*. :.:.:* 176
XfPilT_Pt XaPilT_Pt PsPilT_Pt PaPilT_Pt DnPilT_Pt NgPilT_Pt NmPilT_Pt RsPilT_Pt VcPilT_Pt NsPilT_Cy SsPilT_Cy AaPilT
159 163 160
184 187
194 198 196
204 206 209 207
219 222
229
235
REVHRDTHDFNKALSSALREDPDIILIGELRDLETIRLALTAAETGHLVFGTLHTNSAAK REVHRDTHGFNEALRSALREDPDIILVGELRDLETIRLALTAAETGHLVFGTLHTSSAAK REVHRDTLGFSEALRSALREDPDVILVGEMRDLETIRLALTAAETGHLVFGTLHTTSAAK REVHRDTLGFSEALRSALREDPDIILVGEMRDLETIRLALTAAETGHLVFGTLHTTSAAK REVHRDTQSFSNALRAALREDPDIILVGELRDLETIRLALTAAETGHLVFGTLHTSSAAK RELHQHTLSFANALSSALREDPDVILVGEMRDPETIGLALTAAETGHLVFGTLHTTGAAK RELHQHTLSFANALSSALREDPDVILVGEMRDPETIGLALTAAETGHLVFGTLHTTGAAK RELGPHTHSFANALKSALREDPDVVLVGELRDLETIRLALTAAETGHLVFGTLHTSSAAK REVHRDTHSFKNALRSALREDPDVILVGELRDQETISLALTAAETGHLVFGTLHTSSAAK RQLGEDTKSFANALKAALREDPDIVLVGEMRDLETISLAISAAETGHLVFGTLHTSSASQ RQRGEDTKSFSNALRAALREDPDIVLVGELRDLETIALAITAAETGHLVFGTLHTNSAAG REVGEDTKSFADALRAALREDPDVIFVGEMRDLETVETALRAAETGHLVFGTLHTNTAID *: .* .* .** :*******::::**:** **: *: **************. *
(3b)
175 175 175 175 178 175 175 175 175 177 179 177
235 235 235 235 238 235 235 235 235 237 239 237
Computational Biology Tools for Identifying Specific Ligand Binding Residues
Current Protein and Peptide Science, 2015, Vol. 16, No. 8
711
(Fig. 3) contd…. 239 242
248 249
258
XfPilT_Pt XaPilT_Pt PsPilT_Pt PaPilT_Pt DnPilT_Pt NgPilT_Pt NmPilT_Pt RsPilT_Pt VcPilT_Pt NsPilT_Cy SsPilT_Cy AaPilT
SINRIIDVFPAAEKPMVRSMLSESLCGIISQMLLKKVGG-----GRTAAWEIMVGTPAIR TIDRIIDVFPAGEKPMVRSMLSESLRAVISQALLKKVGG-----GRTAAWEIMVGTPAIR TIDRIVDVFPAQEKSMIRSMLSESLHAVVSQALLKKVGG-----GRVAAHEIMMGTPAIR TIDRVVDVFPAEEKAMVRSMLSESLQSVISQTLIKKIGG-----GRVAAHEIMIGTPAIR TIDRIIDVFPGEEKQLVRSMLSESLRAVIAQTLLKKIGG-----GRVAAHEVLVGTSAVK TVDRIVDVFPAGEKEMVRSMLSESLTAVISQNLLKTHDGN----GRVASHEILIANPAVR TVDRIVDVFPAGEKEMVRSMLSESLTAVISQNLLKTHDGN----GRVASHEILIANPAVR TIDRVVDVFPSDEKDMVRTMLSESLEAVISQTLLKTRDGS----GRVAAHEIMICTPAIR TIDRIIDVFPGSDKDMVRSMLSESLRAVIAQKLLKRVGG-----GRVACHEIMLATPAIR TVDRIIDVFPHEKQTQVRVQLSNSLVAVFSQTLVPKKNPKPGEYGRVMAQEIMIITPAIS TIDRMLDVFPANQQAQIRAMLSNSLLAVFAQNLVKKKSPKPGEFGRALVQEIMVITPAIA TIHRIVDIFPLNQQEQVRIVLSFILQGIISQRLLPKIGG-----GRVLAYELLIPNTAIR ::.*::*:** .: :* ** * .:.:* *: . **. *::: ..*:
XfPilT_Pt XaPilT_Pt PsPilT_Pt PaPilT_Pt DnPilT_Pt NgPilT_Pt NmPilT_Pt RsPilT_Pt VcPilT_Pt NsPilT_Cy SsPilT_Cy AaPilT
NLIREDKVAQMYSSIQTG-QQYGMQTLDQHLQDLIKRNLITRQQAREYAK-DKANF---NLIREDKVAQMYSSIQTG-QQYGMQTLDQHLQDLVKRSLITRNQAREYAK-DKRIFE--NLIREDKVAQMYSSIQTG-GSLGMQTLDMCLADLVKKGLITRESARERAK-VPDNF---NLIREDKVAQMYSAIQTG-GSLGMQTLDMCLKGLVAKGLISRENAREKAK-IPENFGAAA NLIREDKVAQIYSTIQTG-SQYGMQTLDQALSALVKEGKVDRMLAASKAH-DKDNFM--NLIRENKITQINSVLQTG-QASGMQTMDQSLQSLVRQGLIAPEAARRRAQ-NSESMSF-NLIRENKITQINSVLQTG-QASGMQTMDQSLQSLVRQGLIAPEVARRRAQ-NSESMSF-HLIRENKISQMYSMMQTS-SGLGMQTLDQCLAELIKRSAINYADARAIAK-NPDAFAN-NLIREDKVAQMYSIIQTG-AAHGMQTMEQNAKQLIARGVVDAQEVQSKIELDLKAF---NLIREGKTSQIYSAIQTG-GKLGMQTLEKVLADYYKSGTISFEAAMSKTS-KPDEIQRLI NLIREGKAAQIYSAIQTG-AKLGMQTMEQGLATLVVSGVISLEEGLAKSG-KPDELQRLI NLIRENKLQQVYSLMQSGQAETGMQTMNQTLYKLYKQGLITLEDAMEASP-DPKELERMI :****.* *: * :*:. ****:: . : :
290 290 290 290 293 291 291 291 290 297 299 292
294
344 345 344 348 348 347 347 347 345 355 357 351
(3c) Fig. (3). The alignment of the primary sequences, (obtained by ClustalW) of PiITs of Proteobacteria, the Pathogenic bacteria that make use of movement guided by IV-type pilus: Xylella faslidiosa (XfPiIT_Pt), Xanthomonas axonopodis pv citri Q (XaPiIT_Pt), Pseudomonas syringae pv tabaci (PsPiIT_Pt), Pseudomonas aeruginosa (PaPiIT_Pt), Ralslonia solanacearum (RsPiIT_Pt), Vibrio cholerae (VcPiIT_Pt), Dichelobacler nodosus (DnPiIT_Pt), Neisseria gonorrhoeae (NgPiIT_Pt) and Neisseria meningilidis (NmPiIT_Pt), free-living bacteria essential for keeping the balance of countless ecosystems such as Cyanobacteria (Nosloc sp. NsPiIT_Cy) and Synechocyslis sp.(SsPiIT_Cy) and the bacterium of the filus Aquificales, Aquifex aeolicus (AaPiIT). The GI identifiers of the sequences are also indicated. Codes of the pathogenic Proteobacteria PilTs are marked with black background, Cianobacteria in gray. A. aeolicus is marked in white. The most important residues in the interface, suggested as targets (which are present in at least one chain of the hexameric Xf_PiIT models, with at least 10 kcal/mol of energy of interface contacts, with more than 1 A2 of area exposed to the solvent when in complex and which have the characteristic of being capable of establishing hydrogen bridges and/or contacts of electrostatic nature and finally, those which are preferably present in pockets) are highlighted as follows: the residues highlighted in long bars in gray are possible targets that occur in all the PiITs of all the organisms used in the alignment of primary sequence; the residues highlighted in bars in dark gray with length equal to that of the bar of identifiers of pathogenic proteotacteria are the possible specific target for this category of microorganisms and that may be interpreted as preferred targets for the design of drugs to combat these pathogens in common; A small bars in dark gray represents the possible targets that occur only on Xylella faslidiosa PiIT, these being interpreted as sole targets for the Xf; the transparent small bar shows targets that vary in terms of amino acid (e.g., D33 to E33), but the substitute amino acid has similar properties, considered to be a “Positive” because it indicates replacements in which the BLOSUM-62 matrix scores positively according to Altshul and co-workers [49]; the black circles represent the residues that have varied distribution between the organisms.
residue - a specific residue to the Xf (Fig. 4). In this case, several ligands were manually de novo designed based on shape, charge and chemical complementarity. These compound structures were used as queries in database searches for chemically similar ligands using PubChem structure search tool. All the designed ligands and the similar ligands from PubChem were then used in rounds of docking and ranking using Molegro Virtual Docker and manual optimization. Although the design of an optimized chemical compound is beyond the scope of this work, one of the resulting ligands, that was among the top scored poses (by MolDock Score) can be seen in Fig. (5) where shape and chemical
complementarity is displayed. The ligand was also superior for being able to interact with favorable contacts to three TSRs: E89, R90 and E74. Blocking Substrate Access to Enzyme Active and Binding Site by Appropriate Ligands Ligand based drug design approach was used in order to identify and later modify the design of ligands so that competitive inhibitor of PG could be proposed. Searches in the database of compound structures PubChem were performed in order to find ligands which are more than 90% chemically similar to the naturally occurring substrate: galacturonate
712 Current Protein and Peptide Science, 2015, Vol. 16, No. 8
Neshich et al.
Fig. (4). Structural representation of an example of a TSR patch comprising of the three residues exclusively found in pathogenic proteobacteria and specifically in Xf PilT: the residue E89 and R90 from chain A and E 74 from chain F of the XfAal complex. Residues in gray at the right side of this figure are from chain A, and residues in black (left side of the figure) are from chain F. The mesh represents one pocket that could accommodate a ligand as predicted by MVD software. The residues E89 and R90 were considered TSRs in all the three models (XfAal, XfPal and XfPa2), showing a constitutive role in pocket formation in all PilT conformations.
Fig. (5). Representation of the surfaces that delimits the pocket surrounding the TSRs E89, R90 and E74 showing shape and charge complementarity of a potential ligand designed by using SBDD (5a). Ligand is shown in balls and stick representation, light gray colour means rotating bonds and dark gray means rigid bonds. The surface is colored as dark where net electrostatic potential is positive and light gray for negative. The residues interacting with the potential ligand are shown in sticks in 5b, with the residues that established the highest contact energies labeled. Hydrogen bonds are shown as dashed lines.
(based on Tanimoto coefficient). PubChem similarity search uses its own dictionary-based binary fingerprint, consisting in a series of chemical substructure “keys”. Each substructure key corresponds to the occurrence of a particular substructure in a molecule (PubChem fingerprints are available from ftp://ftp.ncbi.nlm.nih.gov/pubchem/specifications/pubchem_ fingerprints.txt). Structure based drug design was also employed in order to identify (using virtual High Throughput Screening, vHTS) those compounds that might bind in the central region delimited as a binding site. New ligands were then designed in order to optimize space fit and get higher affinity inhibitors. In addition to fungi PG sequences, the plant ones (Fig. 6, with white background) were also aligned in order to identify most important conserved residues involved in catalysis and
also to spot pathogenic organisms exclusive PG residues, spatially close to catalytic and/or ligand sites. Such identification might point to additional targets which could be preferentially used for binding specific ligand which, and that is of great importance to the whole endeavor, would not bind to plant PGs. The plant PG sequences used were: Arabidopsis thaliana (15230328, AtPG_Plant); Brassica rubra (21530799, BrPG_Plant) and Solanum lycopersicum (7381227, SlPG_Plant). From Fig. (6), one could clearly note that residues N189, D191, D212, D213, H234, G235, R267, K269 e T302 are well conserved. The PG sequences from different pathogenic fungi used in order to get multiple sequence alignment (Fig. 6, gray background), with respective NCBI identifiers and GenPept codes are listed in Table 2.
Computational Biology Tools for Identifying Specific Ligand Binding Residues
AnPGA1 AfPG AnPGA2 FgPG FoPG ClPGA CpPGA CcPG SsPG BfPG1 AaPGA SpPG1 FmPGA AtPG_Plant BrPG_Plant SlPG_Plant
188 191 194 189
Current Protein and Peptide Science, 2015, Vol. 16, No. 8 213 212
TIDNSDGDD--------NGGHNTDGFDISESTGVYISGATVKNQDDCIAINSGE-SISFT TIDNSAGTA--------EG-HNTDAFDVGSSTYINIDGATVYNQDDCLAINSGS-HITFT TINNADGDT--------QGGHNTDAFDVGNSVGVNIIKPWVHNQDDCLAVNSGE-NIWFT HMDNSLGDT--------QGGHNTDAFDVGSSTGVYISGAVVKNQDDCLAINSGT-NITFT HMDNSLGDS--------LGGHNTDAFDVGSSTGVYISGAVVKNQDDCLAINSGT-NITFT IIDNSAGDS--------AGGHNTDAFDVGSSTGVYISGANVKNQDDCLAINSGT-NITFT TMDNSAGAS--------KG-HNTDAFDVGSSENIYISGAVINNQDDCLAINSGT-NITFT TIDNSAGDS--------AGAHNTDAFDIGSSSGITISNANIKNQDDCVAINSGS-DIHVT TIDNSAGNS---------LGHNTDAFDVGSSTDITISGANVQNQDDCLAINSGT-GITFT HIDNSAGDAG-------KLGHNTDAFDVGSSSDITISGANVQNQDDCLAINSGT-GITFT TIDNSDGDD--------NGGHNTDAFDIGTSTYVTISGATVYNQDDCVAVNSGE-NIYFS TVDDFAGDTK-------NLGHNTDGFDVSAN-NVTIQNCIVKNQDDCIAINDGN-NIRFE ILDNRAGDKPNAKSGSLPAAHNTDGFDISSSDHVTLDNNHVYNQDDCVAVTSGT-NIVVS EITAPGDSP------------NTDGIHITNTQNIRVSNSDIGTGDDCISIEDGTQNLQIF KITAPGDSP------------NTDGIHIVATKNIRISNSDIGTGDDCISIEDGSQNVQIN VVKAPGDSP------------NTDAIHISSSTQVNVKDCIIGTGDDCISIVGNSSRIKVK : . ***.:.: . : : : . ***::: .. : . 234
240 239
267 269
AnPGA1 AfPG AnPGA2 FgPG FoPG ClPGA CpPGA CcPG SsPG BfPG1 AaPGA SpPG1 FmPGA AtPG_Plant BrPG_Plant SlPG_Plant
GGTCSGGHGLSIGSVGGRDD-NTVKNVTISDSTVSNSANGVRIKTIYKET-GDVSEITYS NGYCDGGHGLSIGSVGGRSD-NTVEDVTISNSKVVNSQNGVRIKTVYDAT-GTVSNVKFE GGTCIGGHGLSIGSVGDRSN-NVVKNVTIEHSTVSNSENAVRIKTISGAT-GSVSEITYS GGNCSGGHGLSIGSVGGRSN-NDVKTVRILNSSISNSDNGVRIKTVSGAT-GSVSDVKYD GGNCSGGHGLSIGSVGGRSD-NTVKTVRILNSSISNSDNGVRIKTVSGAT-GSVSDVKYD GGTCSGGHGLSIGSVGGRSD-NTVKTVTISNSKIVNSDNGVRIKTVSGAT-GSVSGVTYS SGSCTGGHGLSIGSVGGRSD-NTVKTVSITNSKIINSQNGVRIKTVYDAT-GSVSDVTYS NCQCSGGHGVSIGSVGGRKD-NTVKGVVVSGTTIANSDNGVRIKTISGAT-GSVSDITYE GGTCSGGHGLSIGSVGGRSD-NVVSDVIIESSTIKNSANGVRIKTVSGAT-GSVSGVTYK GGTCSGGHGLSIGSVGGRSD-NTVSDIIIESSTVKNSANGVRIKTVSGAT-GSVSGVTYK GGYCSGGHGLSIGSVGGRSD-NTVKNVTFVDSTIINSDNGVRIKTNIDTT-GSVSDVTYK NNQCSGGHGISIGSIATG---KHVSNVVIKGNTVTRSMYGVRIKAQRTATSASVSGVTYD NMYCSGGHGLSIGSVGGKSD-NVVDGVQFLSSQVVNSQNGCRIKSNSGAT-GTINNVTYQ DLTCGPGHGISIGSLGDDNSKAYVSGINVDGAKFSESDNGVRIKTYQGGS-GTAKNIKFQ DLTCGPGHGISIGSLGDDNSKAYVSGINVDGATLSETDNGVRIKTYQGGS-GTAKNIKFQ DIVCGPGHGISIGSLGKSNSFSQVYNVHVNGASISNTENGVRIKTWQGGS-GFVKKVSFE . * ***:****:. * : . . .: . ***: : . . :.:.
AnPGA1 AfPG AnPGA2 FgPG FoPG ClPGA CpPGA CcPG SsPG BfPG1 AaPGA SpPG1 FmPGA AtPG_Plant BrPG_Plant SlPG_Plant
NIQLSGITDYGIVIEQDYENGSPTGTPSTGIPITDVTVDGVTGTLED--DATQVYILCGD DITLSGITKYGLIVEQDYENGSPTGTPTNGIKVSDITFDKVTGTVES--DATDIYILCGS NIVMSGISDYGVVIQQDYEDGKPTGKPTNGVTIQDVKLESVTGSVDS--GATEIYLLCGS SITLSNIAKYGIVIEQDYENGSPTGTPTAGVPITDVTINKVTGSVKS--SGTDIYILCAS TITLSNIAKYGIVIEQDYENGSPTGTPTAGVPMTDVTINKVTGTVSSP-AGTEVYILCAN GITLSNIAKYGIVIEQDYENGSPTGTPTNGVPITGLTLSKITGSVAS--SGTNVYILCAS GITLSGITNYGIVIEQDYENGSPTGTPTTGVPITGLTVSKVTGSVAS--SATDVYILCGK NITLKNIAKYGIVIEQDYLNGGPTGKPTTGVPITGVTLKNVAGSVTG--SGTEIYVLCGK DITLSGITKYGVVIEQDYENGSPTGKPTSGVPITGVTLSNVHGTVSS--SATNVYVLCAK DITLSGITSYGVVVQQDYKNGSPTGKPTSGVPITDVTFSNVKGTVSS--SATNVYVLCAK DITLTSIAKYGIVVQQNYGDTSST--PTTGVPITDFVLDNVHGSVVS--SGTNILISCGS ANTISGIAKYGVLISQSYP--DDVGNPGTGAPFSDVNFTGGATTIKVNNAATRVTVECGN NIALTNISTYGVDVQQDYLNGGPTGKPTNGVKISNIKFIKVTGTVAS--SAQDWFILCGD NIRMENVKNPIIIDQDYCDKDK-CEDQESAVQVKNVVYKNISGTSAT--DVAITLNCSEK NIRMDNVKNPIIIDQNYCDKDK-CEQQESAVQVNNVVYRNIQGTSAT--DVAIMFNCSVK NVWMENVSNPIIIDQYYCDSRKPCSNKTSNIHIDNISFMGIKGTSAT--ERAITLACSDS : .: : . . .. : . .
713
189 216 215 214 215 190 222 217 236 221 194 187 202 279 281 274
247 274 273 272 273 248 280 275 294 279 252 244 260 338 340 333
302 305
305 332 331 330 332 306 338 333 352 337 308 302 318 395 397 391
Fig. (6). The alignment of PG sequences from different pathogenic fungi (gray background) with respective NCBI identifiers and GenPept codes: PG from Fusarium moniliforme, 17942538, FmPGA; hypothetical protein from Fusarium graminearum, 46138993, FgPG; from Aspergillus niger, 39654258, AnPGA1 and 6435555, AnPGA2; from Fusarium oxysporum f. sp. lycopersici, 3348099, FoPG; Cryphonectria parasitica, 1208810, CpPGA; from Cochliobolus carbonum, 167221, CcPG; from Sclerotinia sclerotiorum, 156044128, SsPG; from Botrytis cinerea (Botrytinia fuckeliana), 125629516, BfPG1; from Aspergillus flavus, 238490452, AfPG; from Stereum purpureum, 21465803, SpPG1; from Colletotridhum lupini, 159794838, ClPGA; and from Aspergillus aculeatus, 15988279, AaPGA. The plant PGs (with white background): Arabidopsis thaliana, 15230328, AtPG_Plant; Brassica rubra, 21530799, BrPG_Plant and Solanum lycopersicum, 7381227, SlPG_Plant. The residues N189, D191, D212, D213, H234, G235, R267, K269 e T302 are well conserved. The most important residues for substrate binding, highly conserved among variety of phyto pathogenic fungi and plants, are highlighted within long light and dark gray bars, where dark gray indicates catalytic triad ensemble of three Aspartates (191, 212 and 213). On the other hand, the completely transparent bars with black delimiting lines highlights residues H188, D194 and G305 which are considered “preferred targets” once they are spatially close to catalytic residues and substrate binding site residues, however, they occur exclusively in PGs of phytopathogenic fungi (not in plants).
714 Current Protein and Peptide Science, 2015, Vol. 16, No. 8
Table 2.
PG sequences from different pathogenic fungi, with respective NCBI identifiers and GenPept codes, used in multiple sequences alignment showed in Fig. (7). Specie
NCBI ID (GI)
GenPept code
Fusarium moniliforme
17942538
FmPGA
Fusarium graminerarum*
46138993
FgPG
39654258
AnPGA1
6435555
AnPGA2
Fusarium oxysporum f. sp. lycopersici
3348099
FoPG
Cryphonectria parasitica
1208810
CpPGA
Cochliobolus carbonum
167221
CcPG
Sclerotinia sclerotiorum
156044128
SsPG
Botrytis cinere (Botrytinia fuckeliana)
125629516
BfPG1
Aspergillus flavus
238490452
AfPG
Stereum purpureum
21465803
SpPG1
Colletotridhum lupini
159794838
ClPGA
Aspergillus aculeatus
15988279
AaPGA
Aspergillus niger
*hypothetical protein
The most important residues for substrate binding, highly conserved among a variety of phytopathogenic fungi and plants, are highlighted within long light and dark gray bars, where dark gray indicates catalytic triad ensemble of three aspartate residues (D191, D212 and D213). On the other hand, the completely transparent bars with black delimiting lines highlight residues H188, D194 and G305 which are considered “preferred targets” once they are spatially close to catalytic residues and substrate binding site residues, however, they occur exclusively in PGs of phytopathogenic fungi (not in plants). The continuation of this work followed steps already described in procedure as depicted in case of Xf, yielding a design of specific ligands for preferable targets in PGs from phytopathogenic fungi as shown in Fig. (7). DISCUSSION In comparative analysis between two above presented approaches where new ligand design was aimed for distinct inhibitory processes: one to make impossible interfacing of two protein monomers within hexameric biological unit of functional ensemble and, the other, where ligand is aimed to impede access of the native substrate to an enzyme catalytic site, it is obvious that several steps are common to both approaches. Namely, the distinction among desired/preferable targets from those that are supposed to be left untouched (targets from free leaving organisms and targets from plant PGs in two described cases) is a key consideration that must be undertaken in order to avoid complications related to collateral ligand effects and also, environmentally related is-
Neshich et al.
sues. A key step is to carefully select three-dimensional preferable target nanoenvironments against which new chemical compound must be designed. The TSR selection is also of great importance since the drug design and docking programs can be limited to search for a small path instead of trying all the possibilities in the whole protein surface. TSR should also be connected to a biological significance, such as the dynamics of the protein as in the first example and directly to enzymatic activity in the second example. Molecular docking is a common approach for both examples and is considered an essential technique in SBDD.
Fig. (7). One of the designed ligands predicted to interact with the residues H234, R267, K269 and Y302, all belonging to binding site ensemble in PG from F. moniliforme is shown. The same ligand has demonstrated very similar behavior in terms of energy of interactions with mentioned residues but in PG from Colletotrichum lupine. Ligand 602 was top ranked in ADMET tests, showing good values for cLogP, solubility, Druglikeness, Drug-score and very importantly, absence of possible toxicity risks.
Other steps are clearly similar between two approaches and could even be encapsulated in a routine procedure, generating workflow that could be applied in number of similar undertakings. In our case we used PipeLine Pilot from Accelrys (now: BioVia) and in Fig. (8), we show resulting workflow which is designed so that user may proceed from the entry point where genome sequences are presented, their corresponding entries identified in protein sequence bank and protein structure databank. Missing structures (not yet deciphered experimentally) are modelled based on identified templates (Fig. 8a). Modelled protein structures are then “prepared” (term used to indicate procedure where missing atoms are added, interatomic bonds are verified and if wrong, corrected, partial charges properly set, N and C terminals are capped among others) for processing in next steps where molecular docking, molecular dynamics and energy minimization are applied (Fig. 8b). In parallel, ligands are selected from large public databases such as PubChem and then “prepared” for further processing in a very similar fashion as proteins are. In case of ligands, ADMET filtration is also done (passing selected, already established threshold values for Adsorption, Desorption, Metabolism, Excretion and Toxicity) (Fig. 8c). Then, identification of target sites on protein surfaces is initiated before molecular docking among targets and chemical compounds (ligands) is performed (Fig. 8d). Finally, ranking of best pairs involving preferred targets and selected ligands is
Computational Biology Tools for Identifying Specific Ligand Binding Residues
(a)
Current Protein and Peptide Science, 2015, Vol. 16, No. 8
715
A:=B Text Reader
@Identifier
Swiss-Prot Sequence Fetcher
Model builder
Reporter
(b) DS Molecule Protein Write PDB Reader Preparation File
(c) DS Ligand Reader
ADMET AII Models
Ligand Conformation Preparation Generator Component
DS Mol2 Writer
A:=B
(d) PDB Reader
Set Prefix
Site Detection
Get site Properties
Write PDB File
Ligand Reader Generate Empty Data
Site Counter
Reached Set Site File max binding Docking and site? Scoring
Reporter
Fig. (8). The PipeLine Pilot from Accelrys (now BioVia) procedure designed to first identify genome sequences and their corresponding entries in Protein structure databank. Missing structures (not yet deciphered experimentally) are modelled based on identified templates (Fig. 8a). Modelled protein structures are then “prepared” (term used to indicate procedure where missing atoms are added, interatomic bonds are verified and if wrong, corrected, partial charges properly set, N and C terminals are capped etc.) for processing during next steps where molecular docking, molecular dynamics and energy minimization are applied (Fig. 8b). In parallel, ligands are selected from large public databases such as PubChem and then “prepared” for further processing in a very similar fashion as proteins are. In case of ligands, ADMET filtration is also done (satisfying selected and already established threshold values for Adsorption, Desorption, Metabolism, Excretion and Toxicity) (Fig. 8c). Then, identification of target sites on protein surfaces is initiated before molecular docking among targets and chemical compounds is performed (Fig. 8d). Finally, ranking of best pairs involving preferred targets and selected ligands is offered to user inspection before final candidates among ligands are selected for final and further consideration, most likely in experimental tests.
offered to user for inspection before final candidates among ligands are taken for final and further consideration, most likely in experimental in vitro tests. CONCLUSION As clearly follows from described, the process of designing new inhibitors for some diseases or pest attacks, is a complex but realistically feasible procedure which can be (at least) partially automated. Particular care must be given to factors we highlighted above. It is expected that popularization of described procedures would further gain their space, in particular in agrochemical industry, benefiting currently knowledge and technology hungry agribusiness.
We discussed in detail the search for some druggable targets at protein-protein interfaces implicated in pathogenesis of specific disease (such as plant damaging CVC and Pierse’s disease), yet same approach was also indicated as viable solution for controlling some pathogens that are of great importance for human health (e.g. cholera, meningitis and gonorrhea). The process of identification of protein interacting surfaces is a key initial step and could be undertaken with a help of computational biology tools [10]. Alternative algorithms for finding binding sites at the interfaces are also reported in the literature such as the work conducted by Hubert Li and co-workers [45] and by Fuller and coworkers [46]. The basic intention of this work is to illustrate
716 Current Protein and Peptide Science, 2015, Vol. 16, No. 8
how important are interacting protein interfaces for large number disease implicated pathways, and in particular, how we may analyze them, using free platforms such as BlueStar STING, in order to suggest therapeutic intervention projected to have a beneficial effect against disease caused by plant and animal pathogenic bacteria. The second approach, more traditional and present in scientific endeavor since early 1990ies, is based on derivation of small chemicals (ligands) from structure and ligand based drug design, developed in order to satisfy space and physical-chemical complementarity of an existing druggable receptor, usually a catalytic and/or substrate binding site [11] on the enzyme. In approach described here we emphasized the importance of conserved binding site portions of the pockets identified as druggable loci. Such methodology has been described in many scientific papers [47, 48] and reviews but for us here was of crucial importance to expose step-by-step procedure that might be bound in a type of protocol easily adaptable for other biologically relevant problems. Computational biology in its structural molecular headquarters is a viable resource for consulting, practicing and analyzing biological problems related to agrochemical and drug design. Both agrochemicals and drugs as chemically active ingredients will obey similar nano-environment selective guidance before reaching either interactive interfaces binding site or catalytic binding site, where they are to dock and by doing so, impair protein functionality. Research in this particular area of agribusiness related diseases will benefit enormously by both investing and using basic science research and looking for adequate solutions in applications which objective is not only to eliminate problems such as the raising resistance of pathogens to existing pesticides but also finding new active ingredients, capable of dealing appropriately with challenges imposed by variety of diseases. Undoubtedly, a compilation of the new chemical compound inventory will have a decisive impact in a both agriculture and human health during 21st century. CONFLICT OF INTEREST The authors confirm that this article content has no conflict of interest.
Neshich et al. [4] [5]
[6]
[7]
[8] [9]
[10]
[11]
[12] [13] [14]
[15]
ACKNOWLEDGEMENTS Declared none. SUPPLEMENTARY MATERIAL
[16]
Supplementary material is available on the publishers web site along with the published article.
[17]
REFERENCES
[18]
[1] [2] [3]
Walter, M. Structure-based design of agrochemicals. Nat Prod Rep., 2002, 19(3), 278-291. Liu, T.; Altman, R.B. Identifying druggable targets by protein microenvironments matching: application to transcription factors. CPT Pharmacometrics Syst. Pharmacol., 2014, 3, e93. Merz, K.M.; Ringe, D.; Reynolds, C.H. Drug Design Structure and Ligand - Based Approaches, Cambridge University Press, 2010, 287 p.
[19]
[20]
Varghese, J.N. Development of neuraminidase inhibitors as antiinfluenza virus drugs. Drug Dev. Res., 1999, 46, 176-196. Kaldor, S.W.; Kalish, V.J.; Davies, J.F.; II, B.V.S.; Fritz, J.E.; Appelt, K.; Burgess, J.A.; Campanale, K.M.; Chirgadze, N.Y.; Clawson, D.K.; Dressman, B.A.; Hatch, S.D.; Khalil, D.A.; Kosa, M.B.; Lubbehusen, P.P.; Muesing, M.A.; Patick, A.K.; Reich, S.H.; Su, K.S.; Tatlock, J.H. Viracept (Nelfinavir Mesylate, AG1343): A Potent, Orally Bioavailable Inhibitor of HIV-1 Protease. J. Med. Chem., 1997, 40(24), 3979-3985. Singh, J.; Chuaqui, C.E.; Boriack-Sjodin, P.A.; Lee, W.C.; Pontz, T.; Corbley, M.J.; Cheung, H.K.; Arduini, R.M.; Mead, J.N.; Newman, M.N.; Papadatos, J.L.; Bowes, S.; Josiah, S.; Ling, L.E. Successful shape-based virtual screening: the discovery of a potent inhibitor of the type I TGFbeta receptor kinase (TbetaRI). Bioorg. Med. Chem. Lett., 2003, 13(24), 4355-4359. Becker, O.; Dhanoa, D.; Marantz, Y.; Chen, D.; Shacham, S.; Cheruku, S.; Heifetz, A.; Mohanty, P.; Fichman, M.; Sharadendu, A.; Nudelman, R.; Kauffman, M.; Noiman, S. An integrated in silico 3D model-driven discovery of a novel, potent, and selective amidosulfonamide 5-HT1A agonist (PRX-00023) for the treatment of anxiety and depression. J. Med. Chem., 2006, 49(11), 31163135. Lamberth, C.; Jeanmart, S.; Luksch, T.; Plant, A. Current Challenges and Trends in the Discovery of Agrochemicals. Science, 2013, 341(6147), 742-746. Neshich, G.; Neshich, I.A.P.; Moraes, F.; Salim, J.A.; Borro, L.; Yano, I.H.; Mazoni, I.; Jardine, J.G.; Rocchia, W. Using Structural and Physical–Chemical Parameters to Identify, Classify, and Predict Functional Districts in Proteins—The Role of Electrostatic Potential. In Computational Electrostatics for Biological Applications; Springer, 2015; pp. 227-254. de Moraes, F.R.; Neshich, I.A.P.; Mazoni, I.; Yano, I.H.; Pereira, J.G.C.; Salim, J.A.; Jardine, J.G.; Neshich, G. Improving Predictions of Protein-Protein Interfaces by Combining Amino Acid-Specific Classifiers Based on Structural and Physicochemical Descriptors with Their Weighted Neighbor Averages. PLOS One, 2014, 9(1), e87107. Ribeiro, C.; Togawa, R.C.; Neshich, I.A.P.; Mazoni, I.; Mancini, A.L.; Minardi, R.C.d.M.; da Silveira, C.H.; Jardine, J.G.; Santoro, M.M.; Neshich, G. Analysis of binding properties and specificity through identification of the interface forming residues (IFR) for serine proteases in silico docked to different inhibitors. BMC Struct. Biol., 2010, 10, 36. Hopkins, D.L.; Purcell, A.H. Xylella fasidiosa: cause of Pierce's disease of grapevine and other emergent diseases. Plant Dis., 2002, 86, 1056-1066. Hopkins, D.L. Xylella Fastidiosa: Xylem-Limited Bacterial Pathogen of Plants. Annu. Rev. Phytopathol., 1989, 27, 271-290. Newman, K.L.; Almeida, R.P.; Purcell, A.H.; Lindow, S.E. Use of a green fluorescent strain for analysis of Xylella fastidiosa colonization of Vitis vinifera. Appl. Environ. Microbiol., 2003, 69(12), 7319-7327. Simpson, A.J.; Reinach, F.C.; Arruda, P.; Abreu, F.A.; Acencio, M.; Alvarenga, R.; Alves, L.M.; Araya, J.E.; Baia, G.S.; Baptista, C.S.; Barros, M.H.; Bonaccorsi, E.D.; Bordin, S.; Bové, J.M.; Briones, M.R.; Bueno, M.R.; Camargo, A.A.; Camargo, L.E.; Carraro, D.M.; et al.The genome sequence of the plant pathogen Xylella fastidiosa. Nature, 2000, 406(6792), 151-159. Meng, Y.; Li, Y.; Galvani, C.D.; Hao, G.; Turner, J.N.; Burr, T.J.; Hoch, H.C. Upstream migration of Xylella fastidiosa via pilusdriven twitching motility. J. Bacteriol., 2005, 187(16), 5560-5567. Skerker, J.M.; Berg, H.C. Direct observation of extension and retraction of type IV pili. Proc. Natl. Acad. Sci. U. S. A., 2001, 98(12), 6901-6904. Whitchurch, C.B.; Hobbs, M.; Livingston, S.P.; Krishnapillai, V.; Mattick, J.S. Characterisation of a Pseudomonas aeruginosa twitching motility gene and evidence for a specialised protein export system widespread in eubacteria. Gene, 1991, 101(1), 33-44. Satyshur, K.A.; Worzalla, G.A.; Meyer, L.S.; Heiniger, E.K.; Aukema, K.G.; Misic, A.M.; Forest, K.T. Crystal structures of the pilus retraction motor PilT suggest large domain movements and subunit cooperation drive motility. Structure, 2007, 15(3), 363-376. Misic, A.M.; Satyshur, K.A.; Forest, K.T. P. aeruginosa PiIT structures with and without nucleotide reveal a dynamic type IV pilus retraction motor. J. Mol. Biol., 2010, 400(5), 1011-1021.
Computational Biology Tools for Identifying Specific Ligand Binding Residues [21] [22]
[23] [24]
[25]
[26]
[27] [28] [29] [30]
[31] [32]
[33] [34] [35]
[36] [37]
Anderson, A.C. The process of structure-based drug design. Chem. Biol., 2003, 10(9), 787-797. Dunbar, S.J.; Corran, A.J. Target-based research: A critical review of its impact on agrochemical invention, focusing on examples drawn from fungicides. In Pesticide Chemistry. Crop Protection, Public Health, Environmental Safety, Wiley-VCH, Weinheim, Germany, 2007. Mattick, J.S. Type IV pili and twitching motility. Annu. Rev. Microbiol., 2002, 56, 289-314. Neshich, G.; Togawa, R.; Mancini, A.L.; Kuser, P.R.; Yamagishi, M.E.B.; Pappas Jr., G.; Torres, W.V.; Campos, T.F.; Ferreira, L.L.; Luna, F.M.; Oliveira, A.G.; Miura, R.T.; Inoue, M.K.; Horita, L.G.; de Souza, D.F.; Dominiquini, F.; et al. STING Millennium: a Web based suite of programs for comprehensive and simultaneous analysis of protein structure and sequence. Nucleic Acids Res., 2003, 31(13), 3386-3392. Neshich, G.; Rocchia, W.; Mancini, A.; Yamagishi, M.; Kuser, P.; Fileto, R.; Baudet, C.; Pinto, I.; Montagner, A.; Palandrani, J.; Krauchenco, J.; Torres, R.; Souza, S.; Togawa, R.; Higa, R. JavaProtein Dossier: a novel web-based data visualization tool for comprehensive analysis of protein structure. Nucleic Acids Res., 2004, 32(Web Server issue), W595-601. Eswar, N.; Marti-Renom, M.A.; Webb, B.; Madhusudhan, M.S.; Eramian, D.; Shen, M.; Pieper, U.; Sali, A. Comparative protein structure modeling with MODELLER. Curr. Protoc. Bioinformatics, 2006, Chapter 5, Unit 5.6. Guex, N.; Peitsch, M.C. SWISS-MODEL and the Swiss-PdbViewer: An environment for comparative protein modelling. Electrophoresis, 1997, 18(15), 2714-2723. Leslie, J.F.; Summerell, B.A.; Bullock, S. The Fusarium Laboratory Manual; Wiley-Blackwell, 2006. Mandgen, K.; Hahn, M.; Deising, H. Morphogenesis and mechanisms of penetration by plant pathogenic fungi. Ann. Rev. Phytopathol., 1996, 34, 367-386. Carpita, N.C.; Gibeaut, D.M. Structucral models of primary cell walls in flowering plans-consistency molecular structure with the physical properties of the walls during growth. Plant J., 1993, 3(1), 1-31. Markovic, O.; Janecek, S. Pectin degrading glycoside hydorlases of family 28: sequence-structural features, specificities and evolution. Protein Eng., 2001, 14(9), 615-631. Federici, L.; Caprari, C.; Mattei, B.; Savino, C.; di Mateo, A.; De Lorenzo, G.; Cervone, F.; Tsernoglou, D. Structural requirements of endopolygalacturonase for the interaction with PGIP. Proc. Natl. Acad. Sci. U. S. A., 2001, 98(23), 13425-13430. Di, C.; Zhang, M.; Xu, S.; Cheng, T.; An, L. Role of polygalacturonase-inhibiting protein in plant defense. Crit. Rev. Microbiol., 2006, 32(2), 91-100. Ramachandran, G.N.; Ramarkrishnan, C.; Sasisekharan, V. Stereochemistry of polypeptide chain conformations. J. Mol. Biol., 1963, 7, 95-99. Whiederstein, M.; Sippl, M.J. ProSA-web: interactive web service for the recognition of errors in three-dimensional structures of proteins. Nucleic Acid Res., 2007, 35(Web Server issue), W407410. Van Der Spoel, B.; Lindhal, E.; Hess, B.; Groenhof, G.; Mark, A.E.; Berendsen, H.J. GROMACS: fast, flexible, and free. J. Comput. Chem., 2005, 26(16), 1701-1718. Neshich, G.; Mazoni, I.; Oliveira, S.; Yamagishi, M.; KuserFalcão, P.; Borro, L.; Morita, D.; Souza, K.; Almeida, G.;
Received: December 10, 2014
Revised: January 07, 2015
Accepted: January 20, 2015
Current Protein and Peptide Science, 2015, Vol. 16, No. 8
[38] [39] [40] [41]
[42] [43]
[44] [45]
[46] [47]
[48] [49]
[50] [51] [52]
[53] [54]
717
Rodrigues, D.; Jardine, J.; Togawa, R.; Mancini, A.; Higa, R.; Cruz, S.; Vieira, F.; Santos, E.; Melo, R.; Santoro, M. The Star STING server: a multiplatform environment for protein structure analysis. Genet. Mol. Res., 2006, 5(4), 717-722. Sridharan, S.; Nicholls, A.; Honig, B. A new vertex algorithm to calculate solvent accessible surface areas. Biophys. J., 1992, 61, A174. Branden, C.; Tooze, J. Introduction to protein structure; Garland Publishing, New York, ISBN 0-815-30270-3., 1991. Delano, W.L. The PyMOL Molecular Graphics System Delano Scientific. http://www.pymol.org. Larkin, M.A.; Blackshields, G.; Brown, N.P.; Chenna, R.; McGettigan, P.A.; McWillian, H.; Valentin, F.; Wallace, I.M.; Wilm, A.; Lopez, R.; Thompson, J.D.; Gibson, T.J.; Higgins, D.G. Clustal W and Clustal X version 2.0X. Bioinformatics, 2007. Patrick, G.L. An introduction to Medicinal Chemistry; Oxford University Press: Oxford, 2013; Vol. 5. Cross, J.B.; Thompson, D.C.; Rai, B.K.; Baber, J.C.; Fan, K.Y.; Hu, Y.; Humblet, C. Comparison of several molecular docking programs: Pose prediction and virtual screening accuracy. J. Chem. Inf. Model., 2009, 49(6), 1455-1474. Wall, D.; Kaiser, D. Type IV pili and cell motility. Mol. Microbiol., 1999, 32(1), 1-10. Li, H.; Kasam, V.; Tautermann, C.S.; Seeliger, D.; Vaidehi, N. Computational Method To Identify Druggable Binding Sites That Target Protein–Protein Interactions. J. Chem. Inf. Model., 2014, 54(5), 1391-1400. Fuller, J.C.; Burgoyne, N.J.; Jackson, R.M. Predicting druggable binding sites at the protein–protein interface. Drug Discov. Today, 2009, 14(3-4), 155-161. Pérot, S.; Sperandio, O.; Miteva, M.A.; Camproux, A.-C.; Villoutreix, B.O. Druggable pockets and binding site centric chemical space: a paradigm shift in drug discovery. Drug Discov. Today, 2010, 15(15-16), 656-667. Weisel, M.; Kriegl, J.M.; Schneider, G. Architectural Repertoire of Ligand-Binding Pockets on Protein Surfaces. Eur. J. Chem. Biol., 2010, 11(4), 556-563. Altschul, S.F.; Madden, T.L.; Schäffer, A.A.; Zhang, Z.; Miller, W.; Lipman, D.J. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nuc Acid Res., 1997, 25(17), 3389-3402. Xu, D.; Lin, S.L.; Nussinov, R. Protein binding versus protein folding: the role of hydrophilic bridges in protein associations. J. Mol. Biol., 1997, 265(1), 68-84. Larsen, T.A.; Olson, A.J.; Goodsell, D.S. Morphology of proteinprotein interactions. Structure, 1998, 6, 421-427. Han, X.; Kennan, R.M.; Davies, J.K.; Reddacliff, L.A.; Dhungyel, O.P.; Whittington, R.J.; Turnbull, L.; Whitchurch, C.B.; Rood, J.I. Twitching motility is essential for virulence in Dichelobacter nodosus. J. Bacteriol., 2008, 190(9), 3323-3335. Zolfaghar, I.; Evans, D.J.; Fleiszig, S.M. Twitching motility contributes to the role of pili in corneal infection caused by Pseudomonas aeruginosa. Infect Immun., 2003, 71(9), 5389-5393. Wolfgang, M.; Lauer, P.; Park, H.S.; Brossay, L.; Hébert, J.; Koomey, M. PilT mutations lead to simultaneous defects in competence for natural transformation and twitching motility in piliated Neisseria gonorrhoeae. Mol. Microbiol., 1998, 29(1), 321330.