Send Orders for Reprints to
[email protected] 452
Current Bioinformatics, 2013, 8, 452-464
3D-QSAR Methodologies and Molecular Modeling in Bioinformatics for the Search of Novel Anti-HIV Therapies: Rational Design of Entry Inhibitors Alejandro Speck-Planche*,1,2, Valeria V. Kleandrova2,3, Marcus T. Scotti4 and M.N.D.S. Cordeiro*,2 1
Department of Chemistry, Faculty of Natural Sciences, University of Oriente, 90500 Santiago de Cuba, Cuba
2
REQUIMTE/Department of Chemistry and Biochemistry, University of Porto, 4169-007 Porto, Portugal
3
Faculty of Technology and Production Management, Moscow State University of Food Production, Volokolamskoe shosse 11, Moscow, Russia 4
Department of Engineering and Environment, Center for Applied Sciences and Education, Federal University of Paraíba, Brazil Abstract: Human immunodeficiency virus (HIV) is the responsible causal agent of acquired immunodeficiency syndrome (AIDS), a condition in humans where the immune system begins to fail, permitting the entry of diverse opportunistic infections. Until now, there is currently no available vaccine or cure for HIV or AIDS. Thus, the search for new anti-HIV therapies is a very active area. The viral infection takes place through a phenomenon called entry process, and proteins known as gp120, CCR5 and CXCR4 are essential for the prevention of the HIV entry. Bioinformatics has emerged as a powerful science to provide better understanding of biochemical or biological processes or phenomena, where 3D-QSAR methodologies and molecular modeling techniques have served as strong support. The present review is focused on the 3D-QSAR methodologies and molecular modeling techniques as parts of Bioinformatics for the rational design of entry inhibitors. Also, we propose here, a chemo-bioinformatic approach which is based on a model using substructural descriptors and allowing the prediction of multi-target (mt) inhibitors against five proteins related with the HIV entry process. By employing the model we calculated the quantitative contributions of some fragments to the inhibitory activity against all the proteins. This allowed us to automatically extract the desirable fragments for design of new, potent and versatile entry inhibitors.
Keywords: 3D-QSAR, Anti-HIV, CCR5, CXCR4, fragments, gp120, homology modeling, linear discriminant analysis, molecular docking, QSAR, quantitative contributions. 1. INTRODUCTION Human immunodeficiency virus (HIV) is a lentivirus (a member of the retrovirus family) that causes acquired immunodeficiency syndrome (AIDS), a condition in humans in which the immune system begins to fail, leading to lifethreatening opportunistic infections [1]. HIV infection in humans is considered pandemic by the World Health Organization (WHO). Nevertheless, complacency about HIV may play a key role in HIV risk [2]. Although the new reports show that new HIV infections have been reduced by 17% over the last nine years because of the HIV prevention programmes, in some countries HIV incidence is rising again [3]. Anti-retroviral treatment reduces both the mortality and the morbidity of HIV infection, but routine access to antiretroviral medications is not available in all countries [4]. There is currently no publicly available vaccine or cure for HIV or AIDS. The vital step in HIV's replication cycle is a process known as entry [5]. In this sense, HIV interacts through its surface protein gp120 with the CD4 receptor. As *Address correspondence to these authors at the Department of Chemistry, Faculty of Natural Sciences, University of Oriente, 90500 Santiago de Cuba, Cuba; Fax: +351 220402659; E-mail:
[email protected] and REQUIMTE/Department of Chemistry and Biochemistry, University of Porto, 4169-007 Porto, Portugal; Fax: +351 220402659; E-mail:
[email protected] ;/13 $58.00+.00
consequence, conformational changes in gp120 will take place increasing both, its affinity for a co-receptor and the exposition to gp41, another HIV protein closely related with gp120 that penetrates the human cell membranes. The next step in the entry process is the binding of gp120 to a coreceptor either C-C chemokine receptor type 5 (CCR5) or CX-C chemokine type 4 (CXCR4). Finally, the penetration of the cell membrane by gp41 will take place. This phenomenon will approximate the membrane of HIV and the T cell, promoting the fusion and permitting the entrance of the viral “core” to the human cell. Important information of the proteins usually involved in the HIV entry process is summarized in Table 1 [6]. Nowadays, it is almost impossible to have clear ideas about biochemical or biological processes or phenomena without using of Bioinformatics [7], which is concerned with the application of statistics and computer science to the field of molecular biology and it has been determinant for the better understanding of processes related to Medicinal Chemistry [8-17], Proteomics [18-24], Drug Metabolism [25-33], or Pharmaceutical Design [34-43], and where quantitative-structure activity relationship (QSAR) methodologies [44], more specifically 3D-QSAR methodologies and molecular modeling techniques [45], have been essential in drug design. This review is focused on the role of 3D-QSAR methodologies and molecular © 2013 Bentham Science Publishers
3D-QSAR Methodologies and Molecular Modeling in Bioinformatics
modeling techniques (MMT) as support of Bioinformatics toward the design of compounds with anti-HIV. Specifically, we discuss here the role of the methodologies and techniques mentioned above in the design of inhibitors of the proteins gp120, CCR5, CXCR4. That is because these proteins are the keys for both: the prevention of the interaction of HIV with CD4 receptor and the availability of gp41 to penetrate the human membrane cell. Also, we propose in the last section of the present work, a chemo-bioinformatic approach toward the design of versatile anti-HIV agents. Here, a fragment-based QSAR model was created, permitting the prediction of multi-target (mt) inhibitors of proteins related with the HIV entry process. Table 1.
Proteins Involved in the HIV Entry Process
Protein Seq. Length
CD4
458aa
Entry
Method Resolution (Å)
Positions
2NY1
X-ray
1.99
26-208
3B71
X-ray
2.82
428-450
1WIO
X-ray
3.9
26-388
2KLU
NMR
-
397-458
1Q68
NMR
-
421-458
1WBR
NMR
-
428-444
gp120
588aa
2NY1
X-ray
1.99
26-435
CCR5
352aa
1ND8
model
-
1-352
2K03
NMR
-
1-38
3ODU
X-ray
2.5
2-319
3OE0
X-ray
2.9
2-319
CXCR4
352aa
2. QSAR METHODOLOGIES AND MOLECULAR MODELING TECHNIQUES (MMT) QSAR methodologies represent predictive models derived from application of statistical tools correlating biological activity (including desirable therapeutic effect and undesirable side effects) of chemicals (drugs/toxicants/ environmental pollutants) with descriptors which encode the molecular structure at different levels of complexity and diversity. For this reason, QSARs are strongly related with Bioinformatics, because this science entails the creation and advancement of databases, algorithms, computational and statistical techniques and theory to solve formal and practical problems arising from the management and analysis of biological data [44, 46-48]. On the other hand, molecular modeling techniques are essential parts of the structure-based drug design (SBDD) methodology, which is focused on knowledge of the threedimensional structure of the biological receptor obtained through experimental methods such as X-ray crystallography or NMR spectroscopy [45]. Another alternative in SBDD is the creation of a homology model of the biological receptor, based on the experimental structure of another known. This alternative can be used when the experimental structure of a target is not available. The objective of the SBDD is to study the affinity and selectivity of any compound that binds to the receptor. With the development of X-ray crystallography and NMR spectroscopy, the amount of information concerning
Current Bioinformatics, 2013, Vol. 8, No. 4
453
3D structures of biomolecular targets has increased dramatically. In this sense, MMT as part of SBDD, encompass all theoretical methods and computational techniques used to model or mimic the behavior of molecules. The techniques are used in the fields of computational chemistry, computational biology and materials science to study molecular systems, ranging from small molecules to biomacromolecules and material assemblies. It is very common in the different fields of research in drug design to use QSAR methodologies in combination with MMT. Principally, 3D-QSAR methodologies such as comparative molecular field analysis (CoMFA) and comparative molecular similarity indices analysis (CoMSIA) have been used with this purpose, taking into consideration that they refer to the application of force field calculations requiring three-dimensional structures, as those based on protein crystallography or molecule superimposition [49]. They use computed potentials, e.g. the Lennard-Jones potential, rather than experimental constants and is concerned with the overall molecule rather than a single substituent. It examines the steric fields (shape of the molecule) and the electrostatic fields based on the applied energy function. This aspect permits to make deeper studies at different levels of chemical diversity and complexity. For this reason, in the following sections we will discuss the use of 3D-QSAR methodologies and/or MMT for the discovery of inhibitors of three of the proteins involved in the HIV entry process. 2.1. Envelope Polyprotein gp120 This is a glycoprotein exposed on the surface of the HIV envelop. This protein forms a heterodimer with another HIV protein: gp41 [6]. Since CD4 receptor binding is the most obvious step in HIV infection, gp120 was among the first targets of HIV vaccine research. Efforts to develop HIV vaccines targeting gp120, however, have been hampered by the chemical and structural properties of gp120, which make it difficult for antibodies to bind to it. This can also easily be shed from the surface of the virus and captured by T cells due to its loose binding with gp41. A conserved region in the gp120 glycoprotein that is involved in the metastable attachment of gp120 to CD4 has now been identified, and targeting of invariant region has been achieved with a broadly neutralizing antibody called b12 [50]. Thus, gp120 continues being a desirable target for the discovery of new and efficient anti-HIV agents. Several works regarding the application of Bioinformatics and related methodologies have been reported for the study of gp120 as promising target for rational design of anti-HIV agents (Table 2). Essentially, they have reoriented the synthesis and evaluation of new small molecules inhibitors [51-54], and the study of the binding modes of several molecular entities to gp120 [50, 55-59], including the study of conformational changes [6062]. An important study realized in this field was the structure-activity relationship (SAR) analysis of betulinic acid derivatives as anti-HIV-1 agents [52]. Here, novel antiHIV-1 compounds derived from betulinic acid were studied
454
Current Bioinformatics, 2013, Vol. 8, No. 4
Table 2.
Speck-Planche et al.
Promising Works for the Design of gp120 Inhibitors
Methodology
Chemical Family
Statistical Methoda
N
Statistical Indices
Bioinformatic Tool
Author
Ref.
MDock
Oxalamide derivatives
-
40b
-
MOE
Lalonde et al.
[51]
MDock
Betulinic acid derivatives
-
34b
-
SYBYL 8.1 and MOLCAD
Lan et al.
[52]
3D-QSAR (CoMFA)
Betulinic acid derivatives
SYBYL 8.1
Lan et al.
[52]
SYBYL 8.1
Lan et al.
[52]
Teixeira et al.
[53]
SYBYL 8.1
Teixeira et al.
[53]
SYBYL 8.1
Teixeira et al.
[53]
Autodock 3.0, MolFit, InsightII Discover3 and AMBER 9.0
Berchanski et al.
[54]
3D-QSAR (CoMSIA)
MDock
3D-QSAR (CoMFA)
Betulinic acid derivatives
BMS-806 analogs
BMS-806 analogs
PLS
PLS
-
PLS
28/6
c
SEE=0.123, F=675.28, r2*=0.913, r2m*=0.843
28/6c
36
b
30/6
r2=0.994, q2=0.599, ONC =5,
r2=0.958, q2=0.630, ONC =3, SEE=0.300, F=181.29, r2*=0.938, r2m*=0.862
Autodock 3.0, SYBYL 8.1,
-
c
3D-QSAR (CoMSIA)
BMS-806 analogs
PLS
30/6c
MDock and HModel
NeoR6 and Neo-r9
-
2b
AMBER 9.0 and VMD
r2=0.921, q2=0.534, ONC =4 SEE=0.151, F=73.02, r2*=0.651 r2=0.884, q2=0.583, ONC =4 SEE=0.363, F=45.51, r2*=0.532 -
a
Only referred to the model obtained by 3D-QSAR methodologies; b Compounds which were screened using molecular docking; c Compounds which were used in training and test sets respectively; NeoR6 – hexa-arginine-neomycin-conjugate; Neo-r9 – nona-D-arginine-neomycin-conjugate; MDock – Molecular Docking; HModel – Homology Modeling; MOE – Molecular Operating System; MOLCAD – molecular computer aided design; PLS – Partial Least Squares; VMD – Referred to visual molecular dynamic software; ONC – Optimal number of components, r2 – Coefficient of determination; q2 – Coefficient of cross validation; SEE – Standard error of the estimation; F – Referred to F-Ratio.
in detail. For this purpose, 3D-QSAR and molecular docking studies were applied to rationalize the structural requirements responsible for the anti-HIV activity of these compounds. The CoMFA and CoMSIA models resulted from 28 molecules (using 6 other molecules as test set) had good statistical quality and predictive capacity. Based on the contour maps generated from both CoMFA and CoMSIA, some key features in the betulinic acid derivatives responsible for the anti-HIV activity were identified. Molecular docking was used to explore the binding mode of these derivatives with HIV gp120. Thus, a series of novel betulinic acid derivatives was designed by using SAR analysis, and the new compounds were predicted with excellent potencies by the models. The results provided a valuable method to design new betulinic acid derivatives as anti-HIV-1 agents. A great advantage in this work was the possibility to use betulinic acid for lead optimization (Fig. 1), taking into consideration that this compound is a naturally occurring pentacyclic triterpenoid which has antiretroviral, anti-malarial, and anti-inflammatory properties, as well as a more recently discovered potential as an anticancer agent, by inhibition of topoisomerase [63]. For this reason, betulinic acid constitutes one of the most promising natural products in drug design.
H OH H O H HO H
Fig. (1). Structure of betulinic acid.
Other investigations have been focused on BMS-378806 (BMS-806), a small molecule that blocks the binding of host-cell CD4 with viral gp120 protein and therefore inhibits the first steps of HIV-1 infection (Fig. 2). Recently, 36 analogs of BMS-806 were synthesized and their biological activities were evaluated [53]. Based on these compounds, molecular docking was firstly performed with BMS-806 to the gp120 cavity in order to get a representative ligand conformation for the 3D-QSAR process. CoMFA and CoMSIA studies were then conducted for these 36 compounds. CoMFA and CoMSIA models gave reliable
Current Bioinformatics, 2013, Vol. 8, No. 4
3D-QSAR Methodologies and Molecular Modeling in Bioinformatics
correlative and predictive abilities but the CoMFA model performance was slightly better than CoMSIA. CoMFA contours were analyzed and have been correlated to the gp120 viral protein. The relevance of this study was possibility to determine several key fragment positions on the ligands and their implications on the gp120 protein binding. The computational approach used in this paper provided reliable clues for further design of small molecules gp120/CD4 inhibitors based on BMS-806.
O
maraviroc (Fig. 3). Two randomized, placebo-controlled clinical trials (known as MOTIVATE 1 & 2) showed no clinically relevant differences in safety between the maraviroc and placebo groups. However, researchers question the long-term safety of blocking CCR5, a receptor whose function in the healthy individual is not fully understood [69]. F F
O N
NH
N O
N
455
N H
O
N N
N N
O
Fig. (2). Structure of BMS-806.
2.2. C-C Chemokine Receptor Type 5 (CCR5) CCR5's role in HIV infection was characterized by Progenics and its collaborators in 1996 [64]. The previous report regarding the state of the design of anti-HIV agents through the inhibition of CCR5 showed that the new generation of antiviral drugs intended to counter HIV-1 entry into susceptible cells is emerging swiftly. The antiviral agents that inhibit HIV entry to the target cells (denoted as HIV entry inhibitors) were already in different phases of clinical trials [65]. In this work has been pointed out that entry inhibitors had different toxicity and resistance profiles when compared with reverse transcriptase and protease inhibitors. Some of these compounds demonstrated in vitro synergism with other classes of anti-viral agents, thus offering the reasoning for their combination in therapies for HIV-infected individuals. Recent advances in the understanding of cellular and molecular mechanisms of HIV-1 entry have provided the basis for novel therapeutic strategies that prevent viral penetration of the target cell membrane, while reducing detrimental virus and treatment effects on cells and prolonging virion exposure to immune defenses. A number of new experimental HIV drugs, so-called "entry inhibitors", have been designed to interfere with the interaction between CCR5 and HIV, and they are already in phase 2 of clinical trials. One of the most promising drugs is PRO 140, a humanized monoclonal antibody targeted against the CCR5 receptor found on T lymphocytes of the human immune system. It is being investigated as a potential therapy in the treatment of HIV infection [66]. Also, small molecules have been designed as potential CCR5 inhibitors. Thus, the experimental drug aplaviroc was a potent CCR5 inhibitor but all studies of this drug were discontinued due to liver toxicity concerns [67]. In the case of the experimental drug vicriviroc, it showed a very high inhibitory activity against CCR5. This drug was discovered through highthroughput screening and structure-activity relationships (SAR) analysis [68], however, the clinical trials for this drug failed. In the case of the drug maraviroc (UK-427857), it was developed by the pharmaceutical company Pfizer in its UK labs located in Sandwich. On September 24, 2007, Pfizer announced that the European Commission approved
Fig. (3). Structure of maraviroc: commercially available CCR5 antagonist.
In the last 5-10 years several study using 3D-QSAR methodologies and/or MMT have been used for the design of CCR5 inhibitors (Table 3), permitting the optimization of the synthesis and evaluation of diverse families of compounds [70-77]. An important work in this sense constituted a study related with the biological profiling of anti-HIV agents and insight into CCR5 antagonist binding using in silico techniques [70]. In this research, the steps were: 1) performing chemometric analyses of the biological activity profile using a large and heterogeneous dataset of compounds by developing 3D-QSAR models that were able to support fresh pharmacophore hypotheses for the rational selection and prediction of potential new leads; 2) upgrading a previously published 3D theoretical model of CCR5, and docking selected highly active antagonists to construct a detailed binding site map to obtain further insight into the 3D determinants of CCR5 antagonist binding; 3) building a CCR5 pharmacophore model using known high-activity ligands (Fig. 4); and 4) performing shape-based virtual screening (VS) using a database of commercially available compounds to identify plausible novel CCR5 antagonists with a view to selecting synthetically accessible low-molecular weight candidates for further assay-based analysis. These important results demonstrated that a plausible model for the binding mode of known high-affinity CCR5 antagonists could be successfully achieved using chemometric information and receptor-based and ligand-based computational tools. The integration of several methodologies described here, could be valuable for predicting new HIV entry-blocking leads. Another study was carried out for the application of molecular docking and 3D-QSAR techniques to a family of 1-amino-2-phenyl-4-(piperidin-1-yl)-butanes based on the structural modeling of human CCR5 [71]. Thus, an approach combining protein structure modeling, molecular dynamics (MDyn) simulation, automated docking, and 3D-QSAR analyses was carried out. The purpose was to investigate the detailed interactions of CCR5 with their antagonists. Homology modeling (HModel) and MDyn simulation were
456
Current Bioinformatics, 2013, Vol. 8, No. 4
Table 3.
Speck-Planche et al.
Important Researches for the Discovery of CCR5 Inhibitors
Methodologya
Chemical Family
Statistical Methodc
N
MDock and LBPM
HDC
-
3650d
MDyn and MDock
APPB
-
64d
3D-QSAR (CoMFA)b
APPB
PLS
58/6e
3D-QSAR (MSA)
BNPPA
GFA
67/22e
3D-QSAR (CoMFA)b
Piperidine derivatives
PLS
72/19e
3D-QSAR (CoMSIA) b
Piperidine derivatives
PLS
72/19e
Statistical Indices
Bioinformatic Tool
Author
Ref.
MOE, MODELLER 9.2,
Carrieri et al.
[70]
-
SYBYL 6.8, InsightII, AMBER 7.0, Autodock 3.0.3
Xu et al.
[71]
r2=0.947, q2=0.568, ONC =4, s=0.191, F=237.48, r2*=0.920
SYBYL 6.8, AMBER 7.0
Xu et al.
[71]
Cerius 4.8
Roy et al.
[76]
SYBYL 6.8
Song et al.
[77]
SYBYL 6.8
Song et al.
[77]
Sens= 87% GH=0.83
r2=0.724, q2=0.646, SEP=0.581, F=22.10, r2*=0.770 r2=0.918, q2=0.756, ONC =4, SEE=0.250, F=186.34, r2*=0.837 r2=0.899, q2=0.721, ONC =5, SEE=0.278, F=118.07, r2*=0.887
a
The methodologies used employed first homology modeling techniques because the 3D-structure of CCR5 is not available; b Only the best model or studies are showed; c Only referred to the model obtained by 3D-QSAR methodologies; d Compounds which were screened using molecular docking; e Compounds which were used in training and test sets respectively; MDyn – Molecular Dynamics; MDock – Molecular Docking; LBPM – Referred to ligand-based pharmacophore modeling; HDC – Heterogeneous database of compounds; MOE – Molecular Operating System; PLS – Partial Least Squares; MSA – Referred to molecular shape analysis; GFA – Genetic function approximation; ONC – Optimal number of components, Sens – Referred to sensitivity, i.e., the percentage of correctly classified compounds; GH – Goodness of hit list; r2 – Coefficient of determination; q2 – Coefficient of cross validation; SEE – Standard error of the estimation; SEP – Standard error of the prediction; s – Standard deviation, F – Referred to F-Ratio. Chemical families, namely APPB: 1-amino-2-phenyl-4-(piperidin-1-yl)-butanes; BNPPA: 3-(4-Benzylpiperidin-1-yl)-N-phenylpropylamine.
Br
models provide clear guidelines and accurate activity predictions for novel antagonist design.
O N N H
O
N H
OH
Cl
Fig. (4). Structure of the most potent ligand as CCR5 antagonist.
used to build the 3D model of CCR5 based on the highresolution X-ray structure of bovine rhodopsin. A series of 64 CCR5 antagonists, 1-amino-2-phenyl-4-(piperidin-1-yl)butanes, were docked into the putative binding site of the 3D model of CCR5 using the docking method, and the probable interaction model between CCR5 and the antagonists was obtained. The predicted binding affinities of the antagonists to CCR5 correlated well with the antagonist activities (Fig. 5), and the interaction model could be used to explain many mutagenesis results. All these indicated that the 3D model of antagonist-CCR5 interaction is reliable. Based on the binding conformations and their alignments inside the binding pocket of CCR5, 3D-QSAR analyses were performed on these antagonists using CoMFA and CoMSIA methods. In both cases, good results were obtained and the predictive abilities of these models were validated by six compounds that were not included in the training set. From this work, we can say that the mapping of these models back to the topology of the active site of CCR5 leads to a better understanding of antagonist-CCR5 interaction. For this reason, these results suggest that the 3D model of CCR5 can be used in structure-based drug design and the 3D-QSAR
2.3. C-X-C Chemokine Receptor Type 4 (CXCR4) This protein known also as fusin or CD184 (cluster of differentiation 184), acts as a receptor for extracellular ubiquitin; leading to enhance intracellular calcium ions and reduce cellular cyclic adenosine monophosphate (cAMP) levels. CXCR4 is involved in haematopoiesis and in cardiac ventricular septum formation. It plays also an essential role in vascularization of the gastrointestinal tract, probably by regulating vascular branching and/or remodeling processes in endothelial cells [6, 78-80]. CXCR4 acts as a coreceptor (CD4 being the primary receptor) for HIV-1 X4 isolates and as a primary receptor for some HIV-2 isolates and it promotes envelope-mediated fusion of the HIV. For this reason, as the CCR5 which was discussed above, CXCR4 is one of the essential and more interesting targets for the discovery of new entry inhibitors (Table 4). Important works have been realized toward the design of potent CXCR4 inhibitors [81-86]. In this sense, a comparison of ligand-based and receptor-based for the CXCR4 and CCR5 receptors using 3D ligand shape matching and ligand-receptor docking was realized [85]. The purpose here was to realize a virtual screening for the search of HIV entry inhibitors. This research described a detailed comparison of the performance of receptor-based and ligandbased virtual screening approaches to find CXCR4 and CCR5 antagonists that could potentially serve as HIV entry inhibitors. Because no crystal structures for these proteins are available, homology models of CXCR4 and CCR5 were built, using bovine rhodopsin as the template. The quality of
Current Bioinformatics, 2013, Vol. 8, No. 4
3D-QSAR Methodologies and Molecular Modeling in Bioinformatics
H2N
O
N O
O O
457
O
S
N N
Cl
Fig. (5). Structure of the most potent 1-amino-2-phenyl-4-(piperidin-1-yl)-butane derivative. Table 4.
Relevant Studies for the Design of CXCR4 Inhibitors Methodology
Chemical Family
Na
Bioinformatic Tool
Author
Ref.
HModel and MDock
Cyclam derivatives
8
MembStruk 4.30, Glide XP, SCWRL 3.0, HBPLUS 3.0,
Lam et al.
[81]
HModel, MDock and LBSMS
HDC
5302
MODELLER 6.0, MOE, CONGEN, PROCHECK, Autodock 3.0, GOLD, FRED 2.2.1,
Pérez-Nueno et al.
[85]
HModel and MDock
Cyclopentapeptides
11
CLUSTALW, AutoDock 3.0
Vabeno et al.
[86]
a
Compounds which were screened using molecular docking; HModel – Homology modeling; MDock – Molecular Docking; LBSMS – Referred to ligand-based shape-matching; MOE – Molecular Operating System; HDC – Heterogeneous database of compounds.
this work is that ligand-based virtual screening, several shape-based and property-based molecular comparison approaches were compared, using high-affinity ligands as query molecules (Fig. 6). These methods were compared by virtually screening a library assembled by the authors, consisting of 602 known CXCR4 and CCR5 inhibitors and some 4700 similar presumed inactive molecules. For each receptor, the library was queried using known binders, and the enrichment factors and diversity of the resulting virtual hit lists were analyzed. Overall, ligand-based shapematching searches yielded higher enrichments than receptorbased docking, especially for CXCR4. On the hand, a relevant work was realized by considering insights into the binding mode of cyclopentapeptide (CPP) antagonists with CXCR4 [86]. The objective here, was to design new highly active cyclopentapeptides using as model the compound FC131 (Fig. 7), a potent CPP developed by several investigators [87-89]. To facilitate the design of such ligands, the work was focused on the study of the possible binding modes of CPP CXCR4 antagonists by docking 11 high/medium affinity CPPs to a developed three-dimensional model of the CXCR4 G-protein-coupled receptor's transmembrane region. These ligands, expected to bind in the same mode to the receptor, were docked in a previously deduced receptor-bound conformation [90]. Ligand-receptor complexes were generated using an automated docking procedure that allowed ligand flexibility. By comparing the resulting ligand poses, only two binding modes common for all 11 compounds were identified. Inspection of these two ligand-receptor complexes identified several CXCR4 contact residues to be interaction sites for ligands and important for HIV gp120 binding. Thus, the results provide further insights into the mechanism by which these CPPs block HIV entry as well as a basis for rational design of CXCR4 mutants to map potential contacts with small peptide ligands.
3. CHEMO-BIOINFORMATIC APPROACH FOR THE DESIGN OF POTENT AND VERSATILE ENTRY INHIBITORS Nowadays, the design of anti-HIV agents constitutes an area that is not enough explored. Mostly, studies have been focused on specific target: gp120, CCR5 or CXCR4. On the other hand, the use of homology modeling for the case of CCR5 and CXCR4 has been very useful but not enough to explore the chemical diversity and complexity in deeper way. Another important aspect is that although CCR5 and CXCR4 are the principal co-receptors which have been studied for the design of entry inhibitors, there other proteins belong to the same family such as C-C chemokine receptor 2 and 3 (CCR2 and CCR3 respectively) constitute also, potential targets for the interaction with HIV proteins [9193]. Until now, there is no available methodology which can be able to predict multi-target (mt) inhibitors of all the proteins associated with the HIV entry process. In order to overcome this problem, a chemo-bioinformatic approach for the virtual screening and design mt-inhibitors of all the protein associated with the HIV entry process was developed. Here, an mt-QSAR model using a large and heterogeneous database of compounds was constructed. 3.1. Methods 3.1.1. Atom-Centered Fragments (ACF) These constitute a class of very useful molecular descriptors, which have been employed in several QSAR studies [94-96]. They provide important information about hydrophobic and dispersive interactions which are involved in biological processes such as transport and distribution of drugs through the membrane. Also, they give information about drug–receptor interactions [97]. ACF are simple
458
Current Bioinformatics, 2013, Vol. 8, No. 4
Speck-Planche et al.
O
N NH NH
HN O
N N
NH
HN
N H
HN a)
b)
Fig. (6). Structures of the high-affinity ligands a) AMD3100 and b) TAK779.
molecular descriptors which are defined as the number of specific atom types in a molecule. They are calculated from the molecular composition and atom connectivities. Each type of atom in the molecule is described in terms of its neighboring atoms. Hydrogen and halogen atoms are classified by the hybridization and oxidation states of the carbon atom to which they are attached. For hydrogen atoms, heteroatoms which are attached to a carbon in "-position are further considered. Carbon atoms are classified by their hybridization state and depending on whether their neighbors are carbon or heteroatoms.
O O
N H HN
NH O NH
HN
H N H2N
O O
NH
NH
OH NH2
HN
Fig. (7). Structure of FC131.
3.1.2. Functional Group Counts (FGC) These are other type of descriptors that express certain fragmental features. Functional group counts are simple molecular descriptors defined as the number of specific functional groups in a molecule, and as the previous descriptors, they are also calculated from the molecular composition and atom connectivities. FGC represent many of the functional groups which are traditionally used in Organic Chemistry [98].
3.1.3. Spectral Moments of Bond Adjacency Matrix The approach that encloses the calculation of the spectral moments of the bond adjacency matrix is known as TOPSMODE (TOPological Substructural MOlecular DEsign) approach and it has been applied for the description of some physicochemical properties of organic compounds [99-101]. Also, spectral moments of the bond adjacency matrix have been reported for the modeling of pharmacological activities [95-96, 102-104] and for the analysis of toxicological profiles [105-110]. For the calculation of spectral moments, the molecular structure is codified by mean of the edge adjacency matrix E (commonly called the bond adjacency matrix B) [111]. The E or B matrix is a square table of order m (the number of chemical bonds in the molecule). The elements of this matrix (eij) are equal to 1 if bonds i and j are adjacent (which means that i and j are incident in the same vertex or atom) and 0 otherwise. In order to codify information of heteroatoms, the TOPS-MODE approach uses E(wij) weighted matrices instead of E. The weights (wij) are chemically meaningful numbers such as bond distances, bond dipoles, bond polarizabilities or mathematical expressions involving atomic weights [102]. Those weights are introduced in the main diagonal of matrix E(wij). Then, the spectral moments of this matrix can be used as molecular fingerprints in QSAR studies for the codification of molecular structures [112114]. By mathematical definition, the term spectral moments must be understood as the sum of the elements (eij) in the natural powers of E(wij). Then, the spectral moment of order k (#k) is the sum of the main diagonal elements (eii) of matrix E(wij)k. The total spectral moments of the bond matrix are defined as: s
" k = Tr(E k ) =
(eii )k
(1)
i=1
where Tr means the trace of the matrix, that is the sum of the diagonal entries of the matrix and the elements (eii)k are the diagonal entries of the kth power of the bond matrix. 3.1.4. Physicochemical Descriptors Physicochemical properties contain important information related to the structures of the molecules and for
Current Bioinformatics, 2013, Vol. 8, No. 4
3D-QSAR Methodologies and Molecular Modeling in Bioinformatics
this reason they can be useful as descriptors in drug design. Some of the most important physicochemical properties are n-octanol/water partition coefficient (logP), polar surface (PS), polarizability and van der Waals area (VWA). These descriptors are calculated using fragment additivity according to procedures developed and tested in the QSAR/QSPR literature [115-117]. 3.2. Selection of the Data Set: Calculation of the Descriptors and Development of the Mathematical Model The data set was formed by 1501 cases (compound/ protein pairs). The codes and/or names are summarized in Supplementary Information 1 file. Thus, 446 cases are compounds with inhibitory activity against five different proteins related with the HIV entry process [118]. These proteins are: CCR2, CCR3, CCR5, CXCR4 and gp120. As measure of biological activity, the enzymatic potency was selected and expressed as IC50, i.e., the concentration of the compound that resulted in the diminution of 50% of the enzymatic activity. Thus, only were selected as active those compounds which had the following requirements according to the IC50 values: !7nM for CCR2, !2.5nM for CCR3, !1.5nM for CCR5, !5$M for CXCR4 and !62$M for gp120. As inactive cases, we had also 211 drugs which have been reported in the Merck Index. These drugs present other profiles that do not include inhibitory activity against any of the six targets and have been used as inactive [119]. The training series was formed by 1126 compounds: 334 active and 792 inactive. For the validation of the model we used a prediction series formed by 375 compounds: 112 active and 263 inactive. For the compounds, 87 ACF and 94 FGC were calculated using DRAGON v5.3 [98]. Also, 30