Robust Modeling and Scaffold Hopping

1 downloads 0 Views 1009KB Size Report
QSARINS program is the leave-one-out (LOO) cross- validation correlation coefficient (Q2) and was used for the selection of descriptors and the best model ...
Send Orders for Reprints to [email protected] Journal Name, Year, Volume

1

Robust Modeling and Scaffold Hopping: Case Study based on HIV Reverse Transcriptase Inhibitors Type-1 Data Girinath G. Pillai,*a,b Laznier Mederos,b Chandramukhi S. Panda,b Peeter Burk,a C. Dennis Hall,b Alan R. Katritzky,†b Kaido Tämm,a Mati Karelson*a Dedicated to the Late Professor Alan Roy Katritzky† a

Department of Chemistry, University of Tartu, Ravila 14a, Tartu 50411. Estonia.

b

Department of Chemistry, University of Florida, PO 117200, Gainesville, FL. 32611. USA.

Please provide corresponding author(s) photograph

Abstract: Chemically diverse scaffold set of HIV-1 Reverse Transcriptase Inhibitors (HIV-1 RTI) was subjected to optimum oriented QSAR with large descriptor space. We generated four-parameter QSAR model based on 111 data points, which provides an optimum prediction of HIV-1 RTI for overall 367 experimentally measured compounds. The robustness of the model is demonstrated by its statistical validation and by the prediction of HIV-1 inhibition activity. A statistically significant QSAR model describing the anti-viral properties of NNRTIs (Ntraining = 111, R2 = 0.85, F=145.55, Q2lmo = 0.84) was obtained. Finally, 3 new hit compounds that met the pharmacophore constraints, similarity and had good predicted pIC50 values from in-silico virtual screening process were employed to analyze the binding ability by molecular modelling studies on HIV-1 RT protein targets. Keywords: Global QSAR; Applicability Domain; Diverse Scaffolds; Field Points; Scaffold Hopping.

1. INTRODUCTION The aim of this study was to collect a large amount of experimental data and obtain a minimum optimum set for QSAR, which can be applied adequately with statistical relevance for larger number of chemicals. To implement the comprehensive use of large descriptor space for modelling QSAR, we used HIV-type 1 Reverse Transcriptase inhibitor bioactivity data for this study. AIDS (Acquired Immunodeficiency Syndrome) categorized by infections is caused by the retrovirus human immunodeficiency virus type 1 (HIV-1), and is one of the leading causes of death around the world. Based on the WHO 2014 report, as of 2013, there are an estimated 35.0 million people living globally with HIV [1]. HIV type 1 behaves as a selective measure in response to stimulus for human helper T lymphocytes, which express the Cluster of Differentiation 4 (CD4) which is a surface glycoprotein [2]. In viral replication, three enzymes are involved, such as reverse transcriptase (RT), protease (PR), and integrase (IN). The approved drugs for treating HIV-1 infected patients usually target RT or PR [3]. However, the efficiency of the approved drugs is declining mainly because of drug resistance and immune response [4]. Targeting the HIV-RT [5-7] and the design of HIV protease inhibitors [8-10] are the most commonly applied strategies to find new drugs against AIDS. Non-nucleoside reverse transcriptase inhibitors (NNRTIs) reduce the ability of enzymes to perform RNA dependent DNA polymerase and are important anti-retroviral drugs used for the treatment of

XXX-XXX/14 $58.00+.00

HIV [11]. The efficiency of HIV drugs within the same class was studied in clinical trials [12]. Research efforts focus on the identification of NNRTIs with suitable dosage regimes, better genetic barriers to avoid resistance, and enhanced safety profiles [13]. In parallel with experimental efforts, computational studies such as molecular modeling analysis [14-18] and other in-silico designs are powerful methods to obtain novel potential NNRTIs. In particular, QSARs are one of the most significant chemometric tools, for the development of predictive models, which can be used in several areas of chemical science including agriculture, therapeutic, ecological or material science including medicinal chemistry [19]. QSAR’s main task and advantage is to find novel potential drug candidates for experimental tests and therefore provide some relief to the overall time and budget required for synthesizing new chemical entity and bioassays. QSAR correlates molecular descriptors (physicochemical, topological, thermodynamic, constitutional and quantum chemical) with chemical or biological properties (e.g. EC50, IC50, ED50, Ki, etc.).

*Department of Chemistry, University of Tartu, Tartu, Estonia. 50014; Tel/Fax: +372-737-5261; E-mails: [email protected], [email protected]

© 2014 Bentham Science Publishers

2 Journal Name, 2014, Vol. 0, No. 0

Principal Author Last Name et al.

Therefore QSAR models have the capability to predict the biological activity of a novel chemical compound and to estimate the possible activity and design potential novel drugs [20-22]. Since the discovery of anti-retroviral drugs, many experimental and computational efforts have been dedicated to the discovery of superior inhibitors. Classical QSAR/QSPR approaches have limited prediction capability, partly due to their applicability domain. In the current approach, we implemented a new protocol on combining different descriptor spaces to develop robust and generalized QSAR models.

The current approach focuses on modeling of global QSAR with diverse set of scaffolds to predict activity for novel scaffolds in HIV-1 RT. Combinations of global and local QSAR approaches are implemented for dataset trimming and for validation [23]. The respective mechanism of action was studied earlier using molecular docking and molecular dynamics [24]. NNRTIs were targeted against hydrophobic active site of HIV RT and the binding of NNRTIs alters the conformation of active site residues thus hampering normal enzymatic activity [25]. The technique to obtain a generalized structure-activity relationships for diverse sets of compounds was implemented by using rationalized and statistical methods [26,27]. The iterative sampling of inhibitors was to optimize certain property in QSAR including interactions with biological systems which provides noisy data. This limitation has been overcome by classifying the data and by implementing techniques like superimposition, clustering, similarity analysis, and conformational analysis of the inhibitors within the binding site of target protein [28,29]. This approach can be used for prediction of the activity of novel compounds by the scaffold hopping process [30].

Figure 1. Commercially available antiretroviral drugs Efavirenz (approved in 1998), Delavirdine (approved in 1997), Stavudine (approved in 1994), Etravirine-TMC125 (approved in 2008), Rilpivirine-TMC278 (approved in 2011) Abacavir (approved in 1994) and Nevirapine (approved in 1996)

2. DATA AND METHODOLOGY 2.1 Data Set Selection for QSAR Model Building Commonly used inhibitors are Efavirenz, Delavirdine, Etravirine, Rilpivirine and Nevirapine (Figure 1) [31] which were approved by the FDA, and have been used for more than 10 years. Most of these studies focused on specific classes of compounds (local QSAR) and may not provide a generalized model for a diverse scaffold set of molecular structures. In this study, we developed QSAR models for HIV-1 inhibitors using diverse classes of structural motifs, and combined different sets of descriptors and chemometric tools such as best multiple linear regression (BMLR), and genetic algorithm (GA). Experimental IC50 data for 419 HIV-RT inhibitors with diverse scaffolds were obtained from the literature and are given in Figure 2 [31 References are added to Supporting Information - SI]. The experimental bioassay data from literature was reported without any evidence on the bio-safety and bioavailability. Therefore, multi-target approaches have not been examined [32].

Figure 2. Structural diverse scaffold set derivatives of compounds considered to model QSAR. Bioavailability and lead-likeness was predicted computationally using Instant JChem [33-35, 36]. Structural data, calculated properties and inhibition activity is given in Table T1 of SI. The selection of dataset was based on molecular similarity analysis with known active drugs like Nevirapine, Efavirenz, Rilpivirine, etc, including similarity ensemble approach. In this study we used tanimoto method and similarity ranging from 60% to 99%. The diverse set in Figure 2 was used for QSAR modeling and was categorized into the training set, test set and external prediction set to

Short Running Title of the Article

show the practical validation of the QSAR model. The data in this study was analyzed by a non-linear regression logistic dose-response model, for IC50 values [37]. The logarithmic conversion of biological activity data is one of the requirements for QSAR analysis when free energy-related entities are involved, so we transform from the equilibrium constant scale to the energy scale [38,39].

2.2 Data Set Trimming The distribution normalization of the training set was confirmed by the p value using Shapiro-Wilk [40] and Kolmororov-Smirnov analysis [41]. The main reprimand for the QSARs is the over-parameterization and narrow applicability domain, therefore our aim is to come up with a robust model with better applicability domain and reproducibility. The statistical details of the trimmed dataset are given in Table S2 (SI). To estimate the homogenicity or similarity of the diverse set, superimposition matching and grid based techniques were used to trim the datasets for QSAR modeling [29,42]. Cluster analysis and principal component analysis was performed with respect to molecular structure fingerprints and molecular descriptors to understand the classification groups. Based on the progressive 3D structure alignment analysis implemented in the Instant JChem 6.0 software [36] for datasets, structures were categorized and filtered with respect to the five or more different classes of core fragments represented in Figure 2. To confirm, we performed clustering and classification of structures with respect to the biological activity using the Scaffold Hunter tool [43]. The filtered molecules were then considered for 3D QSAR studies. From the literature data, there are 52 compounds out of 419 with indefinite numerical values (e.g., 1000 etc) and therefore they were removed from the set. Similar activity is probably caused by the complex nature of the non-nucleoside reverse transcriptase inhibitors, which were chosen for QSAR models based on chemical similarity and mechanism of action. Based on the trimming techniques (clustering, PCA, similarity analysis), the final model was developed using 111 diverse compounds including 5 scaffolds as training set, approximately ~20% of training set i.e. 25 compounds were considered as test set and 231 compounds as external prediction set (including 5 scaffolds). The training set was trimmed to minimum size in order to cover the predictions for maximum size of the test set with acceptable statistics. The schematic representation of the dataset selection is presented in Scheme 1.

Journal Name, 2014, Vol. 0, No. 0 3

Scheme 1. Protocol scheme for data point/dataset selection & trimming

Table 1. Data points for QSAR modeling based on clustering, 2D, 3D similarity analysis. SMILES and 2D Structures for each set is given in SI. No. 1. 2. 3. 4. 5.

Description Total Compounds Valid for QSAR Training Set Test Set External Set

No. of Compounds 419 367 111 25 231

2.3 Structure Optimization and Descriptor Calculation A pool of 2D, 3D and 4D molecular descriptors including constitutional, quantum chemical (charge, dipole moment), physicochemical, geometrical, topological, electrostatic, thermodynamic, iso-density, and other quantum chemical molecular descriptors were calculated using Codessa-Pro [44], CDK [45], CODESSA III [46], Padel [47], and EDRAGON [48]. Chemical structures were drawn in Marvin Sketch [49], then the structures were cleaned to net neutral, standardized and optimized to 3D using Chem3D [50]. In order to generate 3D & 4D descriptors, conformers for each and every molecule were calculated implementing a systematic method in MacroModel of Schrödinger 2013 Suite [51]. The lack of information on the conformer of highest bioactivity in the original literature made us to perform conformational screening analysis with the target protein in order to understand the most stable conformer within the binding pocket and also in vacuum. The information of a receptor’s binding pocket for its ligand is crucial for drug design, particularly for conducting mutagenesis studies [52]. The receptor’s binding pocket for a ligand is usually defined by residues that have at least one heavy atom within a distance of 5Å from a heavy atom of the ligand. Such a criterion was originally used to define the binding pocket of ATP in the Cdk5-Nck5a complex [53] that has later proved to be quite useful in identifying functional domains and stimulating the relevant truncation experiments [54]. Similar approaches have also been used to define the binding pockets for many other receptor-ligand interactions.[55-57]. Electrostatic interaction constraints were considered from the binding site of the target proteins a) wild strain of HIV-1 RT enzyme co-crystallized with TNK-651 (PDB ID: 1RT2), b) K103N and Y181C mutant strain of HIV-1 RT co-crystallized with TMC278 (Rilpivirine) (PDB ID: 3BGR), c) catalytic complex of HIV1 RT protein (PDB ID:1RTD) and d) wild-type and K103N HIV-1 RT with TMC125 (Etravirine) and TMC278 (Rilpivirine) (PDB ID: 3MEE/3MEC) to replicate the mechanism of action for in-silico virtual screening studies [25,58-60]. The binding site including co-crystal ligand representation with receptor spatial, hydrogen donor and acceptor constraints are given in Figure F1.A, B, C & D of

4 Journal Name, 2014, Vol. 0, No. 0

SI. To maintain reasonable agreement between the quality of the QSAR models, computational parameterization semiempirical methods were used to derive all possible molecular features i.e., molecular descriptors. High level of accuracy can be attained through ab-initio calculations (Hatree-Fock, Density Function Theory), but for large datasets the calculations are computationally demanding and most of the descriptors to be derived manually. The best preorganized structures of lowest energy were chosen for further quantum chemical and geometry optimization using AM1 [61] parameterization in AMPAC 10 software [62] (for Codessa III), MOPAC 2006 [63,64] {for Codessa-Pro, CDK (geometry is from AM1), PaDEL (geometry is from AM1) and EDRAGON} to generate data for descriptor calculation. For the generation of geometry based descriptors, the major steps were (1) generation of conformers and their energy optimization; (2) prioritizing the lowest energy conformer; (3) selection of a reference molecule (from lowest energy conformation); (4) analyzing molecular superimposition using the overlap or cluster analysis method; (5) calculating molecular area similarity; (6) determination of other factors by generating spatial, electronic, and conformational parameters; (7) consideration of interaction energy of inhibitor binding site complex with the contribution of van der Waals and electrostatic interactions which had influence to the comparative molecular field analysis [65-67].

2.4 Multi-linear Modeling Multiple linear regression (MLR) is one of the common statistical methods used for determining relationships between variables. It is used to find a linear model that predicts the best dependent variable from independent variables. We used this method to trim the large descriptor pool with more consistently correlating descriptors to the anti-HIV activity. However, these model diagnostics alone are insufficient to determine the best model. The quality of the regression is also reflected in the numerical values of several statistical parameters including the coefficient of determination (R2) which is a measure of the quality of fit between model-calculated and experimental values, and does not reflect the predictive power [38,39]. Small values of fitness functions q2LOO (Leave One Out) or q2LMO (Leave Many Out) test indicate low prediction ability, but the converse is not necessarily true. It indicates robustness, but not the prediction ability of model. Therefore, the model should be used to predict the same activity for an external dataset. The MLR method includes finding all orthogonal pairs of descriptors in the dataset and building two-descriptor models. With orthogonal pairs of descriptors found, additional descriptors are added to the previous twodescriptor equations until the additional descriptors cease to bring any improvement to F. Finally, the stepwise procedure ends according to the R2, t and F values until the best equations are obtained. The Molecular Descriptor Analysis (MDA) tool developed by Ruslan O. Petrukhin implemented in Codessa-Pro was used to generate the (Best) MLR QSAR models [44]. MDA tool was used to establish the correlation between all descriptors and activity Log(IC50). Statistical parameters are reported in Table 2, and respective correlation plots are given in SI from Figure F2 to Figure F7. BMLR provided a list of descriptors correlated with the activity data, but in order to overcome the limitation of unreliability

Principal Author Last Name et al.

and poor predictions (reverse QSAR) of raw property we used Genetic Algorithm (GA) for further generation of QSAR models [68,69]. The schematic representation of the dataset selection is presented in Scheme 2.

Table 2. Statistical parameters from BMLR models for 367 compounds generated from software specific descriptors. Tool[a]

D[b]

R2

R2cv

F

S

CodessaPro

394/ 347/6

0.42

0.40

66.14

0.958

163/ 0.52 0.50 96.50 0.802 105/6 Codessa 923/ 0.61 0.60 144.78 0.638 III 381/6 411/ PADEL 0.64 0.62 158.69 0.602 261/6 E539/ 0.58 0.57 125.96 0.693 DRAGON 375/6 2430/ Merged 1469/ 0.65 0.64 166.09 0.585 6 [a] AM1 optimized structures are used to generate descriptors by different software package/tool. [b] No. of Total Descriptors Calculated/ Trimmed Descriptors/ Descriptors used in the Model CDK

Scheme 2. Approach for combining molecular descriptor and the detailed workflow is given in SI Figure F4

2.5 Genetic Algorithm The Genetic Algorithm (GA) method was used as a variable selection tool and MLR were employed to model the relationships between molecular descriptors and the biological activity of molecules. With the known MLR limitations, QSAR model renders unreliable and poor predictions for the raw property (IC50) which will affect the performance and reliability of the derived models. Therefore we used MLR as a tool for variable selection based on the orthogonal pairing algorithm to analyze all possible combinations of descriptor sets. The descriptor variable

Short Running Title of the Article

Journal Name, 2014, Vol. 0, No. 0 5

search ranges from 1 to n (n is assigned by the user); in our case n = 12. Thus, the algorithm will search the combination of each descriptor and correlates with the property value. Genetic algorithm is a stochastic optimization machine learning technique that simulates natural selection and its advantages have been proven in several QSAR studies [70]. The genetic algorithm used in this study was presented by Leardi et al. for the first time [71]. The fitness function in the QSARINS program is the leave-one-out (LOO) crossvalidation correlation coefficient (Q2) and was used for the selection of descriptors and the best model based on the applicability domain - William’s Plot, internal validation, external validation, and relevance of descriptor’s physical meaning to the inhibitors [72]. QSAR models ranging from 2 to 6 descriptors for 111 training (Table T1-TR in SI) and 25 test sets (Table T1-TS in SI) using the merged descriptor set are given in Table 3.

test set for external validation and 231 compounds for external test set. 2.7 Statistical Validation A robust QSAR modeling workflow was followed to generate models, validate and predict activities for new data sets. The fitting ability of the model was internally validated based on the leave-one-out (LOO) cross-validation and leave-many-out (LMO) cross-validation techniques. In the LMO cross-validation technique, ~20% of training set compounds were deleted in different cycles based on outliers and heterogeneity of the compounds in the dataset. For all iterations, the biological activities of the excluded compounds were predicted using the model developed with the corresponding dataset of compounds. Training sets were further divided into multiple sets of descriptive training and test sets of different size, i.e., based on descriptor similarity

Table 3. Statistical parameters for combined/merged descriptors (after feature selection) ranging from 2 to 6 for Training Set (111 compounds) N R2

R2ext

Descriptors #

F

2

0.73

0.59

D1-S1

D3-S3

3

0.79

0.58

X1-S3

X2-S1

D3-S3

4

0.85

0.76

D1-S1

D2-S4

D3-S3

D4-S3

5

0.85

0.73

D1-S1

X1-S3

D3-S3

X3-S5

X4-S5

6

0.85

0.73

D1-S1

X5-S1

D3-S3

X6-S3

X7-S4

X8-S3

S

146.45

0.695

135.72

0.793

145.55

0.530

115.75

0.532

91.23

0.545

Alphabet S# in descriptors represents the unique descriptors from different tools. D# represents the descriptors correlated with prediction capability. X# represents other descriptors which had significant influence on the activity. The breaking line rule from Karelson et. al. [67] was applied for defining the number of descriptors in the final robust model and the graph with average R2, Q2 & number of descriptors is given in Plot G1 of SI.

2.6 Selection Criteria for Descriptors Feature selection was used to remove redundant or irrelevant molecular descriptors for improving model quality and reducing computational cost [73]. To generate a pool of scaled descriptors from different software we performed the following steps: (i) the descriptors with zero variance or near zero variance were removed from the initial set of descriptor screening for model consistency; This is to exclude semiconstant descriptors, i.e. those chemical structures with a constant value of more than a certain percentage (suggested value: 80%), (ii) the descriptors that are too inter-correlated (suggested value: 95%) [24,60]; above 90% were removed; (iii) the descriptors that are exact duplicates including physical meaning and values were removed.

Thereafter 3D-QSAR BMLR models were developed for the valid dataset of 367 compounds using individual descriptors from Codessa-Pro-S1, CDK-S2, Codessa-III-S3, PADEL-S4, and EDRAGON-S5. These models were internally validated using cross-validation technique. In addition, a merged and trimmed pool of 1469 descriptors from all softwares was used to generate comprehensive model. To conclude the selection of descriptors, final model was developed using 111 structures with a total number of 1469 descriptors and 25 compounds (~20% of training set) which were used as the

and structure similarity using overlap analysis in DS Visualizer 4.0 [74]. The external predictive ability of the model was assessed based on the predictions of the test set compounds followed by the calculation of the Q2LOO & LMO parameter.

2.8 Theoretical Background of Molecular Descriptors Three descriptors in four parameter (D1, D2, D4) models are directly related to electrostatic properties of the molecule. Together with D3 which can be considered as a molecular stability parameter, all the descriptors influence HIV-1 inhibition. D1 signifies the surface area of inhibitors and the corresponding charge distribution which is important to the binding affinity; because of the presence of dipole–dipole interactions within the active site, hydrophobic substituents are not preferred [75]. The requirement of polar surface area is evident with electrostatic field. D2, van der Waals molecular surfaces of each atom in the inhibitor uses empirically derived hydrophobic atom constants calculated from the solvent partition constants of small molecules [76]. Therefore, the increase in the compound’s hydrophobic surface area would be favorable for better activity and increased negatively charged surface area is better for inhibitory activity [77]. In the following, we also presented the theoretical background of the descriptors.

6 Journal Name, 2014, Vol. 0, No. 0

Principal Author Last Name et al.

2.8.1 D1 - HACA-2/Sqrt(TMSA) – Zefirov PC HACA-2 belongs to the class of Charged Polar Surface Area (CPSA) descriptors, which define the atomic electronegativities of hydrogen atoms [78,79]. HACA-2 is the areaweighted surface charge of hydrogen bonding acceptor atoms and Sqrt(TMSA) is the mathematical square root of total molecular surface area. 𝑯𝑨𝑪𝑨𝟐 = ∑𝑨 𝑫𝟏 =

𝒒𝑨 √𝑺𝑨 √𝑺𝒕𝒐𝒕

𝑨 ∈ 𝑿𝑯−𝒂𝒄𝒄𝒆𝒑𝒕𝒐𝒓

𝑯𝑨𝑪𝑨𝟐

Eq. 1 Eq. 2

√𝑻𝑴𝑺𝑨

where SA = solvent-accessible surface area of H-bonding acceptor atoms, selected by threshold charge, qA = partial charge on H-bonding acceptor atoms, selected by threshold charge, Stot = toal solvent-accessible molecular surface area [80].

2.8.2 D2 - Fractional atomic charge weighted partial negative van der Waal’s surface area This descriptor assumes the partial atomic charges from molecular mechanics force fields to compute the electrostatic interaction energy for understanding of the structure and reactivity of molecules [81]. The descriptor is derived from FNSA3 of Codessa-Pro with van der Waals surface area instead of total molecular surface area [66,88]. 𝑭𝑵𝒗𝑺𝑨 =

𝑷𝑵𝑺𝑨𝟑

Eq. 3

𝒗𝑺𝑨

where PNSA3 = total charge weighted partial negatively charged molecular surface area, vSA = van der Waals surface area [82]. 2.8.3 D3 – Average Bond Order C-C Bond order and bond length indicate the type and strength of covalent bonds between atoms. The bond order is defined as follows: 𝑩. 𝑶. =

𝒃𝒐𝒏𝒅𝒊𝒏𝒈 𝒆𝒍𝒆𝒄𝒕𝒓𝒐𝒏𝒔−𝒂𝒏𝒕𝒊𝒃𝒐𝒏𝒅𝒊𝒏𝒈 𝒆𝒍𝒆𝒄𝒕𝒓𝒐𝒏𝒔 𝟐

Eq. 4

A higher bond order a means that the atoms are held together tighter and therefore the whole molecule has a greater stability. In other words, a large positive value of bond order signifies strong bonding between the atoms of the molecular entity, whereas negative value imply that electrons are displaced away from the inter-atomic region and point to an anti-bonding interaction [83,84]. 2.8.4 D4 - IsoDensity Surface – varElectrostatic Potential The electron probability density is referred as electron density and it’s a function of r displayed as contour maps or isodensity surface. Isodensity surface is defined as the surface mapped by points where the density has the same

value [85]. Total variance of electrostatic static potential is related to conservative force and derived as: 𝝈𝟐𝒕𝒐𝒕 = 𝝈𝟐+ − 𝝈𝟐− = 𝟏 𝒎 𝟐 𝟏 𝒏 − ̅+ ̅− 𝟐 ∑𝒊=𝟏[𝑽+ (𝒓𝒊 ) − 𝑽 𝑺 ] + ∑𝒊=𝟏[𝑽 (𝒓𝒊 ) − 𝑽𝑺 ] 𝒎

𝒏

Eq. 5

+ ̅+ where 𝑽 𝑺 Vs = average value of the positive electrostatic ̅+ potential in the molecule, 𝑽 𝑺 Vs = average value of the negative electrostatic potential in the molecule, V+(ri) = positive electrostatic potential in the molecule, V-(ri) = negative electrostatic potential in the molecule, m,n = number of integration points [86,87].

2.9 Bioisosteric Group Replacement – Scaffold Hopping Bioisosteres are chemical groups with identical physicochemical properties which produce similar biochemical properties to other compounds. The process is known as scaffold hopping and this is performed to reduce toxicity, improve bioavailability and mode of action. To perform scaffold hopping pharmacophore constraints and attachment or replacing chemical groups are assigned to prefragment databases [88-91]. For computational efficiency, we used Spark v10 from Cresset UK, the Cresset’s Field technology condenses the molecular Fields of a molecule into a set of points around the molecule, termed ‘Field points’ [92]. Field points are defined as the local extrema of the electrostatic, van der Waals and hydrophobic potentials of the molecule. The field points are calculated from the physical properties, and interaction point size/strength associated with molecules (not all H-bond donors are treated the same: some make stronger bonds than others) [93]. The four Field types are used in agreement to describe all the potential interaction points that a ligand can make to the target protein in desired conformation. A chemical group or vector in a molecule at its bioactive conformation (3D bound structure to target protein) is selected and chosen for replacement. Spark searches a database of >600,000 fragments for bioisosteres that exhibit similar shape and electronic properties when placed in the context of the final molecule. Spark’s native fragment databases consist of fragments generated from commercially available screening compounds, ChEMBL, and VeHICLE databases [94,95]. Each one of these first-pass fragments is fit into the original molecule; fields and field points are calculated for the resulting whole molecules. The resulting molecules are minimized using the XED forcefield, and then the full field and shape of the molecule is scored against the full field and shape of starting molecule including the reference molecules [92]. By default, 50% shape and 50% fields are used in score calculation. Using Rilvipirine, Etravine, Abacavir and Stavudine as a starting point, Spark was parameterized to search for bioisosteric replacements that would introduce novel compounds into the assigned vector area. Spark uses molecular interaction fields to represent the key binding interactions of a molecule by giving a close approximation to the HIV-1 RT protein (PDB ID: 3MEE) view of the potential ligand [96,97]. The workflow scheme of scaffold replacement is given in Scheme S3A of SI and interpretation of field point technology is given in Scheme S3B of SI.

Short Running Title of the Article

Journal Name, 2014, Vol. 0, No. 0 7

4. RESULTS AND DISCUSSIONS 4.1 QSAR Model In this study, conformers derived based on protein binding site interactions were taken into account on calculation of molecular descriptors. A global structural activity relationship models were developed to cover wide range of chemo-types with a large applicability domain. Implementation of 1D, 2D, 3D and 4D descriptors delivered a meaningful correlation between the mechanism of inhibitors to HIV-1 target protein and mode of action. The functional groups like hydroxyl –OH, methoxy – OCH3 are important for the conformational change in the RT which distorts the position of protein residues. However, different RT inhibitors bind to the target protein in different modes depending on the conformer, which influences shape, size and chemical composition of the compound. All statistical parameters obtained for the models demonstrate the robustness and reliability of the models. The schematic workflow of the whole approach in comprehensive modeling is given in Scheme S2 of SI. The descriptors involved in regressions have definite physicochemical background and can be interpreted in terms of HIV-1 inhibition. The potential drawback of this methodology can be the limitation of specific sets of constitutional or atombased descriptors in order to consider diverse sets of structures for the training set. Literature data was statistically and theoretically analyzed to generate a reproducible robust QSAR model and equation (Eq. 6) with all statistical parameters using comprehensive set of descriptor space is given in Table 4

Figure 3. Scatter plot: Correlation between experimental and predicted values for training set

-18.924

-14.57

0.66

D2

Fractional atomic charge weighted partial Negative van der Waal’s Surface Area

4.987

4.79

0.17

D3

Average Bond Order C-C

12.621

19.67

0.78

D4

IsoDensity Surface – Var Electrostatic Potential

-2445.014

0.22

Desc vs Log(IC5 0) - R2

HACA-2/Sqrt(TMSA) – Zefirov PC

-

D1

-

-11.812

-5.584

t-test

Std. Error

Intercept

1.93

1.93

I0

0.99

Coefficient

1.44

Descriptor

616.22

Desc No.

Table 4. QSAR model for the anti-HIV-1 activity

Ntraining = 111, R2 = 0.85, F=145.554, s=0.53, CCCtraining = 91.73, RMSEtraining = 51.85 Internal Validation : Ntest = 26, Q2loo = 0.83, CCCcv = 90.94 Y-Scramble Test : Q2lmo = 0.84, R2Yscr = 3.61, R2Yrnd = 3.65 Log(IC50) = -11.812 + [(D1 x -18.924) + (D2 x 4.987) + (D3 x 12.621) + (D4 x -2445.014) -- Eq. 6

Figure 4. Residual Plot for Training set

For statistical validation the Y-scrambling method was used to estimate the performance of a model. Furthermore, the activity prediction for a test set of 25 compounds was confirmed by using the QSAR model given in Table 4. For external set validation, the diverse set of 231 compounds (Table T1-EX in SI) was used to predict the HIV-1 inhibition activity for experimentally measured compounds. Respective values are reported in supporting information (Table S5) Statistical parameters for external predictions are given in Table 5. These data suggest that the model we propose has robust predictive power.

8 Journal Name, 2014, Vol. 0, No. 0

Principal Author Last Name et al.

Table 5.Details on the model robustness ID

Set

R2

R2*

TR

Training Set

0.78

0.85

TS

Test Set

0.62

0.76

EX External Set 0.41 0.61 External Validation : RMSEtest = 65.25, R2test= 0.76, Q2F1 = 0.73, Q2F2 = 0.72, Q3F3 = 0.76, CCCext = 86.10 Prediction by Model Eq. 6 for External Dataset : Nexternal = 231, R2external = 0.61 *R2 = if outliers with error greater than 2.5 deviations excluded from Williams plot. In addition to the internal and external validation data, three sets of graphical data are given as follows: (i) Correlation plot for external dataset with experimental and predicted values (using model from Table 4.) [Figure 5], (ii) Williams plot – The plot of standardized residuals versus the Hat leverage to visualize the reliability of the QSAR model and identify the outliers [Figure 6]; (iii) applicability domain which defines the training set space and prediction capability of the model, this is a systematic approach to represent the reliability to make predictions for new compounds from QSAR model generated from training set. QSAR models are based on the leverage calculated from the diagonal values of the hat matrix of the modeling molecular descriptors [Figure 7]. The inter-correlation matrix of the descriptors in the model is given in Table 6 in order to understand the significance and uniqueness of molecular features correlating with experimental activity.

Figure 6. Williams Plot of diagonal Hat elements Vs. Standardized Residual for external dataset.

Figure 7. Applicability domain of the QSAR model Table 6. Descriptor’s inter-correlation matrix

Figure 5. Scatter Plot : Correlation between experimental and predicted values for external dataset.

Desc

D1

D2

D3

D1

1.00

D2

-0.0046

1.00

D3

-0.528

0.255

1.00

D4

0.254

-0.030

-0.417

D4

1.00

Mutual relation of two descriptors and the degree of correlation between two descriptors in a QSAR model progressively. Details of all list of descriptor are given in Table T2 of SI. Descriptor values of D1, D2, D3, D4 for training and test set is given in Table T3-TR and Table T3-TS respectively in SI.

Short Running Title of the Article

ZINC 77110510

0.465

ZINC 70763337

1.707

ZINC 86122474

1.822

ZINC 95346852

Predicted

0.947

ZINC 39118098

Database

Log (IC50)

ABAC095

Structure

RILP016

ID

RILP098 Figure 8A. Field-based template containing a single docked conformation of Rilpivirine based on their three-dimensional field point patterns. Negatively charged field points are shown in blue; positively charged field points are red; van der Waals/shape field points are displayed in yellow; centers of hydrophobicity are shown in orange.

Table 7. In-silico hits generated by scaffold hopping. Best 5 hits selected based on activity prediction using QSAR model given in Eq. 6.

ETRA027

The use of field points represents a major advance in the computational methods available for finding novel bioisosteres. It analyzes and scores the replaced bioisosteres in the context of the final molecule rather than as an isolated fragment. The results are significantly more diverse with both known bioisosteres and novel replacements suggested in most cases. With reference to the requirement of polar surface area is with electrostatic field, van der Waals molecular surfaces, compound’s hydrophobic surface area and increased negatively charged surface area is reflected in the field points of the generated compounds from scaffold hopping process. Field point representation for Rilpivirine is given in Figure 8A and the generated compounds after bioisostere group replacement is shown in Figure 8B highlighted with a black sphere.

predicted activity is given in Table 7. The complete list of structures and predicted activity is given in Table T5-NS of SI. The synthetic feasibility of these compounds was ensured with organic chemist (Author: CDH).

ETRA016

4.2 Scaffold Hopping

Journal Name, 2014, Vol. 0, No. 0 9

1.323

IC50 Predicted

2.923

50.98

66.40

8.863

21.03

ABAC=Compounds from Abacavir as starting molecule, RILP= Compounds from Rilpivirine as starting molecule, ERTA= Compounds from Etravirine as starting molecule. Databases= Compound ID from where the chemical group/fragment replacement was performed. Structural data for 385 in-silico compounds are given in excel format (SI2) and activity prediction using QSAR Eq 6. including molecular descriptor values are given in Table T5-NS in SI CONCLUSION Analysis by QSAR revealed molecular determinants of antiviral action against HIV-1 RT, and this knowledge was translated into search queries for a scaffold hopping step.

Figure 8B. In silico bioisostere replacement (scaffoldhopping). 3 best new substituent conformations with its associated field points in the binding site of protein PDB : 3MEE was identified and highlighted with a black sphere. Antiviral activity prediction was performed for the new compounds generated from scaffold hopping using QSAR model in Table 4. Selected list of compounds and its

We have clearly demonstrated that meaningfully reduced experimental data set can be successfully used for prediction of antiviral properties for large diverse set of chemicals. The results obtained from the model can be considered for selection of novel chemicals for further experimental testing. The important role of the hydrophobic regions and vdW’s in the Rilpivirine and Etravirine scaffold has been verified by both QSAR and molecular binding site analysis of target protein

10 Journal Name, 2014, Vol. 0, No. 0

Principal Author Last Name et al.

PDB ID: 3MEE/3MEC. The past decade has seen an increased understanding of the crucial role of QSAR and scaffold hopping in lead optimization to develop novel inhibitors. QSAR studies on diverse classes of inhibitors have been thoroughly explored. The QSAR method in combination with a range of descriptors and other in-silico tools has increased the applicability of former in drug discovery and design. Based on theoretically obtained compounds from scaffold hopping we propose a number of novel and potentially active scaffolds for further assessment.

potent inhibitor of HIV-1 reverse transcriptase. Journal of Biological Chemistry, 1993, 268, 14875-14880. [6]

Gonzales, A.J.; Diebel, M.R.; Romero, D.L.; Aristoff, P.A.; Tarpley, W.G.; Reusser, F. Kinetic studies with the nonnucleoside HIV-1 reverse transcriptase inhibitor U-88204E. Biochemistry, 1993, 32, 6548-6554.

[7]

Franks, K.M.; Diebel, M.R.; Kezdy, F.J.; Romero, D.L.; Thomas, R.C.; Aristoff, P.A.; Tarpley, W.G.; Reusser, F. The benzylthiopyrididine U-31,355 is a potent inhibitor of HIV-1 reverse transcriptase. Biochemical Pharmacology,1996, 51, 743-750.

[8]

Chou, K.C. Review: Prediction of human immunodeficiency virus protease cleavage sites in proteins. Analytical Biochemistry, 1996, 233, 1-14.

[9]

Shen, H.B. HIVcleave: a web-server for predicting HIV protease cleavage sites in proteins. Analytical Biochemistry, 2008, 375, 388-390.

AUTHOR CONTRIBUTION

[10]

Conceived and designed the experiments: GGP, KT, MK. Performed the experiments: GGP, LM, CSP. Analyzed the data: PB, CDH, ARK, KT, MK. Contributed analysis tools: GGP, ARK, MK. Wrote the manuscript: GGP, LM, KT

Chou, J.J. A formulation for correlating properties of peptides and its application to predicting human immunodeficiency virus protease-cleavable sites in proteins. Biopolymers, 1993, 33, 14051414.

[11]

Katz, R. A.; Skalka, A. M. The Retroviral Enzymes. Annu. Rev. Biochem. 1994, 63, 133–173.

[12]

Debnath, A. Application of 3D-QSAR Techniques in Anti-HIV-1 Drug Design - An Overview. Curr. Pharm. Des. 2005, 11 (24), 3091–3110.

[13]

Walmsley, S. Protease Inhibitor-Based Regimens for HIV Therapy: Safety and Efficacy. JAIDS J. Acquir. Immune Defic. Syndr. 2007, 45.

[14]

Speck-Planche, A.; Kleandrova, V. V; Luan, F.; Cordeiro, M. N. D. S. A Ligand-Based Approach for the in Silico Discovery of Multi-Target Inhibitors for Proteins Associated with HIV Infection. Mol. Biosyst. 2012, 8 (8), 2188–2196.

[15]

Mao, Y.; Li, Y.; Hao, M.; Zhang, S.; Ai, C. Docking, Molecular Dynamics and Quantitative Structure-Activity Relationship Studies for HEPTs and DABOs as HIV-1 Reverse Transcriptase Inhibitors. J. Mol. Model. 2012, 18 (5), 2185–2198.

[16]

De Brito, M. A.; Rodrigues, C. R.; Cirino, J. J. V.; Araújo, J. Q.; Honório, T.; Cabral, L. M.; de Alencastro, R. B.; Castro, H. C.; Albuquerque, M. G. Residue-Ligand Interaction Energy (ReLIE) on a Receptor-Dependent 3D-QSAR Analysis of S- and NHDABOs as Non-Nucleoside Reverse Transcriptase Inhibitors. Molecules 2012, 17 (7), 7666–7694.

[17]

Ashok, P.; Sharma, H.; Lathiya, H.; Chander, S.; Murugesan, S. In-Silico Design and Study of Novel Piperazinyl Β-Carbolines as Inhibitor of HIV-1 Reverse Transcriptase. Med. Chem. Res. 2014, 24 (2), 513–522.

[18]

Jain (Pancholi), N.; Gupta, S.; Sapre, N.; Sapre, N. S. In Silico de Novo Design of Novel NNRTIs: A Bio-Molecular Modelling Approach. RSC Adv. 2015, 5 (19), 14814–14827.

[19]

Berhanu, W. M.; Pillai, G. G.; Oliferenko, A. A.; Katritzky, A. R. Quantitative Structure-Activity/Property Relationships: The Ubiquitous Links between Cause and Effect. Chempluschem 2012, 77 (7), 507–517.

[20]

He, Z.; Zhang, J.; Shi, X.-H.; Hu, L.-L.; Kong, X.; Cai, Y.-D.; Chou, K.-C. Predicting Drug-Target Interaction Networks Based on Functional Groups and Biological Features. PLoS One 2010, 5 (3), e9603.

[21]

Chou, K. C. A Vectorized Sequence-Coupling Model for Predicting HIV Protease Cleavage Sites in Proteins. J. Biol. Chem. 1993, 268 (23), 16938–16948.

[22]

Wang, P.; Hu, L.; Liu, G.; Jiang, N.; Chen, X.; Xu, J.; Zheng, W.; Li, L.; Tan, M.; Chen, Z.; Song, H.; Cai, Y.-D.; Chou, K.-C. Prediction of Antimicrobial Peptides Based on Sequence Alignment and Feature Selection Methods. PLoS One 2011, 6 (4), e18476.

[23]

Feher, M.; Ewing, T. Global or Local QSAR: Is There a Way Out? QSAR Comb. Sci. 2009, 28 (8), 850–855.

CONFLICT OF INTEREST No Authors have conflict of interest.

ACKNOWLEDGEMENTS G.G. Pillai is grateful to the graduate school “Functional materials and technologies,” University of Tartu for funding from the European Social Fund under project 1.2.0401.090079, the EU European Regional Development Fund through the Center of Excellence in Chemical Biology, Estonia, by targeted financing from the Estonian Ministry of Education and Research (SF0140031As09) and Florida Center for Heterocyclic Compounds, Kenan Foundation at University of Florida to carry out my research. Authors are grateful to Prof. Paola Gramatica for providing free licenses and scientific advices for QSARINS software, Mr. Khalid and Dr. Alexander Oliferenko for scientific discussions. The authors are grateful to Dr. Rachel A. Jones for english checking SUPPLEMENTARY MATERIAL Supplementary material contains the structure, SMILES and activity data of the whole dataset, including predicted activity and others statistical data.

REFERENCES [1]

HIV/AIDS – WHO Fact http://www.who.int/mediacentre/factsheets/fs360/en/ during December 2014.

[2]

Klatzmann, D.; Champagne, E.; Chamaret, S.; Gruest, J.; Guetard, D.; Hercend, T.; Gluckman, J.-C.; Montagnier, L. T-Lymphocyte T4 Molecule Behaves as the Receptor for Human Retrovirus LAV. Nature 1984, 312 (5996), 767–768.

[3]

Le, G.; Vandegraaff, N.; Rhodes, D. I.; Jones, E. D.; Coates, J. A. V; Lu, L.; Li, X.; Yu, C.; Feng, X.; Deadman, J. J. Discovery of Potent HIV Integrase Inhibitors Active against Raltegravir Resistant Viruses. Bioorg. Med. Chem. Lett. 2010, 20 (17), 5013– 5018.

[4]

Richman, D. D. HIV Chemotherapy. Nature 2001, 410 (6831), 995–1001.

[5]

Althaus, I.W.; Gonzales, A.J.; Kezdy, F.J.; Romero, D.L.; Aristoff, P.A.; Tarpley, W.G.; Reusser, F. The quinoline U-78036 is a

sheets: Accessed

Short Running Title of the Article [24]

[25]

[26]

[27]

Roessler, F.; Korb, O.; Bender, A.; Maentele, W.; Bond, P. Molecular Dynamics Simulations and Docking of Non-Nucleoside Reverse Transcriptase Inhibitors (NNRTIs): A Possible Approach to Personalized HIV Treatment. J. Cheminform. 2012, 4, P32. Rizzo, R. C.; Udier-Blagović, M.; Wang, D.-P.; Watkins, E. K.; Kroeger Smith, M. B.; Smith, R. H.; Tirado-Rives, J.; Jorgensen, W. L. Prediction of Activity for Nonnucleoside Inhibitors with HIV-1 Reverse Transcriptase Based on Monte Carlo Simulations. J. Med. Chem. 2002, 45 (14), 2970–2987. Hansch, C.; Leo, A.; Hoekman, D. H. Exploring QSAR.: . Fundamentals and Applications in Chemistry and Biology; American Chemical Society, Washington D.C., USA, 513-541 (1995). Roy, K.; Leonard, J. T. QSAR Modeling of HIV-1 Reverse Transcriptase Inhibitor 2-Amino-6-Arylsulfonylbenzonitriles and Congeners Using Molecular Connectivity and E-State Parameters. Bioorg. Med. Chem. 2004, 12 (4), 745–754.

Journal Name, 2014, Vol. 0, No. 0 11 [41]

Massey, F. J. The Kolmogorov-Smirnov Test for Goodness of Fit. J. Am. Stat. Assoc. 1951, 46 (253), 68–78.

[42]

Lemmen, C.; Lengauer, T. Computational Methods for the Structural Alignment of Molecules. J. Comput. Aided. Mol. Des. 2000, 14 (3), 215–232.

[43]

Wetzel, S.; Klein, K.; Renner, S.; Rauh, D.; Oprea, T. I.; Mutzel, P.; Waldmann, H. Interactive Exploration of Chemical Space with Scaffold Hunter. Nat. Chem. Biol. 2009, 5 (8), 581–583.

[44]

Katritzky, A. R.; Karelson, M.; Petrukhin, R. CODESSA-Pro, University of Florida, USA, 2001.

[45]

Steinbeck, C.; Hoppe, C.; Willighagen, E. L. Recent Development Kit (CDK) Chemo- and Bioinformatics. 2111–2120.

[46]

Dennington, R.; Keith, T. Codessa III, 2013.

Kuhn, S.; Floris, M.; Guha, R.; Developments of the Chemistry an Open-Source Java Library for Curr. Pharm. Des. 2006, 12 (17),

[28]

Polanski, J.; Bak, A.; Gieleciak, R.; Magdziarz, T. Modeling Robust QSAR. J. Chem. Inf. Model. 2005, 46 (6), 2310–2318.

[47]

[29]

Polanski, J. Receptor Dependent Multidimensional QSAR for Modeling Drug - Receptor Interactions. Curr. Med. Chem. 2015, 16 (25).

Yap, C. W. PaDEL-Descriptor: An Open Source Software to Calculate Molecular Descriptors and Fingerprints. J. Comput. Chem. 2011, 32 (7), 1466–1474.

[48]

Cherkasov, A.; Muratov, E. N.; Fourches, D.; Varnek, A.; Baskin, I. I.; Cronin, M.; Dearden, J.; Gramatica, P.; Martin, Y. C.; Todeschini, R.; Consonni, V.; Kuz’min, V. E.; Cramer, R.; Benigni, R.; Yang, C.; Rathman, J.; Terfloth, L.; Gasteiger, J.; Richard, A.; Tropsha, A. QSAR Modeling: Where Have You Been? Where Are You Going To? J. Med. Chem. 2014, 57 (12), 4977–5010.

Tetko, I. V; Gasteiger, J.; Todeschini, R.; Mauri, A.; Livingstone, D.; Ertl, P.; Palyulin, V. A.; Radchenko, E. V; Zefirov, N. S.; Makarenko, A. S.; Tanchuk, V. Y.; Prokopenko, V. V. Virtual Computational Chemistry Laboratory--Design and Description. J. Comput. Aided. Mol. Des. 2005, 19 (6), 453–463.

[49]

Marvin Sketch 15.0, ChemAxon Kft, Hungary. Available from: www.chemaxon.com, 2014.

[50]

ChemBio3D Ultra 13.0, PerklinElmer Inc., USA, 2013.

[51]

MacroModel 10.0 : Schrödinger 2013-1, Schrödinger LLC, USA. 2013.

[52]

Chou, K.C. Review: Structural bioinformatics and its impact to biomedical science. Current Medicinal Chemistry,2004, 11, 2105-2134.

[53]

Cruz-Monteagudo, M.; PhamThe, H.; Cordeiro, M. N. D. S.; Borges, F. Prioritizing Hits with Appropriate Trade-Offs Between HIV-1 Reverse Transcriptase Inhibitory Efficacy and MT4 Blood Cells Toxicity Through Desirability-Based Multiobjective Optimization and Ranking. Mol. Inform. 2010, 29 (4), 303–321.

Watenpaugh, K.D.; Heinrikson, R.L. A Model of the complex between cyclin-dependent kinase 5 (Cdk5) and the activation domain of neuronal Cdk5 activator. Biochemical & Biophysical Research Communications (BBRC), 1999,259, 420-428.

[54]

Cruz-Monteagudo, M.; Cordeiro, M. N. D. S.; Tejerad, E.; Dominguez, E. R.; Borgesa, F. Desirability-Based MultiObjective QSAR in Drug Discovery. Mini Rev. Med. Chem. 2012.

Zhang, J.; Luan, C.H.; Johnson, G.V.W. Identification of the Nterminal functional domains of Cdk5 by molecular truncation and computer modeling. Proteins: Structure, Function, and Genetics, 2002, 48, 447-453.

[55]

Wang, Y.; Yu, L.; Zhang, L.; Qu, H.; Cheng, Y. A Novel Methodology for Multicomponent Drug Design and Its Application in Optimizing the Combination of Active Components from Chinese Medicinal Formula Shenmai. Chem. Biol. Drug Des. 2010, 75 (3), 318–324.

Wei, D.Q.; Zhong, W.Z. Binding mechanism of coronavirus main proteinase with ligands and its implication to drug design against SARS. (Erratum: ibid., 2003, Vol.310, 675). Biochem Biophys Res Comm (BBRC), 2003, 308, 148-151.

[56]

Li, X.B.; Wang, S.Q.; Xu, W.R. Novel Inhibitor Design for Hemagglutinin against H1N1 Influenza Virus by Core Hopping Method. PLoS ONE, 2011, 6, e28111.

[36]

Instant JChem 15.0, ChemAxon Kft, Hungary. Available from: www.chemaxon.com, 2014.

[57]

[37]

Desjardins, R. E.; Canfield, C. J.; Haynes, J. D.; Chulay, J. D. Quantitative Assessment of Antimalarial Activity in Vitro by a Semiautomated Microdilution Technique. Antimicrob. Agents Chemother. 1979, 16 (6), 710–718.

Pielak, R.M.; Jason R. Schnell, J.R. Mechanism of drug inhibition and drug resistance of influenza A M2 channel. Proceedings of National Academy of Science, USA, 2009, 106, 7379-7384.

[58]

Hopkins, A. L.; Ren, J.; Esnouf, R. M.; Willcox, B. E.; Jones, E. Y.; Ross, C.; Miyasaka, T.; Walker, R. T.; Tanaka, H.; Stammers, D. K.; Stuart, D. I. Complexes of HIV-1 Reverse Transcriptase with Inhibitors of the HEPT Series Reveal Conformational Changes Relevant to the Design of Potent Non-Nucleoside Inhibitors. J. Med. Chem. 1996, 39 (8), 1589–1600.

[59]

Das, K.; Bauman, J. D.; Clark, A. D.; Frenkel, Y. V; Lewi, P. J.; Shatkin, A. J.; Hughes, S. H.; Arnold, E. High-Resolution Structures of HIV-1 Reverse transcriptase/TMC278 Complexes: Strategic Flexibility Explains Potency against Resistance Mutations. Proc. Natl. Acad. Sci. U. S. A. 2008, 105 (5), 1466– 1471.

[60]

Lansdon, E. B.; Brendza, K. M.; Hung, M.; Wang, R.; Mukund, S.; Jin, D.; Birkus, G.; Kutty, N.; Liu, X. Crystal Structures of HIV-1 Reverse Transcriptase with Etravirine (TMC125) and

[30]

[31]

Vrouenraets, S. M. E.; Wit, F. W. N. M.; van Tongeren, J.; Lange, J. M. A. Efavirenz: A Review. Expert Opin. Pharmacother. 2007, 8 (6), 851–871.

[32]

Liu, X.; Zhu, F.; Ma, X. H.; Shi, Z.; Yang, S. Y.; Wei, Y. Q.; Chen, Y. Z. Predicting Targeted Polypharmacology for Drug Repositioning and Multi- Target Drug Discovery. Curr. Med. Chem. 2013, 20 (13), 1646–1661.

[33]

[34]

[35]

[38]

Kerwin, J. F. Burger’s Medicinal Chemistry and Drug Discovery. 5th Edition. Volume 1: Principles and Practice. Manfred E. Wolff (ed.), Wiley-Interscience, New York, Drug Dev. Res. 1995, 35 (2), 116–117.

[39]

C. D. Selassie, E. D. J. A. I. J. W. Burger’s Medicinal Chemistry and Drug Discovery. Drug Discovery and Drug Development. 6th Edition. Volume 2: Principles and Practice. Donald, J. Abraham (ed.), Wiley-Interscience, New York, Drug Dev. Res. 2003, (2), 243-274.

[40]

Shapiro, S.; Wilk, M. An Analysis of Variance Test for Normality (complete Samples). Biometrika 1965, 3 (52).

12 Journal Name, 2014, Vol. 0, No. 0

Principal Author Last Name et al.

Rilpivirine (TMC278): Implications for Drug Design. J. Med. Chem. 2010, 53 (10), 4295–4299. [61]

Dewar, M. J. S.; Zoebisch, E. G.; Healy, E. F.; Stewart, J. J. P. Development and Use of Quantum Mechanical Molecular Models. 76. AM1: A New General Purpose Quantum Mechanical Molecular Model. J. Am. Chem. Soc. 1985, 107 (13), 3902–3909.

[62]

Dewar, M. J. S.; Dennington, R.; Keith, T. AMPAC 10. Available from: www.semichem.com

[63]

Stewart, J. J. P. MOPAC 7.1, 2006. www.openmopac.net.

Molecular Graphs within the framework of Sanderson Principle. Dokl. Akad. Nauk SSSR 1987, 296 (4), 883–887. [80]

Stanton, D. T.; Egolf, L. M.; Jurs, P. C.; Hicks, M. G. ComputerAssisted Prediction of Normal Boiling Points of Pyrans and Pyrroles. J. Chem. Inf. Model. 1992, 32 (4), 306–316.

[81]

Hirshfeld, F. L. Bonded-Atom Fragments for Describing Molecular Charge Densities. Theor. Chim. Acta 1977, 44 (2), 129–138.

Available from:

[82]

[64]

Stewart, J. J. P. MOPAC: A Semiempirical Molecular Orbital Program. J. Comput. Aided. Mol. Des. 1990, 4 (1), 1–103.

Petitjean, M. On the Analytical Calculation of van Der Waals Surfaces and Volumes: Some Numerical Aspects. J. Comput. Chem. 1994, 15 (5), 507–523.

[83]

[65]

Roy, K.; Pratim Roy, P. Comparative Chemometric Modeling of Cytochrome 3A4 Inhibitory Activity of Structurally Diverse Compounds Using Stepwise MLR, FA-MLR, PLS, GFA, G/PLS and ANN Techniques. Eur. J. Med. Chem. 2009, 44 (7), 2913– 2922.

Averill, B.; Eldredge, P. Chemistry: Principles, Patterns, and Applications; Pearson/Prentice Hall, New Jersey, USA. 1131p. 2007.

[84]

Pyykkö, P.; Atsumi, M. Molecular Double-Bond Covalent Radii for Elements Li-E112. Chemistry 2009, 15 (46), 12770–12779.

[85]

Richard F. W. Bader; Atoms in Molecules: A Quantum Theory (International Series of Monographs on Chemistry); Oxford University Press, UK, 458pp. 1994

[86]

Politzer, P.; Lane, P.; Murray, J. S.; Brinck, T. Investigation of Relationships between Solute Molecule Surface Electrostatic Potentials and Solubilities in Supercritical Fluids. J. Phys. Chem. 1992, 96 (20), 7938–7943.

[87]

Murray, J. S.; Brinck, T.; Edward Grice, M.; Politzer, P. Correlations between Molecular Electrostatic Potentials and Some Experimentally-Based Indices of Reactivity. J. Mol. Struct. THEOCHEM 1992, 256, 29–45.

[88]

Brown N.; Bioisosteres in Medicinal Chemistry, Volume 54; Wiley-VCH; 1 Edition, 2012.

[89]

Ertl, P.; Lewis, R. IADE: A System for Intelligent Automatic Design of Bioisosteric Analogs. J. Comput. Aided. Mol. Des. 2012, 26 (11), 1207–1215.

[90]

Li, X.B.; Wang, S.Q.; Xu, W.R. Novel Inhibitor Design for Hemagglutinin against H1N1 Influenza Virus by Core Hopping Method. PLoS ONE, 2011, 6, e28111.

[91]

Ma, Y.; Wang, S.Q.; Xu, W.R.; Wang, R.L. Design novel dual agonists for treating type-2 diabetes by targeting peroxisome proliferator-activated receptors with core hopping approach. PLoS ONE, 2012, 7, e38546.

[92]

Low, C. M. R.; Buck, I. M.; Cooke, T.; Cushnir, J. R.; Kalindjian, S. B.; Kotecha, A.; Pether, M. J.; Shankley, N. P.; Vinter, J. G.; Wright, L. Scaffold Hopping with Molecular Field Points: Identification of a Cholecystokinin-2 (CCK2) Receptor Pharmacophore and Its Use in the Design of a Prototypical Series of Pyrrole- and Imidazole-Based CCK2 Antagonists. J. Med. Chem. 2005, 48 (22), 6790–6802.

[66]

Rodríguez-Barrios, F.; Gago, F. Chemometrical Identification of Mutations in HIV-1 Reverse Transcriptase Conferring Resistance or Enhanced Sensitivity to Arylsulfonylbenzonitriles. J. Am. Chem. Soc. 2004, 126 (9), 2718–2719.

[67]

Karelson, M.; Lobanov, V. S.; Katritzky, A. R. QuantumChemical Descriptors in QSAR/QSPR Studies. Chem. Rev. 1996, 96 (3), 1027–1044.

[68]

González-Díaz, H.; Arrasate, S.; Gómez-SanJuan, A.; Sotomayor, N.; Lete, E.; Besada-Porto, L.; Ruso, J. M. General Theory for Multiple Input-Output Perturbations in Complex Molecular Systems. 1. Linear QSPR Electronegativity Models in Physical, Organic, and Medicinal Chemistry. Curr. Top. Med. Chem. 2013, 13 (14), 1713–1741.

[69]

[70]

Luan, F.; Kleandrova, V. V; González-Díaz, H.; Ruso, J. M.; Melo, A.; Speck-Planche, A.; Cordeiro, M. N. D. S. ComputerAided Nanotoxicology: Assessing Cytotoxicity of Nanoparticles under Diverse Experimental Conditions by Using a Novel QSTRPerturbation Approach. Nanoscale 2014, 6 (18), 10623–10630. Cartwright, H. M. Applications of Artificial Intelligence in Chemistry (Oxford Chemistry Primers); Oxford University Press, USA, 1994.

[71]

Leardi, R.; Boggia, R.; Terrile, M. Genetic Algorithms as a Strategy for Feature Selection. J. Chemom. 1992, 6 (5), 267–281.

[72]

Gramatica, P.; Chirico, N.; Papa, E.; Cassani, S.; Kovarich, S. QSARINS: A New Software for the Development, Analysis, and Validation of QSAR MLR Models. J. Comput. Chem. 2013, 34 (24), 2121–2132.

[73]

Guyon, I.; Elisseeff, A. An Introduction to Variable and Feature Selection. J. Mach. Learn. Res. 2003, 3, 1157–1182.

[74]

Discovery Studio Visualizer 4.0, Accelrys Inc., USA, 2014.

[93]

[75]

Leonard, J. T.; Roy, K. Exploring Molecular Shape Analysis of Styrylquinoline Derivatives as HIV-1 Integrase Inhibitors. Eur. J. Med. Chem. 2008, 43 (1), 81–92.

Schuffenhauer, A. Computational Methods for Scaffold Hopping. Wiley Interdiscip. Rev. Comput. Mol. Sci. 2012, 2 (6), 842–867.

[94]

Gaulton, A.; Bellis, L. J.; Bento, A. P.; Chambers, J.; Davies, M.; Hersey, A.; Light, Y.; McGlinchey, S.; Michalovich, D.; AlLazikani, B.; Overington, J. P. ChEMBL: A Large-Scale Bioactivity Database for Drug Discovery. Nucleic Acids Res. 2012, 40, D1100–D1107.

[95]

Pitt, W. R.; Parry, D. M.; Perry, B. G.; Groom, C. R. Heteroaromatic Rings of the Future. J. Med. Chem. 2009, 52 (9), 2952–2963.

[96]

Sirois, S.; Sing, T. Review: HIV-1 gp120 V3 loop for structurebased drug design. Curr. Protein Pept. Sci., 2005, 6, 413-422.

[97]

Touaibia, M.; Roy, R. Review: Glycosylation of HIV-1 gp120 V3 loop: towards the rational design of a synthetic carbohydrate vaccine. Curr. Med. Chem., 2007, 14, 3232-3242.

[76]

Pattabiraman, N. Occluded Molecular Surface Analysis of Ligand-Macromolecule Contacts: Application to HIV-1 ProteaseInhibitor Complexes. J. Med. Chem. 1999, 42 (19), 3821–3834.

[77]

Rawal, R. K.; Prabhakar, Y. S.; Katti, S. B. Molecular Surface Features in Modeling the HIV-1 RT Inhibitory Activity of 2-(2,6Disubstituted Phenyl)-3-(substituted Pyrimidin-2-Yl)-Thiazolidin4-Ones. QSAR Comb. Sci. 2007, 26 (3), 398–406.

[78]

Stanton, D. T.; Jurs, P. C. Development and Use of Charged Partial Surface Area Structural Descriptors in Computer-Assisted Quantitative Structure-Property Relationship Studies. Anal. Chem. 1990, 62 (21), 2323–2329.

[79]

Zefirov, N. S.; Kirpichenok, M. A.; Ismailov, F. F.; Trofimov, M. I. Calculation Schemes for Atomic Electronegativities in

Short Running Title of the Article

Received: March 20, 2014

Journal Name, 2014, Vol. 0, No. 0 13

Revised: April 16, 2014

Accepted: April 20, 2014

Suggest Documents