Exploring the Structure Activity Relationships of ...

5 downloads 0 Views 242KB Size Report
Tetrahydrobenzodiazepines as Farnesyltransferase Inhibitors: A QSAR. Study ... polarizability with the farnesyltransferase inhibitory activity was achieved.
Letters in Drug Design & Discovery, 2011, 8, ???-???

1

Exploring the Structure Activity Relationships of Imidazole Containing Tetrahydrobenzodiazepines as Farnesyltransferase Inhibitors: A QSAR Study Anand Gaurav*, Vertika Gautam and Ranjit Singh School of Pharmaceutical Sciences, Shobhit University, Meerut, 250110, India Received December 09, 2010: Revised March 16, 2010: Accepted April 01, 2010

Abstract: Quantitative structure activity relationship approach using enhanced replacement method for variable selection was applied to a series of imidazole containing tetrahydrobenzodiazepines as inhibitors of farnesyltransferase. For the purpose the dataset was divided into training set of 31 compounds and test set of 5 compounds using k means clustering. Statistically significant equations were obtained with high correlation coefficient coefficient (R=0.9637) and low standard deviation (S=0.3715). The robustness of the model was confirmed with the help of R2cv with a value of 0.8886, Y scrambling and by predicting the activities of test compounds. A good correlation of KierA3 shape indices and polarizability with the farnesyltransferase inhibitory activity was achieved.

Keywords: QSAR, Farnesyltransferase Inhibitors, Tetrahydrobenzodiazepines, Y scrambling, Polarizability. INTRODUCTION Cancer is caused by stepwise accumulation of mutations that affect control of growth, differentiation and survival of cell [1]. Ras proteins are the key elements in signal transduction from cell-surface receptor tyrosine kinases to the nucleus, and control of growth, differentiation, and survival of cell [2]. Mutated Ras proteins are found in 3050% of human tumors including lung, colon and pancreatic cancers [3]. Farnesyltransferase a metalloenzyme, containing zinc as a cofactor, is believed to play an important role in the catalytic mechanism [4]. Farnesyltransferase inhibitors, by blocking the prenylation of Ras proteins and other oncoprotein targets, have been shown to be effective for the treatment of several cancers in recent clinical trials [5-6]. The signaling functions of Ras are dependent on its association with membrane, which in turn is accomplished through farnesylation, C-terminal tripeptide hydrolysis and carboxymethylation of cytosolic Ras. The key step in this sequence of reactions is the farnesylation of Ras by the enzyme protein farnesyltransferase (PFT) thus, in past decade several research groups have focused attention toward inhibiting this enzyme as a potential anticancer therapeutic modality [7-9]. The clinically relevant red blood cell stages of P.falciparum also contain PFT activity [10-11]. Thus farnesyltransferase inhibitors have also attracted attention for their anti-malarial activity [12]. Recently, Plasmodium Selective farnesyltransferase inhibitors have been developed [13]. PFT inhibitors have been extensively developed for the anticancer therapy and structurally diverse compounds with drug–like properties are also available [14-15]. Although most of them possess potent inhibitory activity against PFT,

*Address correspondence to this author at the School of Pharmaceutical Sciences, Shobhit University, Meerut, 250110, India; Tel: +91-9458272353; Fax: +91-121-2575724; E-mail: [email protected] 1570-1808/11 $58.00+.00

there is scope for improvement of potency of these compounds. Quantitative Structure Activity Relationship (QSAR) is a widely used tool for the identification of structural requirements of various enzyme inhibitors. Previously we have reported 2D and 3D QSAR studies on 4quinolones as selective GABAA receptor modulators and identified the key structural requirements for the GABAA receptor affinity [16-17]. A series of imidazole containing tetrahydrobenzodiazepines which exhibit anti-tumor activity [18] by inhibition of farnesyltransferase, have been used for this study. The aim of this study was to establish quantitative relationship(s) between the structure of imidazole containing tetrahydrobenzodiazepines and their farnesyltransferase inhibitory activity. QSAR models were developed to identify the structural requirements for farnesyltransferase inhibition, and validated by prediction of the farnesyltransferase inhibition activity of test set. The study is important from the point of guiding the synthetic chemist to synthesise new candidate compounds possessing enhanced inhibitory activity on farnesyltransferase. RESULTS AND DISCUSSION A series of imidazole containing tetrahydrobenzodiazepine derivatives containing a total of 36 compounds was selected on the basis of earlier reports (Table 1). The series was divided into training and test sets using k-means clustering described in experimental section (Table 4 & 5). Five compounds composed the test set while rest of the molecules composed the training set. Training set molecules were used to develop QSAR models by applying ERM taking physicochemical properties of the compounds (Table 2 & 3) as the independent variables and their pIC50 values (Table 1) as the dependent variables. Sufficient care was taken to avoid use of collinear variables in the same equation as it leads to false correlations. A large number of QSARs were generated, however only those were considered where © 2011 Bentham Science Publishers Ltd.

2

Letters in Drug Design & Discovery, 2011, Vol. 8, No. 5

Table 1.

Gaurav et al.

Compounds with R1, R2, R3, R4 Substituents Used for the Development of QSAR Models R3 R4

R1

N

N

R2

R1

R2

R3

R4

Imidazol-4-ylethyl

Naphthalene-1-carbonyl

H

H

2

Imidazol-2-ylpropyl

Naphthalene-1-carbonyl

H

H

3

Imidazol-(2-aminomethyl)-4-ylmethyl

Naphthalene-1-carbonyl

H

H

4

1-phenylmethylimidazol-5-ylmethyl

Naphthalene-1-carbonyl

H

H

5

Imidazol(2-aminoethyl)-4-ylmethyl

Naphthalene-1-carbonyl

H

H

6

Imidazol-4-ylmethyl

Naphthalene-1-carbonyl

H

H

7

Imidazol-2-ylmethyl

Naphthalene-1-carbonyl

H

H

8

Imidazol-4-ylmethyl

2-biphenylcarbonyl

H

H

9

Imidazol-4-ylmethyl

Naphthalene-1-sulphonyl

H

H

10

Imidazol-4-ylmethyl

2-(phenylmethylamino)phenylcarbonyl

H

H

11

Imidazol-4-ylmethyl

2-phenoxyphenylmethylcarbonyl

H

H

12

Imidazol-4-ylmethyl

2-(methoxycarbonyl)phenylsulphonyl

H

H

13

Naphthalene-1-carbonyl

Imidazol-4-ylmethyl

H

H

14

Imidazol-4-ylmethyl

Diphenylaminocarbonyl

H

H

15

Imidazol-4-ylmethyl

Naphthalene-1-carbonyl

H

NHCO-cyclohexyl

16

Imidazol-4-ylmethyl

Naphthalene-1-carbonyl

H

NHCO-phenyl

17

Imidazol-4-ylmethyl

Naphthalene-1-carbonyl

H

NHCONH-cyclohexyl

1

18

Imidazol-4-ylmethyl

Naphthalene-1-carbonyl

Pyrid-3-yl

H

19

Imidazol-4-ylmethyl

Naphthalene-1-carbonyl

H

NHCOCH3

20

Imidazol-4-ylmethyl

Naphthalene-1-carbonyl

Pyrid-2-yl

H

21

Imidazol-4-ylmethyl

Naphthalene-1-carbonyl

Cyclohexyl

H

22

Imidazol-4-ylmethyl

Naphthalene-1-carbonyl

Pyrid-4-yl

H

23

Imidazol-4-ylmethyl

Naphthalene-1-carbonyl

Br

H

24

Imidazol-4-ylmethyl

Naphthalene-1-carbonyl

H

NHSO2-phenyl

25

Imidazol-4-ylmethyl

Naphthalene-1-carbonyl

H

H

26

Imidazol-4-ylmethyl

Naphthalene-1-carbonyl

H

Cl

27

Imidazol-4-ylmethyl

2-trifluoromethoxyphenylcabonyl

Pyrid-4-yl

H

28

Imidazol-4-ylmethyl

Naphthalene-1-carbonyl

Phenyl

H

29

Imidazol-4-ylmethyl

2-methoxynaphth-1-carbonyl

Phenyl

H

30

Imidazol-4-ylmethyl

4-(methoxyphenyl)ethane1,2-dione

Phenyl

H

31

Imidazol-4-ylmethyl

2-(hydoxyethylthio)phenylcarbonyl

Phenyl

H

32

Imidazol-4-ylmethyl

Naphth-1-sulponyl

Phenyl

H

33

Imidazol-4-ylmethyl

1-phenyltetrazole-5-yl

Phenyl

H

34

Imidazol-4-ylmethyl

4-chlorophenylaminomethylcyanamide

Phenyl

H

35

Imidazol-4-ylmethyl

Tetrahydroquinol-1-carbonyl

Phenyl

H

36

Imidazol-4-ylmethyl

Naphthyl-1-methyl

Phenyl

H

Exploring the Structure Activity Relationships of Imidazole Containing

Table 2.

Parameters Used in the Study with their Significance and Mathematical Expressions

Descriptor 1

Letters in Drug Design & Discovery, 2011, Vol. 8, No. 5

Third alpha modified shape index (KierA3) [23]

Mathematical expression

KierA3

=

KierA3

=

(

A + 1

( (

3

(

) ( A +   3)

P+

A+3 3

Significance

)

2

) ( A +   2)

P+

)

2

2

if A is odd

The third alpha modified shape index encodes degree of centrality of branching, thus higher and lower values of KierA3 indicates branching at the ends and at the centre of long chain fragment of the molecule respectively.

2

if A is even

A : number of non hydrogen atoms = 1  rx / rx(sp3 ) rx:covalent radius of the atom being evaluated r x(sp3 ): covalent radius of a carbon sp3 atom (0.77 Å) 3

P: number of paths of bond length 3

2

Polarizability (apol) [24]

=

3  1 1 4  + 2 n

Polarizability of a molecule is directly related to its bulk, thus bulkier molecules have higher polarizabilities.

: Dielectric constant n: Refractive index 3

Specific Polarizability (SPol) [24]

SPol =

 V

The London dispersion interactions between two molecules are proportional to their specific polarizabilities.

: Molecular Polarizability V: Molecular Volume

Table 3.

Descriptors of the Compounds Involved in the Models Developed KierA3

Apol

I(R2)

SPol(R4)

1

3.8604

0.3870

1

0.0535

2

4.2082

0.3870

1

0.0535

3

3.7250

0.3870

1

0.0535

4

4.5525

0.3870

1

0.0535

5

4.2941

0.3870

1

0.0535

6

3.5140

0.3870

1

0.0535

7

3.0810

0.3870

1

0.0535

8

4.2082

0.3870

1

0.0535

9

3.6021

0.3870

1

0.0535

10

4.6503

0.3870

1

0.0535

11

5.1558

0.3870

1

0.0535

12

4.2376

0.3870

1

0.0535

13

3.5140

0.3870

1

0.0535

14

4.4706

0.3870

1

0.0535

15

5.6891

0.3870

1

0.1073

16

5.3946

0.3870

1

0.1206

17

5.9019

0.3870

1

0.1083

18

4.3051

9.7250

1

0.0535

19

4.6356

0.3870

1

0.0966

20

4.3051

9.7250

1

0.0535

3

4

Letters in Drug Design & Discovery, 2011, Vol. 8, No. 5

Gaurav et al.

(Table 3). Contd…..

KierA3

Apol

I(R2)

SPol(R4)

21

4.5934

11.0000

1

0.0535

22

4.3051

9.7250

1

0.0535

23

3.9025

3.4000

1

0.0535

24

4.8016

0.3870

1

0.1397

25

3.6224

0.3870

1

0.0535

26

3.8372

0.3870

1

0.1121

27

5.6853

9.7250

1

0.0535

28

4.3285

10.434

1

0.0535

29

4.8108

10.4340

1

0.0535

30

5.0341

10.4340

1

0.0535

31

5.4482

10.4340

1

0.0535

32

4.5860

10.4340

1

0.0535

33

4.7302

10.4340

0

0.0535

34

5.6347

10.4340

0

0.0535

35

4.4907

10.434

1

0.0535

36

4.4695

10.4340

0

0.0535

the number of parameters is up to four. Further selection of models was based on higher multiple correlation coefficient (R) or variance (R2), minimum inter-correlation between the descriptors found in the same model coupled with high Fischer ratio values (F) and low standard deviation values (S). Another statistical parameter used in the study was the Kubinyi function (FIT). The Kubinyi function (FIT) is a statistical parameter that closely relates to the Fisher ratio (F), but avoids the main disadvantage of the latter that is too sensitive to changes in small d values, and poorly sensitive to changes in large d values. The FIT criterion has a low sensitivity to changes in small d values and a substantially increasing sensitivity for large d values. The greater the FIT value the better the linear equation. Table 4.

K Means Clustering of the Dataset

Cluster no

Compound no

1

1, 2, 3, 4, 5, 6, 7, 19, 21, 34

2

8, 9, 10, 12, 13, 17, 33

3

22, 26, 27, 31, 32, 36

4

11, 14, 15, 16, 18, 30, 35

5

20, 23, 24, 25, 28, 29

The application of ERM to the training set of 31 compounds resulted in the following model pIC50 = 0.6296(±0.2052) KierA3 - 0.2337(±0.0322) apol + 1.0626(±0.4327) I (R2) - 35.6873(±5.7665) SPol (R4) + 4.0770(±0.9217)

(1)

n=31, R=0.8814, R2=0.7769, S=0.6361, R2adj=0.7426, F=22.63, FIT=1.9264, PRESS= 13.807, R2 cv= 0.7072

Here KierA3 is the third alpha modified shape index for the molecules, apol is the polarizability of substituent at R3, I

(R2) is a dummy parameter which has been assigned value of unity if the substituent at R2 is having a hydrogen bond acceptor while for rest it has been assigned value of zero, SPol (R4) is the specific polarizability of substituent at R4. This equation explains only 77.69% variation in anti-tumor activity of farnesyltransferase inhibitors. The standard deviation associated with the equation is quite high; other statistical parameters associated with the equation are also indicating the poor quality of prediction achieved with this equation. The reason for high standard deviation can be the presence of outliers. Close inspection of residuals suggested that compound 7, 23 and 25 were outliers. Surprisingly, the structure of these molecules does not present significant differences with the rest of the molecules belonging to the training set. Since prediction from any QSAR model cannot be intrinsically better than the experimental data employed to develop the model, and the quality of the input data will greatly influence the performance of the QSAR model, molecule 7, 23 and 25 were taken out of the training set. This resulting 28-molecule training set was then used to calculate the best QSAR models with a value of d4. Among numerous models developed by the best one, selected on the basis of the parameters already discussed, turns out to be pIC50 = 0.3058(±0.1334) KierA3 - 0.2398(±0.0189) apol + 1.0775(±0.2531) I (R2) - 36.4316(±3.3749) SPol (R4) + 5.7817(±0.6128)

(2)

n=28, R=0.9637, R2=0.9287, S=0.3715, R2adj=0.9163, F=74.95, FIT=6.8087 PRESS= 4.9644, R2cv= 0.8886

The robustness of this equation is justified by a better R2cv which is as high as 0.8886 and the predicted activities of the test set of compounds (Table 5). According to the Tropsha group [19] a QSAR model is considered predictive, if the following conditions are satisfied:

Exploring the Structure Activity Relationships of Imidazole Containing

R2cv > 0.5 2

R > 0.6 R2-R02/R2 or R2-R’02/R2 is less than 0.1 k or k’ is close to 1 Where R2 is the coefficient of determination between experimental values and model prediction on the training set. Mathematical definitions of R02, R’02, k, and k’ are based on regression of the observed activities against the predicted activities and vice versa (regression of the predicted activities against observed activities). The definitions are presented clearly in literature [19] and are not repeated here. The proposed model passes all these tests related to predictive ability. Table 5.

Letters in Drug Design & Discovery, 2011, Vol. 8, No. 5

5

using equation 2 for training and test set is shown in Figs. (1) and (2). The correlation matrix (Table 7) between various independent variables used in this analysis, demonstrates very poor interrelationship among the four parameters appearing in the equation 2, thus confirming the selection of these parameters in the model. Table 8 shows the predicted pIC50 given by Equation 2 for the training and test sets. The behavior of the residuals in terms of the predictions illustrated in Fig. (3) shows normal distributions.

Composition of Training and Test Sets Compound no

Training set

1, 2, 3, 4, 5, 7, 8, 9, 10, 11, 12, 14, 15, 16, 17, 18, 19, 20, 21, 23, 24, 25, 26, 27, 29, 30, 31, 32, 33, 34, 36

Test set

6, 13, 22, 28, 35

R2cv = 0.8886 > 0.5

Fig. (1). Graphical representation of comparison between experimental and calculated activities using equation 2 for training set.

R2=0.9287 > 0.6 R2-R02/R2 = 0.007< 0.1 k =0.99 With the purpose of demonstrating that equation 2 does not result from happenstance, we resort to a widely used approach to further establish the model robustness: the socalled Y-scrambling. It consists of scrambling the experimental property in such a way that activities do not correspond to the respective compounds and is discussed in experimental section. After analyzing 100 cases of Yscrambling, the highest values of R2 = 0.3630 obtained from this process resulted to be considerably greater than the one corresponding to the true calibration R2 = 0.9287 (Table 6). This result suggests that the model is robust, calibrations are not fortuitous correlations and a reliable structure–activity relationship have been derived. A graphical representation of comparison between experimental and calculated activities Table 6.

Fig. (2). Graphical representation of comparison between experimental and calculated activities using equation 2 for test.

Y-Scrambling Data for Equation 2 Table 7.

Number of Scrambling

R2

R2cv

1

0.3630

-0.2873

2

0.2245

-0.3536

3

0.2687

-0.1673

4

0.2984

-0.4535

5

0.2435

-0.4231

6

0.2098

0.2122

7

0.2675

0.1625

8

0.3425

-0.2523

9

0.2836

-0.3230

10

0.2983

-0.3728

Correlation Matrix for Various Descriptors Present in the Equation 2 with Biological Activity pIC50

KierA3

Apol

I(R2)

pIC50

1.0000

KierA3

0.3210

1.0000

Apol

-0.1835

-0.4479

1.0000

I(R2)

0.3281

-0.3653

0.1557

1.0000

SPol (R4)

-0.2188

-0.6329

0.4789

-0.2342

SPol (R4)

1.0000

Equation 2 clearly demonstrates the positive contribution of the KierA3 [20]. The third alpha modified shape index encodes degree of centrality of branching, thus higher and lower values of KierA3 indicate branching at the ends and at

6

Letters in Drug Design & Discovery, 2011, Vol. 8, No. 5

Gaurav et al.

the centre of long chain fragment of the molecule respectively. Therefore equation 2 indicates that introduction of branching at the ends of the tetrahydrobenzodiazepine molecules will contribute favorably to farnesyltransferase inhibitory activity. Table 8.

Observed, Predicted pIC50 and Residuals for Equation 2

Compound no.

pIC50

Residuals

Observed

Predicted

1

6.4685

5.9977

0.4708

2

6.2291

6.1041

0.125

3

6.0757

5.9563

0.1194

4

5.9586

6.2093

-0.2507

5

5.8861

6.1303

-0.2442

6

6.3372

5.8919

0.4453

7

4.6778

5.7500

-1.0722

8

6.4949

6.1041

0.3908

9

6.3188

5.9187

0.4001

10

6.2218

6.2392

-0.0174

11

6.1549

6.3938

-0.2389

12

6.1427

6.1131

0.0296

13

5.5528

5.8919

-0.3391

14

5.9208

6.1843

-0.2635

15

4.7423

4.5977

0.1446

16

4.5452

4.0207

0.5245

17

4.4498

4.6257

-0.1759

18

3.8386

3.8946

-0.056

19

3.7747

4.6637

-0.889

20

3.7520

3.8946

-0.1426

21

3.7305

3.6746

0.0559

22

4.8125

3.8946

0.9179

23

3.6421

5.2880

-1.6459

24

3.6038

3.1468

0.457

25

3.4559

5.9250

-2.4691

26

3.2676

3.8541

-0.5865

27

4.6198

4.3166

0.3032

28

4.0223

3.7317

0.2906

29

4.2366

3.8792

0.3574

30

4.0410

3.9475

0.0935

31

3.9066

4.0741

-0.1675

32

3.3706

3.8105

-0.4399

33

3.1537

2.7771

0.3766

34

2.8468

3.0536

-0.2068

35

3.6882

3.7813

-0.0931

36

2.5275

2.6973

-0.1698

Fig. (3). Dispersion plot of residuals for equation 2.

Polarizability of a substituent is directly related to its bulk, bulkier substituent having higher polarizability [21]. Negative coefficient of apol in equation 2 demonstrates that polarizability of R3 substituent contributes negatively to biological activity, thus indicating that bulkier substituent at R3 position may cause steric hindrance with the active site and are not favorable for biological activity. It is therefore desirable to have less bulky substituent at R3 position to get increased biological activity. The parameter I (R2) in equation 2 is preceded by a positive coefficient. This clearly indicates that a molecule having a hydrogen bond acceptor group in R2 may have comparatively higher biological activity. The low activity of compounds 33, 34, and 36, which lack any hydrogen bond acceptor group in the substituent R2, is also in support of this finding. The hydrogen bond acceptor present in the substituent at R2 may form hydrogen bond with the active site residues, thus it is favorable to have a hydrogen bond acceptor in the substituent at R2. Specific polarizability of a molecule, the molecular polarizability divided by the molecular volume, indicates the tendency of a molecule or a substituent to undergo temporary polarization under influence of an external electric field. It plays an important role in drug receptor interaction through van der Waals interactions. The parameter SPol (R4 ) in Equation 2 is preceded by a negative coefficient indicating that the specific polarizability of R4 contributes negatively to the biological activity. Thus placing substituent with lower specific polarizability at R4 may lead to more active molecules. The standardization of the regression coefficients of equation 2, allows assigning a greater importance to the molecular descriptors that exhibit larger absolute standardized coefficients. This result is given as follows, with standardized coefficients shown in parentheses: Spol(R4) (36.4316) > KierA3 (0.3058) > apol (0.2398) > I (R2) (1.0775) .......3 From this inequality it is deduced that Spol(R4) is the most relevant variable for the present set of compounds. As already discussed a compound would tend to exhibit a relatively higher binding affinity for lower numerical values of Spol(R4) and apol , and higher values of KierA3 and I(R2). Of course, mixing effects among the four variables would

Exploring the Structure Activity Relationships of Imidazole Containing

also lead to a high estimated potency for the compounds. The accomplishment of the general tendency of descriptor’s importance as given by equation 3 can be checked from the numerical values taken by the variables in Table 3 and by the predictions given in Table 8. EXPERIMENTAL SECTION Data Series A series of imidazole containing tetrahydrobenzodiazepine derivatives, with reported activities, were used for the present study [22]. The experimental farnesyltransferase inhibitory activity of the data set reported in literature was measured on the purified recombinant human farnesyltransferase (hFT). Table 1 summarizes the molecular structures and -log IC50 (pIC50) of the above mentioned imidazole containing tetrahydrobenzodiazepines. Molecular Descriptors Molecular modeling was performed on MOE 2008.10 for the purpose of calculating various parameters to be used in QSAR model development [23]. The 2D builder module of MOE 2008.10 was used to draw structures of the compounds. In order to identify the biologically active conformation the structures were subjected to conformational analysis followed by energy minimization till a RMS gradient of 0.001. The molecular descriptors were computed using the QuaSAR module of MOE 2008.10 and parameters of all types were calculated such as topological indices, structural keys, E-state indices, physical properties (such as LogP, molecular weight and molar refractivity), topological polar surface area (TPSA), VSA descriptors, BCUT descriptors, aromaticity indices, randic indices etc. In addition some descriptors not provided by the program MOE 2008.10 were added to the descriptors pool using the program Bioclipse [24]. Selection of Training and Test Set The main intention of any QSAR modeling is that the developed model should be robust enough to be capable of making accurate and reliable predictions of biological activities of new compounds. Thus developed QSAR models should be validated using new compounds for checking their predictive capacity. For most of the cases, appropriate external data set is not available for prediction purpose. That is why the original data set is divided into training and test sets. A model’s predictive accuracy and confidence for different unknown compounds vary according to how well the training set represents the unknown compounds and how robust the model is in extrapolating beyond the chemistry space defined by the training set. So, the selection of the training set is significantly important in QSAR analysis. Predictive potential of a model on the new data set is influenced by the similarity of chemical nature between training set and test set [25-27]. The test set molecules will be predicted well when these molecules are very similar to the training set compounds. The reason is that the model has represented all features common to the training set molecules.

Letters in Drug Design & Discovery, 2011, Vol. 8, No. 5

7

There are different techniques available for division of the data set into training and test sets like statistical molecular design, self-organizing map, clustering, KennardStone selection, sphere exclusion, etc. [28]. In the present case we have used clustering technique as the method for training set selection. Cluster analysis [29] is a technique to arrange the objects into groups. This method divides different objects into groups in such a way that the degree of association between two objects is maximum if they belong to the same group and otherwise minimum. There are two types of clustering: i) hierarchical clustering and ii) nonhierarchical clustering. One of the important nonhierarchical techniques is K-means clustering [30] which has been used in the present study. In this method clusters are started randomly and then their means are calculated in descriptor space. Molecules are reassigned to clusters whose means are closer to the position of molecules. After clustering, the test set compounds are selected from each cluster because both test set and training set can represent all clusters and characteristics of the whole dataset. In our study the whole data set was divided into training and test sets based on K-means clustering and the models developed the training set were externally validated using test set. All molecules with standardized descriptors were classified into five clusters based on K-means clustering. Serial numbers of compounds under different clusters are shown in Table 3. From each of these five clusters one compound was selected for test set and the remaining for training set. QSAR Model Development At present, there are thousands of descriptors available in the literature, and one has to select those that characterize the property or activity under consideration in the most reliable and efficient way. One thus faces the mathematical problem of selecting subset of descriptors (d) from a much larger set (D). The search for the optimal set of descriptors may be guided by the minimization or maximization of a chosen function; for example, we may be interested in a model that makes the standard deviation (S) as small as possible. A full search (FS) of the optimal variables is impractical because it requires very large number of linear regressions. We consider that the linear algorithms are most convenient for analyzing QSAR data sets for two main reasons: (i) they exhibit a higher predictive capability and perform more efficiently on external test sets not considered during the model calibration; and (ii) when few experimental observations are available, it is necessary to employ the lowest number of optimized parameters during the model development, a condition that linear models fulfill. Some time ago the replacement method (RM) [31-33] and later the enhanced replacement method (ERM) [34] were proposed that produced linear regression QSAR models quite close to the FS ones with much less computational work. Both these techniques approach the minimum of S by judiciously taking into account the relative errors of the coefficients of the least-squares model given by a subset of descriptors. The RM produced models with better statistical parameters than the forward stepwise regression procedure [35] and variants of the more elaborated genetic algorithms [36]. The ERM lead to similar or even better statistical parameters although with slightly more computational work [37]. The RM is a

8

Letters in Drug Design & Discovery, 2011, Vol. 8, No. 5

rapidly convergent iterative algorithm that produces linear regression models with small S in remarkably short computer times [38-41]. However, in some difficult cases, the RM can get trapped in a local minimum of S that is not able to leave without some kind of additional constraint. The ERM follows the same philosophy but is less likely caught into local minima as well as less dependent on the initial solution. It has a close resemblance with the simulated annealing (SA), which is an adaptation of the Metropolis- Hastings algorithm, a Monte Carlo method, [42] and generates sample states of a thermodynamic system. QSAR Model Validation QSAR model validation becomes an essential part to understand statistically robust models capable of making accurate and reliable predictions of biological activities of new compounds not present in the data set. The following approaches have been used to validate the QSAR Equations. Internal Validation

Gaurav et al.

compared with the model obtained for the actual activity values. More precisely, Y-scrambling is done as follows: a) for the training set, on which the given model was developed, the descriptor data is left as it is while the activity data is randomly shuffled to change its true order. Thus, though the values (and the statistical distribution) stay the same, their position against the appropriate compound and its descriptor(s) is now altered thus destroying any meaningful relation that may have existed between independent variables and the dependent variable. b) next, a new QSAR model is obtained for such permuted data and metrics like R2 and R2cv are noted for the fitted model. c) steps 1 and 2 are done for sufficient number of iterations, a good number being 50 to 100. d) values obtained in the above fashion are compared with the true values obtained for the model that was fitted on the real data. True values should lie much outside such a background reference distribution for one to confidently say that there exists a real model on the given data (the model that was originally learned) and that it is not the same as models that were learned by chance (the models learned through Y-scrambling) [45-46].

1. Fraction of the Variance (R2)

6. Lack of Over-Fitting

It is believed that the closer the value of R2 to unity, the better the QSAR model. QSAR models were selected according to R2 values. According to the literature, the predictive QSAR model must have R2>0.6. [43-44]

A model over-fits if it includes more descriptors than required. The lack of over-fitting for all the QSAR models was confirmed by using the following conditions: a)

Number of data points/Number of descriptors  4

2. Cross-Validation Test

b)

All the QSAR models were checked for their correlation with fewer descriptors than that of the original. None of them was found to be statistically significant.

The robustness, stability and predictive power of QSAR equations were cross validated from the PRedictive Error Sum of Squares (PRESS) and r2cv obtained by using leave one out (LOO) process. Leave-one-out (LOO) technique is simplest cross validation procedure, where each compound is removed, one at a time. For given n compounds, n reduced models are calculated, each of these models is developed with the remaining n-1 compounds and used to predict the response of the deleted compound. The model predictive power is then determined from PRESS and R2cv calculated using following equation.

External Validation The QSAR models were validated by prediction on the test set compounds. The predictive capacity of these models is judged from their predictive R2 (R2pred) values, which were calculated by the following equation: R2pred=1- (Ypred(test)-Ytest)2/(Ytest-Ytraining)2

R2cv = (S-PRESS/S) Where S is the sum of squared deviation for each activity from the mean. PRESS is the sum of the squared difference between the actual and the predicted values when the compound is omitted from the fitting process. The model with high R2cv value is said to have high predictivity. This technique is particularly important as this deletion scheme is unique and the predictive ability of the different models can be compared accurately. 3. Standard Deviation (S) The smaller the value of S, the better the QSAR model. 4. Fischer Statistics (F) The larger the value of F, the greater the probability that the QSAR equation is significant.

CONCLUSIONS The QSAR analysis of the series of imidazole containing tetrahydrobenzodiazepine derivatives reveals that anti-tumor activity of this class of compounds is greatly influenced by the functional groups attached to the different positions of the tetrahydrobenzodiazepine skeleton and their topological and physicochemical properties including third alpha modified shape index, polarizability etc. The study provides predictive models for farnesyltransferase inhibition as well as structural insights into the development of newer farnesyltransferase inhibitors as potent anti-tumor agents. These efforts will guide synthetic medicinal chemist to design new compounds with improved biological activity.

5, Y-Scrambling

ACKNOWLEDGEMENT

To guard against the possibility of having chance models, the method of Y-scrambling was used. In this method, models are fitted for randomly reordered activity values and

We thank Chemical Computing Group for providing MOE 2008.10 evaluation license.

Exploring the Structure Activity Relationships of Imidazole Containing

Letters in Drug Design & Discovery, 2011, Vol. 8, No. 5

REFERENCES

[20]

[1]

[21]

[2]

[3] [4]

[5] [6]

[7]

[8] [9] [10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

McCornick, K. G. Signaling networks that cause cancer. T.I.B.S., 1999, 24, 56. Barbacid, M. Ras gene mutations and expression of ras signal transduction mediators in gastric adenocarcinomas Ras Genes. Annu. Rev. Biochem., 1987, 56, 779. Bos, J. L. Ras Oncogenes in Human Cancer. Cancer Res., 1989, 49, 4682-4689. Huang, C.C.; Casey, P. J.; Fierke, C. A. Evidence for a Catalytic Role of Zinc in Protein Farnesyltransferase. J. Biol. Chem., 1997, 20-23 272. Sebti, S.M.; Adjei, A.A. Farnesyltransferase inhibitors. Semin Oncol., 2004, 31, 28-39. Cutsem, E.V.; Velde, H.V.D.; Karasek, P.; Oettle, H.; Vervenne, W.L.; Szawlowski, A.; Schoffski, P.; Post, S.; Verslype, C.; Neumann, H.; Safran, H.; Humblet, Y.; RuixoMa, Y.; Hoff, D.V. Lately, It Occurs to Me What a Long, Strange Trip It's Been for the Farnesyltransferase Inhibitors J. Clin. Oncol., 2004, 22, 1430. Kiyoko, K.; Cox, A. D.; Hisaka, M. M.; Graham, S. M.; Buss, J. E.; Der, C. J. Isoprenoid Addition to Ras Protein is the Critical Modification for its Membrane Association and Transforming Activity. Proc. Natl. Acad. Sci., 1992, 89, 6403-6407. Leonard, D.M. Ras farnesyltransferase a new therapeutic target. J. Med. Chem., 1997, 40, 2971. Cox, A.D.; Der, C.J. Ras family signaling therapeutic targeting. Cancer Biol Ther., 2002, 1, 599. Chakrabarti, D.; Da Silva, T.; Barger, J.; Paquette, S.; Patel H.; Patterson, S.; Allen, C.M. Protein Farnesyltransferase and protein prenylation in Plasmodium falciparum. J.Biol. Chem., 2002, 277, 42066. Chakrabarti, D.; Azam, T.; Delvecchio, C.; Qui, L.; Park, Y.I.; Allen, C.M. Proteinprenyltransferase activities of plasmodium falciparum. Biochem. Parasitol., 1998, 94, 175. Nallan, L.; Kevin, D.; Bendale, P.; Rivas, K.; Yokoyama, K.; Horney, C. P.; Pendyala, P.R.; Floyd, D.; Lombardo, L.J.; Williams, D.K.; Hamilton, A.; Sebti, S.; Windsor, W.T.; Weber, P.C.; Buckner, F.S.; Chakrabarti, D.; Gelb, M.H.; Voorhis W.C.V. Protein Farnesyltransferase Inhibitors Exhibit Potent Antimalarial Activity. J. Med. Chem., 2005, 48, 3704-3713. Glenn, M.P.; Chang, S.Y.; Horney, C.; Rivas, K.; Yokoyama, K.; Pusateri, E.E.; Fletcher, S.; Cummings, C.G.; Buckner, F.S.; Pendyala, P.R.; Chakrabarti, D.; Sebti, S.M.; Gelb, M.; Voorhis, W.C.V.; Hamilton, A.D. Structurally Simple, Potent, Plasmodium Selective Farnesyltransferase Inhibitors That Arrest the Growth of Malaria Parasites. J. Med. Chem., 2006, 49, 5710-5727. Sebti, S.M.; Hamilton, A.D.; Moffitt, H.L. Farnesyltransferase inhibitors in cancer therapy. Humana Press, Totowa N.J., 2001, 8799. Cox, A. D.; Der, C. J. Farnesyltransferase inhibitors and cancer treatment targeting simply Ras. Biochim. Biophys. Acta., 1997, 1333, 51-71. Gaurav, A.; Yadav, M.R.; Giridhar, R.; Gautam, V.; Singh, R.; QSAR studies on 4-quinolones as high affinity ligands at the benzodiazepine site of brain GABAA receptors. Med. Chem., 2009, 5, 353-358. Gaurav, A.; Yadav, M.R.; Giridhar, R.; Gautam, V.; Singh, R.; 3DQSAR studies on 4-quinolones as high affinity ligands at the benzodiazepine site of brain GABAA receptors. Med. Chem., 2010, 10, 9306-5. Ding, C.Z.; Batorsky, R.; Bhide, R.; Chao, H.J.; Cho, Y.; Chong, S.; Gullo-Brown, J.; Guo, P.; Kim, S.H.; Lee, F.; Lefttheris, K.; Miller, A.; Mitt, T.; Patel, M.; Penhallow, B.A.; Ricca, C.; Rose, W.C.; Schmidt, R.; Slusarchyk, W.A.; Vite, G.; Yan, N.; Manne, V.; Hunt, J.T. Discovery and Structure-Activity Relationships of Imidazole-Containing Tetrahydrobenzodiazepine Inhibitors of Farnesyltransferase. J. Med Chem., 1999, 42, 5241. Tropsha, A.; Gramatica, P.; Gombar, V.K. The importance of being earnest Validation is the absolute essential for successful application and interpretation of QSPR models. QSAR Comb. Sci., 2003, 22, 1-9.

[22] [23]

[24]

[25] [26] [27]

[28] [29]

[30]

[31]

[32]

[33]

[34] [35]

[36]

[37]

[38]

[39]

[40]

[41]

[42]

9

Kier, L.B., Shape indexes of orders one and three from molecular graphs. Quant. Struct.-Act. Relat., 1986, 4, 109-116. Todeschini, R.; Consonni, V. Handbook of Molecular Descriptors. Wiley-VCH, Weinheim, Germany. 2000, 138-142. MOE is a molecular modeling package developed by Chemical Computing Group Inc., Canada. S.; Jonathan, A.; Arvid, B.; Martin, E.; Stefan, K.; Carl, M.; Gilleain, T.; Johannes, W.; Egon, L. W.; Christoph, S.; Jarl, E. S. W. A scriptable integration platform for the life sciences. BMC Bioinformatics., 2009, 10.1186/1471-2105-10 397. Eriksson, L.; Jaworska, J.; Worth, A. P.; Cronin, M. T. D.; McDowell, R. M.; Gramatica, P. Method for reability, uncertainty assessment and applicability evalutation of classification and regression based QSARs. Environ. Health Perspect. 2003, 111, 1361-1375. Guha, R.; Jurs, P. C. Determining the Validity of a QSAR Model A Classification Approach. J. Chem. Inf. Model. 2005, 45, 65-73. Leonard, J.T.; Roy, K. "On selection of training and test sets for the development of predictive. QSAR Comb. Sci. 2006, 25(3), 235-251. Roy, K. On some aspects of validation of predictive quantitative structure-activity relationship models. Expert Opin. Drug Discov. 2007, 2, 1567-1577. Everitt, B.; Landau, S.; Leese, M. Cluster analysis Arnold London, 2001. Dougherty, E. R.; Barrera, J.; Brun, M.; Kim, S.; Cesar, R. M.; Chen, Y.; Bittner, M.; Trent, J. M. Inference from clustering with application to gene-expression microarrays J. Comput. Biol,. 2002, 9, 105-126. Duchowicz, P.R.; Castro, E.A.; Fernández, F.M. QSPR Studies on Aqueous Solubilities of Drug-Like Compounds Commun. Math. Comput. Chem., 2006, 55, 179-192. Duchowicz, P.R.; Fernández, M.; Caballero, J.; Castro, E.A.; Fernández, F.M. Replacement Method and Enhanced Replacement Method versus the Genetic Algorithm Approach for the Selection of Molecular Descriptors in QSPR/QSAR Theories. Bioorg. Med. Chem., 2006, 14, 5876-5889. Helguera, A. M.; Duchowicz, P. R.; Pérez, M. A. C.; Castro, E. A. Cordeiro, M. N. D. S.; González, M.P. Application of the Replacement Method as Novel Variable Selection Strategy in QSAR. 1. Carcinogenic Potential. Chemom. Intell. Lab. Syst. 2006, 81, 180-187. Mercader, A. G.; Duchowicz, P. R.; Fernandez, F. M.; Castro, E. A. QSAR prediction of inhibition of aldose reductase for flavonoids. Chemometr. Intel. Lab. Syst. 2008, 92, 138. Draper, N. R.; Smith, H. Applied Regression Analysis, John Wiley & Sons: New York, 1981. So, S. S.; Karplus, M. Evolutionary Optimization in Quantitative Structure-Activity Relationship an Application of Genetic Neural Networks. J. Med. Chem. 1996, 39, 1521-1530. Mercader, A. G.; Duchowicz, P. R.; Ferna´ndez, F. M.; Castro, E. A.; Bennardi, D. O.; Autino, J. C.; Romanelli, G. P. QSAR study of carboxylic acid derivatives as HIV-1 Integrase inhibitors Bioorg. Med. Chem. 2008, 16, 7470-7476. Duchowicz, P. R.; Vitale, M. G.; Castro, E. A.; Fernandez, M.; Caballero, J. QSAR Analysis for heterocyclic antifungals. Bioorg. Med. Chem. 2007, 15, 2680-2689. Duchowicz, P. R.; Talevi, A.; Bruno-Blanch, L. E.; Castro, E. A. New QSPR Study for the Prediction of Aqueous Solubility of Drug-Like Compounds. Bioorg. Med. Chem. 2008, 16, 7944-7955. Duchowicz, P. R.; Goodarzi, M.; Ocsachoque, M. A.; Romanelli, G. P.; Ortiz, E. V.; Autino, J. C.; Bennardi, D. O.; Ruiz, D.; Castro, E. A. QSAR Analysis on Spodoptera litura Antifeedant Activities for Flavone Derivatives Sci. Total EnViron. 2009, 408, 277-285. Goodarzi, M.; Duchowicz, P. R.; Wu, C. H.; Ferna´ndez, F. M.; Castro, E. A. New Hybrid Genetic Based Support Vector Regression as QSAR Approach for Analyzing Flavonoids-GABA Complexes. J. Chem. Inf. Model. 2009, 49, 1475-1485. Metropolis, N.; Rosenbluth, A. W.; Rosenbluth, M. N.; Teller, A. H.; Teller, E. Equation of State Calculations by Fast Computing Machines J. Chem. Phys. 1953, 21, 1087-1092. Golbraikh, A.; Tropsha, Beware of q2. J. Mol. Graphics Modell. 2002, 20, 269-276.

10 Letters in Drug Design & Discovery, 2011, Vol. 8, No. 5 [43] [44]

Tropsha, J.; Gramatica, P.; Gombar, V. K. Environment Health Perspectives. QSAR Comb. Sci. 2003, 22, 69-77. Selassie, D.; Kapur, S.; Verma, R. P.; Rosario, M. Development of QSAR models using C-QSAR program: a regression program that has dual databases of over 21,000 QSAR models. J. Med. Chem. 2005, 48, 7234-7242

Gaurav et al. [45]

[46]

Wold, S.; Eriksson, L. In Chemometrics Methods in Molecular Design vande Waterbeemd, H Ed. Wiley-VCH Weinheim. 1995, 309-318. Melzig, M. F.; Tran, G. D.; Henke, K.; Selassie, C. D.; Verma, R. P. Pharmazie.,cytotoxic of organic compound angaint origin cancer cell aquantitative structure activity relationship study. 2005, 60, 869-873.

Suggest Documents