Revue Roumaine de Chimie, 2006, 51(7-8), 703–717
Dedicated to the memory of Professor Mircea D. Banciu (1941–2005)
QSAR MODELING OF CARBONIC ANHYDRASE-I, -II AND -IV INHIBITORY ACTIVITIES: RELATIVE CORRELATION POTENTIAL OF SIX TOPOLOGICAL INDICES Padmakar V. KHADIKAR,a* Brian W. CLARE,b Alexandru T. BALABAN,c* Claudiu T. SUPURAN,d* Vijay K. AGARWAL,e Jyoti SINGH,e Ashok K. JOSHIf and Meenakshi LAKHWANI g a
b
Research Division, Laxmi Fumigation and Pest Control, Pvt. Ltd., 3, Khatipura, Indore 452007, India, e-mail:
[email protected] School of Biomedical and Chemical Sciences, University of Western Australia,35, Stirling Highway Crawltywa 6009, Australia, e-mail:
[email protected] c Texas A&M University at Galveston, 5007 Avenue U, Galveston, TX, 77551, USA, e-mail:
[email protected] d Laboratorio di Chimica Bioinorganica, Departmento di Chimica, University of Florence, Via della Lastruccia, 3, Polo Scientifico, 50019 Sesto Fioventino, Firenze, Italy. e Department of Chemistry, A.P.S. University, Rewa, 486 003, India, e-mail: vijay-
[email protected] f Upzon Drugs Pvt. Ld, MPLUN Plot No,32, Sector ‘F’ Sanwer Road, Indore 452015, India. g Department of Chemistry, Holkar Model & Autonomous College, Indore, India.
Received December 9, 2005
QSAR studies on modeling the biological activities of 26 benzenesulfonamide derivatives as inhibitors of carbonic anhydrases CA-I, CA-II and CA-IV were performed using six distance-based topological indices, including Balaban-type indices JhetV and JhetE. Satisfactory multiparametric correlations were obtained.
INTRODUCTION1 Carbonic anhydrases (CAs) are important enzymes found in red blood cells, gastric mucosa, pancreatic cells, and renal tubules. The physiological and physio-pathological processes in which carbonic anhydrases are involved were thoroughly investigated due to pharmacological applications of their inhibitors, chiefly sulfonamides.1-3 It has been shown that such inhibitors are important clinical agents.4 As a result a large number of aromatic and heterocyclic sulfonamides were synthesized and tested for their CA inhibitory potential.5-20 Among these carbonic anhydrases, benzenesulfonamides have attracted much attention.5-20 One of the authors [CTS] has published extensive work on such inhibitors,2-4,14-25 and has reported CA inhibition data K1 (nmol) for hCAI, hCAII and bCAIV. However, till now, few QSAR (quantitative structure activity relationship) studies employing distance- based topological indices have been reported. This is, one of the objectives of present study. It is worth mentioning that QSAR methodology is very useful in screening a large library of possible drug candidates for selectivity and potency, arriving at models that correlate molecular structure to bioactivity activity.4-20 A current interest in predictive QSAR is the estimation of biological activities of organic compounds acting as drugs from their calculated structural parameters. Molecular structure is encoded through numerical descriptors, which correspond to topological, geometrical, chemical, or electronic structural features. During the last decades, QSAR modeling based on topological (graph-theoretical) indices has undergone an explosive growth due to rapid progress in chemical graph theory and to advances of computer technology. Looking to the potential of QSAR methodology, we have recently investigated the relative activity of carbonic anhydrase inhibitors.21-36 The parameters used in these studies were chiefly distance-based topological indices. In a few cases, information-theoretic indices were also used.21-41 Having noted that in the *Authors for correspondence: Phone + 91-731-531906 (PVK): +1-409-741-4313 (ATB); +39-055-4573005 (CTS)
704
Padmakar V. Khadikar et al.
literature there are only a few QSAR studies based on the Balaban index (J),42 and its extensions for taking into account the presence of heteroatoms and/or multiple bonds,43-45 we decided to undertake the present study for investigating the relative potential of Balaban-type indices for modeling the CA inhibitory activities of the set of compounds indicated in Table 1. Table 1 Structural details of carbonic anhydrase inhibitors used in present investigation SO 2-NH2
SO2-NH2
SO 2-NH2 NH2
1
NH2
NH2
SO 2-NH2
SO 2-NH2
NH-NH2
2
CH2-NH 2
4
SO 2-NH2
CH2-CH2-NH2
5
Br
Cl
9
NH2
NH2
NH2
7
8
SO 2-NH2
SO2-NH2
SO2-NH2 Cl
H2N
I
N
10
N
N
H3C
SO2-NH2
S
N
O H3C
NH
S
13
HN
N
H2N-CH2-CH2-OC-HN
SO 2-NH2
S
N SO 2-NH2
S
S
H2N
16
14
N
N
S
H2N
S
18
O
17 N
O
NH
SO 2-NH2
SO 2-NH2
NH
H2N
N
O
19
20 SO2-NH2
N
N
SO 2-NH2
SO 2-NH2 S
HO-H2C-H2C-O
22
S
21
N NH2
HO
SO 2-NH2
NH
O
SO2-NH2
SO 2-NH 2
S
S
23 CH2-OH
SO2-NH2
CH2-CH2-OH
25
SO 2-NH2
S
O
N
O
N
15
H3C
O
H2N
12
NH2
11
Cl
N
H2N
SO2-NH2
SO 2-NH2
Cl
NH2
6
SO 2-NH2
SO2-NH2
F
3
SO 2-NH2
24
QSAR modeling of carbonic anhydrase
705
STRUCTURES AND MOLECULAR DESCRIPTORS Structural details of 25 carbonic anhydrase sulfonamide inhibitors used in the present study are presented in Table 1. The CA inhibitory data (Ki, nmol) reported by one of us (CTS) against isozymes I, II and IV were used by converting them into their log units.13 Distance-based molecular descriptors selected for the present study are topostructural indices, namely Balaban (J),42 Wiener (W),46 Szeged (Sz),47-49 first order Randić connectivity (1χ),59,51 and topochemical indices. Several such distance-based Balaban-type topochemical indices have been described by accounting for heteroatoms via their atomic number (JhetZ),52 atomic weight (JhetM), atomic radius (JhetV), electronegativity (JhetE), and polarizability (JhetP). Table 2 contains the values of molecular descriptors for the 25 sulfonamides from Table 1. Table 2 Various topological descriptors used in the present study and their values* Comp.
W
J
JhetZ
JhetM
JhetV
JhetE
JhetP
1
144
2.545
4.788
4.789
3.092
3.614
2
148
2.461
4.585
4.586
3.015
3.504
3
152
2.394
4.425
4.426
2.953
4
201
2.359
4.121
4.122
5
201
2.359
4.008
4.009
6
262
2.305
3.632
7
189
2.512
4.526
9
189
2.512
10
189
11
189
12 13
1
χ
Sz
3.531
5.016
220
3.430
4.999
228
3.415
3.348
4.999
236
2.695
3.331
2.884
5.537
306
2.895
3.257
3.216
5.537
306
3.633
2.791
3.072
3.046
6.037
388
4.540
2.878
3.592
3.083
5.410
291
4.645
4.652
3.101
3.568
3.509
5.410
291
2.512
4.720
4.730
3.150
3.553
3.562
5.410
291
2.512
4.745
4.754
3.178
3.523
3.623
5.410
291
458
2.991
5.883
5.894
3.629
4.203
4.260
7.459
668
399
2.853
5.633
5.640
3.456
4.014
4.073
7.032
582
14
113
2.449
6.035
6.039
2.416
3.271
2.913
4.499
132
15
146
2.538
5.769
5.772
2.360
3.437
2.654
4.910
171
16
403
2.304
3.826
3.827
2.064
2.805
2.189
6.931
452
17
853
1.861
3.779
3.781
1.848
2.401
2.159
9.182
1124
18
948
1.937
3.741
3.743
1.836
2.452
2.093
9.593
1237
19
1004
1.816
3.225
3.226
1.984
2.473
2.205
9.682
1502
20
960
1.900
3.409
3.410
2.049
2.575
2.287
9.682
1414
21
669
1.731
2.651
2.652
1.818
2.419
1.788
8.449
1057
22
287
1.987
3.927
3.929
2.304
2.736
2.648
6.465
430
23
287
1.987
3.953
3.955
2.261
2.749
2.583
6.465
430
24
543
1.856
3.228
3.228
1.816
2.500
1.909
8.003
776
25
201
2.359
4.036
4.036
2.827
3.276
3.120
5.537
306
26
262
2.305
3.651
3.652
2.737
3.086
2.973
6.037
388
* W – Wiener index; J- Balaban distance connectivity index; JhetZ – Balaban-type index from Z-weighted distance matrix (Barysz et al. matrix); JhetM – Balaban-type index from mass weighted distance matrix; JhetV – Balaban-type index from van der Waals weighted distance matrix; JhetE – Balaban-type index from electro negativity weighted distance matrix; JhetP – Balabantype index from polarizability weighted distance matrix; 1χ – First order Randić connectivity index; and Sz – Szeged index.
We have carried out regression analysis 53-55 for modeling the inhibitory activity of sulfonamides against CA-I, CA-II, and CA-IV using the maximum R2 method,53 and the results are discussed below.
706
Padmakar V. Khadikar et al.
RELATIVE POWER OF BALABAN-TYPE INDICES FOR MODELING LOG KI(HCA-I), LOG KI(HCA-II, AND LOG KI(BCA-IV) ACTIVITIES The first step is obviously to investigate uniparametric modeling separately for each descriptor, although the Balaban indices were designed to account for “topological shape” (degree of branching, centricity of branches, and cyclicity), and not for molecular size. Consequently, it was emphasized that for physical-chemical properties or biological activities that depend on molecular size, J and J-type indices should always be used in multiparametric correlations.42-45 As expected, Table 3 shows that in uniparametric correlations the size-independent J and J-type indices lead to poorer correlations (lower values of the correlation coefficient R) than indices W, 1χ and Sz, which increase with graph size. Among the Jhet indices, the best results (comparable to J) are observed for JhetV and JhetE. Table 3 Regression parameters and quality of correlations for uniparametric modeling Parameter W J JhetZ JhetM JhetV JhetE JhetP 1 χ Sz
R -0.812 0.757 0.493 0.493 0.775 0.788 0.712 -0.830 -0.780
CA-I St.err. 0.763 0.853 1.136 1.137 0.826 0.805 0.917 0.729 0.818
F 44.36 30.91 7.38 7.37 34.53 37.60 23.63 50.81 35.68
R -0.721 0.615 0.359 0.359 0.748 0.689 0.690 -0.750 -0.678
CA-II St.err. 0.482 0.548 0.648 0.649 0.461 0.505 0.499 0.460 0.511
F 24.83 13.99 3.40 3.39 29.18 20.63 21.60 29.61 19.58
R -0.653 0.526 0.280 0.279 0.567 0.561 0.502 -0.694 -0.620
CA-IV St.err. 0.641 0.720 0.818 0.819 0.702 0.702 0.737 0.613 0.669
F 17.14 9.25 1.96 1.94 10.91 7.55 7.78 21.25 14.39
The next step was to proceed to biparametric correlations associating one of the W, 1χ and Sz indices to each of the J and J-type indices. In particular, we will concentrate of six topological indices, namely the foremost three Balaban indices (J, JhetV and JhetE) and three distance-based indices (W, 1χ and Sz). In Tables 4 and 5 we present the results for biparametric correlations involving these pairwise associations. It is evident that models 5, 14, and 23 involving JhetV and 1χ (highlighted by boldface characters) have the highest R values and the lowest standard errors. Table 4 Regression parameters for biparametric modeling of log Ki(hCA-I), log Ki(hCA-II), and log Ki(bCA-IV) Model Parameters Regression Coefficients No. used (i) For modeling of log Ki(hCAI) 1. W -0.0025(±0.0006) J 1.5451(±0.5497) 1 2. χ -0.4518(±0.1033) J 1.4770(±0.5207) 3. Sz -0.0016(±0.0004) J 1.6762(±0.5924) 4. W -0.0024(±0.0005) Jhetv 1.0114(±0.3141) 1 5. χ -0.4383(±0.0954) Jhetv 0.9852(±0.2892) 6. Sz -0.0015(±0.0004) Jhetv 1.1144(±0.3167) 7. W -0.0023(±0.0006) Jhete 1.0967(±0.3476) 1 8. χ -0.4237(±0.1023) Jhete 1.0474(±0.3290) 9. Sz -0.0015(±0.0004) Jhete 1.2032(±0.3585)
Constant
St.error
R2A
R
F-ratio
0.6902
0.6692
0.7259
0.8653
32.785
2.8764
0.6378
0.7511
0.8785
37.207
0.3072
0.7158
0.6865
0.8442
27.277
1.5676
0.6432
0.7468
0.8763
36.396
3.6084
0.6031
0.7774
0.8922
42.914
1.2345
0.6687
0.7264
0.8656
32.858
0.7117
0.6473
0.7436
0.8746
35.796
2.7775
0.6167
0.7673
0.8869
40.566
0.2981
0.6798
0.7172
0.8607
31.430
Table 4 (continues)
QSAR modeling of carbonic anhydrase
707 Table 4 (continues)
(ii) For modeling of log Ki(hCAII) 10. W -0.0013(±0.0004) J 0.5229(±0.3891) 1 11. χ -0.2494(±0.0735) J 0.4619(±0.3708) 12. Sz -0.0008(±-0.0003) J 0.6159(±0.4118) 13. W -0.0009(±0.0003) Jhetv 0.6054(±0.2031) 1 14. χ -0.1909(±0.0625) Jhetv 0.5711(±0.1896) 15. Sz -0.0005(±0.0002) Jhetv 0.6638(±0.2029) 16. W -0.0011(±0.0004) Jhete 0.4898(±0.2431) 1 17. χ -0.2161(±0.0722) Jhete 0.4452(±0.2320) 18. Sz -0.0006(±0.0003) Jhete 0.5629(±0.2479) (iii) For modeling of log Ki(bCAIV) 19. W -0.0015(±0.0006) J 0.4832(±0.5320) 1 20. χ -0.3016(±0.1004) J 0.3783(±0.5060) 21. Sz -0.0009(±0.0004) J 0.5694(±0.5525) 22. W -0.0014(±0.0005) Jhetv 0.3842(±0.3116) 1 23. χ -0.2810(±0.0964) Jhetv 0.3352(±0.2924) 24. Sz -0.0008(±0.0004) Jhetv 0.4584(±0.3086) 25. W -0.0015(±0.0006) Jhete 0.3582(±0.3459) 1 26. χ -0.2910(±0.1024) Jhete 0.2846(±0.3292) 27. Sz -0.0009(±0.0004) Jhete 0.4360(±0.3483)
0.8625
0.4737
0.5152
0.7454
13.753
2.1383
0.4541
0.5545
0.7691
15.934
0.5898
0.4975
0.4652
0.7140
11.437
0.3527
0.4159
0.6263
0.8108
21.111
1.3267
0.3954
0.6623
0.8309
24.532
0.1545
0.4283
0.6037
0.7979
19.277
0.4417
0.4527
0.5571
0.7707
16.096
1.5762
0.4349
0.5914
0.7909
18.371
0.1511
0.4700
0.5227
0.7500
14.139
1.7555
0.6476
0.3975
0.6691
8.916
3.3901
0.6197
0.4483
0.7030
10.750
1.5034
0.6676
0.3598
0.6428
7.744
1.8217
0.6380
0.4153
0.6812
9.523
3.2498
0.6096
0.4662
0.7146
11.479
1.5748
0.6516
0.3900
0.6640
8.674
1.7074
0.6442
0.4039
0.6735
9.132
3.2911
0.6172
0.4529
0.7060
10.932
1.4019
0.6604
0.3735
0.6525
8.155
The next steps consist in looking at triparametric correlations involving JhetV and two from the distancebased indices (W, 1χ and Sz). Results are displayed in Table 6 as models 28–33. A slight increase in R and a slight decrease in the standard error may be seen by comparing Tables 4 and 6. A final possible refinement may be obtained by introducing an indicator variable I1 for taking into account the large residuals observed in Table 3 for the calculated versus observed inhibitory activity of CA-I for compounds 21, 22, and 23: the indicator parameter I1 signifies the presence (=1) or absence (=0) of an electron-donating group (NH2 or OR) attached to the benzene ring of a thiazole group. The resulting tetraparametric correlations are presented in the lower part of Table 6 as models 34 –39. Then Tables 7 and 8 show the observed and calculated data, with the corresponding residuals, for tri- and tetraparametric regressions, respectively. PREDICTIVE POWER OF THE PROPOSED MODELS We now discuss the predictive power of the best models for logKi(hCA-I), logKi(hCA-II), and logKi(bCA-IV). The correlation coefficients R2pred are 0.8942 (model 35), 0.7709 (model 36), and 0.6663 (model 38), for logKi(hCAI), logKi(hCAII), and logKi(bCAIV), respectively. In order to investigate the intercorrelations between descriptors in the proposed models, in Table 9 we present the correlation matrix from the data in Table 2. This can be useful in determining if certain variables are redundant and therefore not needed in the model. Because JhetZ and JhetM are so highly intercorrelated with one another, and because JhetP correlates with JhetV we decided to keep only the topostructural index J, and the topochemical indices JhetV and JhetE, along with indices W, 1χ, and Sz.
Obs.
4.657 4.398 4.447 4.895 4.398 4.322 3.919 3.991 3.813 3.778 3.785 3.924 3.934 3.968 2.658 0.778 0.954 1.623 1.643 2.839 1.845 1.740 1.699 4.380 4.255
Comp.
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25.
Model Estim. 4.267 4.127 4.014 3.839 3.839 3.604 4.105 4.105 4.105 4.105 4.180 4.113 4.195 4.251 3.255 1.458 1.341 1.016 1.254 1.712 3.051 3.051 2.217 3.839 3.604
1 Residual 0.39 0.271 0.433 1.056 0.559 0.718 -0.186 -0.114 -0.292 -0.327 -0.395 -0.189 -0.261 -0.283 -0.597 -0.680 -0.387 0.607 0.389 1.127 -1.206 -1.311 -0.518 0.541 0.651
Model Estim. 4.369 4.253 4.154 3.859 3.859 3.553 4.142 4.142 4.142 4.142 3.924 3.913 4.461 4.406 3.148 1.476 1.403 1.184 1.308 1.615 2.890 2.890 2.002 3.859 3.553
2 Residual 0.288 0.145 0.293 1.036 0.539 0.769 -0.223 -0.151 -0.329 -0.364 -0.139 0.011 -0.527 -0.438 -0.490 -0.698 -0.449 0.439 0.335 1.224 -1.045 -1.150 -0.303 0.521 0.702
A. Estimated log Ki (hCA-I) using biparametric models Model Estim. 4.229 4.076 3.951 3.782 3.782 3.564 4.062 4.062 4.062 4.062 4.275 4.179 4.206 4.294 3.462 1.667 1.618 1.000 1.279 1.554 2.965 2.965 2.204 3.782 3.564
3 Residual 0.428 0.322 0.496 1.113 0.616 0.758 -0.143 -0.071 -0.249 -0.284 -0.490 -0.255 -0.272 -0.326 -0.804 -0.889 -0.664 0.623 0.364 1.285 -1.120 -1.225 -0.505 0.598 0.691
Model Estim. 4.351 4.264 4.191 3.813 4.016 3.765 4.027 4.253 4.302 4.331 4.144 4.110 3.741 3.606 2.693 1.400 1.161 1.177 1.347 1.809 3.213 3.169 2.108 3.947 3.710
4 Resid. 0.306 0.134 0.256 1.082 0.382 0.557 -0.108 -0.262 -0.489 -0.553 -0.359 -0.186 0.193 0.362 -0.035 -0.622 -0.207 0.446 0.296 1.030 -1.368 -1.429 -0.409 0.433 0.545 Model Estim. 4.456 4.388 4.326 3.836 4.033 3.712 4.072 4.292 4.340 4.368 3.914 3.931 4.017 3.781 2.604 1.404 1.212 1.319 1.383 1.696 3.044 3.002 1.889 3.966 3.659
5 Resid. 0.201 0.010 0.121 1.059 0.365 0.610 -0.153 -0.301 -0.527 -0.590 -0.129 -0.007 -0.083 0.187 0.054 -0.626 -0.258 0.304 0.260 1.143 -1.199 -1.262 -0.190 0.414 0.596
Model Estim. 4.341 4.243 4.162 3.766 3.989 3.747 3.994 4.242 4.297 4.328 4.250 4.189 3.724 3.601 2.838 1.562 1.375 1.132 1.340 1.632 3.140 3.092 2.063 3.914 3.687
Observed and calculated values and their residuals
Table 5
6 Resid. 0.316 0.155 0.285 1.129 0.409 0.575 -0.075 -0.251 -0.484 -0.550 -0.465 -0.265 0.210 0.367 -0.180 -0.784 -0.421 0.491 0.303 1.207 -1.295 -1.352 -0.364 0.466 0.568
Model Estim. 4.344 4.214 4.108 3.903 3.822 3.479 4.217 4.190 4.174 4.141 4.269 4.197 4.039 4.146 2.862 1.385 1.222 1.116 1.329 1.827 3.053 3.067 2.206 3.843 3.494
7 Resid. 0.313 0.184 0.339 0.992 0.576 0.843 -0.298 -0.199 -0.361 -0.363 -0.484 -0.273 -0.105 -0.178 -0.204 -0.607 -0.268 0.507 0.314 1.012 -1.208 -1.327 -0.507 0.537 0.761
Model Estim. 4.438 4.330 4.236 3.920 3.843 3.437 4.248 4.223 4.207 4.175 4.019 4.002 4.297 4.297 2.779 1.402 1.281 1.265 1.372 1.731 2.904 2.918 2.005 3.863 3.452
8 Resid. 0.219 0.068 0.211 0.975 0.555 0.885 -0.329 -0.232 -0.394 -0.397 -0.234 -0.078 -0.363 -0.329 -0.121 -0.624 -0.327 0.358 0.271 1.108 -1.059 -1.178 -0.306 0.517 0.803
Model Estim. 4.327 4.183 4.064 3.861 3.772 3.430 4.197 4.168 4.150 4.114 4.384 4.282 4.042 4.185 3.016 1.553 1.450 1.090 1.341 1.672 2.965 2.981 2.178 3.795 3.447
9 Resid. 0.330 0.215 0.383 1.034 0.626 0.892 -0.278 -0.177 -0.337 -0.336 -0.599 -0.358 -0.108 -0.217 -0.358 -0.775 -0.496 0.533 0.302 1.167 -1.120 -1.241 -0.479 0.585 0.808
Obs.
2.470 2.380 2.477 2.505 2.230 2.204 1.778 2.041 1.602 1.845 1.447 1.875 1.778 1.279 0.477 0.301 0.778 0.778 0.954 1.079 0.954 0.903 0.845 2.097 2.041
Comp.
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25.
Model Estim. 2.003 1.954 1.913 1.830 1.830 1.721 1.926 1.926 1.926 1.926 1.821 1.827 1.994 1.996 1.534 0.708 0.622 0.484 0.586 0.883 1.522 1.522 1.115 1.830 1.721
10 Residual 0.467 0.426 0.564 0.675 0.400 0.483 -0.148 0.115 -0.324 -0.081 -0.374 0.048 -0.216 -0.717 -1.057 -0.407 0.156 0.294 0.368 0.196 -0.568 -0.619 -0.27 0.267 0.320
Model Estim. 2.063 2.028 1.997 1.847 1.847 1.697 1.949 1.949 1.949 1.949 1.660 1.702 2.147 2.086 1.474 0.708 0.641 0.562 0.601 0.831 1.444 1.444 1.000 1.847 1.697
11 Residual 0.407 0.352 0.480 0.658 0.383 0.507 -0.171 0.092 -0.347 -0.104 -0.213 0.173 -0.369 -0.807 -0.997 -0.407 0.137 0.216 0.353 0.248 -0.490 -0.541 -0.155 0.250 0.344
B. Estimated log Ki (hCA-II) using biparametric models Model Estim. 1.979 1.921 1.873 1.795 1.795 1.695 1.901 1.901 1.901 1.901 1.891 1.876 1.991 2.015 1.643 0.826 0.781 0.492 0.615 0.800 1.465 1.465 1.105 1.795 1.695
12 Residual 0.491 0.459 0.604 0.710 0.435 0.509 -0.123 0.140 -0.299 -0.056 -0.444 -0.001 -0.213 -0.736 -1.166 -0.525 -0.003 0.286 0.339 0.279 -0.511 -0.562 -0.260 0.302 0.346
Model Estim. 2.084 2.034 1.993 1.789 1.910 1.787 1.911 2.046 2.076 2.093 2.104 2.057 1.705 1.639 1.210 0.642 0.542 0.577 0.660 0.803 1.468 1.442 0.924 1.869 1.755
13 Resid. 0.386 0.346 0.484 0.716 0.320 0.417 -0.133 -0.005 -0.474 -0.248 -0.657 -0.182 0.073 -0.360 -0.733 -0.341 0.236 0.201 0.294 0.276 -0.514 -0.539 -0.079 0.228 0.286 Model Estim. 2.135 2.094 2.059 1.809 1.923 1.768 1.937 2.065 2.093 2.109 1.975 1.958 1.847 1.737 1.182 0.629 0.544 0.611 0.648 0.752 1.408 1.384 0.836 1.884 1.737
14 Resid. 0.335 0.286 0.418 0.696 0.307 0.436 -0.159 -0.024 -0.491 -0.264 -0.528 -0.083 -0.069 -0.458 -0.705 -0.328 0.234 0.167 0.306 0.327 -0.454 -0.481 0.009 0.213 0.304
Model Estim. 2.077 2.021 1.975 1.763 1.895 1.778 1.893 2.041 2.073 2.092 2.169 2.105 1.680 1.620 1.257 0.717 0.642 0.584 0.679 0.736 1.430 1.401 0.901 1.850 1.742
15 Resid. 0.393 0.359 0.502 0.742 0.335 0.426 -0.115 0.000 -0.471 -0.247 -0.722 -0.230 0.098 -0.341 -0.780 -0.416 0.136 0.194 0.275 0.343 -0.476 -0.498 -0.056 0.247 0.299
Model Estim. 2.050 1.992 1.944 1.848 1.812 1.653 1.989 1.977 1.970 1.955 1.987 1.961 1.917 1.961 1.364 0.662 0.580 0.528 0.627 0.877 1.460 1.467 1.058 1.821 1.660
16 Resid. 0.42 0.388 0.533 0.657 0.418 0.551 -0.211 0.064 -0.368 -0.110 -0.540 -0.086 -0.139 -0.682 -0.887 -0.361 0.198 0.250 0.327 0.202 -0.506 -0.564 -0.213 0.276 0.381
Model Estim. 2.101 2.056 2.016 1.862 1.830 1.639 2.006 1.995 1.989 1.975 1.835 1.843 2.060 2.045 1.327 0.661 0.595 0.585 0.630 0.827 1.397 1.403 0.960 1.838 1.645
17 Resid. 0.369 0.324 0.461 0.643 0.400 0.565 -0.228 0.046 -0.387 -0.130 -0.388 0.032 -0.282 -0.766 -0.850 -0.360 0.183 0.193 0.324 0.252 -0.443 -0.500 -0.115 0.259 0.396
Model Estim. 2.038 1.971 1.916 1.822 1.780 1.621 1.979 1.965 1.957 1.940 2.071 2.022 1.904 1.972 1.428 0.752 0.705 0.540 0.656 0.807 1.404 1.411 1.040 1.791 1.629
18 Resid. 0.432 0.409 0.561 0.683 0.450 0.583 -0.201 0.076 -0.355 -0.095 -0.624 -0.147 -0.126 -0.693 -0.951 -0.451 0.073 0.238 0.298 0.272 -0.450 -0.508 -0.195 0.306 0.412
Obs.
3.117 3.342 3.477 3.507 3.447 3.389 2.255 2.505 1.820 2.097 2.243 2.204 2.732 2.550 2.097 0.699 0.903 1.699 1.724 2.188 1.279 1.230 1.176 2.748 2.653
Comp.
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25.
Model Estim. 2.763 2.716 2.677 2.585 2.585 2.465 2.677 2.677 2.677 2.677 2.493 2.518 2.764 2.756 2.246 1.337 1.227 1.082 1.191 1.559 2.272 2.272 1.814 2.585 2.465
19 Residual 0.354 0.626 0.800 0.922 0.862 0.924 -0.422 -0.172 -0.857 -0.580 -0.250 -0.314 -0.032 -0.206 -0.149 -0.638 -0.324 0.617 0.533 0.629 -0.993 -1.042 -0.638 0.163 0.188
Model Estim. 2.840 2.814 2.788 2.613 2.613 2.442 2.709 2.709 2.709 2.709 2.272 2.349 2.960 2.870 2.172 1.325 1.230 1.157 1.189 1.497 2.192 2.192 1.679 2.613 2.442
20 Residual 0.277 0.528 0.689 0.894 0.834 0.947 -0.454 -0.204 -0.889 -0.612 -0.029 -0.145 -0.228 -0.320 -0.075 -0.626 -0.327 0.542 0.535 0.691 -0.913 -0.962 -0.503 0.135 0.211
Model Estim. 2.738 2.683 2.637 2.549 2.549 2.438 2.651 2.651 2.651 2.651 2.556 2.562 2.769 2.782 2.375 1.469 1.402 1.076 1.209 1.460 2.216 2.216 1.805 2.549 2.438
C. Estimated log Ki (bCA-IV) using biparametric models 21 Residual 0.379 0.659 0.840 0.958 0.898 0.951 -0.396 -0.146 -0.831 -0.554 -0.313 -0.358 -0.037 -0.232 -0.278 -0.770 -0.499 0.623 0.515 0.728 -0.937 -0.986 -0.629 0.199 0.215
Model Estim. 2.803 2.768 2.738 2.569 2.645 2.518 2.656 2.742 2.761 2.771 2.558 2.577 2.588 2.519 2.036 1.307 1.166 1.143 1.231 1.560 2.295 2.278 1.740 2.619 2.497
22 Resid. 0.314 0.574 0.739 0.938 0.802 0.871 -0.401 -0.237 -0.941 -0.674 -0.315 -0.373 0.144 0.031 0.061 -0.608 -0.263 0.556 0.493 0.628 -1.016 -1.048 -0.564 0.129 0.156 Model Estim. 2.877 2.856 2.835 2.597 2.664 2.489 2.694 2.769 2.785 2.795 2.370 2.432 2.795 2.661 1.994 1.289 1.169 1.194 1.216 1.485 2.205 2.191 1.609 2.641 2.471
23 Resid. 0.240 0.486 0.642 0.910 0.783 0.900 -0.439 -0.264 -0.965 -0.698 -0.127 -0.228 -0.063 -0.111 0.103 -0.590 -0.266 0.505 0.508 0.703 -0.926 -0.961 -0.433 0.107 0.182
Model Estim. 2.794 2.752 2.716 2.535 2.627 2.505 2.632 2.735 2.757 2.770 2.638 2.636 2.564 2.503 2.115 1.411 1.304 1.134 1.243 1.458 2.244 2.225 1.710 2.596 2.481
24 Resid. 0.323 0.590 0.761 0.972 0.82 0.884 -0.377 -0.230 -0.937 -0.673 -0.395 -0.432 0.168 0.047 -0.018 -0.712 -0.401 0.565 0.481 0.730 -0.965 -0.995 -0.534 0.152 0.172
Model Estim. 2.790 2.744 2.707 2.605 2.578 2.422 2.716 2.707 2.702 2.691 2.538 2.558 2.713 2.723 2.119 1.312 1.190 1.115 1.216 1.589 2.265 2.269 1.803 2.585 2.427
25 Resid. 0.327 0.598 0.770 0.902 0.869 0.967 -0.461 -0.202 -0.882 -0.594 -0.295 -0.354 0.019 -0.173 -0.022 -0.613 -0.287 0.584 0.508 0.599 -0.986 -1.039 -0.627 0.163 0.226
Model Estim. 2.860 2.834 2.808 2.628 2.607 2.409 2.739 2.732 2.728 2.720 2.317 2.387 2.913 2.841 2.073 1.303 1.198 1.178 1.207 1.521 2.189 2.192 1.674 2.612 2.413
26 Resid. 0.257 0.508 0.669 0.879 0.840 0.980 -0.484 -0.227 -0.908 -0.623 -0.074 -0.183 -0.181 -0.291 0.024 -0.604 -0.295 0.521 0.517 0.667 -0.910 -0.962 -0.498 0.136 0.240
Model Estim. 2.777 2.722 2.676 2.575 2.543 2.387 2.703 2.692 2.686 2.672 2.625 2.621 2.708 2.744 2.212 1.423 1.342 1.110 1.234 1.492 2.202 2.208 1.784 2.551 2.393
27 Resid. 0.340 0.620 0.801 0.932 0.904 1.002 -0.448 -0.187 -0.866 -0.575 -0.382 -0.417 0.024 -0.194 -0.115 -0.724 -0.439 0.589 0.490 0.696 -0.923 -0.978 -0.608 0.197 0.260
QSAR modeling of carbonic anhydrase
711
Table 6 Regression parameters and quality of correlations for modeling logKi(hCA-I), logKi(hCA-II), and logKi(bCA-IV) using tri- and tetra-parametric regressions (A) Tri-parametric regressions Model Parameters Regression Coefficients No. used (i) For modeling of log Ki(hCA-I) 28 Jhetv 0.9873(±0.2766) 1 χ -1.0371(±0.3540) Sz 0.0025(±0.0014) 29 Jhetv 1.0445(±0.2947) 1 χ 0.0026(±0.0025) W -0.8661(±0.4288) (ii) For modeling of log Ki(hCA-II) 30 Jhetv 0.5730(±0.1681) 1 χ -0.7409(±0.2152) Sz 0.0023(±0.0008) 0.6285(±0.1874) 31 Jhetv 1 χ -0.6049(±0.2727) W 0.0025(±0.0016) (iii) For modeling of log Ki(bCA-IV) 32 Jhetv 0.3376(±0.2723) 1 χ -0.9850(±0.3485) Sz 0.0029(±0.0014) 33 Jhetv 0.4176(±0.2912) 1 χ -0.8756(±0.4237) W 0.0036(±0.0025)
Constant
Se
R2A
R
F-ratio
6.1738
0.5766
0.7965
0.9066
32.315
5.2838
0.6025
0.7779
0.8976
29.019
3.6827
0.3505
0.7346
0.8762
23.144
2.9484
0.3832
0.6828
0.8500
18.221
6.2659
0.5677
0.5371
0.7713
10.281
5.5788
0.5953
0.4909
0.7447
8.715
Constant
Se
R2A
R
F-ratio
5.2972
0.4954
0.8498
0.9353
34.949
2.0072
0.4554
0.8731
0.9456
42.281
3.5704
0.3568
0.7250
0.8780
16.820
2.4835
0.3881
0.6745
0.8537
13.433
5.6009
0.5279
0.5997
0.8163
9.987
3.2758
0.5350
0.5888
0.8107
9.590
(B) Tetra-parametric regressions Model Parameters Regression Coefficients No. used (i) For modeling of log Ki(hCAI) 0.6932(±0.2582) 34 Jhetv 1 -0.6037(±0.3387) χ 0.0005(±0.0014) Sz -1.0854(±0.3734) I1 35 Jhetv 0.3921(±0.2739) 1 χ 0.3795(±0.4446) W -0.0053(±0.0027) I1 -1.7484(±0.4272) (ii) For modeling of log Ki(hCAII) 36 Jhetv 0.5353(±0.1860) 1 χ -0.6853(±0.2439) Sz 0.0020(±0.0010) I1 -0.1391(±0.2689) 37 Jhetv 0.5359(±0.2335) 1 χ -0.4282(±0.3790) W 0.0014(±0.0023) I1 -0.2481(±0.3641) (iii) For modeling of log Ki(bCAIV) 38 Jhetv 0.1146(±0.2752) 1 χ -0.6562(±0.3610) Sz 0.0014(±0.0015) I1 -0.8234(±0.3979) -0.0409(±0.3218) 39 Jhetv 1 -0.0001(±0.5224) χ -0.0020(±0.0032) W -1.2289(±0.5019) I1
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25.
Comp.
4.657 4.398 4.447 4.895 4.398 4.322 3.919 3.991 3.813 3.778 3.785 3.924 3.934 3.968 2.658 0.778 0.954 1.623 1.643 2.839 1.845 1.740 1.699 4.380 4.255
Obs.
CA-I Model Calc. 4.573 4.535 4.494 3.856 4.053 3.637 4.131 4.351 4.399 4.427 3.688 3.746 4.223 3.838 2.152 1.282 1.126 1.841 1.685 1.845 2.817 2.774 1.604 3.986 3.583
logKi 28 Residual 0.084 -0.137 -0.047 1.039 0.345 0.685 -0.212 -0.360 -0.586 -0.649 0.097 0.178 -0.289 0.130 0.506 -0.504 -0.172 -0.218 -0.042 0.994 -0.972 -1.034 0.095 0.394 0.672 Model Calc. 4.541 4.486 4.431 3.822 4.031 3.647 4.093 4.325 4.377 4.406 3.797 3.834 4.203 3.873 2.478 1.465 1.341 1.563 1.518 1.593 2.832 2.787 1.652 3.960 3.591
29 Residual 0.116 -0.088 0.016 1.073 0.367 0.675 -0.174 -0.334 -0.564 -0.628 -0.012 0.090 -0.269 0.095 0.180 -0.687 -0.387 0.060 0.125 1.246 -0.987 -1.047 0.047 0.420 0.664 2.470 2.380 2.477 2.505 2.230 2.204 1.778 2.041 1.602 1.845 1.447 1.875 1.778 1.279 0.477 0.301 0.778 0.778 0.954 1.079 0.954 0.903 0.845 2.097 2.041
Obs.
CA-II Model Calc. 2.243 2.229 2.212 1.826 1.941 1.699 1.991 2.119 2.147 2.163 1.768 1.788 2.037 1.789 0.767 0.516 0.464 1.090 0.926 0.888 1.199 1.175 0.753 1.902 1.668
logKi 30 Residual 0.227 0.151 0.265 0.679 0.289 0.505 -0.213 -0.078 -0.545 -0.318 -0.321 0.087 -0.259 -0.510 -0.290 -0.215 0.314 -0.312 0.028 0.191 -0.245 -0.272 0.092 0.195 0.373 Model Calc. 2.217 2.189 2.160 1.795 1.921 1.705 1.957 2.097 2.128 2.145 1.862 1.864 2.028 1.826 1.060 0.687 0.669 0.848 0.779 0.652 1.203 1.176 0.606 1.878 1.671
31 Residual 0.253 0.191 0.317 0.710 0.309 0.499 -0.179 -0.056 -0.526 -0.300 -0.415 0.011 -0.250 -0.547 -0.583 -0.386 0.109 -0.070 0.175 0.427 -0.249 -0.273 0.239 0.219 0.370 3.117 3.342 3.477 3.507 3.447 3.389 2.255 2.505 1.820 2.097 2.243 2.204 2.732 2.550 2.097 0.699 0.903 1.699 1.724 2.188 1.279 1.230 1.176 2.748 2.653
Obs.
CA-IV Model Calc. 3.015 3.029 3.032 2.620 2.687 2.401 2.763 2.838 2.855 2.864 2.105 2.214 3.037 2.728 1.462 1.145 1.067 1.808 1.571 1.660 1.938 1.923 1.274 2.665 2.382
logKi 32 Residual 0.102 0.313 0.445 0.887 0.760 0.988 -0.508 -0.333 -1.035 -0.767 0.138 -0.010 -0.305 -0.178 0.635 -0.446 -0.164 -0.109 0.153 0.528 -0.659 -0.693 -0.098 0.083 0.271
Observed and calculated log Ki(hCA-I), log Ki(hCA-II), and log Ki(bCA-IV) values using tri-parametric models 28-33.
Table 7
Model Calc. 2.995 2.992 2.981 2.578 2.661 2.399 2.722 2.815 2.836 2.847 2.207 2.297 3.054 2.789 1.819 1.373 1.349 1.534 1.403 1.342 1.910 1.892 1.279 2.633 2.376
33 Residual 0.122 0.350 0.496 0.929 0.786 0.990 -0.467 -0.310 -1.016 -0.750 0.036 -0.093 -0.322 -0.239 0.278 -0.674 -0.446 0.165 0.321 0.846 -0.631 -0.662 -0.103 0.115 0.277
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25.
Comp.
4.657 4.398 4.447 4.895 4.398 4.322 3.919 3.991 3.813 3.778 3.785 3.924 3.934 3.968 2.658 0.778 0.954 1.623 1.643 2.839 1.845 1.740 1.699 4.380 4.255
Obs.
Model Calc. 4.529 4.490 4.451 3.984 4.123 3.792 4.180 4.334 4.368 4.388 3.662 3.755 4.326 4.059 2.782 1.628 1.431 1.619 1.618 2.014 2.133 2.103 1.048 4.076 3.755
CA-I logKi 34 Residual 0.128 -0.092 -0.004 0.911 0.275 0.530 -0.261 -0.343 -0.555 -0.610 0.123 0.169 -0.392 -0.091 -0.124 -0.850 -0.477 0.004 0.025 0.825 -0.288 -0.363 0.651 0.304 0.500 Model Calc. 4.358 4.300 4.254 4.097 4.175 4.000 4.184 4.271 4.291 4.302 3.826 3.910 4.061 4.020 3.304 1.682 1.328 1.122 1.381 2.370 2.090 2.073 1.121 4.148 3.979
35 Residual 0.299 0.098 0.193 0.798 0.223 0.322 -0.265 -0.280 -0.478 -0.524 -0.041 0.014 -0.127 -0.052 -0.646 -0.904 -0.374 0.501 0.262 0.469 -0.245 -0.333 0.578 0.232 0.276 2.470 2.380 2.477 2.505 2.230 2.204 1.778 2.041 1.602 1.845 1.447 1.875 1.778 1.279 0.477 0.301 0.778 0.778 0.954 1.079 0.954 0.903 0.845 2.097 2.041
Obs.
Model Calc. 2.237 2.224 2.207 1.843 1.950 1.719 1.997 2.117 2.143 2.158 1.764 1.789 2.050 1.818 0.848 0.561 0.503 1.062 0.917 0.910 1.111 1.088 0.502 1.913 1.690
CA-II logKi 36 Residual 0.233 0.156 0.270 0.662 0.280 0.485 -0.219 -0.076 -0.541 -0.313 -0.317 0.086 -0.272 -0.539 -0.371 -0.260 0.275 -0.284 0.037 0.169 -0.157 -0.185 0.343 0.184 0.351 Model Calc. 2.191 2.163 2.135 1.834 1.941 1.755 1.970 2.089 2.116 2.131 1.866 1.874 2.007 1.847 1.177 0.718 0.667 0.785 0.759 0.762 1.097 1.074 0.530 1.905 1.726
37 Residual 0.279 0.217 0.342 0.671 0.289 0.449 -0.192 -0.048 -0.514 -0.286 -0.419 0.001 -0.229 -0.568 -0.700 -0.417 0.111 -0.007 0.195 0.317 -0.143 -0.171 0.315 0.192 0.315 3.117 3.342 3.477 3.507 3.447 3.389 2.255 2.505 1.820 2.097 2.243 2.204 2.732 2.550 2.097 0.699 0.903 1.699 1.724 2.188 1.279 1.230 1.176 2.748 2.653
Obs.
CAIV Model Calc. 2.981 2.995 2.999 2.717 2.740 2.518 2.800 2.826 2.831 2.834 2.085 2.221 3.116 2.896 1.941 1.407 1.299 1.639 1.520 1.788 1.419 1.414 0.852 2.732 2.512 logKi 38 Residual 0.136 0.347 0.478 0.790 0.707 0.871 -0.545 -0.321 -1.011 -0.737 0.158 -0.017 -0.384 -0.346 0.156 -0.708 -0.396 0.060 0.204 0.400 -0.140 -0.184 0.324 0.016 0.141
Observed and calculated log Ki(hCA-I), log Ki(hCA-II),and log Ki(bCA-IV) values using tetra-parametric models 34-39.
Table 8
Model Calc. 2.866 2.861 2.856 2.770 2.762 2.647 2.786 2.777 2.775 2.774 2.228 2.351 2.955 2.892 2.400 1.525 1.339 1.223 1.307 1.888 1.389 1.390 0.906 2.765 2.649
39 Residual 0.251 0.481 0.621 0.737 0.685 0.742 -0.531 -0.272 -0.955 -0.677 0.015 -0.147 -0.223 -0.342 -0.303 -0.826 -0.436 0.476 0.417 0.300 -0.110 -0.160 0.270 -0.017 0.004
714
Padmakar V. Khadikar et al. Table 9 Intercorrelation matrix for topological indices according to data in Table 2. W J JhetZ JhetM JhetV JhetE JhetP 1 χ Sz
W
J
JhetZ
JhetM
JhetV
JhetE
JhetP
1 -0.6527 -0.5352 -0.5348 -0.6423 -0.6736 -0.6057 0.4284 0.9848
1 0.8467 0.8472 0.9037 0.9771 0.9079 0.3537 -0.6473
1 0.9999 0.6511 0.8143 0.7198 0.3419 -0.5642
1 0.652 0.815 0.7205 0.343 -0.5638
1 0.9475 0.9883 0.2477 -0.6252
1 0.9445 0.2955 -0.6683
1 0.3055 -0.5943
1
χ
Sz
1 0.4642
1
Eq. (35) and (36) are the best models used for modeling logKi(hCA-I). This eq. (35) contains JhetV, 1χ, I1 and W as parameters whose correlation matrix is shown in Table 10. These results show that topological indices 1 χ and W are highly correlated. We have proposed eq. (36) for modeling log Ki(hCA-II) and eq. (38) for modeling log Ki(bCA-IV). The correlation matrices (Table 10) show that indices 1χ and Sz appearing in equations (36) and (38) also suffer from the same collinearity defect. Table 10 Correlation matrices for the parameters of eq. (35), (36), and (38) Equation (35) log Ki(hCA-I). W 1 χ JhetV I1
logKi(hCA-I). 1.000 -0.812 -0.830 0.775 -0.455
Equation (36) logKi(hCA-II) Sz 1 χ JhetV I1
logKi(hCA-II) 1.000 -0.678 -0.750 0.748 -0.368
Equation (38) logKi (bCA-IV) Sz 1 χ JhetV I1
logKi(bCA-IV) 1.000 -0.620 -0.694 0.567 -0.477
1
W 1.000 0.985 -0.642 -0.005
χ
JhetV
I1
1.000 -0.625 0.098
1.000 -0.331
1.000
1
Sz 1.000 0.980 -0.613 0.004 Sz 1.000 0.979 -0.613 0.004
χ
JhetV
I1
1.000 -0.625 0.098
1.000 -0.331
1.000
1
χ
JhetV
I1
1.000 -0.625 0.098
1.000 -0.331
1.000
A thorough investigation of collinearity involves examining the values of R2 that result from regressions for each of the predictor variables against all others. The relationship between predictor variables can be judged by examining the variance inflation factor (VIF) which is defined as VIF = 1/(1- Ri2), where Ri is the multiple correlation coefficient of the i’th independent variable versus all other independent variables. A VIF is defined for each variable in the equation and not for the equation (model) as a whole. Therefore, there should be as many VIFs as there are independent variables in the equation (model), and all should be less than 10. Any independent variable having VIF > 10 is indicative of the occurrence of collinearity. We observed that: 1χ and W, as well as 1χ and Sz involved in eqs. (35), (36), and (38) have VIF values higher than 10 and therefore, there is a collinearity problem in these equations (models). The above results show that collinearity exists in all the three proposed models, and thus statistically they are disputable. However, Randić has investigated 56,57 such a problem and he recommended that under certain situations even highly correlated descriptors could be retained in the model. We will, therefore, use Randić’s recommendations in the present case. Randić stated that if a descriptor strongly correlates with another descriptor already used in a regression, such a descriptor in most studies should be discarded. For
QSAR modeling of carbonic anhydrase
715
example 1χ and 2χ often strongly correlate and in many structure-property-activity studies 2χ has been discarded. This is not theoretically justified and despite the widespread practice should be stopped. Although two highly correlated descriptors overall depict the same features of molecular structure, it is important to recognize that even highly interrelated descriptors differ in some other structural traits. The difference between them may be relatively small but nevertheless very important for structure-property regression. The criteria for inclusion or exclusion of descriptors should not be based on parallelism between descriptors even if overwhelming, but should be based on whether the part in which two descriptors disagree is or is not relevant for the characterization of the property considered. If the part in which the second descriptor differ from the first, regardless of how small it is, is relevant for the property under consideration, then the descriptor should be included. Randić further stated that the selection of descriptors to be used in structure-property-activity studies should not be delegated solely to computers,56 although statistical criteria will continue to be useful for preliminary screening of descriptors taken from a large pool. Often in an automated selection of descriptors, a descriptor will be discarded because it is highly correlated with another descriptor already selected. But what is important is not whether two descriptors parallel one another, i.e. duplicate much of the same structural information, but whether they are complementary in those parts that are important for structure-property-activity correlations. Hence, the residual of the correlation between two descriptors should be examined and kept or discarded depending on how well it can improve the correlation based on already selected descriptors. In view of Randić’s recommendations and the fact that 1χ, W, and Sz indices have different information contents, these highly correlated descriptors can be retained in the proposed models. At this stage, it is worth mentioning that problems caused by colinearity; and how to deal with them, continue to be of prime concern to theoretical statistician. From a decision maker’s viewpoint, one should be aware of that collinearity can (and usually does) exist and recognize the basic problems it can cause. Some of the most obvious problems and indications of severe multi-collinearity are: Incorrect signs on the coefficient, A change in the values of the previous coefficient when a new variable is added to the model. Change to insignificant of a previously significant variable when a new variable is added to the model An increase in the standard error of the estimate when a variable is added to the model. In the present case most the correlating variables have their coefficients smaller than the respective standard error. We now comment on adjustable-R2 (R2A). These values take into account of adjustment of R2. Therefore, if a variable is added that does not contribute its fair share, R2A will actually decline. It also takes into account the relationship between sample size and number of variables. The correlation coefficient R2 may appear artificially high if the number of variables is high compared with sample size. That is, R2 will always increase when a new independent variable is added, but R2A will decrease if the added variables do not reduce the unexplained variation enough to offset the loss of degrees of freedom. From Tables 4 and 6 we observe that in each case R2A increases with the increase in the number of variables (an exception is provided by data on CA-II from tri- to tetraparametric regressions). Thus, the added variable has a significant contribution to the developed model. All these points indicate that multi-collinearity is not that serious in the proposed models. EXPERIMENTAL 1. Inhibitory activities. All three values of inhibitory activities logKi(hCA-I), logKi(hCA-II), and logKi(bCA-IV) were taken from our earlier publication after converting into their log unit.13 2. Topological indices. All topological indices used in this paper were calculated from the hydrogen suppressed molecular graph of the benzenesulfonamides presented in Table 1. Their calculations are well documented in the literature.58-63 We have used the Luko-1 program of Lukovits, Hungarian Academy of Sciences, Budapest for the calculation of Szeged index (Sz), while other indices are calculated using Todeschini’s Dragon software.64 3. Regression Analysis. The maximum-R2 method 53-55 was adopted for implementing regression analysis. The Regress-1 program of Lukovits as well as Origin-6 and NCSS programs were used. ACKNOWLEDGEMENTS. Authors are thankful to Professor Istvan Lukovits, Hungarian Academy of Sciences, Budapest, Hungary for providing software to carry out regression analysis. Authors are also thankful to CSIR New Delhi, India for providing financial support through project No 01(1785)/02/EMR-II.
716
Padmakar V. Khadikar et al.
REFERENCES 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53. 54. 55. 56.
A. T. Balaban, S. C. Basak, A. Beteringhe, D. Mills and C. T. Supuran, Molecular Diversity, 2004, 8, 401. C. T. Supuran, A. Scozzafava and J. Conway (Eds.), “Carbonic Anhydrase Inhibitors and Activators”, CRC Press, Boca Raton, 2004. C. T. Supuran and A. T. Balaban, Rev. Roum. Chem., 1994, 39, 107. C. T. Supuran, A. Casini, A. Mastrolorenzo and A. Scozzafava, Mini-Rev. Med. Chem., 2004, 4, 625. P. M. Bell and R. O. Roblin, J. Am. Chem. Soc., 1942, 64, 2905. C. Silipo and A. Vittoria, Farmaco, Ediz. Sci., 1979, 34, 858. G. H. Miller, P. M. Doukas and J. K. Seydel, J. Med. Chem., 1972, 15, 700. J. K. Seydel, J. Med. Chem., 1971, 14, 714. W. Walter and R. F. Becker, Liebigs Ann. Chem. 1969, 727, 71. N. Kakey, M. Aoki, A. Kamada and N. Yata, Chem. Pharm. Bull., 1969, 17, 1010. G. Dauphin and A. Kergomard, Bull. Soc. Chim. Fr., 1961, 486. M. Yoshika, K. Hamamoto and T. Kubota, Bull. Chem. Soc. Jpn., 1962, 35, 1723. V. K. Agrawal, J. Singh, M. Gupta, P. V. Khadikar and C. T. Supuran, Eur. J. Med. Chem., 2005, 40, 1002. C. T. Supuran, A. Scozzafava and A. Casini, “Carbonic Anhydrase, Its Inhibitors and Activators”, C.T. Supuran, A. Scozzafava, J. Conway (Eds.) CRC Press, Boca Raton, 2004, p. 67. F. Mincione, L. Menabuoni and C. T. Supuran, Ibid., p. 243-254. C. T. Supuran, F. Briganti, L. Menabuoni, G. Mincione, F. Mincione and A. Scozzafava, Eur. J. Med. Chem., 2000, 35, 309. C. T. Supuran and B. W. Clare, Eur. J. Med. Chem., 1995, 30, 687. C. T. Supuran and B. W. Clare, Eur. J. Med. Chem., 1999, 34, 41. B. W. Clare and C. T. Supuran, Eur. J. Med. Chem., 1997, 32, 311. C. T. Supuran and B. W. Clare, Eur. J. Med. Chem., 1998, 33, 489. C. T. Supuran and A. Scozzafava, J. Enz. Inhib., 2000, 15, 597. A. Casini, J. Antel, F. Abbate, A. Scozzafava, S. David, H. Waldeck, S. Schafer and C. T. Supuran, Bioorg. Med. Chem. Lett., 2003, 13, 841. B. W. Clare and C. T. Supuran, J. Pharm. Sci., 1994, 83, 768. C. T. Supuran and A. Scozzafava, SAR QSAR Environ. Res., 2001, 12, 17. C. T. Supuran and A. Scozzafava, Eur. J. Med. Chem., 2000, 35, 867. A. Thakur, M. Thakur, P. V. Khadikar and C. T. Supuran, Bioorg. Med. Chem. Lett., 2005, 15, 203. D. Mandoli, S. Joshi, P. V. Khadikar and N. Khosla, Bioorg. Med. Chem. Lett., 2005, 15, 405. P. V. Khadikar, V. Sharma, S. Karmarkar and C. T. Supuran, Bioorg. Med. Chem. Lett., 2005, 15, 931. P. V. Khadikar, V. Sharma, S. Karmarkar and C. T. Supuran, Bioorg. Med. Chem. Lett., 2005, 15, 923. A. T. Balaban, P. V. Khadikar, C. T. Supuran, A. Thakur and M. Thakur, Bioorg. Med. Chem. Lett., 2005, 15, 3966. V. K. Agrawal, M. Banerji, M.Gupta, J. Singh, P. V. Khadikar and C. T. Supuran, Eur. J. Med. Chem., 2005, 40, 1002. M. Jaiswal, P. V. Khadikar and C. T. Supuran, Bioorg. Med. Chem. Lett., 2004, 14, 5661. M. Jaiswal, P. V. Khadikar, A. Scozzafava and C. T. Supuran, Bioorg. Med. Chem. Lett., 2004, 14, 3283. V. K. Agrawal, S. Bano, C. T. Supuran and P. V. Khadikar, Eur. J. Med. Chem., 2004, 39, 593. M. Jaiswal, P. V. Khadikar and C. T. Supuran, Bioorg. Med. Chem., 2004, 12, 2477. A. Thakur, M. Thakur, P. V. Khadikar, C. T. Supuran and P. Sudele, Bioorg. Med. Chem., 2004, 12, 789. A. Saxena, V. K. Agrawal and P. V. Khadikar, Oxid. Commun., 2003, 26, 9. V. K. Agrawal, S. Shrivastava, P. V. Khadikar and C. T. Supuran, Bioorg. Med. Chem., 2003, 11, 5353. V. K. Agrawal and P. V. Khadikar, Bioorg. Med. Chem. Lett., 2003, 13, 447. V. K. Agrawal, R. Sharma and P. V. Khadikar, Bioorg. Med. Chem., 2002, 10, 2993. A. Saxena and P. V. Khadikar, Acta Pharm., 1999, 49, 171. A. T. Balaban, Chem. Phys. Lett., 1982, 89, 399. A. T. Balaban, MATCH, Commun. Math. Computer Chem., 1986, 21, 115. A. T. Balaban and O. Ivanciuc, “MATH/CHEM/COMP 1988 Studies in Physical and Theoretical Chemistry Series”, A. Graovac (Ed.), No. 63, Elsevier, Amsterdam, 1989, p. 193-211. O. Ivanciuc, T. Ivanciuc and A. T. Balaban, J. Chem. Inf. Comput. Sci., 1998, 38, 395-401. H. Wiener, J. Am. Chem. Soc., 1947, 69, 17. I. Gutman, Graph Theory Notes New York, 1994, 27, 9. P. V. Khadikar, N. V. Deshpande, P. P. Kale, A. Dobrynin, I. Gutman and G. Domotor, J. Chem. Inf. Comput. Sci., 1995, 35, 547. P. V. Khadikar, P. P. Kale, N. V. Deshpande, S. Karmarkar and V. K. Agrawal, MATCH, Commun. Math. Comput. Chem., 2001, 43, 7. M. Randić, J. Am. Chem. Soc., 1975, 97, 6609. J. Devillers and A. T. Balaban, (eds.) “Topological indices and related descriptors in QSAR and QSPR”, Gordon and Breach, Williston VT, 2000. M. Barysz, G. Jashari, R. S. Lall, V. K. Srivastava and N. Trinajstić, “Chemical applications of topology and graph theory”, R. B. King (Ed.), Elsevier, Amsterdam, 1983, p. 222. S. Chaterjee, A. S. Hadi and B. Price, “Regression analysis by examples”, 3rd ed., Wiley, New York, 2000. H. Van de Waterbeemd, “Chemometric methods in molecular mesign”, VCH, Weinheim, 1995. J. Devillers, W. Karcher (Eds.) “Applied multiparametric analysis in SAR and environmental studies”, Kluwer Academic, Dordrecht, 1991. M. Randić, Acta Chem. Slov., 1998, 45, 239.
QSAR modeling of carbonic anhydrase 57. 58. 59. 60. 61. 62.
717
M. Randić, J. Chem. Inf. Comput. Sci. 1997, 37, 672. M. V. Diudea and P. V. Khadikar, “Molecular topology and its applications”, Galgotia Publ., New Delhi, India, (in press). N. Trinajstić, “Chemical Graph Theory”, 2nd ed., CRC Press, Boca Raton, Florida, 1992, chapter 10, p. 225. M. V. Diudea (Ed.), “QSPR/QSAR studies by molecular descriptors”, Nova Science, Huntington, New York, 2000. R. Todeschini and V. Consonni, “Handbook of molecular descriptors”, Wiley-VCH, Weinheim, 2000. A. T. Balaban, A. Chiriac, I. Motoc and Z. Simon, “Steric fit in quantitative structure activity relations”, Lecture Notes in Chemistry No. 15, Springer Verlag, Berlin, 1980. 63. A. T. Balaban, I. Motoc, D. Bonchev and O. Mekenyan, Topological indices for structure-activity correlations. In Steric Effects in Drug Design, (M. Charton, I. Motoc, eds.), Topics Curr. Chem., 1983, 114, 21-55, Springer, Berlin. 64. Dragon software for calculation of Balaban-type and other indices, www.disat.unimib.it