Optimization of Gaussian Kernel Function in Support ... - Springer Link

2 downloads 297 Views 2MB Size Report
was applied after pruning and redundancy check on molecular descriptors. Internal ... site for reabsorption (90%) of glucose in mammalian kidney (Berry, 1991).
Interdiscip Sci Comput Life Sci (2013) 5: 45–52 DOI: 10.1007/s12539-013-0156-y

Optimization of Gaussian Kernel Function in Support Vector Machine Aided QSAR Studies of C-Aryl Glucoside SGLT2 Inhibitors Rebekah K. PRASOONA1,2 , A. JYOTI1,2 , Yadav MUKESH3∗ , Sharma NISHANT3 , Nayarisseri S. ANURAJ1,4 , Joshi SHOBHA5 1

2

(Department of Genetics, Osmania University, Hyderabad, India) (Institute of Genetics & Hospital for Genetic Diseases, Osmania University, Begumpet 500016, Hyderabad, India) 3 (Department of Pharmaceutical Chemistry, Softvision College, Indore 452010, Madhya Pradesh, India) 4 (Bioinformatics Research Laboratory, Eminent Biosciences, Indore 452010, Madhya Pradesh, India) 5 (Government Degree College, Depalpur, Indore 453001, Madhya Pradesh, India)

Received 8 March 2012 / Revised 17 April 2012 / Accepted 4 June 2012

Abstract: The present investigations include utility of latest statistical algorithm Support Vector Machine (SVM) to identify non-linear structure activity relationship between IC50 values and structures of C-aryl glucoside SGLT2 inhibitors. Training dataset consisted of forty molecules and the remaining six molecules were chosen for test set validation. SVM under Gaussian Kernel Function yielded non-linear QSAR models. Forward selection algorithm was applied after pruning and redundancy check on molecular descriptors. Internal validations of QSAR models have been achieved using R2CV (LOO), PRESS, SDEP and Y-Scrambling. SVM aided non-linear models are more efficient when optimization of Gaussian Kernel Function was introduced. Non-linear QSAR studies further identified atomic van der Waals volumes, atomic masses, sum of geometrical distances between O..S and degree of unsaturation as molecular descriptors and crucial structural requirements to model IC50 of C-aryl glucoside derivatives. Key words: QSAR, Support Vector Machine (SVM), Gaussian Kernel Function, non-linear QSAR, C-aryl glucoside, SGLT2 inhibitors.

1 Introduction Diabetes, a disease on high concerns to human health has claimed 285 million people in 2010. Estimation states that its present status from 6.4% of world population may scale up to 7.8% by 2030 (International Diabetes Federation, 2009). Diabetes has been leading to early age disease and deaths. Between two forms of diabetes, type I (5-10%) is autoimmune and treated by external insulin supply while type II (90-95%) is state of glucose accumulation leading to pathophysiology (Dwarakanathan, 2006). In diabetes type II, glucose transporter proteins supply glucose to fuel up tissues, membranes, muscles and organs (Brown, 2000; Wright et al., 2007; Kloeckener-Gruissem, 2008). This facilitative glucose carriage covering cellular domain is conducted by GLUT transporter family of proteins. Sodium-glucose transport proteins (SGLTs), encoded by SLC5 gene are found responsible in glucose transport in the intestine and kidney. ∗

Corresponding author. E-mail: [email protected]

S1 segment of proximal convoluted tubule is primary site for reabsorption (90%) of glucose in mammalian kidney (Berry, 1991). Among SGLTs, SGLT2 transporters are in key role at S1 segment (Rector, 1983; Moe, 2000). SGLT2 helps with translocation of attached sodium and glucose with it across apical cell membrane. This secondary active-transport works on electrochemical gradient and thus makes SGLT2 transporter protein an attractive therapeutic target in typeII diabetes (Bakris, 2009; Gerich, 2001). Administration of phlorizin, a natural product obtained from root bark of apple tree in 19th century causes glucosuria. This evidence led medicinal society to conclude that removal of extra glucose from a diabetic patient thorough kidney in urine is achievable (Rossetti et al., 1987; Ehrenkranz et al., 2005). Among C-aryl glucosides, a molecule named dapagliflozin was identified as potent SGLT2 inhibitor in the treatment of diabetes type II (Junwon et al., 2010). Derivatives of this class have confirmed enhanced chemical stability and biological responses resulted from glucosidic bond (Fig. 1). Quantitative Structure Activity Relationship

46

Interdiscip Sci Comput Life Sci (2013) 5: 45–52 Cl HO

OEt

O

HO

OH

mization of SVM algorithm in forward selection wrapper. Table 1

OH

Fig. 1

Dapagliflozin

(QSAR) is a tool that implies mathematics and statistics to identify and predict the biological response of compounds in class as a function of their structure. Generally structural properties are expressed in numerical magnitudes as molecular descriptors derived from chemical structures. Last three decades have been evolutionary in QSAR in development of methodologies and statistical approaches in regression like Support Vector Machine (SVM), Neural Network (NN), Partial Least Square (PLS), Regression trees and ensembles etc. (Axel et al., 2008). Multiple linear regression is the most simple and significant approach used in QSAR so far. This technique helped to identify linear relationship among molecular structures and their biological responses. Often, structure activity relationship can be non-linear, which cannot be identified using MLR analysis (Tetyana et al., 2003). Recent developments and work reported confer SVM as an accurate, robust and fast statistical tool (Pavlidis, 2004; Norinder, 2003). Efficiency of SVM in identifying non-linear QSAR has been appreciable and acceptable. Furthermore, development of kernel functions like Gaussian and polynomial made SVM even more applicable and an alternative tool in QSAR studies. SVM developed by Vapnik for classification was further optimized and applied to achieve regression in exploring non-linear QSAR models (Cortes and Vapnik, 1995). The present work includes optimization of Gaussian Kernel Function through regulation of permissible values of epsilon value, cost value and sigma values. Optimization of Gaussian Kernel Function was investigated in order to reveal the impact of variation in default parameters on statistical fitness of QSAR models. The computational tool which has been extensively used in the present investigation is Sarchitect Designer 2.5.0 (Strand Life Science). Sarchitect, a sophisticated tool with simple user interface, provides latest machine learning techniques like MLR, SVM, ANN, PLS etc. for regression and classification. Its heuristic accuracy, robustness and simple interface makes it a tool of choice to achieve QSAR models.

Structural details of C-aryl glucoside SGLT2 inhibitors, original IC50 and log IC50 used in present investigations. Structures with (*) are included in test set validation. IC50

log IC50

(nM)

(nM)

6.25

0.7959

6910

3.8395

1420

3.1523

H

1230

3.0899

5

Cl

542

2.7340

6

Cl

103

2.0128

7

Cl

24.9

1.3962

(8*)

Cl

45.3

1.6561

9

Cl

53.0

1.7243

10

Cl

2390

3.3784

11

Cl

214

2.3304

12

Cl

379

2.5786

13

Cl

353

2.5478

(14*)

Cl

122

2.0864

Molecule

R1

R2

1

H

H

(2*)

H

3

H

4

CH3

N

CH3

CH3

N

N

Cl

CH3

CH3

CH3

2 Methodology We have achieved SVM optimization for present dataset (Table 1) as results obtained from MLR could not be improved in terms of regression parameters. Description under the following heading explains opti-

CH3

Interdiscip Sci Comput Life Sci (2013) 5: 45–52

47 continue

Molecule

R1

15

Cl

16

Cl

IC50

log IC50

(nM)

(nM)

CH3

146

CH3

10.9

R2

O

S

continue Molecule

R1

2.1644

30

Cl

1.0374

31

Cl

R2

(nM)

7.03

0.8470

O

17.7

1.2480

106

2.0253

58.3

1.7657

10.6

1.0253

249

2.3962

257

2.4099

119

2.0755

3280

3.5159

14.4

1.1584

3.51

0.5453

762

2.8820

712

2.8525

18.2

1.2601

O

Cl

F

155

2.1903

32

Cl

F F 18

log IC50

O

F 17

IC50 (nM)

H3C F

Cl

759

2.8802

33

Cl

34

Cl

35

Cl

(36*)

Cl

CH3

O

F 19

Cl

445

2.6484

20

Cl

16.6

1.2201

21

Cl

126

2.1004

Cl

O

O

N

O

CH3 N 22

30.3

Cl

1.4814 (37*)

Cl

N

CH3 (23*)

N

Cl

CH3 638

2.8048

N 24

Cl

CH3

97.4

Cl

Cl

(39*)

Cl

40

Cl

1.9886

CH3 25

38

CH3 N N

36.4

1.5611

81.6

1.9117

S

N

26

Cl

N

S

S N 27

562

Cl

2.7497

(41*)

Cl

CH3 S

N (28*)

4.06

Cl

0.6085

42

Cl

N

Cl

N 29

Cl

N

190

2.2788

(43*)

S Cl

Cl

48

Interdiscip Sci Comput Life Sci (2013) 5: 45–52 continue

Molecule

R1

44

Cl

45

Cl

R2

IC50

log IC50

(nM)

(nM)

44.5

1.6484

21.8

1.3385

72.6

1.8609

S

N S S 46

Cl

N

N

2.1 Optimization of Gaussian Kernel Function The main idea of SVM (Smola and Sch¨ olkopf, 2004; Sch¨ olkopf and Smola, 2002) is to find the “flattest” (i.e. less complex) linear function that approximates the given data with ε precision in a kernel-induced feature space. This is reached using the ε-insensitive loss function, which penalizes errors greater that ε. The trade between the flatness of the estimate and the amount up to which deviations greater than ε are tolerated, is determined by the regularization constant C≥0. This setting is transformed into a constrained optimization problem, in which the Wolfe dual is computed, resulting in a convex programming problem. The solution of this problem is sparse: a subset of the resulting Lagrange multipliers will be nonzero (Kuhn and Tucker, 1951; Pavlidis, 2004) and the associated samples will be support vectors (SV). Only these vectors Lagrange multipliers from the SVR and LS-SVM models were explained for the first time in terms of molecular structures, descriptors, biological activity and principal components. 2.2 Default parameters Gaussian Kernel Function has been applied in forward selection wrapper with epsilon value 0.1 (the prediction error acceptable by the user, i.e. the acceptable difference between the target value for regression and the predicted value), cost value 100 (increasing this parameter will reduce the training error but at the cost of generalization achieved by the regression model) and sigma value (the default value is set to 1.0). Typically, there is an optimum value of sigma such that going below this value decreases both misclassification (training error) and generalization and going above this value increases misclassification. Models obtained using Gaussian Kernel Function identified non-linear relationship between activity and descriptors. 2.3 Optimization of parameters Optimization of Gaussian Kernel Function was achieved by the following variation when introduced in

deciding parameters epsilon, cost value and sigma values. Sigma values are held as default constant i.e. 1.0. Lowering this parameter may lead to misclassification and further selection of non-significant descriptors, which can produce good statistical models but structure-activity relationship models can be nonrelevant in bio-chemical aspects. Instead of all risk probability we lowered its value to less than 1.0 i.e. 0.9, 0.8 and 0.7. This risk is attributed to the fact that results of MLR have been nearly 0.8 in terms of R2 . Results discussed below were obtained after optimizing sigma values to 0.9 from 1.0. Epsilon values, by default are 0.1, which is a prediction error acceptance by user, can be optimized to increase from 0.1 (> 0.1), in present investigations we optimized the limits of epsilon above 0.1 to 0.9. This decision can be supported by the assumption that to get a more significant QSAR models standard error limit can be permitted. Alteration of epsilon to 0.5 produced present outcome of SVM non-linear models included in study. Cost value - The cost or penalty associated with sample error. If the error is less than the epsilon value, the cost incurred is zero, else if the error is more than epsilon then the cost associated with the error is Cost∗(error-epsilon). The default value is 100. Increasing this parameter will reduce the training error but at the cost of generalization achieved by the regression model. We have taken risk of generalization by increasing cost value above 100 (> 100) i.e. 110, 120..., 200. This optimization approach recorded fall in statistical outcomes. Hereby we decided to hold it constant i.e. 100.

3 Results and discussions Stepwise multiple regression analysis facilitated by forward selection algorithm of SARCHITECT Designer 2.5.0 (developed by Strands Life Science Pvt. Ltd.) produced QSAR models with acceptable statistical parameters including R2 , R2 A, S.E. and F-stat values (Table 2). 3.1 SVM aided non-linear QSAR SVM, a statistically fast, robust and accurate method (Vapnik, 1999; Furey et al., 2000) was also employed for the supervised learning of datasets generally used for establishing the non-linear relationship of variables with activity. Descriptors selected for forward selection were collected. Their redundancy check was performed, statistical parameters obtained in forwards selection of optimized SVM, contributing to model fitness are shown in Table 2. The five models including different descriptors from various classes were employed. DISPm: Geometrical descriptor: d COMMA2 value / weighted by atomic

Interdiscip Sci Comput Life Sci (2013) 5: 45–52 Table 2

49

Support Vector Machine analysis and their statistical parameters including R2 , R2 A and standard errors. Max Absolute Error

Mean Absolute Error

Train Metric

Train Metric

Model

R2

One-descriptor model

0.1877

1.5211

0.5361

0.3337

two-descriptor model

0.5787

1.2938

0.3534

0.4899

three-descriptor model

0.6840

0.9673

0.2940

0.5871

four-descriptor model

0.7718

0.9354

0.2412

0.6413

five-descriptor model

0.8480

0.8985

0.1914

0.6682

4.0

3.5

3.0

Predicted LogIC50

Predicted LogIC50

3.5

4.0

y = 0.767x + 0.498 R2 = 0.788

2.5 2.0 1.5 1.0 0.5

2.5 2.0 1.5 1.0 0.5

0 0

(b) 1

2 3 Observed LogIC50

0 0

4

1

2 3 Observed LogIC50

4

1.0 (a)

1.0 (b)

0.5

0.5

Correlation

Correlation

Graphical Correlation of Observed (Log IC50 ) Vs. Predicted (Log IC50 ) for training set (36 molecules) using SVM aided non-linear MODEL -4 (a) and MODEL-5 (b).

0

0

−1.5

Fig. 3

y = 0.872x + 0.246 R2 = 0.877

3.0

(a)

Fig. 2

(R2CV N-Fold)

−1.0

−0.5 0 R-Square

0.5

1.0

−1.5

−1.0

−0.5 0 R-Square

0.5

1.0

Graphical representation of Y-scrambling as internal validation parameter using SVM aided non-linear MODEL-4 (a) and MODEL-5 (b).

masses (svm) c5AB, R4v+, G (O..S) Geometrical descriptor denotes sum of geometrical distances between O..S and Du belongs to a class of WHIM descriptors and D total accessibility index / unweighted to find the relationship with IC50 of the dataset compounds. SVM aided non-linear models are found to be more efficient with the statistical fitness and prediction data. 3.2 Model validation Non-linear QSAR models (tetra and pentavariable) are manifested to obtain predicted binding affinity and

further correlated with observed binding affinity. Correlations of observed and predicted binding affinities have been reported in Table 3 along with its graphical correlation view in Fig. 2. Method of N-folded R2 cv is introduced as internal cross validation parameter for univariable to hexavariable models. Calculations suggest that R2 cv is stable and there is no sharp fall in R2 cv values when correlated with original values of R2 . This assessment is shown in Table 2. Y-scrambling guards against the possibility of hav-

50

Interdiscip Sci Comput Life Sci (2013) 5: 45–52

Table 3

Molecules

Correlation of observed (Log IC50 ) values and predicted (Log IC50 ) values using SVM aided nonlinear QSAR MODEL-4 and MODEL-5. Log IC50 (nM)

Model 4

Model 5

Predicted Log

Residual Log

Predicted Log

Residual Log

IC50 (nM)

IC50 (nM)

IC50 (nM)

IC50 (nM)

1

0.7960

0.7699

0.0261

0.7958

0.0002

(2*)

3.8390

3.1973

0.6417

3.8526

−0.0136

3

3.1520

3.1554

−0.0034

3.1557

−0.0037

4

3.0900

3.0868

0.0032

3.0867

0.0033

5

2.7340

2.3800

0.3540

2.4117

0.3223

6

2.0130

2.1447

−0.1317

1.8024

0.2106

7

1.3960

2.0586

−0.6626

1.5943

−0.1983

(8*)

1.6560

2.0554

−0.3994

1.9538

−0.2978

9

1.7240

2.0666

−0.3426

1.7239

0.0001

10

3.3780

3.3813

−0.0033

3.3814

−0.0034

11

2.3300

2.4277

−0.0977

2.4762

−0.1462

12

2.5790

2.4722

0.1068

2.5494

0.0296

13

2.5480

2.4045

0.1435

2.5514

−0.0034

(14*)

2.0860

2.8865

−0.8005

3.3005

−1.2145

15

2.1640

1.5840

0.5800

2.1642

−0.0002

16

1.0370

1.0336

0.0034

1.0360

0.0010

17

2.1900

2.1895

0.0005

2.1920

−0.0020

18

2.8800

2.6888

0.1912

2.8771

0.0029

19

2.6480

2.6466

0.0014

2.5931

0.0549

20

1.2200

1.7924

−0.5724

1.7012

−0.4812

21

2.1000

1.8622

0.2378

1.9843

0.1157

22

1.4810

1.7342

−0.2532

1.7393

−0.2583

(23*)

2.8050

2.0583

0.7467

1.8041

1.0009

24

1.9890

1.9752

0.0138

1.9863

0.0027

25

1.5610

2.2015

−0.6405

2.1337

−0.5727

26

1.9120

1.9152

−0.0032

1.7998

0.1122

27

2.7500

1.9264

0.8236

1.8709

0.8791 −1.1893

(28*)

0.6090

1.7052

−1.0962

1.7983

29

2.2790

1.7013

0.5777

1.6782

0.6008

30

0.8470

1.1050

−0.2580

1.1911

−0.3441

31

1.2480

1.2098

0.0382

1.1820

0.0660

32

2.0250

2.0279

−0.0029

2.0255

−0.0005

33

1.7660

1.3825

0.3835

1.3234

0.4426

34

1.0250

1.8643

−0.8393

1.4336

−0.4086

35

2.3960

2.3933

0.0027

2.3929

0.0031

(36*)

2.4100

1.0894

1.3206

1.1967

1.2133

(37*)

2.0760

2.1253

−0.0493

1.9741

0.1019

38

3.5160

3.5155

0.0005

3.5156

0.0004

(39*)

1.1580

0.5541

0.6039

−0.4462

1.6042

40

0.5450

1.1965

−0.6515

0.5477

−0.0027

(41*)

2.8820

1.6503

1.2317

2.0009

0.8811

42

2.8520

2.8548

−0.0028

2.8547

−0.0027

(43*)

1.2600

0.7254

0.5346

0.9510

0.3090

44

1.6480

1.6310

0.0170

1.6446

0.0034

45

1.3380

1.3342

0.0038

1.3347

0.0033

46

1.8610

1.8644

−0.0034

1.8576

0.0034

(*) Compounds are included in external test set.

Interdiscip Sci Comput Life Sci (2013) 5: 45–52

51 4.5

3.5

4.0 3.5

3.0 Predicted LogIC50

Predicted LogIC50

3.0 2.5 2.0 1.5

2.5 2.0 1.5 1.0 0.5

1.0

0 0.5

−0.5

(a) 0 0

Fig. 4

2 4 Observed LogIC50

6

−1.0 0

(b) 1

2 3 Observed LogIC50

4

5

Graphical Correlation of Observed (Log IC50 ) Vs. Predicted (Log IC50 ) for test set (10 molecules) using SVM aided non-linear MODEL-4 and MODEL-5.

ing learned chance models, where descriptors happen to be correlated to the endpoint for the particular dataset by statistical chance. Models are fitted for randomly reordered activity value and compared with the model obtained for actual activity. Further, absence of any by chance modeling events is confirmed by 100 iterations and correlation of R2 values as given in Fig. 3. Set of 10 molecules, kept aside for test set validation were further submitted to activity prediction using SVM aided non-linear models (tetravariable and pentavariables). Results are availed in Table 3 and graphical correlation in Fig. 4. Overall SVM models reveal to be efficient in predictive powers.

4 Conclusion Results and discussion show utility of SVM to identify non-linear relationship, which for a long time QSAR community has deprived. Significantly the superiority of SVM can be easily distinguished in present investigations. SVM has suggested that there could be non-linear structure relationship when linear approaches fail to identify it. Gaussian Kernel Function was optimized through insensitive loss function parameters. Autocorrelated atomic van der Waals volumes, autocorrelated atomic polarazibility, aromatic index and rotatable bond fractions are identified as structural properties which are linearly (MLR) related to IC50 values of SGLT2 inhibitors. Non-linear QSAR studies identified R maximal autocorrelation of lag 4 / weighted by atomic van der Waals volumes, d COMMA2 value / weighted by atomic masses, number of fragments Cyc5[AB] with label C on atom 1, sum of geometrical distances between O..S and degree of unsaturation as molecular descriptors.

References [1] Aksyonova, T.I., Volkovich, V.V., Tetko, I.V. 2003. Robust polynomial neural networks in quantativestructure activity relationship studies. Syst Anal Model Simul 43, 1331-1339. [2] Bakris, G.L., Fonseca, V., Sharma, K., Wright, E. 2009. Renal sodium-glucose transport: Role in diabetes mellitus and potential clinical implications. Kidney Int 75, 1272-1277. [3] Berry, C.A., Rector, F.C. Jr. 1991. Renal transport of glucose, amino acids, sodium, chloride, and water. In: Brenner, B.M., Rector, F.C. Jr. (Eds.) The Kidney, 4th Edition, W.B. Saunders, Philadelphia, 245-282. [4] Brown, G.K. 2000.Glucose transporters: Structure, function and consequences of glucose deficiency. J Inherit Metab Dis 23, 237-246. [5] Cortes, C., Vapnik, V. 1995. Support-vector networks. Mach Learn 20, 273-297. [6] Dwarakanathan, A. 2006. Diabetes update. J Insur Med 38, 20-30. [7] Ehrenkranz, J.R., Lewis, N.G., Kahn, C.R., Roth, J. 2005. Phlorizin: A review. Diabetes Metab Res Rev 21, 31-38. [8] Furey, T., Cristianini, N., Duffy, N., Bednarski, D., Schummer, M., Haussler, D. 2000. Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16, 906-914. [9] Gerich, J.E., Woerle, H.J., Meyer, C., Stumvoll, M. 2001. Renal gluconeogenesis. Diabetes Care 24, 382391. [10] International Diabetes Federation. 2009. Diabetes Atlas, 4th Edition, Montreal, Canada. [11] Kloeckener-Gruissem, B., Vandekerckhove, K., Nurnberg, G., Neidhardt, J., Zeitz, C., Nurnberg, P., Schipper, I., Berger, W. 2008. Mutation of solute carrier

52

Interdiscip Sci Comput Life Sci (2013) 5: 45–52 SLC16A12 associates with a syndrome combining juvenile cataract with microcornea and renal glucosuria. Am J Hum Genet 82, 772-779.

[12] Kuhn, H.W., Tucker, A.W. 1951. Nonlinear programming. In: Neyman, J. (Ed.) Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, CA, University of California Press, Los Angeles , 481-492. [13] Lee, J., Lee, S.H., Seo, H.J., Son, E.J., Lee, S.H., Jung, M.E., Lee, M., Han, H.K., Kim, J., Kang, J., Lee, J. 2010. Novel C-aryl glucoside SGLT2 inhibitors as potential antidiabetic agents. 1,3,4Thiadiazolylmethylphenyl glucoside congeners. Bioor & Med Chem 18, 2178-2194. [14] Norinder, U. 2003. Support vector machine models in drug design: Application to drug transport processes and QSAR using simplex optimizations and variable selection. Neurocomputing 55, 337-346. [15] Pavlidis, P., Wapinski, I., Noble, W.S. 2004. Support vector machine classification on the web. Bioinformatics 20, 586-587. [16] Rector, F.C. Jr. 1983. Sodium, bicarbonate, and chloride absorption by the proximal tubule. Am J Physiol 244, F461-F471.

[17] Rossetti, L., Shulman, G.I., Zawalich, W., DeFronzo, R.A. 1987. Effect of chronic hyperglycemia on in vivo insulin secretion in partially pancreatectomized rats. J Clin Invest 80, 1037-1044. [18] Sch¨ olkopf, B., Smola, A.J. 2002. Learning with Kernels: Support Vector Machines, Regularization, Optimization and Beyond, MIT Press, Cambridge, MA, 185-208. [19] Smola, A.J., Sch¨ olkopf, B. 2004. A tutorial on support vector regression. Statistics and computing. Stat Comput 14, 199-222. [20] Soto, A.J., Cecchini, R.L., Vazquez, G.E., Ponzoni, I. 2008. An evolutionary approach for feature selection applied to ADMET prediction. Amer J Artif Intell 37, 55-63. [21] Vapnik, V. 1999. The Nature of Statistical Learning Theory, Verlag Springer, New York. [22] Wright, E.M., Hirayama, B.A., Loo, D.F. 2007. Active sugar transport in health and disease. J Intern Med 261, 32-43.

Suggest Documents