Evolutionary Neural Network for the Classification

0 downloads 0 Views 97KB Size Report
sciences, namely, Chemistry and Information Technology (Arulmozhi et.al, 2011; Brown, 1998;. Brown .... coefficients of volatile organic compounds (Konoz, 2008). 2.3. ... Alignment free classification of G-Protein-Coupled Receptors (Otaki,. 2006) ... The fifth step is to select a set of good chromosomes for the next generation.
European Journal of Scientific Research ISSN 1450-216X Vol. 86 No 3 September, 2012, pp.379-388 © EuroJournals Publishing, Inc. 2012 http://www.europeanjournalofscientificresearch.com

Evolutionary Neural Network for the Classification of Chemoinformatics Data Sets V. Arulmozhi Department of Computer Applications Tirupur Kumaran College for Women, India E-mail: [email protected] Reghunadhan Rajesh Department of Computer Applications Bharathiar University, Coimbatore -641046, India E-mail: [email protected] Abstract Chemoinformatics has evolved by the marriage of two branches of sciences namely, chemistry and information technology. In this paper, neural network trained by evolutionary algorithm is used as a classifier for the classification of Chemoinformatics data sets. The results of evolutionary neural network classifier are promising.

Keywords: Neural Networks, Classifier, Chemoinformatics, Evolutionary Algorithms

1. Introduction Chemoinformatics have now evolved into a new branch of science from the parent branches of sciences, namely, Chemistry and Information Technology (Arulmozhi et.al, 2011; Brown, 1998; Brown, 2005). Brown (1998) has first coined the word Chemoinformatics and hence known as the father of Chemoinformatics. The major functionalities of Chemoinformatics include, but not limited to, chemical structure/ property prediction, molecular similarity/diversity analysis, virtual screening, qualitative/quantitative structural/activity/property relationship, design of combinatorial libraries, statistical models, descriptors, drug discovery, representation of chemical compounds/ reactions, classification/search/ storage methods, management of compound databases, high-throughput docking, data analysis methods, etc (Johann, 2006). This paper presents evolutionary neural network for the classification of Chemoinformatics data sets. This paper is organized as follows. Section 2 provides a quick review of Chemoinformatics toolboxes/software’s and applications of machine learning techniques in Chemoinformatics. Section 3 presents the design and results of evolutionary neural networks. Section 4 concludes the paper.

2. Chemoinformatics – A Quick Review There are a number of scientific software’s and toolboxes for Chemoinformatics. Also there are several machine learning applications in Chemoinformatics. In this section, we present a quick review of some

Evolutionary Neural Network for the Classification of Chemoinformatics Data Sets

388

of the scientific toolboxes/software’s/databases used for Chemoinformatics and applications of machine learning techniques in Chemoinformatics. Table 1 shows some of the scientific software’s and toolboxes used for Chemoinformatics. Table 2 shows some of the Chemoinformatics databases. Table 1:

Some of the scientific software’s and toolboxes for Chemoinformatics

Software’s/Toolboxes Tomocomd (M.-Ponce, 2002) Statistica [80] PubChemBrowse (Choi, 2010) GTM (Bishop, 1997) MDS (Kruskal, 1978) ORAC (Johnson, 1986) ADRIANA [77] Cerius [78] MassLib (Henneberg, 1992) ToSIM (Scsibrany, 1994) TRISTAN (Gelernter, 1990) Horace (Rose, 1994) Synchem2 (Gelernter, 1984) Isolde (Rose, 1989) Constrast (Bawden, 1986) SMILES (Weininger, 1989) CORINA [81] PETRA [83] WODCA [84] Plants (Korb, 2007) WEBLAB [86] Spotfire [87] PARTEK [88] DIVA [89] edragon [90] VCCLAB [91] Osiris [93] Pre-ADME [95] ADAPT (Stuper, 1979)

Table 2:

Description Docking and 3D-chiral indices For statistical analysis 3D data point visualization tool An alternative to SOM Multidimensional scaling Keyword generation for reaction searching Calculation of molecular descriptors Calculation of molecular descriptors Mass spectra evaluation systems Mass spectra evaluation systems Modeling reaction mechanism Classification of chemical reactions Synthesis planning system Generalization of reactions Reaction retrieval system Generation of unique smiles notation 3D structure generator Methods for physicochemical effects Workbench for the Organization of Data Protein-Ligand ANT system Molecular Simulation Visualization and analysis Visualization and analysis Visualization and analysis Descriptor generation Virtual Computational Chemistry Laboratory Descriptor generator Molecular descriptor Data analysis and pattern recognition

Some of the Chemoinformatics databases other than DrugBank, PubChem, NCI and CTD

Database Chem2bio2rdf (Chen, 2010) UCSC (Kuhn, 2009) MDL [79] Chemical Registry (Fisanick, 1998) Crossfire (Meehan, 2001) Scifinder (Ridley, 2000) MedMiner [85] iResearch [92] ZINC (Irwin, 2005) Ensembl [93] Physprop [96]

Description Data portal for chemical biology Genome browser database Chemical Directory Chemical information retrieval systems. Chemical searching Chemical reaction searching Searching and organizing literature Database of Chemical compounds Compounds for virtual screening Human genome data Physical properties database

2.1. Application of Support Vector Machine in Chemoinformatics Support vector machines (SVM), which are supervised learning models, are widely used as classifiers and regressors in Chemoinformatics (Cortes, 1995). There are various toolboxes available, namely,

387

V. Arulmozhi and Reghunadhan Rajesh

LIBSVM, Kernal-SVM, etc. each one is having its own advantages and disadvantages. Some applications of SVM in Chemoinformatics are given in table 3. Table 3: No 1 2 3 4

Some of the applications of SVM in Chemoinformatics

Applications Prediction of gas to water salvation enthalpy of organic compounds (Dashtbozorgi, 2012) Active learning applied to gene expression data for cancer classification (Liu, 2004) Effect of molecular descriptor feature selection of Pharmacokinetic and Toxicological properties of chemical agents (Xue, 2004) The Pharmacophore kernel for virtual screening (Mahe, 2006)

2.2. Application of Neural Networks in Chemoinformatics Neural Network (Novic, 2003) provides learning capability and it is one of the important components of softcomputing. A neural network will consist of one input layer, one or more number of hidden layers and an output layer. Number of neurons in the input layer will be equal to the number of features passed to the neural network. Number of neurons in the output layer will be equal to the number of classes for classification purpose. Hidden neurons are usually fixed by experts depending on the problem. There are various types of neural network available like feedforward neural networks, feedback networks, reccurrent networks, self organizing maps, anfis, etc. One of the important applications of chemoinformatics using neural network is the prediction of air-to-blood partition coefficients of volatile organic compounds (Konoz, 2008). 2.3. Application of PCA in Chemoinformatics Principal component analysis (PCA), invented in 1901 by Karl Pearson (Pearson, 1901), makes use of orthogonal transformation to convert a values of variables which are correlated into values of variables which are linearly uncorrelated variables called principal components. The applications of PCA in Chemoinformatics are, not limited to, (a) modeling of DNA, peptide sequences and chemical process (Wold, 1993), (b) Identification of a preferred set of molecular descriptors for compound classification (Xue, 1999), etc. 2.4. Application of SVD in Chemoinformatics Singular value decomposition (SVD) splits a matrix into three component matrices, namely, a real/complex-unitary matrix of size m × m, a real/complex-unitary matrix of size n × n, and rectangular diagonal matrix with nonnegative real numbers of size m × n known us singular values. Simultaneous spectrophotometric determinations of cobalt, nickel and copper is one of the application of SVD (Khayamian, 1999). 2.4. Other Machine Learning Applications in Chemoinformatics There are a plenty of machine learning algorithms. Some of the applications using machine learning techniques are shown in table 4. Table 4: No 1 2 3 4 5

Some of the applications of machine learning in Chemoinformatics

Applications Classification of Multidrug resistance reversal agents (Bakken, 2000) Quantitative structure-property relationship study of n-octanol-water partition of some diverse drugs (Ghasemi, 2007) Flexible protein-ligand docking (Korb, 2007) Characterization of plastics (Lloyd, 2007) Molecular structure activity relationship (Mahe, 2005)

Machine Learning Techniques LDA Multiple linear regressions Ant Colony Optimization Vector Quantization Graph Kernels

Evolutionary Neural Network for the Classification of Chemoinformatics Data Sets Table 4: 6 7 8 9 10 11 12

388

Some of the applications of machine learning in Chemoinformatics - continued

Classification (Merkwirth, 2004) QSPR model of Henry's law constant (Modarresi, 2007) Alignment free classification of G-Protein-Coupled Receptors (Otaki, 2006) Variable selection (Roy, 2008) Predecting protien subnuclear location (Shen, 2005) Automated descriptor selection (Sutter, 1995) Discriminating acidic and alkaline enzymes (Zhang, 2009)

Ensemble method Radial Basis Function Network Selforganizing maps partial least square regression K-nearest classifier Simulated annealing Random forest model

3. Evolutionary Neural Network for Classification 3.1. Evolutionary Algorithm – A Review Evolutionary algorithms (EAs) are simple, powerful, general purpose, derivative free optimization algorithm inspired by the Darwinian concept of evolution subject to reproduction, crossover and mutation in a selective environment where only fittest will survive (Rajesh, 2009; Rajesh 2008; Rajesh, 2007). Biological evolution is the inspiration for genetic algorithms and hence most of the principles associated with biological evolution also apply to genetic algorithms. EA’s simultaneously considers many points in parallel and hence have a very less chance of getting into local minima. First step is to initialize the population. A population will have a set of chromosomes. Each chromosome will contain a set of genes and each gene will correspond to a particular parameter to be optimized. The values of genes can be real or binary. The values of the genes can be initialized with random values within the domain of each parameter/gene. The second step is to design a fitness function for the optimization problem. The fitness (objective) function will differ from problem to problem. For example, the objective function of a DNA design depends on the count of to-be-avoided sub-sequences, e.g. long AT runs. The objective function for protein structure prediction is directly proportional to the force field of two units. The third step is to evaluate each and every chromosome in the population using the fitness function to get a fitness value. The fourth step is to verify whether the objective criteria have been met or not. If the objective is met, we can stop and the current chromosome will be the best one. The fifth step is to select a set of good chromosomes for the next generation. There are various types of selection functions, namely, roulette wheel selection, stochastic universal sampling, rank based selection, etc. We have to use any of the selection mechanism to do selection. The sixth step is to do crossover. Crossover exchanges genes between chromosomes. There are different types of crossover operator available, namely, single point crossover, double point crossover, shuffle crossover, arithmetic crossover, etc. The seventh step is to do mutation. The eighth step is to do reinsertion (elitism) to retain good chromosomes from the previous generation. Then go back to step 3 and continue until you get the desired result. 3.2. Applications of Evolutionary Algorithm in Cheminformatics Evolutionary algorithms were used earlier for feature selection (Leardi, 1992). Luke (1994) has used Evolutionary programming for the development of quantitative structure - activity relationships and quantitative structure - property relationships. The same work has also been carried out by Rogers et. al. (1994) and for organic compounds by Golmohammadi et. al. (2010). EAs for the automated generation of molecules were also reported in a later stage (Brown, 2004; Glen, 1995). It has also been used to suggest/screen combinatorial libraries (Jones, 1999; Sheridan, 1995) and their mixtures (Brown, 1997). Most interesting applications of EAs are it use in the discovery of anticancer drugs (Hartwell, 1997), for flexible docking (Jones, 1997; Morris, 1998), identification of biological activity profiles

387

V. Arulmozhi and Reghunadhan Rajesh

(Gillet, 1998), feature selection in PLS/QSAR (Leardi, 1998; Venkatraman, 2004), protien structure prediction (Krasnogor, 1999; Tantar, 2007), and molecular diversity design (Weber, 2000). Recently EAs have also been used for effective classification of biologically active compounds (Xue, 2000), for drug discovery game (Almstetter, 2011), finding relationships among G proteincoupled receptors (Graul, 2001), multi-component reactions (Weber, 2002), conformal sampling (Parent, 2006), QSPR modeling (Modarresi, 2007), prediction of air-to-blood partition coefficients of volatile organic compounds (Konoz, 2008), and prediction of human intestinal absorption (Yan, 2008). 3.3. Design, Simulation and Results of Evolutionary Neural Network for Classification A neural network with 100 hidden neurons is used for training. Simulation results of two data sets, namely, E.coli and QSAR features of acute toxicity of phthalate esters to fish are shown here. 3.3.1. Classification of E.coli Protein Dataset E.coli data set consists of 336 instances with 7 attributes. The number of localization sites (classes) is 8. Since the number of attributes is seven, the number of input neurons is seven. The number of output neurons is equal to the number of classes. In order to optimize this neural network, a total of 1608 (7 attributes × 100 hidden neurons + 100 bias values + 100 hidden neurons × 8 classes + 8 bias values) values have to be tuned using evolutionary algorithm. We have used evolutionary algorithm with 100 chromosome with mutation rate = 0.1 and crossover rate = 0.8. Each chromosome consists of 1608 real values which are in the domain [-5, 5]. The selection mechanism used is roulette wheel. A 50% reinsertion (elitism) is also applied to retain the best chromosomes from the previous population. The fitness function used is nothing but the classification rate of the neural network with weights given by the chromosome on the training data. The algorithm was allowed to run till 500 generations. The best chromosome in the 500th generation is used for evaluating the classification performance. Table 5:

Classification rate of E.coli data set

Methods Probabilistic classification (Horton, 1996) Decision Tree (Horton, 1997) Naïve Bayes (Horton, 1997) Evolutionary neural network

Classification Rate 81.1% 80.36% 80.95% 83.92%

3.3.2. Prediction of Acute Toxicity of Phthalate Esters to Fish The QSAR features of acute toxicity of phthalate esters to fish (Fathead Minnow) consists of 324 chemicals (Netzeva, 2007) with four features namely, MW (molecular weight), WSol (water solubility), Kow (octanol-water partition coefficient), LC (lethal concentration). There are four categories of toxicity values, namely, no concern, harmful, toxic, and very toxic. Since the number of attributes is four, the number of input neurons is four. The number of output neurons is equal to the number of classes. In order to optimize this neural network, a total of 804 (4 attributes × 100 hidden neurons + 100 bias values + 100 hidden neurons × 4 classes + 4 bias values) values have to be tuned using evolutionary algorithm. We have used evolutionary algorithm with 100 chromosome with mutation rate = 0.1 and crossover rate = 0.8. Each chromosome consists of 804 real values which are in the domain [-25, 25]. The selection mechanism used is roulette wheel. A 50% reinsertion (elitism) is also applied to retain the best chromosomes from the previous population. The fitness function used is nothing but the classification rate of the neural network with weights given by the chromosome on the training data. The algorithm was allowed to run till 500 generations. The best chromosome in the 500th generation is used for evaluating the classification performance.

Evolutionary Neural Network for the Classification of Chemoinformatics Data Sets Table 5:

388

Prediction of Acute Toxicity of Phthalate Esters to Fish

Methods Neural Network with scaled conjugate gradient algorithm trained for 2000 epochs. Evolutionary Neural Network

Classification Rate 91.36% 92.59%

4. Conclusion A quick survey of machine learning techniques for Chemoinformatics is carried out in this paper. More over evolutionary neural network is for two data sets, namely, E.coli data set and QSAR features of Acute Toxicity of Phthalate Esters to Fish. The results of evolutionary neural network classifier are promising.

Acknowledgement The first author would like to thank Tirupur Kumaran College and Mother Teresa University for their valuable support. Dr. Rajesh Reghunadhan would like to thank Bharathiar University for the valuable support.

References [1] [2] [3] [4]

[5] [6] [7] [8] [9]

[10]

[11] [12] [13] [14]

M. Almstetter, et.al., 2001. “Molmind - An evolutionary drug discovery game”, Daylight user meeting, MUG 2001, Santa Fe, NM, USA Arulmozhi, V., Rajesh, R., 2011. “Chemoinformatics – A Quick Review”, International Conference on Network and Computer Science (ICNCS), Vol. 6., pp. 416-419 Bakken, G.A., Jurs, P.C., 2000. “Classification of Multidrug resistance reversal agents using structure based descriptors and linear discriminant analysis”, J. Med. Chem. 43, 4534-4541 Bawden, D., Wood, S.I., 1986. “Design, implementation and evalution of the CONSTRAST reaction retrieval system”, In modern approached to chemical reaction searching, Willett, P., Ed., Gower, Aldershot. Bishop, C., Svensen, M., Williams, C., 1997: “GTM: A principled alternative to the selforganizing map”, Advances in neural information processing systems, pp. 354-360 Brown, F.K., 1998. “Chemoinformatics – What is it and How does it Impact Drug Discovery”, Ann. Rep. Med. Chem. 33, pp. 375-384 Brown, F.K., 2005. “Editorial opinion: Chemoinformatics - a ten year update”, Current opinion in drug discovery and Development, 8(3), pp. 296-302 Brown, R.D., Martin, Y.C., 1997. “Designing combinatorial library mixtures using genetic algorithm”, J. Med. Chem., 40, 2304-2313 Brown, N., Mckay, B., Gilardoni, F., Gasteiger, J., 2004: “A graph based genetic algorithm and its application to the multiobjecctive evolution of median molecules”, J. Chem. Inf. Comput. Sci., 44, 1079-1087 Chen, B., Wild, D., Zhu, Q., Ding, Y., Dong, X., Sanakaranarayanan, M., Wang, H., Sun, Y., 2012. “Chem2bio2rdf: Alinked open data portal for chemical biology”, in Future of the web in Collaboratice Science (FWCS) . Choi, J.Y., Bae, S.-H., Qiu, J., Fox, G., Chen, B., Wild, D., 2010. “Browsing large scale cheminformatics data with dimension reduction”, HPDC 2010, pp. 503-506 Cortes, C., Vapnik, V.N., 1995. Support-Vector Networks, Machine Learning, 20 Dashtbozorgi, Z., Golmohammadi, H., Acree, W.E., 2012. “Prediction of gas to water solvation enthalpy of organic compounds using SVM”, Thermochimica Fisanick W. et. al., 1998. “Chemical Abstracts Service Information System”, Encycloped. Comput. Chem. 1., 277-315

387

V. Arulmozhi and Reghunadhan Rajesh

[15]

Gelernter, H., Miller, G.A., Larsen, D.L, Berndt, D.J., 1984. “Realization of a large expert problem-solving system: SYNCHEM2, a case study”, IEEE 1984 Proc. of the first conference AI applications Gelernter, H., Rose, J.R., Chen, C., 1990. “Building and refining a knowledge base for synthetic organic chemistry via the methodology of inductive and deductive machine learning”, J. chem. Inf. Comput. Sci. 30, 492-504 Ghasemi, G., Saaidpour, S., 2007. “Quantitative structure-property relationship study of noctanol-water partition coefficients of some diverse drugs using multiple linear regressions”, Anal. Chim. Acta 604, 99 Gillet, V.J., Willett, P., Bradshaw, J., 1998. “Identification of biological activity profiles using subsuctural analysis and genetic algorithms”, J. Chem. Inform. Comput. Sci., 38, 165-179 Glen, R.C., Payne, A.W.R., 1995. “A genetic algorithm for the automated generation of molecules within constraints”, J. Comput. Aided. Mol. Design, 9, 181-202 Golmohammadi, Z.D., 2010, “Quantitative structure-property relationship studies of gas-to-wet butyl acetate partition coefficient of some organic compounds using genetic algorithm and ANN”, Struct. Chem., 21, 1241-1252 Graul, R.C., Sadee, W., 2001. “Evolutionary relationships among G protein-coupled receptors using a clustered database approach”, AAPS Pharam. Sci., 3, E12 Hartwell, L.H., et.al., 1997. “Integrating genetic approaches into the discovery of anticancer drugs”, Science, 278, 1064-1068 Henneberg, D., Weimann, B., 1992. “MassLib, Evaluation of low resolution mass spectra series”, Max-Plank-Institut for Kohlenforschung, Mulheim/Ruhr, Version 7.2 Horton, P, Nakai, K, 1996. “A Probablistic Classification System for Predicting the Cellular Localization Sites of Proteins”, Intelligent Systems in Molecular Biology, 109-115 Horton, P, Nakai, K, 1997. “Better Prediction of protein cellular localization sites with the k nearest neighbours classifier”, Proceedings of ISMB, pp. 147-152 Irwin, J.J., Shoichet, B.K., 2005. “ZINC-a free database of commercially available compounds for virtual screening”, J. chem. inf. model. 45, 177-182 Johann, G., 2006. “The Central Role of Chemoinformatics”, Chemometrics and Intelligent Laboratory Systems. 82, pp. 200-209 Johnson, A.P., Cook, A.P., 1986. “Automated keyword generation for reaction searching”, In modern approached to chemical reaction searching, Willett P., Ed.: Gower, Aldershot Jones, G., Willett, P., Glen, R.C., Leach, A.R., Taylor, R., 1997. “Development and validation of a genetic algorithm for flexible docking”, Journal of Molecular Biology, 267, 727-748 Jones, G., Willett, P., Glen, R.C., Leach, A.R., Taylor, R., 1999. “Further development of a genetic algorithm for ligand docking and its application to screening combinatorial libraries”, Am. Chem. Soc. Sympos. Ser., 719, 271-291 Korb, O., Stuzle, T., Exner, T.E., 2007. “An ant colony optimization approach to flexible protein-ligand docking”, Swarm Intell. 1, 115-134 Kruskal, J.B., Wish, M., 1978. Multidimensional scaling,' Beverly Hills, CA, USA: Sage Kuhn, R., Karolchik, D., Zweig, A., Wang, T., Pohl, K., Pheasant, M., 2009. “The UCSC genome browser database: update 2009”, Nucleic acids research, vol. 37, pp. D755 Khayamian, T., Ensafi, A.A., Hemmateenejad, B., 1999. “Simultaneous spectrophotometric determinations of cobalt, nickel and copper using partial least squares based on SVD”, Talanta 49, 587 Konoz, E., Golmohammadi, H., 2008. “Prediction of air-to-blood partition coefficients of volatile organic compounds using genetic algorithm and artificial neural network”, Anal. Chim. Acta, Vol. 619, pp. 157 Krasnogor, N., Hart, W., Smith, J., Pelta, D., 1999. “Protein structure prediction problem with evolutionary algorithms”, Proc. of the Genetic and Evolutionary Computation Conference

[16]

[17]

[18] [19] [20]

[21] [22] [23] [24] [25] [26] [27] [28] [29] [30]

[31] [32] [33] [34]

[35]

[36]

Evolutionary Neural Network for the Classification of Chemoinformatics Data Sets [37] [38] [39] [40]

[41]

[42]

[43]

[44] [45] [46]

[47]

[48]

[49] [50] [51]

[52] [53] [54]

[55] [56] [57]

388

Leardi, R., Boggia, R., Terrile, M., 1992. “Genetic algorithm as a strategy for feature selection”, J. Chemom., Vol. 6, pp. 267-281 Leardi, R., Gonzalez, A.L., 1998. “Genetic algorithm applied to feature selection in PLS”, Chemometr. Intell. Lab. Syst., Vol. 41, pp. 195-207 Liu, Y., 2004. “Active learning with SVM applied to gene expression data for cancer classification”, J. Chem. Inf. Comput. Sci., 44, 1936-1941 Lloyd, G.R., Brereton, R.G., Faria, R., Duncan, J.C., 2007. “Learning vector quantization for multiclass classification: Application to characterization of plastics”, J. Chem. Inf. Model., 47, 1553-1563 Luke, B.T., 1994. “Evolutionary Programming Applied to the Development of Quantitative Structure - Activity relationships and Quantitative structure - property relationships”, J. Chem. Inf. Comput. Sci., 34, 1279-1287 Mahé, P., Ueda, N., Akutsu, T., Perret, J.-L., Vert, J.-P., 2005. “Graph kernels for molecular structure-activity relationship analysis with support vector machines”, Journal of Chemical Information and Modeling, vol.45(4), pp. 939-951. Mahé, P., Ralaivola, L., Stoven, V., Vert, J.-P., 2006. “The Pharmacophore Kernel for Virtual Screening with Support Vector Machines”, Journal of Chemical Information and Modeling, vol.46(5), pp. 2003-2014. Marrero-Ponce, Y., Romero, V., 2002. TOMOCOMD software, Central Unversity of Las Villas Meehan P., Schofield H., 2001. “CrossFire: a structural revolution for chemists”, Online Inf. Rev., vol. 25, pp. 241-249 Merkwirth, C., Mauser, H.A., Schulz-Gasch, T., Roche, O., Stahl, M., Lengauer, T., 2004. “Ensemble, methods for classification in chemoinformatics” J. Chemical Information and Computer Sciences, vol. 44, pp. 1971-1978 Modarresi, H., Modarress, H., Dearden, J.C., 2007. “QSPR model of Henry's law constant for a diverse set of organic chemicals based on genetic algorithm-radial basis function network approach”, Chemosphere, 66, 2067-2076 Morris, G.M., Goodsell, D.S., Halliday, R.S., Huey, R., Hart, W.E., Belew, R.K., 1998. “Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy function”, Journal of computational chemistry, 19, 1639-1662 Netzeva, T., Worth, A., 2007. “Classification of Phthalates According to Their (Q)SAR Predicted Acute Toxicity to Fish: A Case Study”, Technical report, EUR 22623 EN Novic, M., Vracko, M., 2003. “Nature-inspired methods in chemometrics: genetic algorithms and ANN”, in R. Leardi (Ed.), Data Handling in Science and Technology, Vol. 23, Elsevier. Otaki, J.M., Mori, A., Itoh, Y., Nakayama, T., Yamamoto, H., 2006. “Alignment Free Classification of G-Protein-Coupled Receptors using self-organizing maps”, J. Chem. Inf. Model, 46, 1479-1490 Parent, B., Kokosy, A., Horvath, D., 2006. “Optimized evolutionary strategies in conformational sampling”, Journal of Softcomputing Pearson, K., 1901. “On Lines and Planes of Closest Fit to Systems of Points in Space”, Philosophical Magazine 2 (6), 559–572. Rajesh, R., Kaimal, M.R., 2009. “GAVLCRG: Genetic algorithm with variable length chromosome-based rule generation scheme for fuzzy controllers”, Advances in Fuzzy Sets and Systems, vol. 4(1), pp. 33-66. Rajesh, R., Sureshkumar, T., 2008. “Simultaneous design of T-S fuzzy controllers and observers using GA”, International Journal of Soft Computing, vol. 3(4), pp. 315-320. Rajesh, R., Kaimal, M.R., 2007. “A Novel Method for the Design of Takagi-Sugeno Fuzzy Controllers with Stability Analysis using Genetic Algorithm”, Engineering Letters 14 (2). Ridley, D.D., 2000. “Strategies for chemical reaction searching in SciFinder”, J. Chem. Inf. Comput. Sci., 40, 1077-1084

387

V. Arulmozhi and Reghunadhan Rajesh

[58]

Rogers, D., Hopfinger, A.J., 1994. “Application of genetic function approximation of quantitative structure-activity relationships and quantitative structure - property relationships”, J. Chem. Inf. Comput. Sci., 34, 854-866 Roy, P.P., Roy, K., 2008. “On some aspects of variable selection for partial least squares regression models”, QSAR Comb. Sci., 27, 302-313 Rose, J.R., Gasteiger, J., 1989. “ISOLDE: A system for learning organic chemistry through Induction”, EKAW 1989 Rose, J.R., Gasteiger, J., 1994. “HORACE: An automatic system for the Hierarchical classification of chemical reactions”, J. Chem. Inf. Comput. Sci., 34, 74-90 Scsibrany, H., 1994. Handbook of TOSIM, Technische University, Wien Shen, H.B., Chou, K.C., 2005. “Predicting protien subnuclear location with optimized evidence-theoretic K-nearest classifier and pseudo amino acid composition”, Biochem Biophys Res Commun , 337, 752-756 Sheridan, R.P., Kearsley, S.K., 1995. “Using a genetic algorithm to suggest combinatorial libraries”, J. Chem. Inf. Comput. Sci., 35, 310-320 Sutter, J.M., Dixon, S.L., Jurs, P.C., 1995. “Automated descriptor selection for quantitative structure - activity relationship using generalized simulated annealing”, J. Chem. Inf. Comput. Sci., 35, 77-84 Tantar, A.-A, Melab, N., Talbi, E.-G., Parent, B., Horvath, D., 2007. “A parallel hybrid genetic algorithm for protien structure prediction on the computational grid”, Future Generation computer systems, 23, 398-409 Venkatraman, V., Dalby, A.R., Yang, Z.R., 2004. “Evaluation of mutual information and genetic programming for feature selection in QSAR”, J. Chemical Information and Computer Sciences, vol. 44, pp. 1686-1692 Weber, L., 2000. “Molecular diversity analysis and combinatorial library design”, Evolutionary Algorithms in Molecular Design (Clark D., ed.), Wiley-VCH, pp. 137-158 Weber, L., 2002. “Multi-component reactions and evolutionary chemistry”, Drug Discovery Today, Vol. 7, No. 2. Weininger, D., Weininger, A., Weininger, J.L., 1989. “Smiles - Algorithm for generation of unique smiles notation”, J. Chem. Inf. Comput. Sci., 29, 97-101 Wold, S., Jonsson, J., Sjostrom, M., Sandberg, M., Rannar, S., 1993. “DNA and peptide sequences and chemical processes, multivariately modeled by PCA and partial least squares projections to latent structures”, Anal. Chim. Acta, 277, 239-253 Xue, L., Bajorath, J., 2000. “Molecular descriptors for effective classification of bilogically active compounds based on PCA identified by a GA”, J. Chem. Inf. Comput. Sci., 40, 801-809 Xue, Y., Li, Z.R., Yap, C.W., Sun, L.Z., Chen, X., Chen, Y.Z., 2004. “Effect of molecular descriptor feature selection in SVM classification of Pharmacokinetic and Toxicological properties of chemical agents”, J. Chem. Inf. Comput. Sci., 44, 1630-1638 Xue, L., Godden, J., Gao, H., Bajorath, J., 1999. “Identification of a preferred set of molecular descriptors for compound classification based on PCA”, J. Chem. Inf. Comput. Sci. 39, 699-704 Yan, A., Wang, Z., Cai, Z., 2008. “Prediction of Human Intestinal Absorption by GA feature selection and SVM regression”, Int. J. Mol. Sci, 9, 1961-1976 Zhang, G., Li, H., Fang, B., 2009. “Discriminating acidic & alkaline enzymes using RFM with secondary structure amino acid composition”, Process Biochemistry 44, 654-660 ADRIANA.Code. http://www.molecular-networks.com Cerius. http://www.accelrys.com MDL Information Systems Inc., http://www.mdli.com Statistica, Statsoft Inc. http://www2.chemie.uni-erlangen.de/software/corina/free-struct.html http://www2.chemie.uni-erlangen.de/services/ncidb2/index.html http://www2.chemie.uni-erlangen.de/software/petra/index.html

[59] [60] [61] [62] [63]

[64] [65]

[66]

[67]

[68] [69] [70] [71]

[72] [73]

[74] [75] [76] [77] [78] [79] [80] [81] [82] [83]

Evolutionary Neural Network for the Classification of Chemoinformatics Data Sets [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96]

388

http://www2.chemie.uni-erlangen.de/software/wodca/index5.html http://www.discover.nci.nih.com Molecular Simulations Inc. on the World Wide Web, URL. http://www.msi.com Spotfire Inc. on the WWW. htp://www.spotfire.com Partek Inc. on the WWW. http://www.partek.com Oxford Molecular Group on the WWW. http://www.oxmol.co.uk Virtual Computational Chemistry Laboratory, http://146.107.217.178/lab/edragon/index.html www.vcclab.org www.chemnavigator.com www.ensembl.org http://www.organic-chemistry.org/prog/peo http://preadme.bmdrc.org/preadme www.syrres.com

Suggest Documents