Graph representation of molecular datasets: applications to dataset ...

1 downloads 0 Views 307KB Size Report
Jun 5, 2009 - mensional chemistry spaces is a great challenge in chem- informatics. To this end ... W. Jeffery Hurst, The Hershey Company. available free of ...
Chemistry Central Journal Open Access

Oral presentation

Graph representation of molecular datasets: applications to dataset visualization and comparison using graph indices A Tropsha* and D Fourches Address: Laboratory for Molecular Modeling, School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA * Corresponding author

from 4th German Conference on Chemoinformatics Goslar, Germany. 9–11 November 2008 Published: 5 June 2009 Chemistry Central Journal 2009, 3(Suppl 1):O9

doi:10.1186/1752-153X-3-S1-O9

4th German Conference on Chemoinformatics: 22. CIC-Workshop

Frank Oellien Meeting abstracts – A single PDF containing all abstracts in this Supplement is available here. http://www.biomedcentral.com/content/pdf/1752-153X-3-S1-info.pdf

This abstract is available from: http://www.journal.chemistrycentral.com/content/3/S1/O9 © 2009 Tropsha and Fourches; licensee BioMed Central Ltd.

The visualization of large chemical datasets in multidimensional chemistry spaces is a great challenge in cheminformatics. To this end, we have developed a novel approach using graph representations for ensembles of molecules-points defined by their coordinates in high dimensional descriptor space. The Advanced Dataset Graph (ADG) consists of an ensemble of vertices (corresponding to molecules) connected by edges; two vertices are connected by an edge if the Euclidean distance between the vertex-molecules in the original descriptor space is within a user-defined cut-off. The ADDAGRA program has been implemented to build and visualize ADGs in three-dimensional graphic user interface. The uniqueness of this data representation is in that the points are projected onto 3D space using conventional Principal Component Analysis; however the projected vertices are connected by edges only if they are defined as neighbours in the original high dimensional space. Thus, unlike all other data projection approaches the ADG visualizes compound clusters in the original descriptor space exactly. In addition to the visualization, we have also implemented several simple graph indices for quantitative description and comparisons of ADGs. Three case studies involving (i) 101 AmpC beta-lactamase inhibitors, (ii) 2029 organic compounds with their measured intrinsic aqueous solubility, and (iii) 1093 chemical toxicants (see Figure 1) tested against Tetrahymena Pyriformis, have been analyzed with our ADG approach. Results suggest that some graph indices such as the average vertex degree or Randic connectivity index have the ability to discriminate similar vs. dissimilar pairs of datasets and address several other common issues in cheminformatics such as detection of

Figure Advanced sets, and compounds two including 1external Dataset respectively thesets training Graphs TS, involving set builtTR for(red 339 the –(black) "aquatic 644 compounds), and toxicity 110 (blue) dataAdvanced Dataset Graphs built for the "aquatic toxicity datasets, including the training set TR (red – 644 compounds), and two external sets TS, involving 339 (black) and 110 (blue) compounds respectively.

Page 1 of 2 (page number not for citation purposes)

Chemistry Central Journal 2009, 3(Suppl 1):O9

http://www.journal.chemistrycentral.com/content/3/S1/O9

outliers, shared regions in chemical and property space, etc. We suggest that ADGs could find a broad use for many cheminformatics applications.

Publish with ChemistryCentral and every scientist can read your work free of charge Open access provides opportunities to our colleagues in other parts of the globe, by allowing anyone to view the content free of charge. W. Jeffery Hurst, The Hershey Company. available free of charge to the entire scientific community peer reviewed and published immediately upon acceptance cited in PubMed and archived on PubMed Central yours you keep the copyright Submit your manuscript here: http://www.chemistrycentral.com/manuscript/

Page 2 of 2 (page number not for citation purposes)