This is an open access article published under a Creative Commons Attribution (CC-BY) License, which permits unrestricted use, distribution and reproduction in any medium, provided the author and source are cited.
Perspective pubs.acs.org/crt
ToxCast Chemical Landscape: Paving the Road to 21st Century Toxicology Ann M. Richard,*,† Richard S. Judson,† Keith A. Houck,† Christopher M. Grulke,† Patra Volarath,‡ Inthirany Thillainadarajah,§ Chihae Yang,∥,⊥ James Rathman,⊥,# Matthew T. Martin,† John F. Wambaugh,† Thomas B. Knudsen,† Jayaram Kancherla,∇ Kamel Mansouri,∇ Grace Patlewicz,† Antony J. Williams,† Stephen B. Little,† Kevin M. Crofton,† and Russell S. Thomas† †
National Center for Computational Toxicology, Office of Research & Development, U.S. Environmental Protection Agency, Mail Code B205-01, Research Triangle Park, Durham, North Carolina 27711, United States ‡ Center for Food Safety and Nutrition, U.S. Food and Drug Administration, 5100 Paint Branch Parkway, College Park, Maryland 20740, United States § Senior Environmental Employment Program, U.S. Environmental Protection Agency, Research Triangle Park, Durham, North Carolina 27711, United States ∥ Molecular Networks GmbH, Henkestraße 91, 91052 Erlangen, Germany ⊥ Altamira, LLC, 1455 Candlewood Drive, Columbus, Ohio 43235, United States # Department of Chemical and Biomolecular Engineering, The Ohio State University, 151 W. Woodruff Avenue, Columbus, Ohio 43210, United States ∇ ORISE Fellow, U.S. Environmental Protection Agency, Research Triangle Park, Durham, North Carolina 27711, United States ABSTRACT: The U.S. Environmental Protection Agency’s (EPA) ToxCast program is testing a large library of Agency-relevant chemicals using in vitro high-throughput screening (HTS) approaches to support the development of improved toxicity prediction models. Launched in 2007, Phase I of the program screened 310 chemicals, mostly pesticides, across hundreds of ToxCast assay end points. In Phase II, the ToxCast library was expanded to 1878 chemicals, culminating in the public release of screening data at the end of 2013. Subsequent expansion in Phase III has resulted in more than 3800 chemicals actively undergoing ToxCast screening, 96% of which are also being screened in the multi-Agency Tox21 project. The chemical library unpinning these efforts plays a central role in defining the scope and potential application of ToxCast HTS results. The history of the phased construction of EPA’s ToxCast library is reviewed, followed by a survey of the library contents from several different vantage points. CAS Registry Numbers are used to assess ToxCast library coverage of important toxicity, regulatory, and exposure inventories. Structure-based representations of ToxCast chemicals are then used to compute physicochemical properties, substructural features, and structural alerts for toxicity and biotransformation. Cheminformatics approaches using these varied representations are applied to defining the boundaries of HTS testability, evaluating chemical diversity, and comparing the ToxCast library to potential target application inventories, such as used in EPA’s Endocrine Disruption Screening Program (EDSP). Through several examples, the ToxCast chemical library is demonstrated to provide comprehensive coverage of the knowledge domains and target inventories of potential interest to EPA. Furthermore, the varied representations and approaches presented here define local chemistry domains potentially worthy of further investigation (e.g., not currently covered in the testing library or defined by toxicity “alerts”) to strategically support data mining and predictive toxicology modeling moving forward.
■
CONTENTS
1. Introduction 2. Building the ToxCast Chemical Library 2.1. Chemical Selection 2.1.1. ToxCast Phase I 2.1.2. ToxCast Phase II 2.1.3. ToxCast Phase III 2.1.4. Chemical Inventory and Assay Coverage © XXXX American Chemical Society
2.2. Transitioning from a Chemical List to Chemical Samples, Structures, and Features 2.2.1. Physical Sample Library 2.2.2. Quality Considerations 3. ToxCast Chemical Library Contents
B C C C D D E
E E F I
Received: April 22, 2016
A
DOI: 10.1021/acs.chemrestox.6b00135 Chem. Res. Toxicol. XXXX, XXX, XXX−XXX
Chemical Research in Toxicology 3.1. Methods 3.1.1. ACToR and DSSTox CASRN Lists 3.1.2. DSSTox ToxCast Structure Inventory (TOXCST) 3.1.3. Calculated Physicochemical Properties and Chemical Descriptors 3.1.4. Structure-Handling Software and Graphics 3.1.5. ToxPrint Chemotypes 3.1.6. Collaborative Estrogen Receptor Activity Prediction Project Structure Inventory (CERAPP) 3.1.7. FDA Marketed (and Discontinued) Drugs Structure Inventory (FDA_Drugs) 3.1.8. Benchmark Dose Calculations for Human Health Assessment (BMDHHA) Structure Inventory 3.1.9. OECD Toolbox 3.1.10. Derek Nexus and Meteor (Lhasa Ltd., Leeds, UK) 3.2. DSSTox TOXCST and CASRN List Overlaps 3.3. Structure-Based Inventory Profiling 3.3.1. DMSO Insolubles and Volatiles 3.3.2. Global and Local Inventory Profiling Based on Chemical Features 3.3.3. Global Physicochemical Property Inventory Comparisons 3.3.4. “Nearest Neighbor” Similarity Inventory Profiling 3.4. SAR and Knowledge-Based Expert Systems 3.4.1. OECD Toolbox Profiling: HESS and DART 3.4.2. Derek Nexus and Meteor (Lhasa Ltd., Leeds, UK) 4. Conclusions Author Information Corresponding Author Funding Notes Biographies Acknowledgments Dedication Abbreviations References
Perspective
technologies.3,4 At the time, 310 compounds was considered a large library relative to traditional toxicity testing data sets, and the Phase I results, with their associated in vivo toxicity guideline data, were to serve as proof-of-concept for the development of new approaches to modeling toxicity. Phase I testing officially concluded with the January 2010 ToxCast data release. Phase II of EPA’s ToxCast program expanded the size and diversity of the chemical library undergoing ToxCast screening to 1878 chemicals and officially concluded with the December 2013 ToxCast data release. (Current and past ToxCast data releases can be downloaded from https://www. epa.gov/chemical-research/toxicity-forecasting). At the time of this writing, the ToxCast program is well into Phase III, and EPA’s chemical library actively undergoing ToxCast screening (old chemicals in new assays, new chemicals in old and new assays) has expanded to more than 3800 unique chemicals. Accompanying the three phases of library expansion (I, II, and III), both the numbers and types of ToxCast assay end points (per chemical), as well as the data analysis pipeline used to process ToxCast assay results, have undergone regular updates to the present. These changes are the result of the evolution of assay technologies and changing EPA program priorities, and are reflected in periodically updated ToxCast data releases. The EPA also has contributed a large chemical library, currently exceeding 4000 chemicals, to the multi-Agency Tox21 testing program.5,6 EPA’s initial Tox21 library was built in parallel to the expansion of the ToxCast chemical library undertaken in Phase II, and EPA’s Tox21 library has evolved in tandem with Phase III to maintain substantial (96%) inclusion of the ToxCast chemical library moving forward. The full Tox21 library, in turn, comprises approximately equal sized contributions from the EPA, the National Toxicology Program (NTP), and the National Center for Advancing Translational Sciences (NCATS), and currently exceeds 9000 unique substances. Given the close alignment of EPA’s ToxCast and Tox21 testing programs and the substantial overlap of the two chemical libraries undergoing testing, the term “ToxCast chemical library” will henceforth refer to EPA’s complete past and present testing inventory and will include EPA’s full contribution to the Tox21 library. The ToxCast program has produced several published reports and is widely viewed as successful in demonstrating the feasibility of an HTS program applied to screening of environmental chemicals and in promoting the development of biologically based approaches to toxicity screening along the lines articulated in the original NRC Report.4,7−15 However, application of HTS approaches to toxicity screening continues to face significant research challenges, not least of which is developing the means to manage, process, and extract insights from the large, complex data sets being produced.4,16−19 In addition, there is growing recognition that the challenge of reliably predicting downstream biological responses of toxicological relevance will require integration of many different types of data, knowledge, and informational resources.4,14 From a biological perspective, a better understanding of the implications of chemical−target interactions and cellular effects in the context of biological processes and adverse outcome pathways is needed.20,21 Equally important is gaining a better understanding of the technological limitations and sources of noise in the various HTS assay platforms, while identifying or inferring missing elements such as metabolism, pharmacokinetic, and species-specificity.14,19,22 Each of these consider-
J J J J J J
J K
K K K K N N P R S T T V W W W W W X Y Y Y Z
1. INTRODUCTION The ToxCast program within the U.S. Environmental Protection Agency (EPA) employs high-throughput in vitro assays to efficiently screen large numbers of chemicals to support the development of improved toxicity prediction models, particularly to be applied to environmental chemicals for which limited or no in vivo animal toxicity data are available. ToxCast began as an exploratory pilot program within the EPA’s newly formed National Center for Computational Toxicology (NCCT) in 2007, the same year that the National Research Council (NRC) published the paradigm-changing report entitled “Toxicity Testing in the 21st Century”.1,2 Since that time, EPA’s ToxCast program has significantly expanded coverage of both chemicals and assays in a series of testing phases extending to the present. In ToxCast Phase I, a set of 310 mostly pesticidal chemicals were tested across a panel of several hundred commercially available, medium and highthroughput screening (HTS) assays spanning nine diverse B
DOI: 10.1021/acs.chemrestox.6b00135 Chem. Res. Toxicol. XXXX, XXX, XXX−XXX
Chemical Research in Toxicology
Perspective
chemical library, to characterize and provide representations of the library that can aid in future investigations, and to offer metrics to assess the degree to which the current library is fit for the purpose and vision of the ToxCast project.
ations plays an important role in validating the application of HTS results in toxicity prediction models.23,24 Of central importance to the interpretation and application of HTS data is the nature of the increasingly diverse chemical library fueling these efforts (e.g., detailed composition, physicochemical properties, and analytical data quality) and how to best apply this foundation of chemical information to meeting the overall ToxCast program objectives. The ToxCast chemical library effectively functions as a probe set of biological activity space, with each chemical generating a bioactivity profile across the battery of ToxCast assays in which it is screened. As a result, the chemical library determines not only the scope of chemical space tested but also the coverage of chemical interaction mechanisms, toxicity pathways, adaptive responses, and mode(s) of action potentially leading to one or more adverse outcomes. Hence, success in meeting the broader objectives of the ToxCast (and affiliated Tox21) program will depend, in part, on the ability of the ToxCast chemical library to adequately capture existing chemical-toxicity knowledge, as well as to span potential toxicity mechanisms across the landscape of the thousands of environmental chemicals for which little to no data currently exist.25,26 Given its central role in determining the scope of ToxCast bioactivity profile coverage and information pertaining to toxicity mechanisms, the remainder of this perspective focuses on the historical construction and detailed composition of the current ToxCast chemical library. In section 2, an overview and historical account of the inputs and selection criteria that have exerted major influence on the content of the ToxCast chemical library as it has expanded across testing phases are presented. In addition, since EPA’s ToxCast inventory is a physical HTS library comprising thousands of mostly commercially procured chemicals, practical constraints that have influenced library content as well as quality control measures that have been instituted to ensure data integrity are described. In section 3, the cumulative results of these efforts, i.e., the chemical contents of the current ToxCast library, are considered. As one metric of success in achieving the goals set out for its construction, Chemical Abstracts Service Registry number (CASRN) overlaps of the ToxCast library with various other chemical inventories of regulatory interest are considered. This enables direct assessment of coverage, as well as enrichment of toxicity data-rich chemicals, chemicals spanning major usecategories, and chemicals with exposure potential across testing phases. Subsequently, cheminformatics and structure-based approaches are employed to compute generalized chemical features and properties that allow for assessment of the relative diversity and coverage of the ToxCast chemical structure library in comparison to larger or more specific chemical domains of regulatory interest. Finally, cheminformatics approaches are used to assess ToxCast chemical library coverage of putative reactivity and toxicity mechanism space as captured by structural alerting features for toxicity and biotransformation, such as are available within commercially and publicly available knowledge-based expert systems. Details of the ToxCast assay technologies and HTS results to date are presented elsewhere and will be the subject of future, more in-depth cheminformatics studies.4,9,12,13 In addition, a survey of the full Tox21 chemical library and details of its construction and analytical chemistry evaluation in the context of the Tox21 cross-Federal agency project will be described in a future publication. The main objectives herein are to provide a historical account of EPA’s efforts to construct the ToxCast
2. BUILDING THE TOXCAST CHEMICAL LIBRARY EPA’s ToxCast chemical library has been constructed in several phases, mirroring the testing phases and temporal evolution of EPA’s ToxCast and Tox21 programs. Each iterative build of the library, in turn, has been circumscribed by practical constraints, including chemical commercial availability, dimethyl sulfoxide (DMSO) solubility, and suitability for testing in automated or semiautomated systems. Operating within these constraints, there were three major, interrelated drivers for chemical selection: (1) availability of animal toxicity data or mechanistic knowledge; (2) exposure potential; and (3) EPA regulatory interest. The first driver would provide the necessary in vivo and mechanistic data to anchor and validate subsequent prediction modeling efforts, whereas the latter two were intended to provide coverage of the chemical landscape (or chemical space) to which humans and ecosystems are potentially exposed and for which toxicity data are mostly lacking. Hence, from the start, the ToxCast library was designed to serve dual purposes, i.e., to provide sufficient coverage of known toxicity data space in the early testing phases to support development and validation of models for predicting toxicity, as well as to incorporate a broad range of chemicals of heightened regulatory concern, for which data are lacking, and the application of such models to fill data gaps is most urgently needed. 2.1. Chemical Selection. 2.1.1. ToxCast Phase I. Phase I of EPA’s ToxCast program screened a total of 310 unique structures, the majority of which were pesticides accompanied by guideline in vivo toxicity studies conducted as part of EPA’s pesticide registration process.4 These compounds were accompanied by a rich complement of previously unpublished data for subchronic, chronic, cancer, multigenerational reproductive, and developmental end points across several species (rat, mouse, and rabbit). Data were extracted from EPA’s Data Evaluation Reports and systematically captured into a newly developed toxicity reference database - ToxRefDB.27−29 The pesticides included in Phase I testing spanned diverse chemical structures, as well as a broad range of known pesticidal mechanisms. Hence, the chemicals not only offered significant toxicity data coverage (approximately 275 compounds had near complete guideline data coverage in ToxRefDB) but captured a variety of chemical reactivity features and mechanistic diversity. Additionally, the Phase I testing inventory included 30 nonpesticidal environmental chemicals of research or regulatory interest: perfluorinated compounds such as perfluorooctanoic acid (PFOA), a surfactant used in fluoropolymers such as Teflon, and perfluorooctanesulfonic acid (PFOS), used in the semiconductor industry, ScotchGuard formulations, and flame retardant foams; bisphenol A (BPA) and a set of phthalate alternatives used as plasticizers; and 10 toxicologically active metabolites of included phthalates and pesticides, such as monobutyl phthalate (metabolite of dibutyl phthalate) and diazoxon (metabolite of diazinon).9 At the completion of Phase I testing, chemical stocks were largely depleted. Hence, the majority of the chemicals were reprocured from various commercial sources to undergo screening in newly acquired assays for ToxCast Phase II and in the then recently launched Tox21 program. Excluded from the reprocured test set were 14 sulfurons determined to C
DOI: 10.1021/acs.chemrestox.6b00135 Chem. Res. Toxicol. XXXX, XXX, XXX−XXX
Chemical Research in Toxicology
Perspective
apolis, IN; Astellas, Northbrook, IL; GlaxoSmithKline, Philadelphia, PA). The donation of failed drug compounds (denoted “donated_pharma”), along with preclinical and, in some cases, clinical data, represented a breakthrough in open collaboration between a government research laboratory and the pharmaceutical industry. It also brought to the ToxCast program a set of chemicals designed to be bioactive at selected human (or veterinary use) targets and accompanied by data indicating why these drugs were deemed potentially or actually toxic to humans. A complete listing of the donated pharmaceuticals is provided within the current ToxCast chemical inventory file (DSSTox_TOXCST, date-stamped 20160129), available from the EPA DSSTox Data download page (see ftp://ftp.epa.gov/dsstoxftp). All donated data on these failed pharmaceuticals that were deemed compatible with the ToxRefDB schema have since been incorporated into ToxRefDB. A current version of ToxRefDB is available for download at https://www.epa.gov/chemical-research/toxicityforecaster-toxcasttm-data. Shortly after ToxCast Phase II testing commenced, the chemical inventory was further expanded to include a newly constructed “e1k’ chemical inventory. The latter consisted of 799 unique compounds, containing many known estrogen receptor (ER) and androgen receptor (AR) active reference chemicals, that were to be screened in a limited subset of Phase II endocrine-related assays. These e1k chemicals were also incorporated into EPA’s Tox21 library (tox21_epa_v1) prior to the initiation of Tox21 testing. (Note: The term e1k originally referred to the set of 1000 total samples that included 201 additional compounds that were either replicates or rescreened ph2 chemicals.) Figure 1 summarizes the numbers of unique chemicals and assay end points associated with the various EPA inventories through to the conclusion of Phase II testing. ToxCast Phase II encompassed screening of the newly added ph2 inventory in the majority of the original Phase I ToxCast assays, as well as testing of the ph2 and reprocured ph1_v2 inventory in newly acquired ToxCast and Tox21 assays. Phase II testing formally concluded with the public release of ToxCast assay results for the complete set of 1863 unique ToxCast chemicals screened to that point (1878 if discontinued ph1_v1 chemicals are included in the total), including data for the ph1_v1, ph1_v2, ph2, and e1k chemical sets (https://www.epa.gov/chemical-research/ toxicity-forecaster-toxcasttm-data). (Note: EPA-processed results generated for the endocrine-related subset of Tox21 assays were included in EPA’s ToxCast Phase II data release for the full Tox21 inventory, tox21_epa_v1.) 2.1.3. ToxCast Phase III. The ToxCast program entered into Phase III testing in late 2014, with new assay technologies and end points added to the current ToxCast battery of assays for testing some or all of the original ToxCast chemical inventories. A set of chemicals in EPA’s original Tox21 and e1k inventories are being run in a broader range of ToxCast assays. In addition, Phase III extended ToxCast and Tox21 testing to approximately 500 newly procured chemicals and a small set of donated mixtures and EPA water samples, which have been incorporated into the ToxCast library. The newly procured chemicals expand coverage of chemicals of current regulatory concern, such as flame retardants and chemicals of interest to EPA’s Endocrine Disruption Screening Program (EDSP) for the 21st Century (EDSP21) (https://www.epa.gov/endocrinedisruption/endocrine-disruptor-screening-program-21stcentury-edsp21-workplan-summary). The majority of the newly
undergo rapid hydrolysis in DMSO, three reprocured compounds deemed insufficiently soluble in DMSO, and a chemical originally procured in parent form that was reprocured in a different salt form. Henceforth, the original chemical inventory tested in ToxCast Phase I is referred to as “ph1_v1”, whereas the slightly smaller, reprocured inventory of 293 unique chemicals that moved to expanded Phase II testing is referred to as “ph1_v2”. A listing of the ToxCast ph1_v1 and ph1_v2 chemical inventories, with modified or discontinued compounds labeled and problems annotated, is provided in the ToxCast chemical inventory file (DSSTox_TOXCST, datestamped 20160129), available from the EPA DSSTox Data download page (see ftp://ftp.epa.gov/dsstoxftp). (Note: The temporal, programmatic testing phases of ToxCast are, henceforth, denoted with Roman numerals, as in Phases I, II, III, whereas the nonoverlapping ToxCast chemical subinventories are denoted with lower case alphanumeric indices, as in ph1_v1, ph1_v2, etc. In addition, the terms “chemical library” and “chemical inventory” will be used interchangeably to refer to a list of unique chemical substances.) 2.1.2. ToxCast Phase II. Despite its large size in comparison to traditional toxicity studies, it was recognized early on that the initial chemical library (ph1_v1) screened in ToxCast Phase I provided limited coverage of the broad range of possible bioactivity and toxicity mechanisms of potential concern. Thus, a larger, more diverse chemical library would be needed. Moving into Phase II, the ToxCast program objectives were broadened to include validation and expansion of initial toxicity models and predictive signatures, as well as generation of HTS profiles for data-poor chemicals of research and regulatory interest to EPA. A primary means by which these broader goals were to be achieved was by expanding the chemical landscape undergoing testing.4 The various stages of chemical nomination, selection, and procurement prior to commencing Phase II testing would significantly draw from EPA’s Distributed Structure-Searchable Toxicity (DSSTox) database and the newly constructed Aggregated Computational Toxicology Resource (ACToR) database.26,30,31 In addition, the effort would engage multiple EPA Program Offices and researchers, as well as a wide range of environmental, academic, industry (both chemical and pharmaceutical), and nonprofit organizations. ToxCast library expansion under Phase II would span a 2-year period and take place in parallel to the construction of EPA’s initial Tox21 chemical inventory (denoted “tox21_epa_v1”). A newly procured set of 768 unique chemicals, which comprised the highest priority subset of the larger EPA Tox21 library under construction at the time, was the first set of new chemicals to be entered into Phase II testing. This subinventory is nonoverlapping with the reprocured ph1_v2 inventory and is denoted as the “ph2” chemical inventory. The majority of chemicals added to EPA’s library during Phase II were procured by EPA Contract services utilizing a broad range of commercial suppliers. In addition, physical samples of approximately 150 compounds were donated by outside ToxCast collaborators. A small number of chemical samples were donated by the chemical industry and the U.S. Food and Drug Administration (FDA)’s National Center for Toxicological Research (“green” plasticizer alternatives and reference liver toxicants, respectively). The remaining set of 136 “failed pharma” compounds (i.e., drug candidates discontinued due to toxicity in preclinical or clinical trials) were donated by five major pharmaceutical companies (Merck, Kenilworth, NJ; Pfizer, New York City, NY; Roche, IndianD
DOI: 10.1021/acs.chemrestox.6b00135 Chem. Res. Toxicol. XXXX, XXX, XXX−XXX
Chemical Research in Toxicology
Perspective
subset of e1k and Tox21 chemicals currently underway. Gaps in assay coverage are apparent for the small set of Phase I (ph1_v1) chemicals that were dropped prior to Tox21 testing and for a relatively small portion of Tox21 chemicals that were deemed of lower priority and not included in ToxCast testing inventories. The largest gap in the Tox21 assay coverage seen for ph3 chemicals is due to the recent addition of a plate of 352 ph3 chemicals to the tox21_epa_v2 library, for which assay data are not yet available. Finally, it is apparent from Figure 2 that the total number of assay end points varies on a per chemical basis within and across testing phases, in part due to the design of the experiments and changing programmatic testing priorities within EPA. Figures 1 and 2 are intended to orient the reader to the distinctions and relationships among the various ToxCast testing phases and chemical inventories, conveying nomenclature that will be used throughout the remainder of this perspective and in subsequent publications. Additionally, these figures are intended to underscore the primary purpose of the ToxCast chemical library, which is to generate HTS chemicalactivity data profiles for use in toxicity screening and building predictive models. Consistent with the open, transparent philosophy of the ToxCast program, the current ToxCast chemical-assay results files (including EPA’s processed Tox21 results for the full Tox21 library tested to the present) are publically available from the ToxCast December 2014 Data Release Download Page (see “ToxCast Chemicals DSSTox”, chemicalfiles.zip at https://www3.epa.gov/research/ COMPTOX/previously_published.html). (Note: A small set of nanomaterials was run in a limited number of ToxCast assays but is not considered part of the official ToxCast chemical library; the description of these materials and results will be the subject of a future publication.) 2.2. Transitioning from a Chemical List to Chemical Samples, Structures, and Features. Before shifting focus to the chemical contents of EPA’s ToxCast library, the process of transitioning from a large list of CASRN nominations to the ToxCast HTS library of actual physical samples will be reviewed. In addition, the quality review steps instituted for ensuring accurate chemical identification, characterization, and annotations of tested samples, and, hence, the process by which the physical sample library is assigned to accurate structure representations suitable for cheminformatics investigations will be summarized. 2.2.1. Physical Sample Library. Figure 3 illustrates the main steps involved in the construction of EPA’s expanded Phase II (including ph2 and e1k) and initial Tox21 (tox21_epa_v1) inventories, which represent the largest contributions to EPA’s physical sample library, to date. As previously mentioned, original nominations primarily consisted of chemical lists aggregated from an early version of EPA’s ACToR database (which had incorporated published DSSTox inventories). Scores of EPA and non-EPA chemical inventories pertaining to commercial use, having environmental occurrence data, and/ or of regulatory or toxicological concern were collected from public sources and cross-indexed by CASRN.26,31 The final nomination list consisted of approximately 19,000 unique CASRNs. This list was cross-referenced against public sources, primarily DSSTox and PubChem (https://pubchem.ncbi.nlm. nih.gov), to retrieve chemical structures for the majority of substances. Where structures were available, molecular weight (MW) and a set of physicochemical properties, computed using EPA’s EPI Suite property prediction software (https://www.
Figure 1. Counts of unique chemicals and associated assay end points for chemical sets through to completion of ToxCast Phase II testing (as of December 2013 and using the assay end point definitions at that time). The total number of ToxCast and Tox21 assay end points is derived from a mapping of distinct assay technologies, assay components, and end point IDs; each end point is associated with hit calls and IC50 values for the majority of chemicals in publicly released data files. The reprocured ph1_v2 library was run in newly acquired ToxCast and Tox21 assays not included in Phase I, generating over 200 new assay end points, approximately 80 of which were Tox21 end points. The ph2 library was run in the original 700 Phase I assay end points and the 200 newly added assay end points in Phase II, whereas approximately half of the 50 e1k endocrinerelated assay end points were from ToxCast vendors, the other half were from Tox21. Approximately 80 Tox21 assay end points were generated at the intramural Tox21 HTS robotics facility at NCATS; results for endocrine-related assays were included in the ToxCast Phase II public data release (Dec, 2013).
procured chemicals have also been incorporated into EPA’s expanded Tox21 testing library to maintain near-complete coverage of Tox21 assay screening across EPA’s ToxCast chemical library moving forward. The total combined set of 1921 unique chemicals comprising this latest expansion of the ToxCast chemical library in Phase III is referred to as the “ph3” chemical inventory and is nonoverlapping with the earlier chemical inventories having full or partial testing coverage in Phases I and II (ph1_v1, ph1_v2, ph2, and e1k). Each of these nonoverlapping ToxCast subinventories, as well as the testing phase in which the chemical was screened (I, II, and/or III) is labeled within the ToxCast chemical inventory file (DSSTox_TOXCST, date-stamped 20160129), available from the EPA DSSTox Data download page (see ftp://ftp.epa.gov/dsstoxftp). 2.1.4. Chemical Inventory and Assay Coverage. Figure 2 provides a high-level overview of the ToxCast testing program from a chemical perspective through Phase III and is illustrative of some important features and trends. The figure plots chemical-assay coverage from Tox21 and the 5 ToxCast assay providers who have generated the largest number of assay end points to date. Total counts of assay end points per chemical are shown, with chemicals sorted by ToxCast testing phase and chemical inventory. Higher density assay coverage is evident in the initial phases of ToxCast testing, across ph1 and ph2 inventories, dropping off due to more limited Phase II testing of e1k chemicals, with expanded Phase III testing of ph3 and a E
DOI: 10.1021/acs.chemrestox.6b00135 Chem. Res. Toxicol. XXXX, XXX, XXX−XXX
Chemical Research in Toxicology
Perspective
Figure 2. Plot of EPA ToxCast chemicals by the total number of assay end points per chemical for the top five ToxCast assay end point providers and Tox21 (as of January 2016), sorted by ToxCast Testing Phases (I, II, and III), and ToxCast chemical subinventories (ph1_v1, ph1_v2, ph2, e1k, and ph3). Assay end points and corresponding totals were computed using the most recent ToxCast data analysis pipeline as of January, 2016; since not all ToxCast assay providers are represented in this figure, the actual assay end point totals per chemical are larger than those shown here. Five ToxCast assay providers delivering the largest numbers of assay end points evaluated per chemical are shown: CLD = CellzDirect (ThermoFisher Scientific, Staley Road Grand Island, NY), BSK = BioSeek (South San Francisco, CA), APR = Apredica (Cyprotex, Watertown, MA), NVS = NovaScreen Biosciences (PerkinElmer, Waltham MA), and ATG = Attagene (Morrisville, NC). A full listing of ToxCast assay providers can be found on the ToxCast Web site: https://www.epa.gov/chemical-research/toxicity-forecaster-toxcasttm-data. The largest gap in the TOX21 assay coverage corresponds to a set of 352 newly procured ph3 chemicals added to the tox21_epa_v2 inventory and currently undergoing testing; additional gaps correspond to chemicals that were either dropped from testing after Phase I or are lower priority ph3 chemicals not moved to TOX21 testing; a total of 148 (4226−4078) chemicals included in tox21_epa_v2 were not tested in ToxCast Phases I, II, or III. ToxCast Phase I corresponds to the ph1_v1 chemical inventory and is the only testing phase that includes the CLD assay provider end points; Phase II testing spanned the ph1_v2, ph2, and e1k chemical inventories, with partial testing of e1k; Phase III is extending testing to the ph3 inventory and expanding testing for a portion of the e1k inventory (136 chemicals). Chemical inventories consist of unique, nonoverlapping sets of chemicals (with the exception of the ph1_v1 and ph1_v2 inventories, which are largely overlapping), with total chemical counts indicated in parentheses; vertical dashed lines and counts indicate total number of chemicals tested prior to the addition of a new chemical inventory. The total number of chemicals (4226) includes a small set of reference and discontinued chemicals (6) only tested in ATG and not otherwise included in the main testing inventories.
concentration of 20 mM (with later exceptions made for a small number of high priority chemicals that were soluble at lower concentrations, 5−10 mM). Another 3% of the chemicals were excluded due to volatility and reactivity concerns (based on calculated properties, observations, and documentation). Following the initial procurements, donated chemicals, which included the set of 136 failed pharmaceuticals, were added to the ph2 inventory, and a set of endocrine reference chemicals were procured for inclusion in the e1K inventory. Hence, both practical and design considerations influenced the final composition of EPA’s Tox21 and Phase II testing inventories. 2.2.2. Quality Considerations. In contrast to the typical application of HTS in a drug discovery pipeline, which seeks to identify a small number of novel and potent leads for testing follow-up, the application of HTS in the ToxCast and Tox21 programs has multiple overlapping objectives.34 These range from developing new computational approaches to predict toxicity for large numbers of chemicals to gaining insights into toxicity mechanisms for diverse chemicals and to filling data
epa.gov/tsca-screening-tools/epi-suitetm-estimation-programinterface), were applied to filtering and prioritizing this list.32 Initially excluded were compounds predicted to be volatile (MW < 140 g/mol), unable to transport through lipid bilayers or be sufficiently water-soluble in assay buffers/media (log octanol/water partition coefficients, i.e., logP values, less than −1 or greater than 7), or deemed less suitable for testing and structure-based modeling (e.g., mixtures, inorganics, and explosives).33 However, given the incomplete coverage and approximate nature of these assumptions and calculated properties, some high priority chemicals having less than optimal properties were included in the testing library. After application of the initial filters to the original 19,000 ACToR nomination list, a list of approximately 7,000 CASRNs and chemical names was submitted to EPA Contract services for procurement subject to contract specifications pertaining to cost, availability, and acceptable handling restrictions. Of the final set of 4400 chemicals successfully procured, approximately 10% were determined to be insoluble in DMSO at a target F
DOI: 10.1021/acs.chemrestox.6b00135 Chem. Res. Toxicol. XXXX, XXX, XXX−XXX
Chemical Research in Toxicology
Perspective
Figure 3. Schematic illustrating the main steps in the construction of the EPA’s Tox21 and Phase II testing inventories, starting with a nomination list of approximately 19,000 compounds and ending with a total of 3726 Tox21 compounds (tox21_epa_v1), 1860 of which underwent ToxCast Phase II testing (i.e., the unique combined total of ph1_v2, ph2, and e1K inventories).
Figure 4. Quality control steps associated with each stage of EPA’s ToxCast library construction and processing, from samples to generic substances to chemical structures and their modeling representations (approximate totals as of October, 2015).
gaps for individual or small groups of chemicals. Hence, each individual chemical-assay data point takes on greater importance in informing applications of the ToxCast data. As a result, a correspondingly higher level of individual chemical quality review than is typical for HTS was instituted, including quality control (QC) steps governing initial procurements and supplier documentation review, analytical chemistry QC of plated and neat samples, and DSSTox chemical curation and structure registration. The main chemical QC steps are reviewed below, with a more detailed discussion contained in “ToxCast Chemicals: Data Management and Quality Considerations Overview” (available at https://www.epa.gov/ chemical-research/toxcast-chemicals-data-management-and-
quality-considerations-overview). Additionally, since most of the analytical chemistry QC results generated on EPA’s chemical library, to date, have been performed in the context of the larger Tox21 program (tox21_epa_v1), details of the analytical approaches and results associated with those analyses will be presented in a future Tox21 report. Figure 4 illustrates the various types of chemical and sample representations included in EPA’s ToxCast chemical library, the relative dimensions of the data sets being managed at each level, and the types of QC applied at each step of library construction and processing. EPA specifies that contractorprocured samples be of high purity (>98%) and accompanied by a supplier-provided Certificate of Analysis (COA) and G
DOI: 10.1021/acs.chemrestox.6b00135 Chem. Res. Toxicol. XXXX, XXX, XXX−XXX
Chemical Research in Toxicology
Perspective
or CASRN, and 1% have neither) or when the CASRN and name provided are in conflict (i.e., the CASRN and name refer to different substances). In general, the CASRN and name within the COA have been found to be more reliable (specific to the correct hydrate, salt, and stereo level), consistent, and accurate than the information provided in a spreadsheet from the supplier. In cases where COAs (or MSDSs) are unavailable, DSSTox registration must rely on the limited information provided by the supplier. The electronic chemical structure representations (i.e., either SMILES or Structure-Data SD “mol” files) provided by ToxCast chemical suppliers were found to be the least reliable piece of information for sample identification, most often due to incomplete or inaccurate representations of hydrate, salt, and stereo forms. When supplier-provided structures were later compared to DSSTox registered chemical structures for all of EPA’s ToxCast sample library, it was found that 22% of the normalized, canonicalized structure pairs (i.e., conforming to a single standard representation) were different. Approximately half of these mismatches were eliminated when both sets of structures were “desalted” prior to comparison, indicating that a large portion of mismatches resulted from a lack of explicit salt/ hydrate representations (or incorrect salt/hydrate representations) in supplier-provided structures. Finally, when desalted formulas were compared, which further eliminated discrepancies that could be attributed to salt-parent, stereo, and geometric isomer differences, 3% of the mismatches remained, indicating a significant remaining level of gross structural errors. These results are consistent with a recently published report quantifying the ambiguity of chemical identifiers in a selection of public databases,35 but to our knowledge, the high degree of inconsistency of chemical supplier-provided information relative to manual curation results had not previously been reported. (Note: These results were based on an analysis of 90% of the current ToxCast sample inventory, as of March, 2014, using canonicalized SMILES strings for both sets of parent and desalted structures generated from the original SD files in ACD/ChemFolder, v2012, Toronto, Canada. For further discussion of the quality of public chemistry databases, see, e.g., Williams and Ekins, and Williams et al.36,37) Once the most reliable CASRN and/or chemical name is established for the procured sample (see “Generic Substances” review step in Figure 4), the DSSTox chemical substance registration process either matches the information to an existing DSSTox substance or registers a new chemical substance through the DSSTox manual curation interface. At this stage, additional quality checks are made according to established DSSTox Chemical Information Quality Review Procedures (ftp://ftp.epa.gov/dsstoxftp/DSSTox_Archive_ 20150930/) to ensure internal consistency and unique 1:1:1 mappings among the three main substance identifiers: CASRN, chemical name, and structure. [Note: DSSTox quality review procedures have been updated to employ automated structurechecks using InChIKeys and to enforce unique 1:1:1 CASname-structure mappings, eliminating representative structures that had been previously stored for mixtures.] After DSSTox chemical substance registration is complete, chemical structure files for a registered list (e.g., the ToxCast chemical inventory) are generated. Additionally, processes based on publicly available tools have been implemented to perform additional levels of structure standardization and validation, i.e., structure normalization, as well as desalting, to produce what are termed “QSAR-ready” structure files (see
Material Safety Data Sheet (MSDS), whenever possible. Of the nearly 8800 commercially procured samples/bottles registered at the unique supplier/lot level to date (as of October 2015), nearly 70% were accompanied by a COA that indicated a sample purity. Of these COAs, 76% reported purities of 98% or greater (i.e., analytical grade) and 99% reported purities of 90% or greater. The 1% of compounds with reported COA purities less than 90% were mostly specified as “technical grade” or were dyes or mixtures. Hence, to date, 99% of EPA’s commercially procured samples that are accompanied by a supplier-provided COA are certified at or above 90% purity prior to solubilization and plating. The second level of Figure 4, “Stock Solutions”, provides a rough indication of the number of uniquely barcoded solutions currently available for ToxCast plating. Analytical chemistry QC of EPA’s ToxCast library has been primarily performed at the plated or stock solution level and has included confirmation of parent mass identity (ID) and assessment of purity, stability, and concentration for the majority of tested samples. The bulk of these efforts has been carried out in association with the Tox21 project and includes both liquid chromatography (LC) and follow-up gas chromatography (GC) mass spectrometry (MS) on original Tox21 stock plates (including tox21_epa_v1). Initial, high-level summary QC calls (completed as of May, 2014) for approximately half of EPA’s Tox21 library are available for download at the ToxCast December 2014 Data Release Download Page (see “ToxCast Chemicals DSSTox”, chemicalfiles.zip at https://www3.epa.gov/research/ COMPTOX/previously_published.html). Of the chemicals amenable to analysis, more than 90% were judged as having purity greater than 90% (Grade A), with less than 2% of samples having purity less than 50% or failing confirmatory parent mass ID. A relatively high percentage of ToxCast chemicals (>35%) were not amenable to LC-MS, with more than half of these being relatively low MW compounds (90% purity) had MW values 720 K substances as of 4/1/2016) can be publicly accessed from EPA’s recently released Chemistry Dashboard, available at https://comptox.epa.gov/dashboard. 3.1.3. Calculated Physicochemical Properties and Chemical Descriptors. The publicly available CORINA Symphony Descriptors Community Edition Web service (Molecular Networks GmbH, Erlangen, Germany) (http://www.mn-am. com/services/corinasymphonydescriptors) was used to compute logP (i.e., XlogP) and two additional properties, total polar J
DOI: 10.1021/acs.chemrestox.6b00135 Chem. Res. Toxicol. XXXX, XXX, XXX−XXX
Chemical Research in Toxicology
Perspective
(Lhasa Ltd., Leeds, UK) were available by commercial license to EPA through the end of 2012. At that time, the programs were used to process a 2012 version of the DSSTox TOXCST SD file that provided coverage of the majority of EPA’s ToxCast library to the end of Phase II testing, i.e., including inventories ph1_v1, ph2, and a portion of e1k and EPA’s initial Tox21 library. The DSSTox SD file was batch processed in Derek using the 2011i version of the knowledge-base. Predictions were performed for Species:Rat across available toxicity end points, including, e.g., Carcinogenicity, Mutagenicity, Irritation (eye, skin, respiratory, gastrointestinal), Teratogenicity, Hepatotoxicity, Cardiotoxicity, and Neurotoxicity.46 Default settings were used in Meteor processing of the SD file, with Species:Rat, and with all enzymes manually selected and processed in separate runs. 3.2. DSSTox TOXCST and CASRN List Overlaps. The DSSTox TOXCST SD file (Figure 5a) contains chemical identifiers and structures for the ToxCast inventory of unique “generic” substances, each of which have been subjected to some level of ToxCast or Tox21 testing. A generic substance is considered to be the supplier/lot-independent version of the chemical sample that undergoes testing and most often corresponds to the level of annotation provided by CASRN and associated chemical names. The current version of DSSTox annotates to the level of reported stereo (relative, absolute, partial, none), salt, or hydrate form and assigns a unique structure to the substance wherever possible (Note: When a single structure assignment is not possible, e.g., for mixtures, a unique substance ID is assigned, and linkages to one or more unique structure components are created within the database). Structures are uniquely rendered in 2D molfile format, as well as associated with an InChI and InChIKey (http://www.inchitrust.org/about-the-inchi-standard/). A summary of the chemical contents of the current (January 2016) TOXCST file is provided in Table 1.
aid in prioritizing chemicals for further testing and regulatory purposes. This CERAPP inventory includes a large fraction of all man-made chemicals to which humans are potentially exposed, spanning chemicals from a variety of use classes, including consumer products, food additives, and human and veterinary drugs. Chemicals were aggregated by CASRN from the following sources: (1) chemicals with documented use from the ACToR CPCat database; (2) the legacy DSSTox Master database structures, consisting of ∼15,000 curated chemical structures from multiple inventories of environmental and toxicological interest, including ToxCast and Tox21 (ftp://ftp. epa.gov/dsstoxftp/DSSTox_Archive_20150930/); (3) the Canadian Domestic Substances list (DSL) consisting of ∼24,000 CASRN substances presumed to be in commerce in Canada (https://ec.gc.ca/lcpe-cepa/); (4) the EDSP Universe, consisting of ∼10,000 chemicals of potential concern for significant exposure and, thus, subject to testing prioritization for endocrine disruption, in particular through ER-mediated pathways (https://www.epa.gov/endocrine-disruption/ endocrine-disruptor-screening-program-edsp-universechemicals); and (5) a list of ∼15,000 Chemicals used as EPI Suite training and test sets.32,40 A structure validation and standardization workflow developed for use in the CERAPP project was also used to process DSSTox structures into “QSAR-ready” form for the present analyses.38 3.1.7. FDA Marketed (and Discontinued) Drugs Structure Inventory (FDA_Drugs). The FDA Marketed (and Discontinued) Drugs structure file (FDA_Drugs) is offered for public download from the Leadscope (Leadscope Inc., Columbus, OH) Web site http://www.leadscope.com/ls-manuals/. The FDA_Drugs SD file includes structures for 7815 marketed drugs, of which 1444 were at some point discontinued, compiled from the following sources: Drugs@FDA, (www. accessdata.fda.gov/scripts/cder/drugsatfda); FDA Orange Book (www.fda.gov/cder/ob); National Drug Code Directory, (www.fda.gov/cder/ndc); Medical Subject Headings from the National Library of Medicine (www.nlm.nih.gov/mesh). 3.1.8. Benchmark Dose Calculations for Human Health Assessment (BMDHHA) Structure Inventory. The DSSTox BMDHHA structure file includes 901 chemicals whose data were collected for a recent study aimed at standardizing benchmark dose calculations for use in human health risk assessments [personal communication].45 The list consists of environmental chemicals with existing human health assessments mainly extracted from EPA’s Integrated Risk Information System (IRIS, https://www.epa.gov/iris/) and the Agency for Toxic Substances and Disease Registry (ATSDR, www.atsdr. cdc.gov). The DSSTox BMDHHA file used in the present study is available from the DSSTox Data download page (see “ToxCastLandscapeDataFiles.zip” at ftp://ftp.epa.gov/ dsstoxftp/). 3.1.9. OECD Toolbox. The Organisation for Economic Cooperation and Development (OECD) Toolbox was developed by the OECD with funding from the European Union Chemicals Agency and is available for free download as a Windows stand-alone application from http://www.oecd.org/ chemicalsafety/risk-assessment/theoecdqsartoolbox.htm). The current TOXCST SD file was imported into a downloaded version of the OECD Toolbox (v3.3.5.17), and Toolbox profilers available to the user from within the application were applied. All results were exported as text linked to DSSTox IDs. 3.1.10. Derek Nexus and Meteor (Lhasa Ltd., Leeds, UK). Derek Nexus v.2.0 and Meteor (LPS13) v.13.0.0 for Windows
Table 1. Summary of the Contents of the DSSTox TOXCST File (Date-Stamped 20160129)a generic substances structures CASRN salt or complex inorganic organometallic mixture of stereoisomers mixture/formulation polymer single chemical duplicates on desalting
4226 4056 4134 459 48 110 113 157 11 3945 202
Available for download (see “ToxCastLandscapeDataFiles.zip” at ftp://ftp.epa.gov/dsstoxftp/).
a
Recall that a large collection of CASRN listings were crossreferenced to create the nomination list for the Phase II and Tox21 expansion phase of the ToxCast library (Figure 3). A posthoc evaluation of the current ToxCast library thus considers the degree of CASRN coverage, or overlap, achieved across a representative selection of CASRN inventories, many of which contributed to the original CASRN nomination process. Table 2 lists 19 DSSTox and ACToR CASRN inventories of varying size and content, spanning a variety of regulatory, data, and use-categories of potential interest. K
DOI: 10.1021/acs.chemrestox.6b00135 Chem. Res. Toxicol. XXXX, XXX, XXX−XXX
a
see above see above
see above
ACToR (https://actor.epa.gov/) see above
see above
see above ACToR:CPCat (https://actor.epa.gov/ cpcat/) see above see above see above see above see above see above see above see above
DSSTox_HPVCSI
DSSTox_IRISTR DSSTox_NTPBSI
DSSTox_TOXREF
ACToR:EPA_IUR_2002, 2006 ACToR:FDA EAFUS
ACToR:FDA GRAS
ACToR:NHANES 2001-2, IV UseDB_16:Industrial
L
National Health and Nutrition Examination Survey ACToR Chemical Product Category (CPCat) database
FDA Generally Recognized as Safe Chemicals
EPA Inventory Update Rule Lists (2002,2006) FDA Everything Added to Food in the United States
EPA’s Integrated Risk Information System inventory NTP’s Bioassay Database inventory (moved to Chemical Effects in Biological Systems - CEBS Database) EPA’s Toxicity Reference Database (ToxRefDB)
EPA’s High Production Volume Chemicals inventory
Carcinogenic Potency Database - All Species (hosted on National Library of Medicine’s ToxNet Web site) FDA’s Maximum Daily Dose inventory, ToxCast Failed Drugs
title
193/1216 (16%)
ftp://ftp.epa.gov/dsstoxftp/DSSTox_Archive_ 20150930/ ftp://ftp.epa.gov/dsstoxftp/DSSTox_Archive_ 20150930/ https://www.epa.gov/iris http://www.niehs.nih.gov/research/resources/ databases/cebs/index.cfm https://www.epa.gov/chemical-research/ toxicity-forecaster-toxcasttm-data https://www.epa.gov/chemical-data-reporting http://www.fda.gov/Food/ IngredientsPackagingLabeling/ucm115326. htm http://www.fda.gov/Food/ IngredientsPackagingLabeling/GRAS/ http://www.cdc.gov/nchs/nhanes.htm https://actor.epa.gov/cpcat/faces/download. xhtml
780 148 195 199 406 997 701 671
83/274 (30%) 1621
61/355 (17%)
2202/6200 (36%) 695/3969 (18%)
898/1172 (77%)
312/544 (57%) 953/2326 (41%)
874/3548 (25%)
584/1547 (38%)
TOXCST overlap/total (%)
http://toxnet.nlm.nih.gov/cpdb/
original source Web site
For further details, see the CASRN overlap table available for download from the DSSTox Data download page (see “ToxCastLandscapeDataFiles.zip” at ftp://ftp.epa.gov/dsstoxftp/).
UseDB_17:Consumer Use UseDB_18:Antimicrobial UseDB_20:Colorant UseDB_23:Fragrance UseDB_25:Personal Care UseDB_26:Pesticide UseDB_28:Pharmaceutical UseDB_29:Inert
see above
DSSTox_FDAMDD*
list source
DSSTox (ftp://ftp.epa.gov/dsstoxftp/ DSSTox_Archive_20150930) see above
DSSTox_CPDBAS
list name
Table 2. Representative Set of ACToR, DSSTox, and CPCat Data, Regulatory, and Use-Category CASRN Lists That Contributed Nominations to the Current ToxCast Librarya
Chemical Research in Toxicology Perspective
DOI: 10.1021/acs.chemrestox.6b00135 Chem. Res. Toxicol. XXXX, XXX, XXX−XXX
Chemical Research in Toxicology
Perspective
number of lists (out of the 19) containing each chemical, with 146 chemicals occurring on 10 or more lists. The latter set includes phenol (108-95-2), propylene glycol (57-55-6), caffeine (58-08-2), benzoic acid (65-85-0), benzyl alcohol (100-51-6), 2-hydroxybiphenyl (90-43-7), and dibutyl phthalate (84-74-2). Additional trends apparent in both Figures 6 and 7 include (1) a high concentration of TOXREF chemicals and pesticides tested in Phase I; (2) a more dense coverage of lists in the ph2 inventory (versus ph1), as was the objective of Phase II, with this trend continuing through e1k and tapering off into the ph3 inventory; and (3) a relatively dense coverage of high production volume (IUR) and industrial chemicals throughout the ToxCast inventories but particularly in Phases II and III. Interestingly, even chemicals classified as drugs are reasonably well represented in the ph2 and e1k inventories. Some drugs were deliberately included, such as those serving as reference compounds and the donated failed drugs. In addition, a number of chemicals (e.g., fluconazole, caffeine) are “multipurpose”, spanning a variety of use cases, including use as drugs. Thus, the total number of CASRN-list combinations in Figure 7 far exceeds the total number of CASRN in TOXCST, indicating that an appreciable number of TOXCST chemicals span multiple use and data categories. Limiting counts to the 19 lists in Table 2 and Figures 6 and 7, each TOXCST chemical appears on an average of three lists, whereas chemicals added in Phase II, where multilist occurrence was a key factor in prioritization, appear on an average of 4.5 lists. Representation of chemicals on multiple CASRN inventories was a key means of prioritizing chemicals for inclusion in the ToxCast library and is a distinguishing feature of this library. This not only services a broad range of EPA program offices and non-EPA stakeholders but also includes chemicals with potentially broader data coverage, spanning different information domains. For instance, Figures 6 and 7 indicate broad coverage of regulated pesticides (TOXREF, UseDB_26:Pesticide), drugs (DSSTox_FDAMDD, UseDB_28:Pharmaceutical), and food additives and cosmetics (FDA_EAFUS), which together constitute a more diverse landscape of chemicals accompanied by guideline animal toxicity studies than would each separately. Similarly, the availability of chemical data sets where both exposure-use data and toxicity data are available are key components to building more rapid exposure-hazard assessments using HTS data.47 The above inventory comparisons and frequency values were compiled at the generic substance level by exact matching of CASRN across DSSTox and ACToR inventories. Such comparisons rely upon the veracity of CASRN-substance assignments provided in public inventory listings, do not require chemical structure annotations, and therefore do not consider closely related chemicals (i.e.., compounds that are structurally similar or differ only by hydrate, salt, or stereo form but are assigned different CASRNs). Relying upon a robust substance-structure (1:1:1 CASRN-name-structure) curation workflow, such as has been implemented in the DSSTox registration process, reduces common errors associated with CASRN-substance-structure assignments in the public domain (e.g., deleted and invalid CASRN, and incorrect assignments of CASRN to substance creating duplicate mappings within a database). Once accurate CASRN linkages to structure are established, one can more confidently move to the chemistry realm and begin to assess and compare CASRN inventories based on chemical structures and their generalized representations.
Figure 6 plots the total number of CASRN overlaps from each of the 19 lists in Table 2 with CASRNs contained in the
Figure 6. Total number of ToxCast library chemicals (CASRNs) included in a representative selection of DSSTox, ACToR, and CPCat inventories (see Table 2) indicating overlaps for inventories added in ToxCast testing Phases I (gray), II (dark green), and III (light green), with the corresponding total percentage coverage of a list by ToxCast indicated in red text. ACToR:FDA_EAFUS inventory represents FDA “safety approved” chemicals in food and cosmetics and was considered high priority for inclusion in ToxCast based on the public availability of in vivo data. DSSTox_FDAMDD CASRN inventory was combined with the set of 136 donated_pharma chemicals in TOXCST for the purposes of this figure, but the 16% total list coverage refers only to the TOXCST coverage of DSSTox_FDAMDD.
TOXCST file, indicating totals for each successive testing phase expansion of the ToxCast library (from Phase I to Phase II to Phase III). Lists are approximately grouped in this figure according to the type of chemicals and data associated with the list. Figure 6 shows that the library expansion in Phases II and III includes larger numbers and types of chemicals than in Phase I and spans a broader array of industrial, use, and exposure domains. The high priority nomination inventories containing chemicals with in vivo data (e.g., TOXREF) or accompanied by risk assessments (e.g., IRISTR) have the largest total percentage representation within TOXCST, i.e., 77% of TOXREF and 57% of IRISTR CASRNs are included in TOXCST. Additionally, categories of chemicals that are most highly represented in TOXCST include pesticides/antimicrobials (nearly 1000 CASRNs), exposure (more than 2100), and industrial/high production volume (HPV) (more than 1500). Of note is that TOXCST includes only 30% of NHANES chemicals (http://www.cdc.gov/biomonitoring/pdf/ FourthReport_UpdatedTables_Feb2015.pdf). Although these were high priority for inclusion since they are associated with human exposure data, NHANES includes a significant number of metals, urinary metabolites, PCBs, bromodiphenyl ethers (BDEs), and haloalkanes that were either unsuitable for HTS testing (volatile, insoluble) or difficult to procure. Figure 7 provides a finer grained representation of the incidence of each ToxCast chemical across the same 19 CASRN inventories listed in Table 2 and Figure 6. The frequency plot at the top of the figure indicates the total M
DOI: 10.1021/acs.chemrestox.6b00135 Chem. Res. Toxicol. XXXX, XXX, XXX−XXX
Chemical Research in Toxicology
Perspective
Figure 7. Heat map of ToxCast library CASRN chemicals across various DSSTox, ACToR, and CPCat inventories. Total number of lists containing each chemical as a member is plotted as frequency directly above the heat map plot. Chemicals sorted by the same list, inventory, and testing phase progression (Phases I, II, and III) as in Figure 2. CASRN inventories grouped as in Figure 6 are differentiated by colored ribbons (red, in vivo data; brown, industrial or high-production; light blue, exposure data; dark blue, drugs; purple, pesticides; light brown, risk assessment; green, FDA, presumed safe; medium blue and orange, miscellaneous use categories). The DSSTox_FDAMDD CASRN inventory was combined with the set of 136 donated_pharma chemicals in TOXCST for the purposes of this figure and are clustered at the end of the ph2 set (with a corresponding enriched representation seen within TOXREF).
3.3. Structure-Based Inventory Profiling. With the TOXCST structure inventory established, it becomes possible to use chemical structure-based representations to profile and assess chemical coverage and diversity of TOXCST with respect to other chemical inventories of interest. The term “coverage” refers to the ability of a chemical library to span the chemical features and properties of the targeted chemical space of interest, as well as to provide local coverage, or density of similar chemicals sharing features of chemicals or chemical classes of interest. The term “chemical diversity”, in contrast, usually refers to the extent to which a library samples a broad range of chemical features and properties, thus approximating the diversity of a theoretical chemical universe or, more practically, a smaller, targeted chemical space of interest. It follows that the concepts of coverage and diversity mean little without a point of reference or appropriate context. In this section, various structure-based comparisons of TOXCST with potential target inventories are presented in an attempt to provide relative metrics with which to assess both the scope and suitability of the TOXCST inventory to meeting program objectives. In section 3.3.1, TOXCST inventory chemicals, which have each undergone some level of HTS testing, are compared to chemicals in EPA’s procured inventory that were either insoluble in DMSO or volatile to help define the physicochemical boundaries of ToxCast “HTS testability” (Figure 5c). Section 3.3.2 provides examples that illustrate the use of ToxPrints and chemical features for both global and local inventory profiling of TOXCST relative to potential target inventories (CERAPP, FDA_Drugs) (Figure 5d). In section
3.3.3, physicochemical properties are used for global inventory comparisons of TOXCST to three potential target inventories (CERAPP, FDA_Drugs, and BMDHHA) (Figure 5e). Finally, in section 3.3.4, global inventory comparisons of the same three potential target inventories are made based on chemical similarity, i.e., nearest-neighbor calculations (Figure 5f). 3.3.1. DMSO Insolubles and Volatiles. As previously mentioned, a small set of physicochemical property filters, including logP and MW, were applied to pruning the initial ACToR 19,000 nomination list to produce the smaller 7,000 chemical list submitted for procurement to supply EPA’s Tox21 and Phase II programs (see Figure 3). The goal was to eliminate chemicals likely to be unsuitable or problematic for HTS testing, such as mixtures, inorganics, volatiles, and DMSO insolubles. Property filters were approximate, structures were not available for all chemicals, and exceptions were made for some high priority chemicals. Hence, a set of chemicals were procured and processed for inclusion in the ToxCast testing library that were later determined to be volatile or either insoluble in DMSO or of limited solubility in DMSO based on achieving a maximum solubility of 10 mM or less. (Note: An experimental label of “volatile” was assigned when procured neat samples, typically 100 mg or less stored at −20 °C in standard screw-top sample vials, at a later time point were labeled as “empty on reweigh”. Despite this label, solubilization in DMSO can sometimes serve to inhibit volatilization.) Although contained in EPA’s physical sample inventory, the majority of these problematic chemicals were not submitted for testing and, therefore, are not included in the TOXCST chemical inventory file. However, closer examination of these N
DOI: 10.1021/acs.chemrestox.6b00135 Chem. Res. Toxicol. XXXX, XXX, XXX−XXX
Chemical Research in Toxicology
Perspective
Figure 8. Distribution of DMSO insoluble chemicals (a) versus TOXCST tested chemicals (b) as a function of logP (log octanol−water partition coefficient), and chemicals labeled volatile (c) versus TOXCST tested chemicals (d) as a function of molecular weight (g/mol). DMSO insolubles are defined as insoluble in DMSO at 5 mM or less, gray region of (a), or are of limited solubility in DMSO at 5−10 mM and were submitted for ToxCast testing, blue region of (a). LogP (XlogP) was calculated for 505 out of 558 DMSO insolubles and 4031 out of 4056 TOXCST structures using the CORINA Symphony Descriptors Community Edition Web service on 1/15/2016 (Molecular Networks GmbH, Erlangen, Germany; see section 3.1.3 for details). Volatiles consist of chemicals with evidence of volatility when stored frozen in neat form, some of which were solubilized in DMSO immediately on procurement and submitted for testing, blue region of (c), and another portion which was not submitted for ToxCast testing, gray region of (c). Bracketed region in (a) and (b) contrasts the percentages of chemicals within the logP range [1 to 7], whereas the bracketed region in (c) and (d) contrasts the percentages of chemicals in the low molecular weight range of 0−140 g/mol.
without excluding the overlapping “limited solubility” chemicals (indicated in blue in Figure 8a). (Note: DSSTox listings of substances labeled as DMSO insolubles and volatiles that were procured for inclusion into ToxCast, including those with structures represented in Figures 8a and c, are available from the DSSTox Data download page (see “ToxCastLandscapeDataFiles.zip” at ftp://ftp.epa.gov/dsstoxftp/); chemicals within TOXCST, having limited solubility or that were later determined to be volatile, are also annotated as such within the corresponding DSSTox TOXCST file.) Similarly, Figure 8c and d compares the MW distribution of chemicals labeled as “volatiles” versus the TOXCST inventory as a whole. In this case, 130 unique chemicals were labeled as “volatile”, and 44 of these were included in TOXCST and underwent some degree of testing (indicated in blue in Figure 8c). Figure 8c and d shows the majority of volatiles (72%) having MW less than 140 g/mol as compared to TOXCST (18%). Hence, it can be reasonably concluded that a chemical that has one of the following properties is more likely to have DMSO solubility or volatility problems: (1) logP outside range of [1 to 7]; (2) organics with MW < 140 g/mol; (3) complex mixtures (e.g., oils); (4) metal-containing compounds; and (5) metals and inorganics. In addition to analytical chemistry QC results indicative of sample purity, ID, and concentration problems, these are the sorts of chemicals that should warrant further scrutiny of associated ToxCast assay results. Additionally, chemicals falling outside of these property boundaries are less likely to be suitable for or undergo HTS testing. Therefore, predictive models based on ToxCast data are less likely to apply to such chemicals. (Note: At the time of this writing, a subset of
chemicals can help to delineate the HTS boundaries of the TOXCST inventory. In particular, based on experimental confirmations of volatility or DMSO insolubility, it is possible to evaluate how useful logP and MW property filters are in anticipating which chemicals are less suitable for HTS. There are a total of 558 unique chemicals marked as “insoluble” within EPA’s current physical sample library relative to a target stock concentration of 20 mM. This set includes 101 “limited solubility” chemicals, 84 of which were either included in ToxCast testing at lower concentrations of 5−10 mM or had a solubility status that changed over time. Of the 558 DMSO insolubles, 47 (8%) are mixtures and assigned “no structure” within the DSSTox database, compared to a total of 135 (3%) mixtures (excluding EPA environmental samples) in the full TOXCST inventory. These insolubles include polymers (e.g., sodium alginate, CASRN 9005-38-3), oils (e.g., cedarwood oil, CASRN 8000-27-9), and ill-defined substances (e.g., xylenes, CASRN 1330-20-7). Additionally, a significantly higher proportion of metal-containing compounds (22% vs 8%) and inorganics (14% vs 2%) are in the DMSO insolubles set versus the full TOXCST inventory. Figure 8 plots the distribution of chemicals as a function of calculated logP (Figure 8a,b) and MW (Figure 8c,d) for the full TOXCST inventory versus each of the two HTS problem inventories: DMSO insolubles (Figure 8a) or volatiles (Figure 8c). The distribution of calculated logP values for the set of DMSO insolubles (Figure 8a) appears much broader than that for the TOXCST chemicals (Figure 8b), with a larger percentage of TOXCST chemicals (89% versus 50% for the insolubles) falling in the middle logP range of [1 to 7], even O
DOI: 10.1021/acs.chemrestox.6b00135 Chem. Res. Toxicol. XXXX, XXX, XXX−XXX
Chemical Research in Toxicology
Perspective
TOXCST chemicals (93%) contain 5 or more ToxPrint features, which is another indicator of the suitability of ToxPrints for profiling the TOXCST inventory chemicals (i.e., most chemicals contain multiple ToxPrints). Figure 10 provides another profiling view, listing a representative subset of ToxPrint chemical features and the total incidence of each feature in chemicals tested in ToxCast Phase I or in chemicals added in testing Phases II and III. This view illustrates the diverse nature of the named (and drawn) ToxPrint features, spanning bond types, chains, and larger scaffolds. The plot shows that new ToxPrints were introduced as testing expanded into Phases II and III (e.g., steroids, Sicontaining organics, and azo bonds). In addition, the incidence of most of these ToxPrints increased significantly as the ToxCast library was expanded into testing Phases II and III. In the next example, TOXCST is compared to two larger inventories: CERAPP and FDA_Drugs. Each represents a possible target for application of prediction models derived from ToxCast HTS data. CERAPP, spanning more than 32,000 chemical structures, was created by consolidating multiple ACToR and DSSTox inventories (see section 3.1.7 for details) and was intended to serve as a surrogate for the EDSP testing universe or, more generally, for the “environmental-exposure” universe of interest to EPA. The FDA_Drugs structure inventory (Leadscope, Columbus, OH; http://www. leadscope.com/ls-manuals/), in contrast, is an approximate representation of “drug space”, consisting of 6703 unique structures (see section 3.1.8 for details). CERAPP includes the majority of TOXCST chemical structures (3808 or 12% of the CERAPP total), by design, whereas a smaller proportion (8%) of FDA_Drugs structures are also present in TOXCST. Figure 11 compares the three inventories based on the same representative sampling of 22 ToxPrint features as shown in Figure 10, with FDA_Drugs and CERAPP inventory counts proportionately scaled to the size of TOXCST to facilitate the comparison. Overall, the three profiles appear similar. On closer examination, however, FDA_Drugs is seen to include significantly higher representation of some chemotypes, such as the pyridine (ring:hetero_[6]_N_pyridine_generic) and aromatic ether (bond:COC_ether_aliphatic_aromatic) chemotypes shown, and little to no representation of others, such as organosilanes (bond:metal_metalloid_Si_organo) and azo bonds (bond:NN_azo_generic). TOXCST more closely represents the feature profile of the full CERAPP inventory, as is intended and expected. Even so, a few chemotypes in Figure 11 appear to be proportionately under-represented in TOXCST as compared to CERAPP, e.g., the pyridine and azo chemotypes. An analysis of total ToxPrint coverage, similar to that shown in Figure 9 for TOXCST, can be carried out for the larger CERAPP inventory. If the 126 ToxPrints associated with metals, metallic bonds, and inorganics, which were largely excluded from CERAPP by design, are removed from the inventory comparison, 94% of the remaining 603 ToxPrints are present in CERAPP chemicals, as compared to 83% present in TOXCST chemicals. The 11% missing ToxPrints are of interest as they can highlight regions of potentially important local chemistry in the environmental-exposure landscape that are under-represented in TOXCST. Figure 12 lists examples of ToxPrints not present in TOXCST, along with a sample of the CERAPP chemicals containing those chemotypes. Note that the first set of haloketones are likely to be volatile, whereas many dioxins and PAHs are difficult to commercially procure.
approximately 110 ToxCast DMSO insolubles, mostly consisting of defined organics, are being evaluated for water solubility and possible inclusion on a new water plate to be screened in ToxCast assays.) 3.3.2. Global and Local Inventory Profiling Based on Chemical Features. Various global and local substructural feature-based approaches can be used to profile and compare TOXCST to chemical structure inventories containing different numbers and types of structures (Figure 5d). ToxPrint chemotypes enable both global inventory comparisons, such as when the complete set of chemotypes (or a representative subset) are used, as well as local chemistry comparisons across inventories, such as when a chemical class or category is defined by one or a small number of chemotypes. Several examples are presented below to illustrate the use of feature profiling. As mentioned previously, ToxPrints were built to provide good coverage of environmental, industrial, and commercial chemicals, as well as to capture known toxicity alerts and features associated with mammalian biotransformation.44 ToxPrints include organic chemistry building blocks (e.g., alkyl and aromatic groups, functional groups, elemental, and generic metals), toxicologically important chemical scaffolds (e.g., steroidal backbone, bicyclics, polycyclic aromatic hydrocarbons, polychlorinated biphenyls, PCBs, etc.), and additionally include sets of known toxicity structural “alerting” features (Ashby carcinogenicity alerts48 and “threshold of toxicological concern” TTC alerts49). Despite the inclusion of toxicity structural alerts, it should be emphasized that ToxPrints were designed to describe aspects of both toxic and nontoxic chemicals of the sort that are broadly represented in diverse EPA and FDA chemical inventories. As such, the full set of 729 ToxPrints offers a condensed representation of the diverse environmental-exposure chemical landscape consisting of tens of thousands of chemicals. Hence, the degree to which the 729 ToxPrints are represented within the ToxCast inventory can provide a rough indication of chemical diversity and coverage of that inventory. Figure 9 illustrates these points, with a large proportion of the diverse ToxPrint features (77%) represented within TOXCST. (Note: This percentage increases to 83% if the 126 ToxPrints associated with metal atoms, metallic bonds, and inorganics are excluded.) In addition, a large fraction of
Figure 9. Projection of ToxPrints (blue) onto the TOXCST chemical structure inventory (green), indicating coverage in both directions (% of ToxPrints used and % of TOXCST chemicals containing ToxPrints), as well as a frequency plot (orange) of the number of distinct ToxPrints present in each chemical in TOXCST. P
DOI: 10.1021/acs.chemrestox.6b00135 Chem. Res. Toxicol. XXXX, XXX, XXX−XXX
Chemical Research in Toxicology
Perspective
Figure 10. Comparison of the number of TOXCST chemicals containing a representative set of ToxPrints, including bond, chain, and ring types, across ToxCast testing phase expansions (Phase I to Phase II to Phase III). ToxPrint chemotypes are uniquely defined and adhere to a hierarchical naming convention, with bond, chain, and ring serving as top-level categories, and subsequent levels indicated sequentially, as in “bond:COC_ether_aromatic”, bond (level 1), COC (level 2), ether (level 3), and aromatic (level 4). ToxPrint ring chemotype images use clouded gray bonds to represent aromatic bonds and dashed bonds to represent potential aromaticity. Phase I (gray bars) represents chemicals tested in Phase I, whereas Phase II (dark green) and Phase III (light green) bars represent chemicals added to the overall testing inventory in each of those Phases, i.e., ph2 and e1k in Phase II and ph3 in Phase III.
Figure 11. Comparison of proportional ToxPrint coverage (scaled to size of TOXCST inventory) in TOXCST (green) vs CERAPP (light purple) vs FDA_Drugs (blue) for a subset of 22 ToxPrint bond, chain, and ring types. FDA_Drugs and CERAPP total inventory counts are proportionately scaled to the size of TOXCST (i.e., 7815 FDA_Drugs/4226 TOXCST/32468 CERAPP scales to TOXCST as 0.54:1:0.13), such that equal height bars would mean equal proportions of chemotypes across two inventories. ToxPrint chemotype images use fuzzy gray bonds to represent aromatic bonds and dashed bonds to represent partial aromaticity.
Additionally, referring to Figure 8, it is noteworthy that 14% of CERAPP chemicals have low MW (0.75) to its nearest neighbor TOXCST analogue. In contrast, only 58% of BMDHHA chemicals (not in TOXCST) exceed 0.75 similarity to the nearest TOXCST analogue, which is consistent with the diverse property spread of this inventory observed in Figure 14a. The bottom plot, Figure 15c, indicates a lower overall similarity of FDA_Drugs chemicals to TOXCST chemicals than is observed for CERAPP, consistent with Figures 11 and 14b. Even so, a relatively large percentage, 61% of the drugs, exceeds 0.75 similarity to the nearest TOXCST chemical. If the similarity threshold is increased to 0.85, however, the difference
global descriptors selected for this example include logP (related to lipophilicity and membrane permeability), log of the Total Polar Surface Area (TPSA) (related to electronic, dipole characteristics), and log of Complexity (related to molecule size, bulk, and degree of branching). The plots convey the relative sizes of the various inventories as well as some important trends and distinctions in property space coverage across the inventories. First, a high degree of TOXCST overlap occurs in the central region of both plots but with less TOXCST chemical representation in the outer regions. The TOXCST inventory also appears to well represent the overall shape and density of the much larger CERAPP chemical distribution in Figure 14a. However, TOXCST appears more distinct from that of the FDA_Drugs inventory in Figure 14b, with the latter extending more prominently toward higher TPSA and Complexity values, and the TOXCST inventory more concentrated in the lower TPSA and logP regions. These trends are a reflection of the generally smaller MW of environmental chemicals versus drugs and are consistent with the known bias of drug structures toward ready bioavailability, as captured in the well-known “Rule of 5” constraints.50 Interestingly, although the major portion of the relatively small BMDHHA inventory in Figure 14a is embedded in the center of the plot (and thus obscured from view), this inventory appears unusually diverse in property space, extending to the outer regions in each dimension of the plot. Conversely, the Donated_pharma subset of TOXCST appears less diverse and representative of drug space, as a whole, in part due to the small size of this inventory. 3.3.4. “Nearest Neighbor” Similarity Inventory Profiling. In the final example in this section, the general concept of “structure-similarity” is employed to compute a similarity score for each chemical in the CERAPP, BMDHHA, and FDA_Drugs inventories to the most similar (i.e., nearest neighbor (NN)) chemical identified in the TOXCST inventory. In this case, structure similarity was computed using a publicly available fingerprinting method (i.e., a Tanimoto similarity calculation based on PubChem fingerprints (https://pubchem. S
DOI: 10.1021/acs.chemrestox.6b00135 Chem. Res. Toxicol. XXXX, XXX, XXX−XXX
Chemical Research in Toxicology
Perspective
and mechanisms of activity for selected adverse outcomes.4,9,12−15 However, for general toxicity screening and testing prioritization applications, any type of toxicity that might trigger a regulatory concern is of potential interest. If the ToxCast HTS data set is to lend itself most productively to such a broad-based screening objective, then both the chemical and bioassay coverage must attempt to span the full range of potential toxicities and toxicity mechanisms of interest for regulatory screening of environmental chemicals and should include coverage of both toxic and nontoxic chemicals. In addition, the question of xenobiotic biotransformation (or lack thereof in most of the current ToxCast in vitro bioassays) and its relation to toxicity, continues to be one of the most pressing challenges facing the ToxCast program moving forward. In the final few examples presented below, the topics of chemicaltoxicity coverage and metabolic potential are considered by turning to available expert systems that have compiled structural alerting features for biotransformation and toxicity end point prediction and screening (Figure 5g). 3.4.1. OECD Toolbox Profiling: HESS and DART. The publicly available OECD Toolbox is designed to assist users in the development, evaluation, justification, and documentation of chemical categories (http://www.oecd.org/chemicalsafety/ risk-assessment/theoecdqsartoolbox.htm) and to provide support for “read-across” chemical safety assessments within those categories following recent OECD grouping guidance.51 The OECD Toolbox contains a number of chemical profiling schemes for comparing substances on the basis of their mechanistic similarity in relation to various types of toxicity. Some of the profilers consist only of broad categories, whereas others contain end point-specific rule-bases that comprise the same types of structural alerts as are contained in knowledge expert systems, such as Derek.46 As such, these Toolbox profilers provide a useful and freely available means of probing a data set using end point alerts and, in so doing, provide a perspective of the chemical landscape in terms of mature chemical-toxicity relationships. The TOXCST structure library was processed by a selection of the OECD Toolbox mechanistic profilers. Here, the outcomes for two of these are presented: the repeated dose “HESS” (Hazard Evaluation Support System) toxicity profiler; and the “DART” (Developmental and Reproductive Toxicity) profiler.52−54 The HESS profiler was originally developed by the National Institute of Technology and Evaluation (NITE) in Japan (http://www.nite.go.jp/en/chem/qsar/hess-e.html) and is based on repeated dose toxicity test data contained in the database of the Hazard Evaluation Support System (HESS). This profiler includes 61 categories of alerting features (referred to here as “Alerts”). A total of 1213 of the TOXCST chemicals (30%) triggered one or more Alerts in the HESS profiler, with frequencies of a subset of the most highly represented alerts shown in Figure 16. Of the 61 possible alerts in the HESS model, TOXCST chemicals triggered 44 (or 72%) at least once, and 27 of these were triggered five or more times, indicating 5 or more chemicals in each of the 27 categories. In a similar fashion, the DART scheme 1.0 profiler within the Toolbox was used to identify chemicals in TOXCST containing structural alerts associated with the potential to act as a developmental or reproductive toxicant. The DART profiler contains 68 nodes and is based on the decision framework published by Wu et al.54 In this case, a similar percentage of TOXCST chemicals as in the previous example (26%) triggered DART alerting features, 69% were not flagged, and
Figure 15. Radial distance and histogram incidence plots of the similarity scores of chemicals contained in each of three inventories: (a) CERAPP, (b) BMDHHA, and (c) FDA_Drugs, in relation to their most similar “nearest neighbor” (NN) chemical in the TOXCST inventory. Colors of spheres in radial plots on the left portion of the figure vary according to relative distances from the center of the plot: center red = similarity close to 1; yellow > green > blue > purple > orange indicate decreasing similarity from 0.8 to 0 in increments of 0.2, whereas the radial angle of spheres relative to the top of the plot is random. Histogram plots on the right half of the figure indicate numbers of chemicals having a similarity score within the 0.1 increment ranges in relation to a NN TOXCST chemical; the pink shaded region highlights the proportion (listed as percentage) of chemicals with a similarity score >0.75 in relation to a NN TOXCST chemical. Overlapping structures between each of the three inventories and TOXCST (similarity = 1) have been removed in each of the plots (3808 in CERAPP, 573 in BMDHHA, 543 in FDA_Drugs) such that only the nonoverlapping portions of the three inventories with TOXCST are being compared.
is more pronounced, with 26% of FDA_Drugs versus 46% of CERAPP and 38% of BMDHAA having chemicals exceeding this similarity threshold with a TOXCST chemical. These observations are consistent with the distinct view of FDA_Drugs relative to TOXCST in descriptor space that was presented in Figures 11 and 14b. 3.4. SAR and Knowledge-Based Expert Systems. The previous examples explored TOXCST chemical coverage and diversity in relation to other types of chemical inventories to which ToxCast data and inferences might be applied. The goal in this section is not to predict toxicity or biotransformation of TOXCST chemicals, per se, but rather to assess the extent to which the TOXCST inventory spans the known toxicity and metabolic biotransformation structure-alerts and structurebased categories, as represented in commercial and publicly available knowledge-based expert systems. ToxCast HTS data has been used to model particular types of toxicity pathways T
DOI: 10.1021/acs.chemrestox.6b00135 Chem. Res. Toxicol. XXXX, XXX, XXX−XXX
Chemical Research in Toxicology
Perspective
Figure 16. Plot of the number of TOXCST chemicals that triggered a selection of the most highly represented repeat dose toxicity categories within the HESS Profiler accessed from within the OECD Toolbox (v3.3.5.17). Rank A and B categories correspond to “Well known toxicity mechanism”, but Rank A is validated by repeat dose toxicity test data, whereas Rank B is not validated but is estimated by other tests; Rank C corresponds to “Toxicity mechanism not well known” but is experimentally defined using repeat dose toxicity test data. Profiler includes 13 categories with “No rank” that are defined by structural boundaries supported by training set chemicals but with lack of repeat dose toxicity test data to support a specific toxicity mechanism. Profiler includes 22 categories assigned as “Alert”, which are based on a single target chemical and its analogues as determined by a similarity threshold.
Figure 17. Plot of the number of TOXCST chemicals triggering a selection of the most highly represented structural alerting classes in the DART Profiler accessed from within the OECD Toolbox (v3.3.5.17). Numbers in parentheses correspond to the DART flowchart decision element in Wu et al.54
approximately a third of the ToxCast chemicals containing a DART alert (361) also contain a HESS alert, indicating possible broad-based toxicity for this subset. Another conclusion to be drawn from this figure is that despite the fact that 55% of ToxCast chemicals did not trigger any HESS or DART alerts (light green portion of TOXCST bar), the remaining chemicals hit on significant proportions of the respective knowledge-bases by way of the alert categories represented in the HESS and DART profilers. Finally, although TOXCST contains good coverage of the alerting features represented in both models, it would be of future interest to examine the alerting features NOT present in TOXCST to determine (1) what types of chemicals led to the identification of these toxicity alerts and (2) possible reasons for the absence of these chemicals within TOXCST.
the remaining 5% were not processed by the profiler (mainly inorganics and metal-containing compounds). A sampling of DART alerts most highly represented in TOXCST chemicals is shown in Figure 17. TOXCST chemicals triggered 124 alerts one or more times, and 52 (42%) of those alerts were triggered five or more times. Although this profiler did not report instances of alerts not triggered (i.e., 0 incidence), a lower bound estimate of 136 total alerts could be inferred from the DART node numbering scheme. Hence, it is estimated that TOXCST covers up to 91% of the DART knowledge space with at least one chemical per alert. Figure 18 summarizes the results of the two examples above, highlighting respective coverage of alerting features within the two profilers, i.e., 72% for HESS and 91% for DART, as well as indicating the proportion of TOXCST chemicals containing one or more alerts in each case. In addition, it shows that U
DOI: 10.1021/acs.chemrestox.6b00135 Chem. Res. Toxicol. XXXX, XXX, XXX−XXX
Chemical Research in Toxicology
Perspective
Figure 18. Projection of HESS (yellow) and DART (dark orange) alerts onto the TOXCST chemical structure inventory (light green), indicating coverage in both directions, i.e., % of alerts triggered and % of TOXCST chemical structures containing alerts, as well as chemicals containing both HESS and DART alerts (patterned orange). Calculations performed using HESS and DART profilers available within the OECD Toolbox v3.3.5.17.
Figure 19. Projection of Derek (pink) and Meteor (blue) structural alerts onto the portion of the TOXCST inventory (1935 structures) analyzed in 2012 (light green), also indicating the corresponding coverage of Derek end points and Meteor enzymes for Species:Rat in each case. Calculations were performed using Derek Nexus v.2.0 and Meteor (LPS13) v.13.0.0 for Windows (Lhasa Ltd., Leeds, UK).
is largely a reflection of the greater availability of public data and structure-based knowledge in these well-studied topic areas relative to other lesser studied areas of bioactivity.) For this TOXCST subset, 1127/1935 (58%) of the chemicals triggered one or more Derek Rat end points, and 508 (26%) triggered three or more end points. Furthermore, 41/43 (95%) of the Derek end points are triggered by at least one chemical, and 26 (60%) of the end points are triggered by 10 or more chemicals. Thus, even though only half of the current TOXCST inventory chemicals were considered in this analysis, the results indicate significant representation of predicted toxicants, as well as significant coverage of the available Derek (Rat) structure-toxicity alerting knowledge base. Figure 20 lists the top 16 Derek (Rat) end points (Figure 20a) and alerts (Figure 20b) and their incidence within the 1935 TOXCST inventory subset. In addition to using the TOXCST prediction totals to assess the coverage of Derek knowledge space, each Derek [end point:structure alert] pairing provides a potentially useful local mechanism-based category for exploring in vitro to in vivo toxicity associations. It will be of interest to expand this work in the future to the full TOXCST inventory to see if greater coverage of Derek alerts and end points, which approximately define the scope of the toxicity alert “Knowledge-Prediction Landscape”, can be achieved. A similar analysis to that for Derek was carried out in 2012 using Meteor on the same set of 1935 TOXCST chemicals, also restricted to Species:Rat, with the results shown on the right side of Figure 19. Of the available 33 Rat enzymes (and associated alerts), 31 (94%) were triggered by at least one chemical in the 1935 TOXCST subset, and 19 (58%) were triggered by 10 or more chemicals. Correspondingly, 1634 (84%) of the chemicals triggered at least one enzyme alert. Relative to the total 157 possible alerts across all enzymes in Meteor, the 1935 TOXCST chemicals triggered 125 (80%) of these, with almost half corresponding to the CYP enzyme. It can be concluded that the 1935 TOXCST inventory chemicals provide significant coverage of the Meteor (Rat) structurebiotransformation alerting knowledge base, indicating significant chemical functional diversity. The implication of this result to toxicity, however, is less clear since the linkage between
3.4.2. Derek Nexus and Meteor (Lhasa Ltd., Leeds, UK). Derek Nexus and Meteor are widely used expert rule-based toxicity and metabolism prediction software programs, respectively, that are commercially licensed through the notfor-profit Lhasa Cooperative http://www.lhasalimited.org/ membership/. These programs consolidate literature-derived and domain expert knowledge of structure-alerting features to predict toxicity and biotransformation of chemicals. Expert rules (i.e., structural alerting features) are the result of more than two decades of public literature review, as well as contributions by cooperative members from industry, academia, and regulatory institutions, and are supported by public literature references.46 Hence, both programs strive to achieve broad coverage of existing knowledge relative to structurealerting features spanning the diverse chemical space. Results were generated in 2012 using EPA-licensed versions of Derek Nexus v.2.0 and Meteor (LPS13) v.13.0.0 for Windows (Lhasa Ltd., Leeds, UK) applied to an early version of the ToxCast inventory consisting of 1935 unique chemicals (see section 3.1.10 for details).55 Derek toxicity alerts are specific to species-end point, whereas Meteor biotransformation alerts are specific to species-enzyme. For both Derek and Meteor, across end points and enzymes, respectively, the largest number of alerts was available for Species:Rat. Of the 58 possible toxicity end points available in Derek at the time of this study, 43 applied to Rat, and these were associated with 280 structural alerts (some alerts spanning multiple end points). The largest number of alerts were available for Rat carcinogenicity, teratogenicity, eye and skin irritation, and hepatotoxicity end points, in decreasing order. In Meteor, there were 157 structural alerts assigned to the 33 Rat enzymes (some assigned to more than one enzyme), with the majority corresponding to the Rat:CYP450 enzyme module, representing all cytochrome P450s. Figure 19 summarizes the results of the Derek and Meteor analyses, focusing on the coverage of the respective Species:Rat knowledge-bases in relation to the ToxCast inventory, as in previous examples. (Note: The large number of alerts in Derek and Meteor associated with Species:Rat, End point:Carcinogenicity, and Enzyme:CYP450 V
DOI: 10.1021/acs.chemrestox.6b00135 Chem. Res. Toxicol. XXXX, XXX, XXX−XXX
Chemical Research in Toxicology
Perspective
application of ToxCast testing to the larger chemical universe; (4) various means for utilizing chemical structure representations to assess local and global chemical coverage of the ToxCast structure inventory with respect to potential target application inventories; and (5) an assessment of the degree to which the ToxCast structure inventory encompasses the knowledge domains captured within expert rule-bases consisting of structural alerting features for predicting toxicity and metabolism. Although cheminformatics and SAR considerations have had little direct influence on the composition of the ToxCast chemical library, to date, we have demonstrated herein that practical and varied programmatic considerations have yielded considerable chemical structure diversity as well as broad chemical feature and property coverage across the ToxCast inventory. In addition, we have presented several examples illustrating that the ToxCast inventory provides significant representation of chemicals within a broad range of local chemical feature domains as defined by ToxPrint chemotypes, OECD HESS and DART profiling categories, Derek toxicity alerts, and Meteor biotransformation alerts. These knowledge-informed, structurally articulated chemistry domains, in turn, can serve to focus investigations and amplify chemical-bioactivity patterns within and across ToxCast assay end points. Similarly, chemical properties, such as DMSO insolubility, volatility, logP, etc., and their greater or lesser variation and importance within different chemical domains can inform both interpretation of assay results and testing strategies moving forward. Lastly, underrepresented regions of chemical or property space within the ToxCast inventory that are articulated by missing chemical features or toxicity alerts highlight limitations of the current ToxCast library and should inform future library expansion. It is hoped by means of these examples, others will be encouraged to carry out similar comparisons to target inventories of interest. In summary, the ToxCast chemical library has been constructed in a manner to specifically serve the broad aims of EPA’s ToxCast program for (1) improving models to predict chemical toxicity, (2) prioritizing chemicals for further testing, and (3) informing risk assessment. The size and diverse composition of the ToxCast chemical library, together with the enormous breadth of associated bioassay data and biological context, make this a unique and valuable public resource. It is hoped that by relating the history of construction of the ToxCast chemical inventory and by presenting a variety of ways in which computational structure-based approaches can be applied to exploring issues of coverage, diversity, and suitability for application to toxicity screening and assessment, the road will be paved for others to employ chemistry and cheminformatics approaches on the path to implementing the vision of 21st century toxicology.
Figure 20. Top occurring Derek (rat) predicted end points (a) and structure alerts (b) for a subset of 1935 TOXCST chemicals. Calculations performed using Derek Nexus v.2.0 v.13.0.0 for Windows (Lhasa Ltd., Leeds, UK).
biotransformation and toxicity has not yet been established for the TOXCST inventory. Computational efforts are underway to explore this linkage by combining Meteor metabolically transformed structures to predicted Derek end points for the TOXCST chemicals. In this way, subportions of the TOXCST chemical inventory whose predicted toxicity (by Derek or other SAR models) is more likely to be impacted by metabolic transformation can be highlighted. This segregation of chemicals, in turn, should inform the utilization and interpretation of ToxCast in vitro results as they relate to in vivo animal outcomes.
4. CONCLUSIONS Our objectives for this perspective were to describe EPA’s efforts to construct the current ToxCast HTS chemical library, to characterize and provide representations of the library that can aid in future investigations, and to provide metrics to assess whether the resulting inventory is fit for the purpose and goals of the ToxCast project. Current toxicological knowledge is limited to a relatively small landscape of a few thousand chemicals having in vivo toxicity guideline studies, and the full range of potential toxicity mechanisms that could apply to the much larger universe of untested chemicals is largely unknown. In the preceding sections, we presented a variety of lenses through which to view the current ToxCast HTS chemical inventory, including the following: (1) a historical account of the main drivers in its formation and QC steps implemented to ensure its accurate description; (2) a series of CASRN inventory overlap comparisons across a wide variety of data, regulatory, exposure, and use categories; (3) an assessment of the physicochemical properties of chemicals deemed unsuitable for HTS testing, which help to define the boundaries of
■
AUTHOR INFORMATION
Corresponding Author
*Mail Drop D143-02, U.S. EPA Research Triangle Park, NC 27711. Phone: 919-541-3934. E-mail:
[email protected]. Funding
The work presented in this manuscript was solely supported by the U.S. Environmental Protection Agency appropriated funds. Notes
The views expressed in this paper are those of the authors and do not necessarily reflect the views or policies of the U.S. W
DOI: 10.1021/acs.chemrestox.6b00135 Chem. Res. Toxicol. XXXX, XXX, XXX−XXX
Chemical Research in Toxicology
Perspective
Environmental Protection Agency or the U.S. Food and Drug Administration. Mention of trade names or commercial products does not constitute endorsement or recommendation for use. The authors declare no competing financial interest.
University of Arizona, Tucson, Arizona, USA. She has worked in R&D departments in pharmaceutical industry and preclinical research in medical facilities. She has many years of research experience in molecular/cell biology, biochemistry, and pharmaceutical sciences. Presently, she is a lead curator/quality control analyst for DSSTox/ ACToR data base at National Center for Computational Toxicology, U.S. Environmental Protection Agency.
Biographies Dr. Ann Richard received a Ph.D. in Physical Chemistry from the University of North Carolina at Chapel Hill and has been a Principal Investigator within the U.S. Environmental Protection Agency since 1987. She joined the National Center for Computational Toxicology in 2005 and has led the DSSTox project and chemical management efforts in support of EPA’s ToxCast and Tox21 programs. Her research interests lie in creating a knowledge-informed, quality cheminformatics interface between the chemical landscape and the in vitro and in vivo data landscapes that can be used to guide modeling into productive areas of mechanistic inquiry.
Dr. Chihae Yang is Managing Director and CEO of Molecular Networks GmbH, Chief Science Officer of Altamira LLC, and an adjunct professor at The Ohio State University. She was a visiting fellow in the computational toxicology program at the U.S. Food and Drug Administration from 2008−2011. She was previously Chief Scientific Officer at Leadscope, during which time she developed the ToxML database standard, chemoinformatics-based data mining techniques, and a structural feature-based computational modeling system. Dr. Yang was also formerly a tenured professor in the Department of Chemistry at Otterbein College and a Senior Scientist at The Clorox Company.
Dr. Richard Judson is with the U.S. Environmental Protection Agency where he develops computer models and databases to help predict toxicological effects of environmental chemicals. One current major focus is on the development of models of chemicals interacting with the endocrine system. He has published in areas including computational biology and chemistry, bioinformatics, genomics, human genetics, toxicology, and applied mathematics. Prior to joining the EPA, he held positions in biotechnology and Department of Energy laboratories. Dr. Judson has a B.A. in Chemistry and Chemical Physics from Rice University and an M.A. and Ph.D. in Chemistry from Princeton University.
Dr. James F. Rathman is a Professor of Chemical and Biomolecular Engineering at The Ohio State University and Managing Director of Altamira, LLC. Before coming to OSU in 1991, he spent seven years in industrial R&D with Conoco Inc. and The Clorox Company. His research interests include molecular informatics, interfacial and colloidal phenomena, computational modeling and simulation, machine learning, and statistical analysis and experimental design. His current research efforts focus on computational risk assessment of complex chemical systems, with emphasis on pharmaceuticals and cosmetics.
Dr. Keith Houck received an M.S. (Chemistry) from the University of North Carolina at Chapel Hill in 1985 and a Ph.D. (Toxicology and Pathology) from Duke University in 1989. Following a Postdoctoral Fellowship at Genentech, Inc., he worked in the biotechnology and pharmaceutical fields for 13 years at Sphinx Pharmaceuticals and Eli Lilly & Co. He joined the U.S. Environmental Protection Agency as a Principal Investigator in the National Center for Computational Toxicology in 2005 supporting the ToxCast project and Tox21 program. His research interests focus on understanding the complex interactions of chemical and biological targets underlying mechanisms of toxicity.
Dr. Matthew Martin received his B.S. from James Madison University in Integrated Science and Technology in 2003 and his M.S. (2008) and Ph.D. (2011) from the University of North Carolina at Chapel Hill in Environmental Sciences and Engineering. He is currently a Research Computational Biologist within the U.S. Environmental Protection Agency’s National Center for Computational Toxicology working on developing computational tools to improve chemical testing and safety decisions. Dr. Martin is the principal investigator on a number of projects, including the Toxicity Reference Database (ToxRefDB), ToxCast Data Analysis Pipeline (tcpl), high-throughput transcriptomics, and the development of predictive models of toxicity.
Dr. Chris Grulke received a B.S.E. from the University of Michigan in Chemical Engineering (2003) and a Ph.D. in Pharmaceutical Science, Medicinal Chemistry, and Biophysics from the University of North Carolina at Chapel Hill (2011). He is currently employed as a Cheminformatician Scientist at the U.S. Environmental Protection Agency’s National Center for Computational Toxicology. Dr. Grulke is applying advanced database and software development skills to building a cheminformatics infrastructure for integrating chemical and biological data to support the development of predictive models pertaining to exposure, pharmacokinetics, and toxicity.
Dr. John Wambaugh is a Physical Scientist with the U.S. Environmental Protection Agency where he coleads the EPA Rapid Exposure and Dosimetry (RED) project. Much of the RED project supports “exposure forecasting” or “ExpoCast” research to provide real world context for ToxCast data. He received his B.S. (physics) from the University of Michigan, Ann Arbor; an M.S. (physics) from Georgia Institute of Technology; and an M.S. (computer science) and then Ph.D. (physics) in 2006 from Duke University. His areas of active research include development of high throughput methods for exposure, toxicokinetics, and toxicology.
Dr. Patra Volarath is a cheminformatics scientist in the Office of Food Additive Safety (OFAS) at the Center for Food Safety and Applied Nutrition (CFSAN) at the U.S. Food and Drug Administration. She received her Ph.D. in Biophysical Chemistry (2008) and M.S. in Molecular Genetics and Biochemistry (2002) from Georgia State University. She joined the National Center for Computational Toxicology at the U.S. Environmental Protection Agency as a PostDoctoral Fellow in 2010, before joining OFAS in 2013 to work on the FDA’s Chemical Evaluation and Risk Estimation System (CERES) project.
Dr. Thomas Knudsen is Developmental Systems Biologist at the U.S. Environmental Protection Agency’s National Center for Computational Toxicology. He received a B.S. in Biology from Albright College and a Ph.D. in Anatomy and Developmental Biology from Thomas Jefferson University in Philadelphia. His current research involves computational modeling and simulation of developmental processes and toxicities, focusing on the application of cell agent based models to unravel spatial dynamics in complex embryological systems. Dr. Knudsen is Editor-in-Chief of Reproductive Toxicology (since 2003).
Inthirany Thillainadarajah received her Bachelor’s degree in Chemistry and Chemical Technology from University of Manchester, Manchester, UK and Master’s degree in Chemical Education from
Jayaram Kancherla received a B.Tech. in Computer Science and Engineering from Acharya Nagarjuna University in 2009 and an M.S. in Computer Science from North Carolina State University in 2011. X
DOI: 10.1021/acs.chemrestox.6b00135 Chem. Res. Toxicol. XXXX, XXX, XXX−XXX
Chemical Research in Toxicology
Perspective
physics from Colorado State University, and a Ph.D. in toxicology also from Colorado State. Following his doctoral studies, Dr. Thomas performed postdoctoral research in molecular biology and genomics at the McArdle Cancer Research Laboratory at the University of Wisconsin.
He subsequently worked as a Student Services Contractor and then as an ORISE Research Fellow within the U.S. Environmental Protection Agency’s National Center for Computational Toxicology until 2013. He is currently employed as a Scientific Developer with the Center for Bioinformatics at University of Maryland, College Park.
■
Dr. Kamel Mansouri is a computational chemist who obtained an engineering degree in analytical chemistry from the University of Tunis, Tunisia, an M.S. degree in cheminformatics from the University of Strasbourg, France, and a Ph.D. in computational chemistry from the University of Milano Bicocca, Italy. He joined the National Center for Computational Toxicology at the U.S. Environmental Protection Agency as an ORISE Post-Doctoral Fellow in 2013. He is working on several projects involving QSAR modeling, cheminformatics, and datamining, and has collaborated and led projects in the QSAR field with renowned international scientists.
ACKNOWLEDGMENTS We acknowledge the following persons: past and present ToxCast team members, including Drs. David Reif, Nicole Kleinstreuer, Nisha Sipes, Imran Shah, Woody Setzer, and James Rabinowitz; Evotec Chemical Contractor staff Mike Stock, Kim Tran, Kim Matus, Naik Forum, and Tarley Cameron; Hao Truong, who built ToxCast’s initial chemical laboratory information management system; ACToR curators, Doris Smith and Jamie Vail; and, last, Drs. David Dix and Robert Kavlock (EPA), who provided vision and invaluable scientific leadership in directing the launch of the ToxCast program through Phase II.
Dr. Grace Patlewicz is currently a research chemist at the National Center for Computational Toxicology within the U.S. Environmental Protection Agency. She started her career at Unilever UK, before moving to the E.C. Joint Research Centre in Italy and then to DuPont in the U.S. A chemist and toxicologist by training, her research interests have focused on the development and application of QSARs and read-across for regulatory purposes. She has authored over 90 journal publications and book chapters, chaired various industry groups, and has contributed to the development of technical guidance for QSARs and chemical categories under various OECD programs.
■
DEDICATION A.R. wishes to dedicate this work to Dr. Maritja (a.k.a. Marty) Wolf (1946−2014), lead DSSTox curator for ToxCast Phases I and II and Tox21, who exerted a major influence in promoting and ensuring the highest quality chemistry in support of EPA’s ToxCast and Tox21 programs.
■
Dr. Antony Williams received a Ph.D. in analytical chemistry (NMR) from the University of London, UK in 1988. He ran NMR facilities in both academia and US-based Fortune 500 companies. He joined ACD/Laboratories as their Chief Science Officer with a focus on structure representation, nomenclature, and analytical data management. He was a founder of the ChemSpider chemistry database, later acquired by the Royal Society of Chemistry. In 2015, he joined the National Center for Computational Toxicology within the U.S. Environmental Protection Agency as a computational chemist and is presently focused on the development of Web-based applications to access chemistry data.
ABBREVIATIONS ACToR, EPA’s Aggregated Computational Toxicology Resource; APR, Cyprotex (formerly Apredica), Watertown, MA; AR, androgen receptor; ATG, Attagene, Morrisville, NC; ATSDR, Agency for Toxic Substances and Disease Registry; BDEs, brominated diphenyl ethers; BMDHHA, DSSTox SD file for Benchmark Dose Human Health Assessment inventory; BPA, Bisphenol A; BSK, BioSeek, South San Francisco, CA; CASRN (or CAS), Chemical Abstracts Service Registry Number; CERAPP, Collaborative Estrogen Receptor Activity Prediction Project; COA, certificate of analysis; CPCat, EPA ACToR Chemical/Product Categories Database; CPDBAS, DSSTox SD file for Berkeley’s Carcinogenic Potency Database (all species); CSRML, Chemical Subgraphs and Reactions Mark-up Language; DMSO, dimethyl sulfoxide; DSSTox, EPA’s Distributed Structure-Searchable Toxicity database project; e1k, ToxCast Phase II inventory consisting of 799 unique substances evaluated in endocrine-related ToxCast and Tox21 assays; EAFUS, FDA’s Everything Added to Food in the United States chemical inventory; EDSP/EDSP21, Endocrine Disruption Screening Program for the 21st Century; EPA, U.S. Environmental Protection Agency; ER, estrogen receptor; EU, European Union; FDA, U.S. Food and Drug Administration; FDAMDD, DSSTox SD file for FDA’s Maximum Daily Dose chemical inventory; GRAS, FDA’s Generally Recognized as Safe chemical inventory; HPV, high production volume; HPVCSI, DSSTox SD file for EPA’s High Production Volume Chemical Substance Inventory; HTS, high throughput screening; ID, parent mass identification; InChI, IUPAC International Chemical Identifier; IRIS, EPA’s Integrated Risk Information System; IRISTR, DSSTox SD file for EPA’s Integrated Risk Information System; IUR, EPA Inventory Update Rule; logP, log (octanol/water partition coefficient); MSDS, Material Safety Data Sheet; MW, molecular weight; NCATS, National Center for Advancing Translational Sciences; NHANES, National Health and Nutrition Examination Survey chemical
Stephen Little is a chemist and Quality Assurance Manager within the U.S. Environmental Protection Agency’s National Center for Computational Toxicology (NCCT), which he joined in 2005. He received a B.Sc. in Mathematics from Gardner-Webb University in 1977, a B.Sc. in Biochemistry in 1981, an M.S. in Toxicology in 2001 from North Carolina State University, and a graduate certificate in Chemical Informatics from Indiana University in 2009. Prior to becoming the NCCT QA Manager in 2010, his research interests were focused on chemical property calculations and molecular docking of ToxCast chemicals with nuclear receptors. Dr. Kevin Crofton is the Deputy Director of the National Center for Computational Toxicology at the U.S. Environmental Protection Agency. He received a M.S. in zoology from Miami University, and a Ph.D. in toxicology from the University of North Carolina at Chapel Hill. His research interests include developmental neurotoxicity, with an emphasis on the development of adverse outcome pathways and in vitro alternative testing methods, particularly as they relate to the impact of endocrine disruptors and the cumulative risk of thyroid disruptors and pesticides. Dr. Russell Thomas is the Director of the National Center for Computational Toxicology at the U.S. Environmental Protection Agency. The Center is researching new, more efficient, ways to evaluate the safety of chemicals, particularly in assessing chemicals for human health effects. Dr. Thomas’ academic training includes a B.A. in chemistry from Tabor College, an M.S. in radiation ecology and health Y
DOI: 10.1021/acs.chemrestox.6b00135 Chem. Res. Toxicol. XXXX, XXX, XXX−XXX
Chemical Research in Toxicology
Perspective
from drinking-water contaminants. Annu. Rev. Public Health 33, 209− 224. (11) Stephens, M. L., Anderson, M., Becker, R. A., Betts, K., Boekelheide, K., Carney, E., Chapin, R., Devlin, D., Fitzpatrick, S., Fowle, J. R., Harlow, P., Hartung, T., Hoffmann, S., Holsapple, M., Jacobs, A., Judson, R., Naidenko, O., Pastoor, T., Patlewitz, Z. G., Rowan, A., Scherer, R., Shaikh, R., Simon, T., Wolf, D., and Zurlo, J. (2013) Evidence-based toxicology for the 21st century: Opportunities and challenges. Altex 30, 74−103. (12) Sipes, N. S., Martin, M. T., Kothiya, P., Reif, D. M., Judson, R. S., Richard, A. M., Houck, K. A., Dix, D. J., Kavlock, R. J., and Knudsen, T. B. (2013) Profiling 976 ToxCast chemicals across 331 enzymatic and receptor signaling assays. Chem. Res. Toxicol. 26, 878− 895. (13) Kleinstreuer, N., Yang, J., Berg, E., Knudsen, T., Richard, A., Martin, M., Reif, D., Judson, R., Polokoff, M., Dix, D., Kavlock, R., and Houck, K. (2014) Phenotypic screening of the ToxCast chemical library to classify toxic and therapeutic mechanisms. Nat. Biotechnol. 32, 583−591. (14) Judson, R., Magpantay, F. M., Chickarmane, V., Haskell, C., Tania, N., Taylor, J., Xia, M., Huang, R., Rotroff, D., Filer, O. M., Houck, K. A., Martin, M. T., Sipes, N., Richard, A. M., Mansouri, K., Setzer, R. W., Knudsen, T., Crofton, K. M., and Thomas, R. S. (2015) Integrated Model of Chemical Perturbations of a Biological Pathway Using 18 In Vitro High Throughput Screening Assays for the Estrogen Receptor. Toxicol. Sci. 148, 137−154. (15) Leung, M. C. K., Phuong, J., Baker, N. C., Sipes, N. S., Klinefelter, G. R., Martin, M. T., McLaurin, K., Setzer, R. W., Perreault-Darney, S. P., Judson, R. S., and Knudsen, T. B. (2016) Systems toxicology of male reproductive development: Profiling 774 chemicals for molecular targets and adverse outcomes. Environ. Health Perspect. 124, 1050−1061. (16) Benigni, R. (2013) Evaluation of the toxicity forecasting capability of EPA’s ToxCast Phase I data: Can ToxCast in vitro assays predict carcinogenicity? J. Environ. Sci. Health 31, 201−212. (17) Shah, F., and Greene, N. (2014) Analysis of Pfizer compounds in EPA’s ToxCast chemicals-assay space. Chem. Res. Toxicol. 27, 86− 98. (18) Rovida, C., Asakura, S., Daneshian, M., Hofman-Huether, H., Leist, M., Meunier, L., Reif, D., Rossi, A., Schmutz, M., Valentin, J. P., Zurlo, J., and Hartung, T. (2015) Toxicity testing in the 21st century beyond environmental chemicals. Altex 32, 171−181. (19) Judson, R., Houck, K., Martin, M., Richard, A. M., Knudsen, T. B., Shah, I., Little, S., Wambaugh, J., Setzer, R. W., Kothiya, P., Phuong, J., Filer, D., Smith, D., Reif, D., Rotroff, D., Kleinstreuer, N., Sipes, N., Xia, M., Huang, R., Crofton, K., and Thomas, R. S. (2016) Analysis of the effects of cell stress and cytotoxicity on in vitro assay activity in the ToxCast dataset. Toxicol. Sci., in press. DOI: 10.1093/toxsci/kfw092. (20) Villeneuve, D. L., Crump, D., Garcia-Reyero, N., Hecker, M., Hutchinson, T. H., LaLone, C. A., Landesmann, B., Lettieri, T., Munn, S., Nepelska, M., Ottinger, M. A., Vergauwen, L., and Whelan, M. (2014) Adverse outcome pathway (AOP) development I: strategies and principles. Toxicol. Sci. 142, 312−320. (21) Tollefsen, K. E., Scholz, S., Cronin, M. T., Edwards, S. W., de Knecht, J., Crofton, K., Garcia-Reyero, N., Hartung, T., Worth, A., and Patlewicz, G. (2014) Applying adverse outcome pathways (AOPs) to support integrated approaches to testing and assessment (IATA). Regul. Toxicol. Pharmacol. 70, 629−640. (22) Thorne, N., Auld, D. S., and Inglese, J. (2010) Apparent activity in high-throughput screening: origins of compound-dependent assay interference. Curr. Opin. Chem. Biol. 14, 315−324. (23) Cox, L. A. T., Popken, D., Marty, M. S., Rowlands, J. C., Patlewicz, G., Goyak, K. O., and Becker, R. A. (2014) Developing scientific confidence in HTS-derived prediction models: Lessons learned from an endocrine case study. Regul. Toxicol. Pharmacol. 69, 443−450. (24) Patlewicz, G., Simon, T., Goyak, K., Phillips, R. D., Rowlands, J. C., Seidel, S. D., and Becker, R. A. (2013) Use and validation of HT/
inventory; NRC, National Research Council; NN, nearest neighbor chemical deemed most structurally similar to another; NTP, National Toxicology Program; NTPBSI, DSSTox SD file for the NTP Bioassay Database; NVS, NovaScreen Biosciences, acquired by PerkinElmer, Waltham, MA; OECD, Organisation for Economic Co-operation and Development; CLD, CellzDirect, ThermoFisher Scientific, Staley Road Grand Island, NY; PCBs, polychlorinated biphenyls; PFOA, perfluorooctanoic acid; PFOS, perfluorooctanesulfonic acid; ph1_v1, ToxCast Phase I inventory of 310 unique chemicals; ph1_v2, reprocured ToxCast Phase I inventory of 293 unique chemicals evaluated in ToxCast Testing Phases II and III; ph2, inventory of 768 unique substances entered into ToxCast testing in Phase II; ph3, inventory of 1921 unique substances (nonoverlapping with ph1_v2, ph2, e1k inventories) entered into ToxCast testing in Phase III; QC, quality control; QSAR, quantitative structure−activity relationship; SAR, structure−activity relationship; SD file, structure-data molfile format; SMILES, Simplified Molecular Input Line Entry System; TOXCST, DSSTox SD file for the EPA ToxCast program chemical inventory; TPSA, total polar surface area; Tox21, Cross-Federal Agency HTS Screening Program for Toxicology in the 21st Century; tox21_epa_v1 (and v2), EPA’s chemical inventory contributions to the Tox21 testing program; TOXREF, DSSTox SD file for EPA’s ToxRefDB database; ToxRefDB, EPA’s Toxicity Reference Database of in vivo guideline study data; TTC, threshold of toxicological concern; XML, Extensible Markup Language
■
REFERENCES
(1) Dix, D. J., Houck, K. A., Martin, M. T., Richard, A. M., Setzer, R. W., and Kavlock, R. J. (2007) The ToxCast program for prioritizing toxicity testing of environmental chemicals. Toxicol. Sci. 95, 5−12. (2) NRC (2007) National Research Council Toxicity Testing in the 21st Century: A Vision and a Strategy, National Academies Press, Washington, D.C.. (3) Judson, R. S., Houck, K. A., Kavlock, R. J., Knudsen, T. B., Martin, M. T., Mortensen, H. M., Reif, D. M., Rotroff, D. M., Shah, I., Richard, A. M., and Dix, D. J. (2010) In vitro screening of environmental chemicals for targeted testing prioritization: The ToxCast project. Environ. Health Perspect. 118, 485−492. (4) Kavlock, R. J., Chandler, K., Houck, K. A., Hunter, S., Judson, R. S., Kleinstreuer, N., Knudsen, T. B., Martin, M. T., Padilla, S., Reif, D. M., Richard, A. M., Rotroff, D., Sipes, N., and Dix, D. J. (2012) Update on EPA’s ToxCast Program: Providing high throughput decision support tools for chemical risk management. Chem. Res. Toxicol. 25, 1287−1302. (5) Collins, F. S., Gray, G. M., and Bucher, J. R. (2008) Transforming environmental health protection. Science 319, 906−907. (6) Tice, R. R., Austin, C. P., Kavlock, R. J., and Bucher, J. R. (2013) Improving the human hazard characterization of chemicals: a Tox21 update. Environ. Health Perspect. 121, 756−765. (7) Haynes, R. C. (2010) ToxCast on target: In vitro assays and computer modeling show promise for screening chemicals. Environ. Health Perspect. 118, A172. (8) Berg, N., De Wever, B., Fuchs, H. W., Gaca, M., Krul, C., and Roggen, E. L. (2011) Toxicology in the 21st century − Working our way towards a visionary reality. Toxicol. In Vitro 25, 874−881. (9) Knudsen, T. B., Houck, K. A., Sipes, N. S., Singh, A. V., Judson, R. S., Martin, M. T., Weissman, A., Kleinstreuer, N. C., Mortensen, H. M., Reif, D. M., Rabinowitz, J. R., Setzer, R. W., Richard, A. M., Dix, D. J., and Kavlock, R. J. (2011) Activity profiles of 309 ToxCast chemicals evaluated across 292 biochemical targets. Toxicology 282, 1−15. (10) Murphy, E. A., Post, G. B., Buckley, B. T., Lippincott, R. L., and Robson, M. G. (2012) Future challenges to protecting public health Z
DOI: 10.1021/acs.chemrestox.6b00135 Chem. Res. Toxicol. XXXX, XXX, XXX−XXX
Chemical Research in Toxicology
Perspective
HC assays to support 21st century toxicity evaluations. Regul. Toxicol. Pharmacol. 65, 259−268. (25) Richard, A. M. (2006) Future of Predictive Toxicology: An Expanded View of “Chemical Toxicity” - Future of Toxicology Perspective. Chem. Res. Toxicol. 19, 1257−1262. (26) Judson, R., Richard, A., Dix, D. J., Houck, K., Martin, M., Kavlock, R., Dellarco, V., Henry, T., Holderman, T., Sayre, P., Tan, S., Carpenter, T., and Smith, E. (2009) The toxicity data landscape for environmental chemicals. Environ. Health Perspect. 117, 685−695. (27) Knudsen, T. B., Martin, N. T., Kavlock, R. J., Judson, R. S., Dix, D. J., and Singh, A. V. (2009) Profiling the Activity of Environmental Chemicals in Prenatal Developmental Toxicity Studies using the U.S. EPA’s ToxRefDB. Reprod. Toxicol. 28, 209−219. (28) Martin, M. T., Judson, R. S., Reif, D. M., Kavlock, R. J., and Dix, D. J. (2009) Profiling chemicals based on chronic toxicity results from the U.S. EPA ToxRef Database. Environ. Health. Perspect. 117, 392− 399. (29) Martin, M. T., Mendez, E., Corum, D. G., Judson, R. S., Kavlock, R. J., Rotroff, D. M., and Dix, D. J. (2009) Profiling the reproductive toxicity of chemicals from multigenerational studies in the toxicity reference database. Toxicol. Sci. 110, 181−190. (30) Richard, A. M. (2004) DSSTox Website launch: Improving public access to databases for building structure-toxicity prediction models. Preclinica 2, 103−108. (31) Judson, R., Richard, A., Dix, D., Houck, K., Elloumil, F., Martin, M., Cathey, T., Transue, T., Spencer, R., and Wolf, M. (2008) ACToR − Aggregated Computational Toxicology Resource. Toxicol. Appl. Pharmacol. 233, 7−13. (32) US EPA (2015) EPI Suite. Estimation Programs Interface Suite for Microsoft® Windows, v EPIWEB4.10, United States Environmental Protection Agency, Washington, D.C.. (33) Ghose, A. K., Viswanadhan, V. N., and Wendoloski, J. J. (1999) A knowledge-based approach in designing combinatorial or medicinal chemistry libraries for drug discovery. 1. A qualitative and quantitative characterization of known drug databases. J. Comb. Chem. 1, 55−68. (34) Macarron, R. (2006) Critical review of the role of HTS in drug discovery. Drug Discovery Today 11, 277−279. (35) Akhondi, S. A., Muresan, S., Williams, A. J., and Kors, J. A. (2015) Ambiguity of non-systematic chemical identifiers within and between small-molecule databases. J. Cheminf. 7, 1−10. (36) Williams, A. J., and Ekins, S. (2011) A quality alert and call for improved curation of public chemistry databases. Drug Discovery Today 16, 747−750. (37) Williams, A. J., Ekins, S., and Tkachenko, V. (2012) Towards a gold standard: regarding quality in public domain chemistry databases and approaches to improving the situation. Drug Discovery Today 17, 685−701. (38) Mansouri, K., Abdelaziz, A., Rybacka, A., Roncaglioni, A., Tropsha, A., Varnek, A., Zakharov, A., Worth, A., Richard, A. M., Grulke, C., Trisciuzzi, D., Fourches, D., Horvath, D., Benfenati, E., Muratov, E., Wedebye, E. B., Grisoni, F., Mangiatordi, G. F., Incisivo, G. M., Hong, H., Ng, H. W., Tetko, I. V., Balabin, I., Kancherla, J., Shen, J., Burton, J., Nicklaus, M., Cassotti, M., Nikolov, N. G., Nicolotti, O., Andersson, P. L., Zang, Q., Politi, R., Beger, R. D., Todeschini, R., Huang, R., Farag, S., Rosenberg, S. A., Slavov, S., Hu, X., and Judson, R. J (2016) CERAPP: Collaborative Estrogen Receptor Activity Prediction Project. Environ. Health Perspect. 124, 1023−1033. (39) Judson, R. S., Martin, M. T., Egeghy, P. P., Gangwal, S., Reif, D. M., Kothiya, P., Wolf, M. A., Cathey, T., Transue, T. R., Smith, D., Vail, J., Frame, A., Mosher, S., Cohen-Hubal, E. A., and Richard, A. M. (2012) Aggregating Data for Computational Toxicology Applications: The U.S. Environmental Protection Agency (EPA) Aggregated Computational Toxicology Resource (ACToR) System. Int. J. Mol. Sci. 13, 1805−1831. (40) Dionisio, K. L., Frame, A. M., Goldsmith, M. R., Wambaugh, J. F., Liddell, A., Cathey, T., Smith, D., Vail, J., Ernstoff, A. S., Fantke, P., Jolliet, O., and Judson, R. S. (2015) Exploring consumer exposure
pathways and patterns of use for chemicals in the environment. Toxicol. Rep. 2, 228−237. (41) Egeghy, P. P., Judson, R., Gangwal, S., Mosher, S., Smith, D., Vail, J., and Cohen-Hubal, E. A. (2012) The exposure data landscape for manufactured chemicals. Sci. Total Environ. 414, 159−166. (42) Ertl, P., and Schuffenhauer, A. (2009) Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. J. Cheminf. 1, 8. (43) Steinbeck, C., Han, Y., Kuhn, S., Horlacher, O., Luttmann, E., and Willighagen, E. (2003) The Chemistry Development Kit (CDK): An open-source Java library for chemo- and bioinformatics. J. Chem. Inf. Model. 43, 493−500. (44) Yang, C., Tarkhov, A., Marusczyk, J., Bienfait, B., Gasteiger, J., Kleinoeder, T., Magdziarz, T., Sacher, O., Schwab, C. H., Schwoebel, J., Terfloth, L., Arvidson, K., Richard, A., Worth, A., and Rathman, J. (2015) New publicly available chemical query language, CSRML, to support chemotype representations for application to data mining and modeling. J. Chem. Inf. Model. 55, 510−28. (45) Wignall, J. A., Shapiro, A. J., Wright, F. A., Woodruff, T. J., Chiu, W. A., Guyton, K. Z., and Rusyn, I. (2014) Standardizing benchmark dose calculations to improve science-based decisions in human health assessments. Environ. Health Pespect. 122, 499−505. (46) Marchant, C. A., Briggs, K. A., and Long, A. (2008) In silico tools for sharing data and knowledge on toxicity and metabolism: Derek for Windows, Meteor, and Vitic. Toxicol. Mech. Methods 18, 177−187. (47) Wambaugh, J. F., Setzer, R. W., Reif, D. M., Gangwal, S., Mitchell-Blackwood, J., Arnot, J. A., Joliet, O., Frame, A., Rabinowitz, J., Knudsen, T. B., Judson, R. S., Egeghy, P., Vallero, D., and Cohen Hubal, E. A. (2013) High-throughput models for exposure-based chemical prioritization in the ExpoCast project. Environ. Sci. Technol. 47, 8479−8488. (48) Ashby, J. (1985) Fundamental structural alerts to potential carcinogenicity and non-carcinogenicity. Environ. Mutagen. 7, 919− 921. (49) Kroes, R., Renwick, A., Cheesman, M., Kleiner, J., Mangelsdorf, I., Piersma, A., Schilter, B., Schlatter, J., van Schothorst, F., Vos, J. G., and Wurtzen, G. (2004) Structured-based thresholds of toxicological concern (TTC): Guidance for application to substances present at low levels in the diet. Food Chem. Toxicol. 42, 65−83. (50) Lipinski, C. A. (2004) Lead- and drug-like compounds: the ruleof-five revolution. Drug Discovery Today: Technol. 1, 337−341. (51) OECD (2014) Guidance on Grouping of Chemicals, OECD Series on Testing and Assessment No. 194, Organisation for Economic Cooperation and Development, Paris, France. (52) Sakuratani, Y., Zhang, H. Q., Nishikawa, S., Yamazaki, K., Yamada, T., Yamada, J., Gerova, K., Chankov, G., Mekenyan, O., and Hayashi, M. (2013) Hazard Evaluation Support System (HESS) for predicting repeated dose toxicity using toxicological categories. SAR QSAR Environ. Res. 24, 351−363. (53) Gerova, K., Chankova, G., Sakuratani, Y., Yamada, J., Yamada, T., Hayashi, M., and Mekenyan, O. (2012) Predicting Repeated Dose Toxicity in OECD (Q)SAR Toolbox. QSAR 2012, Tallonia, Estonia, June 18−22, 2012, available at http://oasis-lmc.org/media/25105/ QSAR2012_PREDICTING_REPEATED_DOSE_TOXICITY_IN_ OECD_QSAR_TOOLBOX.pdf. (54) Wu, S., Fisher, J., Naciff, J., Laufersweiler, M., Lester, C., Daston, G., and Blackburn, K. (2013) Framework for identifying chemicals with structural features associated with the potential to act as developmental or reproductive toxicants. Chem. Res. Toxicol. 26, 1840−1861. (55) Volarath, P., Martin, M., and Richard, A. (2012) Extending the Derek-Meteor Workflow to Predict Chemical-Toxicity Space Impacted by Metabolism: Application to ToxCast and Tox21 Chemical Inventories. Society of Toxicology, Poster abstract#286: http://www. toxicology.org/application/ToxicologistDB/.
AA
DOI: 10.1021/acs.chemrestox.6b00135 Chem. Res. Toxicol. XXXX, XXX, XXX−XXX