Using Bioinformatic Approaches to Identify Pathways Targeted ... - MDPI

2 downloads 44 Views 987KB Size Report
Jul 12, 2012 - antimitotic carcinoma of the head and neck, breast, thyroid gland, genitourinary tract, and lung. 3,3'‐dichlorobenzidine and its. Anticipated. E/I.
OPEN ACCESS

International Journal of Environmental Research and Public Health ISSN 1660-4601 www.mdpi.com/journal/ijerph Supplementary Information

Using Bioinformatic Approaches to Identify Pathways Targeted by Human Leukemogens. Int. J. Environ. Res. Public Health 2012, 9, 2479-2503 Reuben Thomas, Jimmy Phuong, Cliona M. McHale and Luoping Zhang * Genes and Environment Laboratory, School of Public Health, University of California, Berkeley, CA 94720, USA; E-Mails: [email protected] (R.T.); [email protected] (J.P.); [email protected] (C.M.M.) * Author to whom correspondence should be addressed; E-Mail: [email protected]; Tel.: +1-510-643-5189; Fax: +1-510-642-0427. Received: 18 May 2012; in revised form: 25 June 2012 / Accepted: 26 June 2012 / Published: 12 July 2012

Int. J. Environ. Res. Public Health 2012, 9

2

Figure S1. 2D Tanimoto coefficient distance matrix of the leukemogens. The figure is a visual representation of the distance matrix between the 29 leukemogens, based on structural similarity measured by 2D tanimoto coefficient, as obtained from the PubChem database (http://pubchem.ncbi.nlm.nih.gov//score_matrix/score_matrix.cgi). The order of the leukemogens in the distance matrix is the same as that determined from the unsupervised clustering analysis in Figure 3. The color of the (i,j)th position of the distance matrix reflects the structural similarity between leukemogen i and leukemogen j. The color ranges from white to red, with red indicating greater structural similarity. Dashed black lines correspond to boundaries of clusters of leukemogens as determined from the unsupervised clustering analysis in Figure 3. Cluster no. C hemical name 0 ccnu 1,3‐but adi ene 1 e thyl ene_oxi de 2 oxymethal one 3 t enip oside 4 vinyl _chloride cycloph osphami de 5 thiotepa benzene 6 t richl oroe thyl ene mel phalan 7 ddt st yrene beta‐ hexachlorocyclohexane 8 busulfan l i ndane 9 chl orampheni co l 10 b cn u adri amycin 11 ci splati n e toposi de chl orambucil 12 n itrogen_mustard 13 me_ccn u 14 f ormal dehyde 3,3'‐di chlorobenzi dine 15 al pha‐hexachlorocycloh exane 16 delta‐hexachl orocyclohexan e 17 propylthi ouracil

Int. J. Environ. Res. Public Health 2012, 9 Figure S2. Empirical distribution of 2D tanimoto coefficients. Empirical probability distribution of the 2D tanimoto coefficients obtained from each pair-wise analysis of all 29 leukemogens.

3

Int. J. Environ. Res. Public Health 2012, 9 Figure S3. Correlation between benzene pathway enrichment results using gene interaction data from the Comparative Toxicogenomics Database and global gene expression profiles from a human study. Pathway enrichment analysis was computed for all 250 KEGG human pathways using the SEPEA algorithm [59] on human toxicogenomic data related to benzene from CTD and from a global gene expression study of 125 factory workers occupationally exposed to a range of benzene levels [47]. Pathways are ranked from the most significant (smallest p-values) to the least significant. The figure is a scatter plot of the ranks of the pathways based on the two data sources. The spearman correlation of the pathway enrichment values using the two data sources is 0.45 with a statistical significance of 10–13.

4

Int. J. Environ. Res. Public Health 2012, 9

5

Table S1a. Chemicals found to cause leukemias in humans, according to both the IARC Monographs and the NTP 12th Report on Carcinogens. CTD chemical name (Compound CID)

IARC monograph chemical summaries Exposure Mode of Treatment

Chemical CASRN

name Reported to cause predominantly leukemias melphalan (461602) 148‐82‐3 melphalan

Group Volume 1 9,

M 100A

type action

NTP RoC chemical summaries Exposure Mode of Treatment

Chemical

use

Resultant diseases

a bifunctional alkylating MM, STS, childhood NB, AML, MDS Sup 7, agent that forms DNA advanced ovarian crosslinks; causes clonal adenocarcinoma, breast

name

Class

type action

Known

M

1‐(2‐chloroethyl)‐3‐ (4‐methylcyclohexyl) 1‐nitrosourea (methyl‐CCNU)

Known

M

thiotepa

Known

M



Wilm's tumor, bronchiogenic secondary AML carcinoma, lymphoma, and adenocarcinoma of the breast, ovary, and bladder

chloramphenicol

Anticipated

M



typhoid fever, meningitis, secondary leukemia superficial ocular infections

procarbazine and procarbazine hydrochloride

Anticipated

M



HD

benzene

Known

E/I



1,3‐butadiene

Known

E/I

chlorambucil

Known

M

cyclophosphamide

Known

M

ethylene oxide

Known

E/I

formaldehyde

Known

E/I

loss of chromosome 5 or cancer, melanoma, 7

thiotepa (5453)

52‐24‐4

chloramphenicol (5959)

56‐75‐7

procarbazine

671‐16‐9 366‐70‐1

1 Sup 7, 100A

M

polycythemia vera

thiotepa

1 50,

M

cyclophosphamide 50‐18‐0 (2907)

M

2a 26,

M

generates an alkylating HD

100A

chloramphenicol

salmonellosis

Reported to cause leukemias and othe procarbazine and benzene (241) 71‐43‐2 procarbazine hydrochloride 1,3‐butadiene (7845) 106‐99‐0 r cancers benzene

chlorambucil (2708) 305‐03‐3

2a 50

a bifunctional alkylating HD, melanoma, AML agent that induces hematopoietic macromolecule malignancies, and cancer carbamoylation; of the lung, methylates p15 promoter gastrointestinal tract, and brain a trifunctional, alkylating malignant lymphomas, non‐lymphocytic agent breast cancer, ovarian leukemia cancer, meningeal carcinomatosis, bladder malignancies ‐ typhoid fever, ALL, AML, AA, and

1,3‐butadiene

chlorambucil

Sup 7

agent

1 29, E/I Sup 7, 100F 1 97, 100F

induces chromosome aberrations in lymphocytes. E/I ‐

1 26, M

a bifunctional alkylating CLL, GN, HD, NHL, WM, AML, MDS, SCC, Sup 7, agent; hallmarks of breast cancer, ovarian pancytopenia, leukemia are seen as cancer, polycythemia melanoma deletion or clonal loss of vera, juvenile arthritis chromosome 5 and 7

100A

ethylene oxide (6354) 75‐21‐8 cyclophosphamide

1 26,

M Sup 7, 100A

formaldehyde (712) 50‐00‐0 ethylene oxide

1 97,

E/I 100F

formaldehyde

bone‐marrow depression AML

1 88,

E/I 100F



AML, ALL, CLL, MM, NHL



CLL, CML, HD, NHL

the phosphoramide CLL, NHL, HD, OS, MM, AML, MDS, mustard metabolites are STS, CTCL, NB, ovarian bladder cancer active alkylating agents cancer, breast cancer, burkitt's lymphoma, lymphoblastic forms hydroxyethyl ‐ AML, CML, hemoglobin adducts with lymphocytic N‐terminal valine, leukemia, WM, and histidine, and S‐cysteine cancer of the esophagus, stomach, brain, and breast forms DNA adducts, sister ‐ leukemia, NPC, HD, chromid exchange, MM, and cancer of micronucleii, the buccal cavity, chromosomal larynx, pharynx, lung, aberrations, and DNA kidney, skin, bladder, cross‐links in lymphocytes prostate, brain, and colon

Resultant diseases

melphalan

methyl‐ccnu (5198) 13909‐09‐6 1‐(2‐chloroethyl)‐3‐ (4‐methylcyclohexyl) 1‐nitrosourea (methyl‐CCNU)

use

alkylating agent CML, MM, OS, melanoma, leukemia polycythemia vera, scleromyxedema, amyloidosis, and cancer of the ovaries, breast, testes, and prostate a bifunctional melanoma, and cancer of the AML alkylating agent gastrointestinal tract, brain and lung

leukemia, and cancer of the soft tissue, cartilage, and bone ‐

AML, CLL

epoxide ‐ leukemia, NHL, metabolites reticulosarcoma adduct DNA and form hprt mutations in lymphocytes alkylating agent CLL, HD, GN, KS, NHL, RA, AML WG, chronic hepatitis, systemic lupus, psoriasis, primary macroglobulinemia, cold agglutinin disease, nephrotic syndrome, and

cancer of the breast, lung, cervix, ovaries, and testes leukemia, MM, NB, RA, WG, leukemia, bladder lymphoma, mycosis cancer fungoides, ovarian cancer, breast cancer, nephrotic syndrome in children an alkylating ‐ lymphocytic agent that forms leukemia, NHL, DNA adduct reticulosarcoma, stomach cancer, breast cancer ‐

forms reactive methanediol, which directly damages bone marrow and hematopoietic progenitors



CML, NPC, sinonasal cancer

Chemicals are ordered according to the cancers and diseases they are reported to cause, then by IARC group numbers {1, 2a, 2b, 3, then 4}, then alphabetically by CTD chemical name. Exposure types are expressed as E/I (Environmental or Industrial) or M (Medical/Therapeutic). Chemicals in gray‐italics did not have associated human CTD data and were excluded from further data analysis.

Int. J. Environ. Res. Public Health 2012, 9

6

Table S1b. Chemicals found to cause leukemia in humans, according to IARC Monographs. IARC monograph chemical summaries CTD chemical name Chemical name (Compound CID, if used) CASRN Reported to cause predominantly leukemias busulfan (2478) 55‐98‐1 busulfan

Group Volume

NTP RoC chemical summaries

Treatment use

Resultant diseases

Chemical name

Class

1

4, Sup7, 100A

M

a bifunctional, alkylating CML, brochial carcinoma, AML, MDS, agent thrombocytosis, pancytopenia polycythemia vera

76, 100A

M

inhibits DNA synthesis AML, ALL, HD, NHL, lung by interfering with the cancer, testicular activity of DNA neoplasms topoisomerase II

AML, MDS





M

inhibits DNA topoisomerase II

AML specifically in childhood cancers





M

a bifunctional, ovarian cancer, breast alkylating agent that cancer, melanoma forms DNA crosslinks

AML





etoposide (36462)

33419‐42‐0

etoposide

1

teniposide (452548)

29767‐20‐2

teniposide

2a 76

treosulfan

299‐75‐2

treosulfan

1

Reported to cause leukemias and other cancers trichloroethylene trichloroethylene (6575) 79‐01‐6

Exposure Mode of type action

2a 63

26, Sup 7, 100A

E/I



ALL, AML, Childhood NB, lymphoma, adult brain tumors, bladder cancer, lung cancer



leukemia, NHL, liver cancer, biliary tract cancers, cervical cancer

1,4‐butanediol dimethanesulfonate

trichloroethylene

Known

Anticipated

Chemicals are ordered according to the cancers and diseases they are reported to cause, then by IARC group numbers {1, 2a, 2b, 3, then 4}, then alphabetically by CTD chemical name. Exposure types are expressed as E/I (Environmental or Industrial) or M (Medical/Therapeutic). Chemicals in gray‐italics did not have associated human CTD data and were excluded from further data analysis. The NTP column is shortened to show only the source and class information, and grayed due to the absence of associations with human leukemias.

Int. J. Environ. Res. Public Health 2012, 9

7

Table S1c. Chemicals found to cause leukemia in humans, according to NTP 12th Report on Carcinogens CTD chemical name (Compound CID, if used) CASRN Reported to cause predominantly leukemias bcnu (2578) 154‐93‐8

cisplatin (84691) ccnu (3950)

15663‐27‐1 13010‐47‐4

propylthiouracil (657298)

51‐52‐5

Reported to cause leukemias and other cancers vinyl chloride (6338) 75‐01‐4 adriamycin (31703) 23214‐92‐8 3,3'‐dichlorobenzidine (7070)

91‐94‐1

DTT (3036)

50‐29‐3

Chemical name

thorium dioxide chloroprene 1,3‐dichloropropene

1314‐20‐1 126‐99‐8 542‐75‐6

phenoxybenzamine hydrochloride 59‐96‐1; 63‐92‐3

Exposure Mode of type action

NTP RoC chemical summaries Treatment use

Resultant diseases

Bis(chloroethyl) Nitrosourea

Anticipated

M

a bifunctionally alkylating HD, MM, NHL, primary/metastatic brain AML agent tumors, burkitt's lymphoma, melanoma, Ewing's sarcoma, mycosis fungoides, breast cancer, gastrointestinal cancers

cisplatin

Anticipated

M

forms cisplatin‐induced ovarian cancer, testicular cancer platinum‐DNA‐adducts

secondary leukemia

1‐(2‐chloroethyl)‐ 3‐cyclohexyl‐ cancer 1‐nitrosourea

Anticipated

M

a bifunctional alkylating brain tumors, HD, NHL, psoriasis, agent melanoma, mycosis fungoides,

leukemia

propylthiouracil

Anticipated

M

an antithyroid agent

Known

E/I

the metabolite forms DNA adducts

Anticipated

M

vinyl chloride lindane (727); 58‐89‐9; alpha‐hexachlorocyclohexane (727); 319‐84‐6; beta‐hexachlorocyclohexane (727); 319‐85‐7; delta‐hexachlorocyclohexane (727) 319‐86‐8 nitrogen mustard (4033) 55‐86‐7; 51‐75‐2 oxymetholone (5281034) 434‐07‐1 styrene (7501) 100‐42‐5

Class

adriamycin

3,3'‐dichlorobenzidine and its dihydrochloride dichlorodiphenyl‐ trichloroethane

and

of the gastrointestinal tract, kidney, and breast hyperthyroidism

AML

leukemia, lymphoma, lung cancer, hepatic angiosarcoma, brain cancer, hepatocellular carcinoma a cytotoxic anthracycline‐ acute leukemia, HD, KS, MM, NB, NHL, OS, AML, OS class antibiotic used as an STS, Ewing’s sarcoma, Wilms’ tumor, and antimitotic carcinoma of the head and neck, breast, thyroid gland, genitourinary tract, and lung

E/I





leukemia, kidney cancer

Anticipated

E/I





leukemia, MM, breast cancer

lindane, hexachlorocyclohexane Anticipated (technical grade), and other hexachlorocyclohexane isomers

E/I





paramyeloblastic leukemia, myelomonocytic leukemia, lung cancer

Nitrogen mustard hydrochloride Anticipated

M



oxymetholone

M

thorium dioxide

Bischloroethyl nitrosourea (BCNU)

2A 26, Sup 7

cisplatin

2A 26, Sup 7

1‐(2‐Chloroethyl)‐ 3‐cyclohexyl‐ 1‐nitrosourea (CCNU) propylthiouracil vinyl chloride

2A 26, Sup 7

adriamycin

2A 10, Sup 7

3,3'‐dichlorobenzidine

2B 29, Sup 7 2B 53



Anticipated

styrene

IARC monograph chemical summaries Chemical name Group Volume

DDT (4,4'‐Dichlorodiphenyl trichloroethane) hexachlorocyclohexanes

nitrogen mustard

Anticipated

Anticipated

a synthetic androgen

E/I



leukemia, HD, NHL, polycythemia vera, leukemia, SCC mycosis fungoides, bronchogenic carcinoma, melanoma, and cancer of the kidney, gastrointestinal tract and breast AA, fanconi's anemia, paroxysmal nocturnal AML, SCC of the head and neck, and hemoglobinuria and cancer of the esophagus, liver, anogenital region, and bile‐duct ‐

leukemia, lymphoma, hematopoietic cancer Used in medical radiology as an leukemia, hemangiosarcoma, bone intravascular contrast agent for cerebral cancers, primarily cholangiocellular and limb angiography tumors

Known

M



chloroprene

Anticipated

E/I





1,3‐dichloropropene (technical grade)

Anticipated

E/I





phenoxybenzamine hydrochloride

Anticipated

M

an α‐adrenergic receptor Raynaud's disease, phlebothrombosis, blocking agent used to causalgia, diabetic gangrene, phlebitis, treat peripheral vascular chronic skin ulcers, urinary‐bladder disorders problems, and sweating caused by pheochromocytoma

leukemia and other cancers of the liver, lung, digestive, hematopoietic system leukemia, pancreatic cancer, histiocytic lymphoma

2B 79 1 97, 100F

2B 20, Sup 7 2A 9, Sup 7

‐ styrene





2B 60, 82 ‐



chloroprene

2B 71

1,3‐dichloropropene (technical‐grade)

2B 41,

phenoxybenzamine hydrochloride

2B 24,



Sup 7, 71 Sup 7

CLL, SCC, small‐cell carcinoma, bladder cancer, esophageal cancer

Chemicals are ordered according to the cancers and diseases they are reported to cause, then by their "Known" or "Anticipated" NTP Report on Carcinogens (RoC) class, then alphabetically by CTD chemical name. Exposure types are expressed as E/I (Environmental or Industrial exposure) or M (Medical/Therapeutic exposure). Chemicals in gray italics did not have associated human CTD data and were excluded from further data analysis. The IARC column is shortened to show only the source and class information, if available, and grayed due to the absence of associations with human leukemias.

Int. J. Environ. Res. Public Health 2012, 9

8

Table S2a. Chemicals identified as non-leukemogenic carcinogens, according to both the IARC monographs and the NTP 12th Report on Carcinogens. IARC monograph chemical summaries CTD chemical name (Compound CID, if used) CASRN Reported to cause lymphomas azathioprine (2265)

446‐86‐6

Chemical name Azathioprine

Group Volume 1 26,

Exposure Mode of type action M

Sup 7, 100A

cyclosporine (2907)

59865‐13‐3 Cyclosporine

1 50,

M

1746‐1‐6

2,3,7,8‐Tetrachloro dibenzo‐ para‐dioxin

1 69, E/I 100F

tetrachloroethylene (31373)

127‐18‐4

Tetrachloro‐ethylene 2a 63

Resultant diseases

induces DNA damage RA, renal‐transplant allograft NHL, SCC, mesenchymal by 6‐thioguanine rejection preventative, tumors, hepatobiliary accumulation in the systemic lupus, carcinoma DNA of patients inflammatory bowel disease



100A

TCDD (15625)

NTP RoC chemical summaries

Treatment use

allograft rejection KS, lymphoma, skin cancer preventative, systemic lupus, dermatomyositis, Grave's disease, Crohn's disease, myasthenia gravis, chronic hepatitis, biliary cirrhosis, uveitis, sarcoidosis, psoriasis

Chemical name

Class

Exposure Mode of type action

Treatment use

Resultant diseases

Azathioprine

Known

M



kidney‐transplant rejection NHL, SCC, mesenchymal preventative, systemic lupus, tumor, hepatobiliary Crohn's disease, autoimmune carcinoma hemolytic anemia, myasthenia gravis, chronic hepatitis, ulcerative colitis

Cyclosporin A

Known

M



RA, psoriasis

KS, skin cancer, lymphoma

sustained AHR binding and anti‐ apoptosis gene expression E/I ‐



NHL, STS, lung cancer

2,3,7,8‐Tetrachloro dibenzo‐ para‐dioxin

Known

E/I





NHL, lung cancer



NHL, esophageal cancer, cervical cancer, and a potential risk of leukemia

Tetrachloro‐ethylene Anticipated

E/I





NHL, esophageal cancer, cervical cancer

E/I



lung cancer, larynx cancer

Mustard gas

Known

E/I

causes genetic damage chromosome aberrations



cancer of the lung, and larynx, pharynx, and upper‐respiratory tract



lung cancer

Bis(chloromethyl) ether

Known

E/I

an alkylating agent that induces unschedule DNA synthesis and forms chromosome aberrations in white blood cells



lung cancer

an alkylating agent



lung cancer

Technical‐Grade Chloromethyl Methyl Ether

Known

E/I

an alkylating agent forms chromosome aberrations in white blood cells



lung cancer



bladder cancer, liver cancer



bladder cancer



bladder cancer

Reported to cause primarily lung and respiratory cancers sulfur mustard (10461)

505‐60‐2

Sulfur mustard

1 9,



Sup 7, 100F bis(chloromethyl) ether 542‐88‐1

chloromethyl methyl ether

107‐30‐2

Bis(chloromethyl) Ether

1 4,

E/I

Chloromethyl methyl ether

1 4, E/I Sup7, 100F

an alkylating agent

Sup7, 100F

Reported to cause primarily bladder, liver, kidney, and gastrointestinal cancers 4‐aminobiphenyl (7102) 92‐67‐1

4‐aminobiphenyl

1 1, Sup7, 99, 100F

E/I





bladder cancer

4‐aminobiphenyl

Known

E/I

benzidine (7111)

Benzidine

1 29, Sup7, 99, 100F

E/I



bladder cancer

Benzidine

Known

E/I

2‐naphthylamine (7057) 91‐59‐8

2‐Naphthylamine

1 4, Sup 7, 99, 100F

E/I

the metabolite forms DNA adducts to induce clastogenic effects the metabolite forms DNA adduct



bladder cancer

2‐Naphthylamine

Known

E/I

phenacetin (4754)

Phenacetin

1 24,

E/I



Anticipated

M



inflammation, edema

92‐87‐5

62‐44‐2

aristolochic acid

313‐67‐7

Plants containing Aristolochic acid

Sup 7, 100A 1 82, M 100A

2‐toluidine

95‐53‐4

ortho‐Toluidine

1 77,

E/I 99, 100F

the metabolite forms DNA adduct

fever reducer, pain reliever kidney cancer, ureter cancer



Phenacetin

human nephropathy, urothelial cancer

Aristolochic acid

Known

M

bladder cancer

ortho‐Toluidine

Anticipated

E/I

‐ the metabolite binds DNA and forms chromosome aberrations in the white blood cells the metabolite likely forms adducts with blood‐ serum proteins ‐ pain, fever

a relatively selective inhibitor of phospholipase A2



TCC of the renal pelvis

antibacterial, antiviral, antifungal, antitumor, inflammatory diseases, venomous infections ‐

urothelial cancer, aristolochic acid nephropathy, progressive interstitial renal fibrosis bladder cancer

Int. J. Environ. Res. Public Health 2012, 9

9 Table S2a. Cont.

IARC monograph chemical summaries Chemical Exposure Mode of Treatment Resultant name Group Volume type action diseases Reported to cause primarily reproductive orang cancers CTD chemical name (Compound CID, if used) CASRN

diethylstilbestrol (448537) 56‐53‐1

Diethylstilbestrol 1 21, adenocarcinoma of Sup 7, testicular 100A

tamoxifen (2733526) 10540‐29‐1 Tamoxifen

1 66, 100A M

E/I



Chemical

NTP RoC chemical summaries Exposure Mode of Treatment Resultant name Class type actionuse diseases

Diethylstilbestrol

Known

use

menopausal symptoms, clear‐cell senile vaginitis, vulvar the vagina/cervix,

dystrophy cancer, breast cancer, endometrial cancer - metastatic breast cancer endometrium cancer

Tamoxifen

Known M

M

a synthetic estrogen prostate cancer, breast clear‐cell cancer in women adenocarcinoma of the (postmenopausal) vagina/cervix, breast cancer, testicular cancer ‐ contralateral breast cancer endometrial uterine cancer

Chemicals are ordered according to the cancers and diseases they are reported to cause, then by their IARC Group numbers {1, 2a, 2b, 3, 4}, then alphabetically by CTD chemical name. Exposure types are indicated as E/I (Environmental or Industrial) or M (Medical/Therapeutic ). Chemicalsgirnay italics did not have associated human CTD data and were excluded from further data analysis.

Table S2b. Chemicals identified as non-leukemogenic carcinogens, according to the IARC monographs. CTD chemical name (Compound CID, if used) Chemical CASRN

IARC monograph chemical summaries

Exposure Mode of Treatment Resultant name Group Volume type action use diseases Reported to cause bladder, nd gastrointestinal cancers liver, kidney, a Chlornaphazine 1 4, M a bifunctional HD, polycythemia vera invasive chlornaphazine 494‐03‐1 carcinoma of the Sup7, alkylating agent bladder, papillary 100A carcinoma grade II of the bladder

NTP RoC chemical summaries Chemical name



Class



Chemical names in gray italics did not have associated human CTD data and were excluded from further data analysis. The NTP data is truncated to show only the source and class information. Exposure type is indicated asM for a Medical/Therapeutic.

Int. J. Environ. Res. Public Health 2012, 9 Table S3. Definitions of disease abbreviations used in Tables S1 and S2. Abbreviations AA AML ALL CLL CML CTCL GN HD KS MM MDS NB NHL NPC OS RA SCC STS TCC WM WG

Full name of abbreviated diseases Aplastic Anemia Acute Myeloid Leukemia (also Acute Non-lymphocytic Leukemia) Acute Lymphoblastic Leukemia (also Acute Lymphocytic Leukemia) Chronic Lymphocytic Leukemia Chronic Myeloid Leukemia (also Chronic Non-lymphocytic Leukemia) Cutaneous T-cell Lymphoma Glomerulonephritis (also Glomerular nephritis) Hodgkin's Disease (also Hodgkin's Lymphoma) Kaposi's Sarcoma (also Kaposi's Disease) Multiple Myeloma Myelodysplastic Syndrome Neuroblastoma Non-Hodgkin's Lymphoma (also Lymphosarcoma) Nasopharyngeal Cancer Osteogenic Sarcoma (also Osteosarcoma) Rheumatoid Arthritis Squamous Cell Carcinoma Soft-Tissue Sarcoma (also rhabdomyosarcoma) Transitional Cell Carcinoma Waldenström macroglobulinemia (also Lymphoplasmacytic Lymphoma) Wegener's granulomatosis

10

Int. J. Environ. Res. Public Health 2012, 9

11

Table S4. Number of leukemogens and non-leukemogenic carcinogens targeting each of the 250 KEGG human pathways; Membership probabilities of each pathway in the two clusters in Figure 1; and, the Mean decrease in gini index scores from the random forest classification method. KEGG ID

All Pathways

path:hsa00010 path:hsa00020 path:hsa00030

Glycolysis / Gluconeogenesis Citrate cycle (TCA cycle) Pentose phosphate pathway Pentose and glucuronate interconversions Fructose and mannose metabolism Galactose metabolism Ascorbate and aldarate metabolism Fatty acid biosynthesis Fatty acid elongation Fatty acid metabolism Synthesis and degradation of ketone bodies Steroid biosynthesis Primary bile acid biosynthesis Ubiquinone and other terpenoidquinone biosynthesis Steroid hormone biosynthesis Oxidative phosphorylation Purine metabolism Caffeine metabolism Pyrimidine metabolism Alanine, aspartate and glutamate metabolism

path:hsa00040 path:hsa00051 path:hsa00052 path:hsa00053 path:hsa00061 path:hsa00062 path:hsa00071 path:hsa00072 path:hsa00100 path:hsa00120 path:hsa00130 path:hsa00140 path:hsa00190 path:hsa00230 path:hsa00232 path:hsa00240 path:hsa00250

Disease / Biological Pathway* 0 0 0

Leukemogens (n = 29) No. % 8 28 1 3 3 10

Non- Leukemogenic Carcinogens (n = 11) No. % 4 36 1 9 2 18

Cluster 0 Probability

Cluster 1 Probability

0.68 0.99 0.98

0.33 0.01 0.02

Mean Decrease Gini 0.02 0.03 0.02

0

7

24

5

45

0.89

0.11

0.14

0 0 0 0 0 0

0 3 6 2 5 6

0 10 21 7 17 21

2 3 4 1 2 2

18 27 36 9 18 18

1.00 0.98 0.87 0.95 0.82 0.93

0.00 0.02 0.13 0.05 0.18 0.07

0.01 0.07 0.09 0.00 0.01 0.02

0

1

3

2

18

0.85

0.15

0.02

0 0

3 0

10 0

2 3

18 27

0.88 1.00

0.12 0.00

0.03 0.11

0

0

0

2

18

1.00

0.00

0.03

0 0 0 0 0

14 0 7 3 7

48 0 24 10 24

5 1 3 8 0

45 9 27 73 0

0.03 0.64 0.01 0.30 0.66

0.97 0.36 0.99 0.70 0.35

0.09 0.04 0.04 0.36 0.05

0

3

10

3

27

0.95

0.05

0.03

Int. J. Environ. Res. Public Health 2012, 9

12 Table S4. Cont.

KEGG ID path:hsa00260 path:hsa00270 path:hsa00280 path:hsa00290 path:hsa00300 path:hsa00310 path:hsa00330 path:hsa00340 path:hsa00350 path:hsa00360 path:hsa00380 path:hsa00400 path:hsa00410 path:hsa00430 path:hsa00450 path:hsa00460 path:hsa00471 path:hsa00472 path:hsa00480 path:hsa00500

All Pathways Glycine, serine and threonine metabolism Cysteine and methionine metabolism Valine, leucine and isoleucine degradation Valine, leucine and isoleucine biosynthesis Lysine biosynthesis Lysine degradation Arginine and proline metabolism Histidine metabolism Tyrosine metabolism Phenylalanine metabolism Tryptophan metabolism Phenylalanine, tyrosine and tryptophan biosynthesis beta-Alanine metabolism Taurine and hypotaurine metabolism Selenocompound metabolism Cyanoamino acid metabolism D-Glutamine and D-glutamate metabolism D-Arginine and D-ornithine metabolism Glutathione metabolism Starch and sucrose metabolism

Disease / Biological Pathway*

Leukemogens (n = 29) No. %

Non- Leukemogenic Carcinogens (n = 11) No. %

Cluster 0 Probability

Cluster 1 Probability

Mean Decrease Gini

0

6

21

3

27

0.86

0.14

0.04

0

4

14

1

9

0.81

0.19

0.03

0

4

14

2

18

0.96

0.04

0.02

0

0

0

2

18

1.00

0.00

0.02

0 0 0 0 0 0 0

1 7 6 5 7 4 10

3 24 21 17 24 14 34

1 2 5 2 3 3 7

9 18 45 18 27 27 64

1.00 0.82 0.37 0.61 0.24 0.58 0.02

0.00 0.18 0.63 0.39 0.77 0.42 0.98

0.00 0.02 0.07 0.02 0.03 0.06 0.12

0

0

0

0

0

1.00

0.00

0.02

0 0 0 0

8 0 6 0

28 0 21 0

3 2 2 2

27 18 18 18

0.79 1.00 0.91 0.91

0.21 0.00 0.09 0.09

0.02 0.04 0.01 0.01

0

3

10

2

18

0.93

0.07

0.03

0

0

0

2

18

1.00

0.00

0.03

0 0

18 1

62 3

3 4

27 36

0.02 0.84

0.98 0.16

0.10 0.10

Int. J. Environ. Res. Public Health 2012, 9

13 Table S4. Cont.

KEGG ID

All Pathways

path:hsa00510 path:hsa00511 path:hsa00512 path:hsa00514

N-Glycan biosynthesis Other glycan degradation Mucin type O-Glycan biosynthesis Other types of O-glycan biosynthesis Amino sugar and nucleotide sugar metabolism Butirosin and neomycin biosynthesis Glycosaminoglycan degradation Glycosaminoglycan biosynthesis— chondroitin sulfate Glycosaminoglycan biosynthesis— keratan sulfate Glycosaminoglycan biosynthesis— heparan sulfate Glycerolipid metabolism Inositol phosphate metabolism Glycosylphosphatidylinositol(GPI)anchor biosynthesis Glycerophospholipid metabolism Ether lipid metabolism Arachidonic acid metabolism Linoleic acid metabolism Alpha-Linolenic acid metabolism Sphingolipid metabolism Glycosphingolipid biosynthesis—lacto and neolacto series

path:hsa00520 path:hsa00524 path:hsa00531 path:hsa00532 path:hsa00533 path:hsa00534 path:hsa00561 path:hsa00562 path:hsa00563 path:hsa00564 path:hsa00565 path:hsa00590 path:hsa00591 path:hsa00592 path:hsa00600 path:hsa00601

Disease / Biological Pathway* 0 0 0 0

Leukemogens (n = 29) No. % 3 10 1 3 3 10 4 14

Non- Leukemogenic Carcinogens (n = 11) No. % 3 27 1 9 3 27 3 27

Cluster 0 Probability

Cluster 1 Probability

0.87 0.96 0.93 0.77

0.14 0.04 0.07 0.23

Mean Decrease Gini 0.02 0.03 0.05 0.13

0

2

7

3

27

0.84

0.16

0.09

0 0

1 2

3 7

2 2

18 18

1.00 0.98

0.00 0.03

0.04 0.01

0

4

14

3

27

0.62

0.38

0.01

0

2

7

2

18

1.00

0.00

0.01

0

3

10

2

18

0.97

0.03

0.02

0 0

5 2

17 7

2 3

18 27

0.9 0.9

0.11 0.10

0.01 0.05

0

2

7

1

9

0.77

0.23

0.02

0 0 0 0 0 0

2 1 13 10 0 6

7 3 45 34 0 21

4 3 6 4 2 3

36 27 55 36 18 27

0.67 0.91 0.03 0.01 1.00 0.96

0.33 0.09 0.97 0.99 0.00 0.04

0.07 0.08 0.20 0.10 0.04 0.03

0

2

7

3

27

0.98

0.02

0.03

Int. J. Environ. Res. Public Health 2012, 9

14 Table S4. Cont.

KEGG ID path:hsa00603 path:hsa00604 path:hsa00620 path:hsa00630 path:hsa00640 path:hsa00650 path:hsa00670 path:hsa00730 path:hsa00740 path:hsa00750 path:hsa00760 path:hsa00770 path:hsa00780 path:hsa00785 path:hsa00790 path:hsa00830 path:hsa00860 path:hsa00900 path:hsa00910 path:hsa00920 path:hsa00970 path:hsa00980

All Pathways Glycosphingolipid biosynthesis— globo series Glycosphingolipid biosynthesis— ganglio series Pyruvate metabolism Glyoxylate and dicarboxylate metabolism Propanoate metabolism Butanoate metabolism One carbon pool by folate Thiamine metabolism Riboflavin metabolism Vitamin B6 metabolism Nicotinate and nicotinamide metabolism Pantothenate and CoA biosynthesis Biotin metabolism Lipoic acid metabolism Folate biosynthesis Retinol metabolism Porphyrin and chlorophyll metabolism Terpenoid backbone biosynthesis Nitrogen metabolism Sulfur metabolism Aminoacyl-tRNA biosynthesis Metabolism of xenobiotics by cytochrome P450

Disease / Biological Pathway*

Leukemogens (n = 29) No. %

Non- Leukemogenic Carcinogens (n = 11) No. %

Cluster 0 Probability

Cluster 1 Probability

Mean Decrease Gini

0

1

3

1

9

0.96

0.04

0.04

0

3

10

2

18

0.99

0.01

0.03

0

5

17

2

18

0.86

0.14

0.01

0

1

3

2

18

0.99

0.01

0.02

0 0 0 0 0 0

5 3 4 0 2 2

17 10 14 0 7 7

2 2 2 2 2 2

18 18 18 18 18 18

0.78 0.87 0.95 1.00 1.00 1.00

0.23 0.13 0.05 0.00 0.00 0.00

0.01 0.02 0.02 0.02 0.04 0.01

0

3

10

3

27

0.89

0.11

0.02

0 0 0 0 0 0 0 0 0 0

2 0 1 3 15 3 1 8 1 2

7 0 3 10 52 10 3 28 3 7

2 1 0 4 8 2 2 3 4 2

18 9 0 36 73 18 18 27 36 18

0.96 1.00 0.36 0.82 0.02 0.47 0.97 0.49 0.47 0.78

0.04 0.00 0.64 0.18 0.98 0.53 0.04 0.51 0.53 0.22

0.01 0.00 0.00 0.04 0.15 0.11 0.02 0.02 0.11 0.05

0

20

69

7

64

0.00

1.00

0.15

Int. J. Environ. Res. Public Health 2012, 9

15 Table S4. Cont.

KEGG ID

All Pathways

Disease / Biological Pathway*

path:hsa00982 path:hsa00983 path:hsa01040 path:hsa01100 path:hsa02010 path:hsa03008 path:hsa03010 path:hsa03013 path:hsa03015 path:hsa03018 path:hsa03020 path:hsa03022 path:hsa03030 path:hsa03040 path:hsa03050 path:hsa03060 path:hsa03320 path:hsa03410 path:hsa03420 path:hsa03430 path:hsa03440 path:hsa03450 path:hsa03460 path:hsa04010 path:hsa04012 path:hsa04020

Drug metabolism—cytochrome P450 Drug metabolism—other enzymes Biosynthesis of unsaturated fatty acids Metabolic pathways ABC transporters Ribosome biogenesis in eukaryotes Ribosome RNA transport mRNA surveillance pathway RNA degradation RNA polymerase Basal transcription factors DNA replication Spliceosome Proteasome Protein export PPAR signaling pathway Base excision repair Nucleotide excision repair Mismatch repair Homologous recombination Non-homologous end-joining Fanconi anemia pathway MAPK signaling pathway ErbB signaling pathway Calcium signaling pathway

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Leukemogens (n = 29) No. % 13 45 9 31 4 14 12 41 7 24 3 10 2 7 5 17 3 10 5 17 2 7 5 17 1 3 9 31 4 14 3 10 7 24 11 38 4 14 3 10 8 28 3 10 7 24 17 59 15 52 4 14

Non- Leukemogenic Carcinogens (n = 11) No. % 5 45 5 45 2 18 5 45 1 9 2 18 5 45 2 18 4 36 3 27 0 0 3 27 3 27 0 0 6 55 1 9 4 36 2 18 4 36 3 27 4 36 1 9 4 36 7 64 4 36 4 36

Cluster 0 Probability

Cluster 1 Probability

0.01 0.00 0.64 0.00 0.27 0.64 0.62 0.01 0.67 0.82 0.79 0.76 1.00 0.63 0.57 0.36 0.1 0.02 0.59 0.82 0.19 0.81 0.18 0.00 0.00 0.00

0.99 1.00 0.36 1.00 0.73 0.36 0.38 0.99 0.33 0.18 0.21 0.24 0.00 0.38 0.43 0.64 0.90 0.98 0.41 0.18 0.81 0.19 0.82 1.00 1.00 1.00

Mean Decrease Gini 0.11 0.16 0.02 0.09 0.04 0.03 0.15 0.04 0.03 0.03 0.01 0.02 0.06 0.02 0.07 0.02 0.02 0.03 0.02 0.04 0.03 0.02 0.03 0.11 0.07 0.06

Int. J. Environ. Res. Public Health 2012, 9

16 Table S4. Cont.

KEGG ID

All Pathways

path:hsa04060 path:hsa04062 path:hsa04070

Cytokine-cytokine receptor interaction Chemokine signaling pathway Phosphatidylinositol signaling system Neuroactive ligand-receptor interaction Cell cycle Oocyte meiosis p53 signaling pathway Ubiquitin mediated proteolysis Sulfur relay system SNARE interactions in vesicular transport Regulation of autophagy Protein processing in endoplasmic reticulum Lysosome Endocytosis Phagosome Peroxisome mTOR signaling pathway Apoptosis Cardiac muscle contraction Vascular smooth muscle contraction Vascular smooth muscle contraction Wnt signaling pathway Dorso-ventral axis formation

path:hsa04080 path:hsa04110 path:hsa04114 path:hsa04115 path:hsa04120 path:hsa04122 path:hsa04130 path:hsa04140 path:hsa04141 path:hsa04142 path:hsa04144 path:hsa04145 path:hsa04146 path:hsa04150 path:hsa04210 path:hsa04260 path:hsa04270 path:hsa04270 path:hsa04310 path:hsa04320

Disease / Biological Pathway* 0 0 0

Leukemogens (n = 29) No. % 12 41 10 34 3 10

Non- Leukemogenic Carcinogens (n = 11) No. % 6 55 4 36 2 18

Cluster 0 Probability

Cluster 1 Probability

0.00 0.00 0.63

1.00 1.00 0.37

Mean Decrease Gini 0.07 0.04 0.08

0

8

28

4

36

0.00

1.00

0.03

0 0 0 0 0

13 12 16 3 1

45 41 55 10 3

7 3 8 2 0

64 27 73 18 0

0.21 0.01 0.11 0.62 0.75

0.80 0.99 0.89 0.38 0.25

0.09 0.04 0.06 0.02 0.00

0

5

17

3

27

0.75

0.25

0.02

0

6

21

3

27

0.37

0.63

0.04

0

11

38

3

27

0.00

1.00

0.05

0 0 0 0 0 0 0 0 0 0 0

2 10 6 7 11 18 2 12 12 10 8

7 34 21 24 38 62 7 41 41 34 28

3 3 6 2 2 9 0 6 6 5 3

27 27 55 18 18 82 0 55 55 45 27

0.63 0.00 0.01 0.54 0.00 0.01 0.61 0.05 0.05 0.00 0.01

0.38 1.00 1.00 0.46 1.00 0.99 0.39 0.95 0.95 1.00 0.99

0.09 0.03 0.07 0.04 0.04 0.08 0.02 0.04 0.04 0.10 0.03

Int. J. Environ. Res. Public Health 2012, 9

17 Table S4. Cont.

KEGG ID

All Pathways

path:hsa04330 path:hsa04340 path:hsa04350 path:hsa04360 path:hsa04370 path:hsa04380 path:hsa04510 path:hsa04512 path:hsa04514 path:hsa04520 path:hsa04530 path:hsa04540 path:hsa04610 path:hsa04612 path:hsa04614 path:hsa04620 path:hsa04621 path:hsa04622 path:hsa04623 path:hsa04630 path:hsa04640

Notch signaling pathway Hedgehog signaling pathway TGF-beta signaling pathway Axon guidance VEGF signaling pathway Osteoclast differentiation Focal adhesion ECM-receptor interaction Cell adhesion molecules (CAMs) Adherens junction Tight junction Gap junction Complement and coagulation cascades Antigen processing and presentation Renin-angiotensin system Toll-like receptor signaling pathway NOD-like receptor signaling pathway RIG-I-like receptor signaling pathway Cytosolic DNA-sensing pathway Jak-STAT signaling pathway Hematopoietic cell lineage Natural killer cell mediated cytotoxicity T cell receptor signaling pathway B cell receptor signaling pathway Fc epsilon RI signaling pathway

path:hsa04650 path:hsa04660 path:hsa04662 path:hsa04664

Disease / Biological Pathway* 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Leukemogens (n = 29) No. % 4 14 3 10 13 45 9 31 12 41 12 41 12 41 7 24 4 14 10 34 4 14 14 48 2 7 8 28 2 7 17 59 14 48 9 31 9 31 11 38 10 34

Non- Leukemogenic Carcinogens (n = 11) No. % 2 18 3 27 7 64 3 27 7 64 4 36 5 45 4 36 4 36 2 18 1 9 4 36 2 18 7 64 3 27 7 64 5 45 5 45 4 36 7 64 4 36

Cluster 0 Probability

Cluster 1 Probability

0.92 0.95 0.05 0.01 0.00 0.00 0.00 0.34 0.65 0.16 0.67 0.00 0.55 0.02 0.37 0.00 0.00 0.01 0.02 0.00 0.00

0.08 0.05 0.95 0.99 1.00 1.00 1.00 0.66 0.35 0.84 0.33 1.00 0.46 0.98 0.63 1.00 1.00 0.99 0.98 1.00 1.00

Mean Decrease Gini 0.02 0.06 0.08 0.04 0.08 0.05 0.08 0.10 0.05 0.05 0.02 0.05 0.04 0.09 0.13 0.06 0.10 0.10 0.03 0.07 0.05

0

13

45

7

64

0.00

1.00

0.05

0 0 0

12 13 9

41 45 31

4 4 6

36 36 55

0.00 0.01 0.00

1.00 1.00 1.00

0.06 0.05 0.05

Int. J. Environ. Res. Public Health 2012, 9

18 Table S4. Cont.

KEGG ID

All Pathways

path:hsa04666 path:hsa04670

Fc gamma R-mediated phagocytosis Leukocyte transendothelial migration Intestinal immune network for IgA production Circadian rhythm—mammal Long—term potentiation Synaptic vesicle cycle Neurotrophin signaling pathway Retrograde endocannabinoid signaling Glutamatergic synapse Cholinergic synapse Serotonergic synapse GABAergic synapse Dopaminergic synapse Long-term depression Olfactory transduction Taste transduction Phototransduction Regulation of actin cytoskeleton Insulin signaling pathway GnRH signaling pathway Progesterone-mediated oocyte maturation Melanogenesis Adipocytokine signaling pathway Type II diabetes mellitus

path:hsa04672 path:hsa04710 path:hsa04720 path:hsa04721 path:hsa04722 path:hsa04723 path:hsa04724 path:hsa04725 path:hsa04726 path:hsa04727 path:hsa04728 path:hsa04730 path:hsa04740 path:hsa04742 path:hsa04744 path:hsa04810 path:hsa04910 path:hsa04912 path:hsa04914 path:hsa04916 path:hsa04920 path:hsa04930

Disease / Biological Pathway* 0 0

Leukemogens (n = 29) No. % 10 34 7 24

Non- Leukemogenic Carcinogens (n = 11) No. % 3 27 4 36

Cluster 0 Probability

Cluster 1 Probability

0.03 0.51

0.97 0.49

Mean Decrease Gini 0.04 0.04

0

9

31

5

45

0.00

1.00

0.06

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

2 12 1 19 7 8 15 12 6 7 8 0 3 2 12 11 12

7 41 3 66 24 28 52 41 21 24 28 0 10 7 41 38 41

2 4 1 10 2 3 5 5 1 5 2 0 3 2 4 5 4

18 36 9 91 18 27 45 45 9 45 18 0 27 18 36 45 36

0.99 0.20 0.70 0.00 0.01 0.08 0.00 0.00 0.25 0.21 0.09 0.15 1.00 0.63 0.00 0.01 0.00

0.01 0.80 0.30 1.00 0.99 0.92 1.00 1.00 0.75 0.79 0.91 0.86 0.00 0.37 1.00 0.99 1.00

0.01 0.06 0.01 0.10 0.07 0.03 0.06 0.10 0.02 0.08 0.04 0.02 0.02 0.06 0.04 0.05 0.05

0

13

45

3

27

0.00

1.00

0.02

0 0 1

11 7 10

38 24 34

3 7 1

27 64 9

0.01 0.00 0.00

0.99 1.00 1.00

0.05 0.08 0.03

Int. J. Environ. Res. Public Health 2012, 9

19 Table S4. Cont.

KEGG ID

All Pathways

path:hsa04940 path:hsa04950

Type I diabetes mellitus Maturity onset diabetes of the young Aldosterone-regulated sodium reabsorption Endocrine and other factor-regulated calcium reabsorption Vasopressin-regulated water reabsorption Proximal tubule bicarbonate reclamation Collecting duct acid secretion Salivary secretion Gastric acid secretion Pancreatic secretion Carbohydrate digestion and absorption Protein digestion and absorption Fat digestion and absorption Bile secretion Vitamin digestion and absorption Mineral absorption Alzheimer's disease Parkinson's disease Amyotrophic lateral sclerosis (ALS) Huntington's disease Prion diseases Cocaine addiction

path:hsa04960 path:hsa04961 path:hsa04962 path:hsa04964 path:hsa04966 path:hsa04970 path:hsa04971 path:hsa04972 path:hsa04973 path:hsa04974 path:hsa04975 path:hsa04976 path:hsa04977 path:hsa04978 path:hsa05010 path:hsa05012 path:hsa05014 path:hsa05016 path:hsa05020 path:hsa05030

Disease / Biological Pathway* 1 0

Leukemogens (n = 29) No. % 9 31 2 7

Non- Leukemogenic Carcinogens (n = 11) No. % 7 64 1 9

Cluster 0 Probability

Cluster 1 Probability

0.00 0.98

1.00 0.02

Mean Decrease Gini 0.09 0.01

0

13

45

5

45

0.01

1.00

0.04

0

5

17

3

27

0.16

0.84

0.03

0

3

10

3

27

0.67

0.33

0.09

0

4

14

2

18

0.72

0.28

0.04

0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0

2 4 5 2 6 4 2 15 3 7 13 7 18 11 14 7

7 14 17 7 21 14 7 52 10 24 45 24 62 38 48 24

2 4 4 2 1 3 1 4 1 2 4 5 8 4 6 4

18 36 36 18 9 27 9 36 9 18 36 45 73 36 55 36

1.00 0.64 0.47 0.64 0.94 0.74 0.75 0.05 0.48 0.55 0.00 0.16 0.00 0.00 0.00 0.03

0.00 0.36 0.53 0.36 0.06 0.26 0.25 0.95 0.52 0.45 1.00 0.84 1.00 1.00 1.00 0.97

0.07 0.13 0.07 0.02 0.02 0.02 0.02 0.07 0.02 0.02 0.06 0.05 0.08 0.07 0.07 0.04

Int. J. Environ. Res. Public Health 2012, 9

20 Table S4. Cont.

KEGG ID

All Pathways

path:hsa05031 path:hsa05100 path:hsa05110

Amphetamine addiction Bacterial invasion of epithelial cells Vibrio cholerae infection Epithelial cell signaling in Helicobacter pylori infection Pathogenic Escherichia coli infection Shigellosis Salmonella infection Pertussis Legionellosis Leishmaniasis Chagas disease (American trypanosomiasis) African trypanosomiasis Malaria Toxoplasmosis Amoebiasis Staphylococcus aureus infection Tuberculosis Hepatitis C Measles Influenza A HTLV-I infection Herpes simplex infection Pathways in cancer Transcriptional misregulation in cancer

path:hsa05120 path:hsa05130 path:hsa05131 path:hsa05132 path:hsa05133 path:hsa05134 path:hsa05140 path:hsa05142 path:hsa05143 path:hsa05144 path:hsa05145 path:hsa05146 path:hsa05150 path:hsa05152 path:hsa05160 path:hsa05162 path:hsa05164 path:hsa05166 path:hsa05168 path:hsa05200 path:hsa05202

Disease / Biological Pathway* 0 0 0

Leukemogens (n = 29) No. % 7 24 4 14 5 17

Non- Leukemogenic Carcinogens (n = 11) No. % 4 36 4 36 2 18

Cluster 0 Probability

Cluster 1 Probability

0.06 0.65 0.67

0.94 0.35 0.33

Mean Decrease Gini 0.05 0.13 0.02

0

12

41

6

55

0.00

1.00

0.04

0 1 1 1 1 1

2 13 12 13 14 10

7 45 41 45 48 34

2 5 5 6 4 8

18 45 45 55 36 73

0.90 0.01 0.00 0.00 0.00 0.00

0.10 1.00 1.00 1.00 1.00 1.00

0.04 0.05 0.05 0.08 0.09 0.11

1

14

48

6

55

0.00

1.00

0.08

0 1 1 1 0 1 0 1 1 1 0 1 1

8 10 17 11 3 16 12 14 13 16 12 23 14

28 34 59 38 10 55 41 48 45 55 41 79 48

6 6 6 6 2 7 6 7 6 7 7 6 7

55 55 55 55 18 64 55 64 55 64 64 55 64

0.00 0.00 0.00 0.00 0.31 0.00 0.00 0.00 0.00 0.02 0.00 0.00 0.00

1.00 1.00 1.00 1.00 0.69 1.00 1.00 1.00 1.00 0.98 1.00 1.00 1.00

0.08 0.11 0.09 0.07 0.05 0.06 0.08 0.11 0.07 0.08 0.08 0.10 0.12

Int. J. Environ. Res. Public Health 2012, 9

21 Table S4. Cont. Disease / Biological Pathway*

Leukemogens (n = 29) No. % 20 69 8 28 18 62 16 55 15 52 20 69 11 38 9 31 19 66 19 66 18 62 11 38 18 62 16 55 7 24 10 34 8 28 11 38 8 28 10 34 2 7 11 38

Non- Leukemogenic Carcinogens (n = 11) No. % 6 55 4 36 8 73 7 64 6 55 7 64 7 64 6 55 7 64 6 55 7 64 3 27 9 82 6 55 4 36 6 55 5 45 6 55 5 45 6 55 4 36 5 45

KEGG ID

All Pathways

path:hsa05210 path:hsa05211 path:hsa05212 path:hsa05213 path:hsa05214 path:hsa05215 path:hsa05216 path:hsa05217 path:hsa05218 path:hsa05219 path:hsa05220 path:hsa05221 path:hsa05222 path:hsa05223 path:hsa05310 path:hsa05320 path:hsa05322 path:hsa05323 path:hsa05330 path:hsa05332 path:hsa05340 path:hsa05410

Colorectal cancer 1 Renal cell carcinoma 0 Pancreatic cancer 1 Endometrial cancer 1 Glioma 1 Prostate cancer 1 Thyroid cancer 1 Basal cell carcinoma 1 Melanoma 1 Bladder cancer 1 Chronic myeloid leukemia 1 Acute myeloid leukemia 1 Small cell lung cancer 1 Non-small cell lung cancer 1 Asthma 1 Autoimmune thyroid disease 1 Systemic lupus erythematosus 1 Rheumatoid arthritis 1 Allograft rejection 0 Graft-versus-host disease 0 Primary immunodeficiency 0 Hypertrophic cardiomyopathy (HCM) 1 Arrhythmogenic right ventricular 0 2 7 2 18 cardiomyopathy (ARVC) Dilated cardiomyopathy 0 9 31 4 Viral myocarditis 1 8 28 4 * 0 indicates biological pathway and 1 indicates disease pathway.

path:hsa05412 path:hsa05414 path:hsa05416

Cluster 0 Probability

Cluster 1 Probability

0.00 0.00 0.09 0.00 0.00 0.14 0.00 0.12 0.00 0.10 0.01 0.06 0.07 0.00 0.00 0.00 0.37 0.00 0.00 0.00 0.55 0.01

1.00 1.00 0.91 1.00 1.00 0.86 1.00 0.88 1.00 0.91 1.00 0.94 0.93 1.00 1.00 1.00 0.64 1.00 1.00 1.00 0.45 0.99

Mean Decrease Gini 0.10 0.04 0.11 0.05 0.09 0.14 0.09 0.18 0.08 0.08 0.08 0.06 0.07 0.09 0.06 0.06 0.10 0.07 0.07 0.06 0.13 0.04

0.29

0.71

0.03

36 36

0.04 0.03

0.96 0.97

Int. J. Environ. Res. Public Health 2012, 9

22

Table S5. Probability that each of the 40 chemicals is a member of each of the 7 clusters in Figure 4. Chemical Name 1,3-butadiene 3,3'-dichlorobenzidine adriamycin bcnu benzene busulfan ccnu chlorambucil chloramphenicol cisplatin cyclophosphamide ddt ethylene_oxide etoposide formaldehyde alpha-hexachlorocyclohexane beta-hexachlorocyclohexane delta-hexachlorocyclohexane lindane me_ccnu melphalan nitrogen_mustard oxymethalone propylthiouracil styrene teniposide

Cluster Number 2 6 T 6 6 6 6 6 6 6 5 6 2 6 6 6 6 6 6 6 6 6 6 6 5 5

Cluster 0 0 0.045 0 0 0 0 0.012 0 0 0 0 0 0.003 0 0.021 0.045 0 0.009 0 0.001 0 0 0 0.041 0 0

Cluster 1 0 0 0 0 0 0 0.249 0 0 0 0.004 0 0 0 0 0 0 0.06 0 0 0 0 0 0 0 0.016

Cluster Membership Cluster 2 Cluster 3 Cluster 4 1 0 0 0 0 0 0 0 0 0 0 0.027 0 0 0 0.001 0 0.485 0.176 0 0.007 0 0 0 0 0 0 0 0 0 0.005 0 0 0 0 0 0.948 0 0.023 0 0 0 0 0.003 0.782 0 0 0 0 0 0.015 0 0.088 0 0 0.002 0.055 0 0 0 0.004 0 0.19 0 0 0 0 0 0 0.542 0.336 0 0.143 0 0.09 0.037 0.179 0.021

Cluster 5 0 0.097 0 0.098 0 0.017 0.16 0.025 0.005 0 0.979 0.005 0.026 0 0.005 0.091 0 0.029 0.142 0.093 0.175 0.045 0.028 0 0.47 0.735

Cluster 6 0 0.858 1 0.875 1 0.497 0.396 0.975 0.995 1 0.012 0.995 0 1 0.189 0.864 0.985 0.814 0.801 0.906 0.631 0.955 0.972 0.081 0.297 0.012

Int. J. Environ. Res. Public Health 2012, 9

23 Table S5. Cont.

Chemical Name thiotepa trichloroethylene vinyl_chloride 2-NAPHTHYLAMINE 4-AMINOBIPHENYL AZATHIOPRINE BENZIDINE CYCLOSPORINE DIETHYLSTILBESTROL PHENACETIN SULFUR_MUSTARD TAMOXIFEN TCDD TETRACHLOROETHYLENE

Cluster Number 5 6 4 6 3 6 6 6 6 1 6 6 0 6

Cluster 0 0 0 0 0 0 0 0 0.899 0 0 0 0 1 0

Cluster 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0

Cluster Membership Cluster 2 Cluster 3 Cluster 4 0 0 0 0 0.001 0 0 0 1 0 0.334 0 0 1 0 0 0 0 0 1 0 0 0 0.033 0 0.028 0.012 0 0 0 0 0 0 0 0 0 0 0 0 0 0.001 0

Cluster 5 1 0.028 0 0.001 0 0 0 0 0.043 0 0.001 0 0 0.306

Cluster 6 0 0.971 0 0.665 0 1 0 0.068 0.917 0 0.999 1 0 0.693

Int. J. Environ. Res. Public Health 2012, 9

24

Table S6. Grouping of chemicals into 5 groups for the five-fold cross-validation performed in the One-class SVM analyses. Chemical group (fold) no. 1

2

3

4

5

Chemical Name 1,3-butadiene 3,3'-dichlorobenzidine adriamycin bcnu benzene busulfan ccnu chlorambucil chloramphenicol cisplatin cyclophosphamide ddt ethylene_oxide etoposide formaldehyde alpha-hexachlorocyclohexane beta-hexachlorocyclohexane delta-hexachlorocyclohexane lindane me_ccnu melphalan nitrogen_mustard oxymethalone propylthiouracil styrene teniposide thiotepa trichloroethylene vinyl_chloride

© 2012 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

Suggest Documents