The Science of the Total Environment, Supplement 1993. Elsevier Science Publishers B.V., Amsterdam. 1459. Statistical approach to chemicals classification.
The Science of the Total Environment, Supplement 1993 Elsevier Science Publishers B.V., Amsterdam
1459
Statistical approach to chemicals classification Monika Nendza* and Andrea Wenzel Fraunhofer-Institut fiir Umweltchemie und (~kotoxikologie, D-57392 Schmallenberg-Grafschaft, FRG
ABSTRACT Principal component analysis was employed to test its applicability for classifying potential environmental contaminants by modes of action with respect to various targets. Discrimination of baseline from specific toxicants is necessary to allow the rational QSAR prediction from either log Pow or other descriptors, thus selecting the appropriate endpoints and models. Taking pesticides from different classes with known effects as an example, a reduction of the relevant toxicity parameters was achieved by combining data sets. Based on the principal components QSARs reflecting the underlying common processes were derived. The predominance of lipophilicity-dependent baseline toxicity is indicated for various classes of compounds with respect to most endpoints, whereas the pattern of excess toxicity revealed to be specific for each target. Multivariate statistics proved to be a useful tool to recognize compounds of analogous mode of action, as required for consecutive QSAR application.
Key words: QSAR; Mode of action; Lipophilicity; Specific toxicity; Multivariate statistics INTRODUCTION
The ecotoxic hazard of a chemical can be described best by its activity towards a variety of species assumed to represent the diverse biota in the environment, hence reflecting the possible modes of interaction with various organisms. For series of compounds, this leads to a multidimensional parameter space, and inherent possible intercorrelations are not easily detected. Principal component analysis (PCA) represents a means to analyse such data matrices containing results for a number of chemicals obtained in
* Current address: Analytisches Laboratorium, Bahnhofstr. 1, D-24816 Luhnstedt, Germany. 1993 Elsevier Science Publishers B.V.
1460
M O N I r , A N E I ~ 7 _ ~ A N D A I ~ R . E A VCENZEL
a series of test systems simultaneously (Jolliffe, 1986; Johnson and Wichern, 1988; Martens and Naes, 1989). The objective of this method is to reveal the extent and the nature of the basic properties encoded in the data sets. By explaining the variance-covariance structure of the data matrix through linear combinations, reduction in the dimensionality of the data set can be achieved. Hypothetical variables, the principal components (PCs), are extracted from the data matrix, condensing the information of the test results. Thereby collinearities can be recognized. Each PC is assumed to represent a characteristic scale for conjoint effects and interactions underlying the test results. Corresponding patterns are combined, the respective PCs are used to identify and characterize, for example, the relative influence of a compound's physico-chemical properties on its toxicity in several tests simultaneously. If chemicals act uniformly, their bioactivity can be described by the same structure-activity model (QSAR). A combination of these two approaches is used to discriminate chemicals of different modes of action and consecutively to apply QSAR analysis. Such a procedure may yield a more reliable operation of QSAR-based predictions for properties and effects of compounds.
TABLE 1 Toxicity data {log 1/[mol/l(kg)]} and log Pow values for the compounds analysed: herbicides (H1-Hs), insecticides (I1-I5), rodenticides (R1-R 4) Compounds
log Pow
LC50 fish
ECs0 daphnia
EC50 algae
LD50 bird
LD50 rat
H 1 Benazolin-ethyl H 2 Ethofumesat H 3 Atrazin H 4 Phenmedipham H 5 Diuron 11 Fenoxycarb 12 IV1 13 IV2 14 IV3 15 IV4 R 1 Chlorphacinon R e RV1 R 3 Bromadiolon R 4 RV2
2.28 2.16 2.82 3.52 2.86 4.63 2.82 2.79 4.48 0.33 4.95 2.52 6.00 7.88
4.84 3.05 3.86 5.03 4.28 4.70 4.71 4.73 7.04 4.88 5.89 3.67 5.38 6.68
4.64 2.99 3.74 4.70 5.27 6.88 7.07 7.56 9.03 9.09 5.95 3.23 6.34 5.76
5.43 5.46 6.28 6.64 7.61 5.44 5.29 4.80 5.14 3.32 5.95 3.30 6.40 5.80
_< 2 _< 2 _< 2 daphnia > bird, rat), the insecticides (daphnia > algae, fish > bird, rat), and the rodenticides (rat > fish, daphnia, algae > bird), respectively. For each class of these toxicants, the presumed most sensitive species in the data set is characterized by the presence of target biomolecules mediating the specific effects, i.e. algae for herbicides, daphnia for insecticides, rat for rodenticides. If such targets are not present in organisms, as, for example, the herbicide target in fish, unspecific toxicity is observed. The effects are then governed by distribution processes and can be related to the compounds' lipophilicity. Hence, chemicals may mostly act in an unspecific manner, but further interactions become relevant if specific targets are present. As a consequence, mode of toxic action can not only be regarded as a compound-specific property, it also depends on features of the exposed organism. The response pattern obtained from a series of tests is therefore characteristic for the respective mode of action. Comparison of compounds from different classes reveals that they may act by the same mode in some organisms, but by dissimilar modes in other organisms. Applying this rationale to QSAR studies implies that the occurrence of 'outliers' cannot be homogeneous for all models, but toxicity exceeding the lipophilicity-mediated baseline effects occurs for different structures with different endpoints.
1462
Mo~r~ ~ z a
ANDANDREAW~NZEL
In order to classify the m o d e of action related toxicity p a t t e r n in a series of tests, P C A was applied to the data set representing three different m o d e s of action by five (four) c o m p o u n d s each. T h e objective is to classify the original variables into clusters with respect to their intercorrelations.
Herbicides log 1/[tool]
10
fish
daphnia
algae
bird
rat
Insecticides log 1/[moll
10
5
fish
daphnia
algae
bird
rat
STATISTICAL APPROACH TO CHEMICAL CLASSIFICATION
1463
Rodenticides log l/[moll
10
fish
daphnia
algae
bird
rat
Fig. 1. Toxicityprofiles for chemicalsof different modes of action: herbicides,insecticides, rodenticides.
All variables in a particular group are highly correlated among themselves, but have relatively small correlations with variables in a different group. Each group of chemicals corresponds to an underlying principal factor (i.e. mode of action) which is responsible for the observed correlations. Three PCs were extracted representing the substantial information of the original data with > 90% total variance explained (Table 2). A possible fourth PC may be attributed to data scatter. The selected PCs are to be interpreted from the PC loadings, which represent the correlations between the PCs and the original variables. The larger the absolute value ( < 1) of a PC loading, the more information from the original variable is contained in this PC. From the loadings/variables matrix, those original variables can be identified which contribute to the same PC(s). The herbicides and insecticides show similar contributions to PC(T1) and PC(T3), but they can be discriminated from their loadings of PC(T2), which is positive for the insecticides but negative for the herbicides. The rodenticides significantly load PC(T3) and contribute only minorily to PC(T1) and PC(T2). The PC scores reflect the contribution of the relative species sensitivity to the classification of the compounds. PC(T1) scores reveal positive contributions only from the aquatic tests, fish, daphnia, algae, and may hence present the
1464
Mo~r~ NENDZAANDANDREAWENZEL
TABLE 2 Principal component analysis of five toxicity parameters for 14 pesticides % variance explained
PC (T1) (T2) (T3) (T4)
Percent (%) 54.5 24.5 12.4 8.6
Sum (%) 54.5 79.0 91.4 100.0
Loadings of principal components: 3 PCs extracted.
H 1
H2 H3 H 4
H5 11 12 13 I4
15 R1 R2 R3 R 4
PC(T1)
PC(T2)
PC(T3)
0.289 0.188 0.264 0.392 0.392 0.396 0.228 0.180 0.363 0.243 0.251 - 0.049 0.211 - 0.036
- 0.129 -0.295 - 0.314 - 0.308 - 0.308 0.105 0.166 0.287 0.368 0.608 - 0.060 - 0.024 - 0.058 - 0.023
- 0.042 -0.116 - 0.121 - 0.186 - 0.186 - 0.091 - 0.053 -- 0.158 0.168 - 0.128 0.518 0.520 0.474 0.289
Scores of principal components: 3 PCs extracted. PC(T1) Fish Daphnia Algae Bird Rat
1.515 5.548 4.217 - 6.330 - 4.950
PC(T2) - 0.225 4.335 - 4.707 0.656 - 0.058
PC(T3) 0.843 - 0.266 - 0.673 - 0.543 2.639
r e s p e c t i v e b a s e l i n e toxicity. P C ( T 2 ) s c o r e s d i s c r i m i n a t e b e t w e e n h e r b i c i d e s and insecticides, corresponding to the positive contribution of daphnia toxicity d a t a a n d the n e g a t i v e c o n t r i b u t i o n of algae toxicity data. T h e high scoring of rat toxicity in P C ( T 3 ) relates to the r o d e n t i c i d e s . T h e results of
1465
STATISTICAL APPROACH TO CHEMICAL CLASSIFICATION
.50
PC[T1]
2 .50
• 50
PCIT2] Fig. 2. Rotated vectors of the loadings for PC(T1), PC(T2), PC(T3) calculated by PCA reflecting modes of action (data prior to rotation: Table 2). The compound classes are discriminated: H1_5, herbicides; 11_5, insecticides; Rl_4, rodenticides.
this analysis allow one to clearly identify patterns that are characteristic for a mode of action: herbicides: + - - , insecticides: + + - , rodenticides: + with respect to the three PCs. The discriminative power of the PCA technique can also be visualized in the vector plot (Fig. 2). The intercorrelations between the test results correspond to the cosine of the angle between variable vectors, i.e. the larger the angle between the vectors, the lower the intercorrelation, thus representing the clustering of the compounds into three groups according to their mode of action. Such statistical classification by mode of action can be of use for further application of predictive QSARs since rationales are provided for selecting the appropriate models as well as indications on potential outliers. To test this hypothesis, another PCA was conducted with the data matrix transposed (Table 3), now yielding PC scores that represent the relative ranking of the compounds with respect to principal components now classifying the test systems. These quantities can be used for diagnostic purposes as well as inputs to a subsequent analysis to understand, for example, the collinear ranking of compounds by various tests. Like the original test results, the PC scores can be subject to QSAR analysis. The PC loadings confirm that the majority ( > 80%) of the information from the five tests is contained in the first three PCs. One PC, PC(C2), is
1466
MONIKA NENDZA AND ANDREA WENZEL
TABLE 3 Principal component analysis of 14 pesticides for five toxicity parameters % variance explained
PC
Percent (%)
(C1) (C2) (C3) (C4) (C5)
Sum (%)
43.2 19.1 18.3 11.1 8.3
43.2 62.3 80.6 91.7 100.0
Loadings of principal components: 3 PCs extracted.
Fish Daphnia Algae Bird Rat
PC(C1)
PC(C2)
PC(C3)
0.346 0.729 - 0.189 0.358 0.430
0.011 - 0.577 - 0.169 0.141 0.787
0.49 - 0.035 0.831 - 0.182 0.183
Scores of principal components: 3 PCs extracted. PC(C1) H1 H2 n 3 H a
H5 11 12 13 I4
15 R1 R2 R 3 R 4
- 1.893 - 3.72 - 3.049 - 2.012 - 2.04 0.309 1.256 2.09 3.448 3.192 0.664 - 1.259 1.245 2.386 -
-
PC(C2) - 0.63 0.316 - 0.254 - 0.867 - 1.36 - 1.925 - 0.33 - 0.284 -- 1.332 -- 1.351 0.956 3.282 1.325 2.453
PC(C3) - 0.173 - 0.967 0.085 0.924 1.342 - 0.313 - 0.412 - 0.954 0.656 -- 2.183 1.287 - 1.953 1.408 1.252
predominantly loaded by the rat toxicity data, two PCs, PC(C3) and PC(C1), a r e l o a d e d b y fish a n d e i t h e r a l g a e o r d a p h n i a t o x i c i t y d a t a , r e s p e c t i v e l y . T h e i n f o r m a t i o n f r o m t o x i c i t y t e s t i n g w i t h fish is c o m p r i s e d s i m i l a r l y i n t h e two components,
revealing that
algae and daphnia
toxicity information
1467
STATISTICALAPPROACHTO CHEMICALCLASSIFICATION TABLE 4
Baseline toxicity QSARs based on the loadings obtained from P C A of 14 pesticides for five toxicity parameters PC(C1), PC(C2), PC(C3) (data: Tables 1, 3) n PC(C1) = 0.92 ( + 0.14) log Pow - 4.79( +-,0.61) PC(C2) = 1.11(5:0.42)1og Pow - 0.25(+--0.08) (log Pow) 2 - 1.61(+0.57) PC(C3) = 0 . 8 2 ( + 0 . 1 9 ) l o g P o w - 3 . 3 5 ( + 0 . 7 9 )
r
s
9 0.93 0.62 10 0.73 0.47 7 0.89 0.38
n: number of compounds analysed; r: correlation coefficient; s: standard deviation of the residues.
ol 4
ol 5 ................~13
~12
0
D -2
i
o
!o -4
8
2
4
log
iS
8
Pow
Fig. 3a.
cover this aspect already for the investigated compounds and at the same time allow the discrimination of specific modes of action. Analysis of the PC scores reveals the underlying baseline QSARs (Table 4) which are common for all compounds of analogous mode of interaction with the
1468
MONIKA NENDZA AND A N D R E A
WENZEL
°R4
i
R2o oR 3
*R 1
0
@
i
o
-2
-4 i
@
i
i
i
2
i
i
i
4
i
i
i
6
i
i
@
Z og Pow
Fig. 3b.
target (Nendza and Klein, 1990). With respect to algae, the insecticides and rodenticides can be combined into the same log Pow-dependent linear baseline model. An analogous model was derived for the herbicides and rodenticides with respect to daphnia. The similarity of both aquatic models indicates analogous underlying processes, namely partitioning (Fig. 3). Concerning rat toxicity, the herbicides and insecticides follow a parabolic baseline relationship with maximum effects for compounds of log Pow about 2. The derived QSARs are in good agreement with models based directly on the toxicity data (K6nemann, 1981; Veith et al., 1983; Hermens 1986; Nendza and Russom, 1991; Nendza, 1991), thus revealing that QSARs based on the results of multivariate analysis allow one to recognize the basic properties related to the various toxicity parameters even from very small data sets. Simultaneously, compounds of deviating mode of action can be recognized easily. The findings of the presented analyses support the rationale that a chemical's toxicity may be classified on the basis of a few not intercorrelated tests which can serve as indicators of specific modes of action.
STATISTICAL
APPROACH
TO
CHEMICAL
1469
CLASSIFICATION
i
....~..............................................i .............................................. H~ 5 Ij ~............................................... r-i ~ ; ........................................... i....
ii
i i
H
i
.... i ..............................................i-.o
~
-2
...i ..........
o
°J
.....
[2
................................
i
"ii
i
......................i ............................................
" ...............................................
~ ............................................
-4 i
i
e
i
i
2
i
t
i
i
i
4
i
6
t
i
e
Fig. 3. Correlation of the scores for PC(C1), PC(C2), PC(C3) and log Pow (data: Tables 1, 3; functions: Table 4), revealing the mode of action related deviations for the different classes of compounds: insecticides vs. PC 1 (daphnia), rodenticides vs. PC 2 (rat), herbicides vs. PC 3 (algae).
ACKNOWLEDGEMENT
Thanks to G. Pitzen for support in data acquisition. REFERENCES Hermens, J.L.M. 1986. Quantitative structure-activity relationships in aquatic toxicology. Pestic. Sci., 17: 287-296. IVA Industrieverband Agrar e.V., 1992. Informationen fiber Wirkstoffe. IVA, Frankfurt. Johnson, R.A. and D.W. Wichern, 1988. Applied multivariate Statistical Analysis. PrenticeHall Inc. Englewood Cliffs NJ. Jolliffe, J.T., 1986. Principal Component Analysis. Springer, New York. K6nemann, H., 1981. Quantitative structure-activity relationships in fish toxicity studies. Toxicology, 19" 209-221. Martens, H. and T. Neas, 1989. Multivariate Calibration. Wiley & Sons Ltd, New York. Mayer, F.L. and M.R. Ellersieck, 1986. Manual of acute toxicity: interpretation and database for 410 chemicals and 66 species of freshwater animals, Resource Publication, U.S. Department of the Interior, Washington, DC. Medchem Software 1989. Release 3.54 Daylight Chemical Information Systems, Irvine CA.
1470
M O N I I ~ NENDZA AND ANDREA WENZEL
Nendza, M. and W. Klein, 1990. Comparative QSAR study on freshwater and estuarine toxicity. Aquat. Toxicol., 17: 63-74. Nendza, M., 1991. Predictive QSAR models estimating ecotoxic hazard of phenylurea herbicides: Mammalian toxicity. Chemosphere, 22: 613-623. Nendza, M. and C.L. Russom, 1991. QSAR modelling of the ERL-D fathead minnow acute toxicity database. Xenobiotica, 21: 147-170. Perkow, W., 1988. Wirksubstanzen der Pflanzenschutz- und Sch~idlingsbek/impfungsmittel I-II, Verlag Paul Paray. Register of Toxic Effects of Chemical Substances (RTECS) 1989. National Library of Medicine, Bethesda, MD. Unscrambler Software, 1991. Release 3.10 Camo AS, Trondheim. Veith, G.D., D.J. Call and L.T. Brooke, 1983. Structure-toxicity relationships for the fathead minnow, Pimephalespromelas: narcotic industrial chemicals. Can. J. Fish. Aquat. Sci., 40: 743-748. Worthing, C.R., 1983. The Pesticide Manual. The British Crop Protection Council.