Abstract Regression analysis has been used to characterize the relationship between an exposure dose and the incidence of an adverse health effect such as ...
Water 43.2 (D121) corr
3/2/01
3:39 pm
Page 133
Y.W. Lee,* S.Y. Chung,* I. Bogardi,** M.F. Dahab,** and S.E. Oh*** * Department of Environmental Engineering, Chonnam National University, 300 Yongbond-dong, Puk-ku, Kwangiu 500–757, Korea ** Department of Civil Engineering, University of Nebraska, W348 Nebraska Hall, Lincoln, NE 68588, U.S.A. *** Department of Environmental Engineering, Daejon National University of Technology, 305–3 Samsungdong, Dong-Ku, Daejon 300–717, Korea Abstract Regression analysis has been used to characterize the relationship between an exposure dose and the incidence of an adverse health effect such as cancer. However, the regression rarely describes the true relationship due to uncertainties in dose-response data and relationships. Therefore, a method is developed to perform dose-response assessments by a fuzzy linear regression which explicitly exhibit these uncertainties. This method is applied to define the relationship between a particular nitrate dose to humans and its corresponding cancer risk. Keywords Regression; dose-response assessment; uncertainty; deviation; fuzzy number; nitrate
Introduction
The purpose of this study is to develop a fuzzy-linear regression method to characterize dose-response relationships under uncertainty and apply it to estimate human cancer risk due to nitrate intake. Dose-response assessment, a critical step in the risk assessment procedure, is the process of characterizing the relationship between an exposure dose and the incidence of an adverse health effect such as cancer. Regression analysis, finding how a dependent vector y is related to an independent vector x, has been used to investigate the dose-response relationship based on existing data (i.e., measured values). Deviations between the measured values and the values estimated by regression analysis can then occur because of measurement errors and/or modelling errors (Bardossy, 1990). The concept of measurement errors is clear. The modelling errors represent omitting independent variable(s) and/or using an improper regression model (linear, quadratic, etc.) due to a lack of available information. In conventional regression analysis, which is based on probability theory, the method of least square estimation (Neter et al., 1985) has been usually employed to find “good” values of regression parameters by minimizing the sum of the squared differences between the measured values and the estimated values. In the least-squares method, the deviations are assumed to be due to measurement errors. A regression method using fuzzy sets has been proposed by Tanaka et al. (1982) and Tanaka and Watada (1988). In their regression method, the deviations are assumed to be due modelling errors rather than measurement errors. However, the deviations can result from both the measurement errors and the modelling errors. In this study, a fuzzy linear-regression methodology is formulated under the assumption that the deviations can be attributed to both the measurement errors and the modelling errors. As an example of this problem, the methodology is applied to estimate human cancer risk corresponding to a particular nitrate dose.
Water Science and Technology Vol 43 No2 pp 133–140 © IWA Publishing 2001
Dose-response assessment by a fuzzy linear-regression method
133
Water 43.2 (D121) corr
3/2/01
3:39 pm
Page 134
Fuzzy number and extension principle
A special case of fuzzy sets is described by L-R fuzzy numbers (Zimmerman, 1985). As an example of L-R fuzzy numbers, let U be a L-R fuzzy number and its membership function, µ(U), be denoted by:
Y.W. Lee et al.
1 − ( u − U ) / σ µ (U )1 − ( U − u) / τ 0
u −σ ≤ U ≤ u u < U ≤ u +τ otherwise
(1)
where µ is the center value of U; and σ (σ > 0) and τ(τ>0) are the largest left and right fuzzinesses from the center value µ, respectively. Symbolically, U can be written as U=(u, σ, τ)LR (Zimmerman, 1985). When the values of σ and τ are equal to zero, U is a nonfuzzy number by convention. As the values of σ and τ increase, U becomes fuzzier and fuzzier. Basic operations and functions on fuzzy numbers can be defined by the help of the extension principle (Tanaka and Watada, 1982; Zimmermann, 1985; Bardossy, 1990). If X and Y are two sets, and f is a mapping f: X → Y such that for all x ∈ X, f(x) = y ∈ Y, then f can be extended to operate on fuzzy subsets of X in the following way: Let A be a fuzzy subset of X with membership function µA, then the image of A in Y is the fuzzy subset B with the membership function µB (Zimmermann, 1985): max{µA (x);x:y = f(x),x ∈ X} µ B ( y) = 0 if there is no x ∈ X such that f(x) = y
(2)
A fuzzy linear-regression method
The concern of this study is the case of linear regression and those which can be transformed to the linear case. A linear regression model can be defined as follows: Yi = Q 0 X 0,i + Q l X l,i + K + Q m − l X m − l,i
(3)
where Yi is the value of the dependent variable Y in order of data (i = 1, …, n); X0,1 …, Xml,i are the values of the different independent variables X0 …, Xm-l, respectively, in the i-th
datum; and Q0 …, Qm-1 are parameters. It should be noted that X0 is a dummy variable identically equal to one and there are no interacting effects between the independent variables X1, …, Xm-1. To simplify this presentation, consider a linear regression model with one independent variable (i.e., m – 1 = 1). The model of Equation 3 may then be rewritten as: Yi = Q 0 + QX i
(4)
In general, there is no solution of Equation 4. In other words, deviations between the measured value (Yi) and the estimated value (Q0 + QXi) generally occur around the regression line described by Y = Q0 + QX because of uncertainties in the functional relationship between the variables and/or the input (Xi) and the output (Yi). A methodology for incorporating these uncertainties into the regression analysis is formulated in this section. Fuzzy parameters
134
Even when precise (error free) data for Yi and Xi are known, the deviations due to modelling errors can occur around the regression line described by Equation 4. In this case, the parameters Q0 and Q may be estimated as fuzzy parameters (i.e., fuzzy numbers)
Water 43.2 (D121) corr
3/2/01
3:39 pm
Page 135
to characterize the uncertainty associated with modelling errors (Tanaka et al., 1982), and symbolically Q0 = (q0, α0, β0)LR and Q = (q, α, β)LR). Details on this symbol were explained in Equation 1. If Q0 is a symmetric fuzzy parameter (i.e., α0 = β0) and its membership function, µ(Q0), is a triangular shape, then µ(Q0) can be described by: q 0 − α 0 ≤ Q0 ≤ q 0 + α 0 otherwise
(5)
Y.W. Lee et al.
1 − Q 0 − q 0 /α 0 µ (Q 0 ) 0
In a formula similar to Equation 5, the membership function, µ(Q), for Q can be given by: 1 − Q − q /α µ (Q 0 ) 0
q −α ≤ Q ≤ q +α otherwise
(6)
In Equation 4, the use of fuzzy parameters Q0 = (q0, α0, α0)LR and Q = (q, α, α)LR will result in a fuzzy number for Yi. Using the fuzzy parameters in Equation 4 and then applying the extension principle defined in Equation 2, the membership function, µ(Yi), for Yi can be written as follows (Tanaka and Watada, 1988): 1 Y − (q 0 + qX i ) µ (Yi )1 − i α0 + α Xi 0
X i = 0,Yi = 0 Xi
(7)
X i = 0, Yi
where q0 + qXi is the center value of the fuzzy number Yi, and α0 + α|Xi| is the width from q0 + qXi (i.e., fuzziness of the model). Fuzzy input and fuzzy output
As mentioned earlier, the deviations may be attributed to uncertainties in the values of the input Xi and the output Yi as well as the fuzziness of parameters. Uncertainties in the values of Yi and Xi are generally caused by measurement errors. In conventional regression analysis, measurement values are represented by crisp (i.e., single-valued) numbers even when measurement errors (sometimes large) exist. In this section, the input Xi and the output Yi are interpreted as fuzzy numbers to characterize their uncertainties. If µ(Xi) is the membership function for Yi and its shape is triangular, then µ(Xi) can be defined by: 1 − (x i − X i ) / v i µ ( X i )1 − (X i − x i ) / w i 0
xi − vi ≤ X i ≤ xi xi − X i ≤ xi ≤ wi otherwise
(8)
where xi is the center value of Xi, and vi and wi are the largest left and right fuzziness from xi, respectively. Symbolically Xi = (xi, vi, wi)LR. If vi = 0 and wi = 0, Xi is a crisp value (i.e., Xi = xi). As in Equation 8, the membership function, µ(Yi), for Yi may be expressed as: 1 − (y i − y i ) / c i µ (Yi )1 − (Yi − y i ) / e i 0
y i − c i ≤ Yi ≤ yi y i − Yi ≤ y i ≤ e i otherwise
(9) 135
Water 43.2 (D121) corr
3/2/01
3:39 pm
Page 136
where yi is the center value of Yi, and ci and ei are the largest left and right fuzziness from yi, respectively. Symbolically Yi = (yi, ci, ei)LR. If ci = 0 and ei = 0, Yi is a crisp value (i.e., Yi = yi). Formulation of linear programming
Y.W. Lee et al.
When the set of fuzzy data consisting of (Xi, Yi) from Equations 8 and 9 is given, the values of parameters (q0, q, α0 and α) shown in Equation 7 can be found by minimizing the index (V) of fuzziness of the linear regression model (i.e., V(α0, α) = Σ [(α0 + α|Xi–|) + (α0 + α |Xi+|), i = 1, …, n), under the condition that the membership function µ(Yi) (Equation 7) has at least the membership degree h = 0 (i.e., µ(Yi) ≥ 0), which indicates that the fuzziness (α0 + α|Xi|) from the center value (q0 + qXi) is the largest at h = 0. Thus, the condition µ(Yi) > 0 is given in order that when h = 0, Yi falls between the lower- and upper-bound lines of Y = Q0 + QXi. Given the condition µ(Yi) ≥ 0 in Equation 7, a linear programming problem can be formulated to find the values of parameters as follows: n
[(
) (
Minimize V(α 0 ,α ) = ∑ α 0 + α X i− + α 0 + α X i+ i=l
)]
subject to: α 0 ≥ 0 and α ≥ 0(i = l,K, n)
( + (α − (α + (α
) )≥Y )≤Y )≥Y
(10) (11a)
q 0 + qX i− − α 0 + α X i+ ≤ Yi−
(11b)
q 0 + qX i−
+ i
(11c)
− i
(11d)
+ i
(11e)
q 0 + qX i+ q 0 + qX i+
0
+ α X i−
0
+ α X i+
0
+ α X i+
where Xi– = [xi – (1 –p)vi]; Xi+ = (1 – p)wi]; Yi– = [yi – (1 –p)ci]; Yi+ = [yi + (1 – p)ei]; and p (0 ≤ p ≤ 1) is a membership degree (Equations 8 and 9). The membership degree p can be determined using “expert judgement” based on experience and observed measurement variability (Lee et al., 1991). Minimizing the index V of the fuzziness of Y (Equation 10) under the constraints given in Equations 11a through 11e results in the value of parameters (q0, α0, α0) and (q, α, α) constructing the fuzzy parameters Q0 and Q, respectively. Because the fuzziness of Y varies with the membership degree h, the domain of Y is defined as follows: q 0 + qX − (1 − h )(α 0 + α X ) ≤ Y ≤ q 0 + qX − (1 − h )(α 0 + α X )
(12)
Figures 1, 2 and 3 show examples for the lower- and upper-bound lines of Y = Q0 + QX when h = 1.0, h = 0.0, and h = 0.5, respectively. When h = 1.0 (Figure 1), it is assumed that modelling errors (the fuzziness of regression model) do not exist, that is, all of the deviations around the regression line (Y = q0 + qX) occur due to only measurement errors. However, all of the fuzzy data fall between the lower- and upper-bound lines of Y when h = 0.0 (Figure 2) and it is assumed that only modelling errors exist. Since these two extreme cases cannot generally occur, both the measurement errors and the modelling errors should be reflected in the regression analysis by choosing a proper value for the membership degree h, as shown in Figure 3. The proper value for h can be determined by the “goodness” of the fit of the data as well as expert judgement.
136
Water 43.2 (D121) corr
3/2/01
3:40 pm
Page 137
Y.W. Lee et al.
Figure 1 Measurement data and fuzzy regression line when h = 1.0
Figure 2 Measurement data and fuzzy regression line when h = 0.0
Figure 3 Measurement data and fuzzy regression line when h = 0.5
137
Water 43.2 (D121) corr
3/2/01
3:40 pm
Page 138
A case study
Y.W. Lee et al.
Nitrate hazard to human health occurs because bacteria in the stomach convert the nitrate to nitrite. This nitrite can then induce human gastric cancer by reacting in the stomach with amines and amides to form nitrosamines and nitrosamides, which are etiologic agents for human gastric cancer (Mirvish, 1991). Table 1 shows dose-response data obtained from tests of laboratory rats (Terracini et al., 1967). Since rats differ from people, the dose data recorded for the rats need to be converted into a standard measure that can be applied to humans. The health effects caused by the nitrosamines and nitrosamides can be represented by the dose data taken from the ingestion of nitroso-dimethylamine (NDMA). This assumes that NDMA is representative of the carcinogens of all nitrosamines and nitrosamides (Mirvish, 1991). Assuming also that NDMA is as carcinogenic in humans as it is in animals, the amount of NDMA ingested by a rat can then be converted into the amount of nitrate ingested by a human as follows (Mirvish, 1991; Curtis, 1992): D i = λ ( Wi )
0.5
(13)
where i is a number in order of data (i = 1, …, 6) (Table 1); Di is daily nitrate dose by a human (grams/day); Wi is daily NDMA dose by a rat (milligrams/day); and λ is a coefficient equal to 0.716 (i.e., λ = 0.716) (Curtis, 1992). Due to a lack of available information, several assumptions can be made for input values (such as available dietary substrates for nitrosation, gastric pH, gastric volume, and emptying time) required to estimate the value of λ. Therefore, uncertainty exists in the estimated coefficient λ. The coefficient λ may be interpreted as a fuzzy number (fuzzy coefficient) to reflect its uncertainty, that is, λ = (0.716, 0.179, 0.179)LR (Lee, 1992). In Equation 13, the use of the fuzzy coefficient λ results in a fuzzy number for Di (i.e., Di = (di, ρi, ρi)LR) (Table 2). The greater the number of rats tested, the higher may be the usefulness of the response data. In this respect, human cancer response datum (Ri) corresponding to the dose datum Di may be estimated based on the response data for rats and the total number of rats tested. The Ri may also be interpreted as a fuzzy number to characterize the uncertainty in the estimated value of Ri = (τi, ϕi, ϕi)LR) (Table 2). To plot the dose-versus-response data estimated as fuzzy numbers (Table 2), several mathematical models such as one-hit, multistage, and logit models are available. However, the use of different dose-response models may lead to very different cancer-risk estimates. In the present case, a logit model may be used to describe the relationship between human nitrate dose (Di) and its corresponding cancer response (Ri) (Lee, 1992). The logit model can be denoted by:
(
(
))
R = 1 / 1 + exp T1 + T2 (log e D)
(14)
where T1 and T2 are fuzzy parameters. Symbolically, T1 and T2 can be written as T1 = (t1, δ1, δ2)LR and T2 = (t2, δ2, δ2)LR.
Table 1 Dose-Response data obtained from animal experiment (Terracini et al., 1967)
138
Order in data, i Dose of NDMA in rat diet (mg/day) Number of rats on test Number of rats with liver tumour Response (probability)
1 2 3 0 2 5 41 37 83 0 1 8 0.000 0.027 0.096
4 10 5 2 0.400
5 6 20 50 23 12 15 10 0.652 0.833
Water 43.2 (D121) corr
3/2/01
3:40 pm
Page 139
Table 2 Nitrate dose to human and its corresponding response (Lee, 1992) Nitrate dose to human
Response (probability),
data, i
(g/day), Di = (di, ρi, ρi)LR
Ri = (τi, ϕi, ϕi)LR
(0.000, 0.000, 0.000) (1.013, 0.253, 0.253) (1.602, 0.400, 0.400) (2.265, 0.565, 0.565) (3.203, 0.801, 0.801) (5.065, 1.265, 1.265)
(0.000, 0.000, 0.000) (0.027, 0.006, 0.006) (0.096, 0.010, 0.010) (0.400, 0.162, 0.162) (0.652, 0.166, 0.166) (0.833, 0.265, 0.265)
1 2 3 4 5 6
After the logit model (Equation 14) is transformed into a linear function for regression parameters T1 and T2, a linear programming problem corresponding to Equation 10 and 11 can be formulated to find the values of t1, δ1 and δ2 as follows: 6
[(
) (
Minimize V(δ1 + δ 2 ) = ∑ δ1 + δ 2 log e D i− + δ1 + δ 2 log e D i+ f =2
subject to: δ1 ≥ 0, δ 2 ≥ 0 (i = 2,K,6)
( + t (log D + t (log D + t (log D
) ( ) − (δ ) + (δ ) − (δ
t1 t1
2
e
− i
2
e
+ i
2
e
+ i
1
+ δ 2 log e D i−
1
+ δ 2 log e D i+
1
+ δ 2 log e D i+
(15) (16a)
) [( ) ] ) ≤ log [(1 / R ) − 1] ) ≥ log [(1 / R ) − 1] ) ≤ log [(1 / R ) − 1]
t1 + t 2 log e D i− + δ1 + δ 2 log e D i− ≥ log e 1 / R i− − 1 t1
)]
Y.W. Lee et al.
Order in
(16b)
e
+ i
(16c)
e
− i
(16d)
e
+ i
(16e)
where Di+ = (di – ρi); Di+ = (di + ρi); Ri– = (τi – ϕi); and Ri+ = (ri + ϕi) (Table 2). If the index, V(δ, δ2), of fuzziness of the regression model (Equation 15) under the given constraints (Equations 16a through 16e) is minimized, then the values of ti, δ1, t2 and δ2 are estimated as 3.331, 1.138, –3.429 and 0.881, respectively. As a result, the relationship between nitrate dose to humans and its corresponding cancer response can be described with the lower and upper tolerance levels as follows:
[
]
[
]
1 / 1 + exp( Z1 ) ≤ R ≤ 1 / 1 + exp( Z2 )
(17)
Z1 = 3.3331 − 3.429(log e D) + (1 − h )(1.138 + 0.881 log e D)
(18a)
D = NW + NF
(18b) (18c)
Z2 = 3.3331 − 3.429(log e D) − (1 − h )(1.138 + 0.881 log e D)
where D is total nitrate dose by a human (grams/day); NW is nitrate dose from drinking water (grams/day); NF is nitrite dose from sources other than drinking water (grams/day) (based on the literature (ETETOC, 1988), the value of NF is approximately 0.15 grams/day, including the amount obtained by the conversion of ingested nitrite to nitrate); R is lifetime probability of developing human gastric cancer; and h (0 ≤ h ≤ 1) is the membership degree, which is used to define the tolerance level, that is, the domain of the response R at various degrees of membership. Conclusions
Dose-response data obtained from animal experiments have been used to define a model available for estimating the human cancer risk by nitrate dose because there are no doseresponse data related directly to cancer incidence in humans. Although this interspecies
139
Water 43.2 (D121) corr
3/2/01
3:40 pm
Page 140
Y.W. Lee et al.
conversion (from animals to humans) is a controversial issue, it may be the only method, short of extensive epidemiological studies, for modelling the relationship between nitrate dose to humans and its corresponding cancer response (Allen et al., 1988). Unfortunately, dose-response assessments by this interspecies conversion are often plagued by large uncertainties in dose-response data (converting the dose data recorded for animals into the data for humans) and relationships. The larger those uncertainties, the larger may be the deviations between the measured values and the estimated values. From the case study presented earlier, it is shown that a fuzzy linear-regression method could be a useful tool for defining dose-response relationships when the use of conventional regression analysis such as least-square method (Neter et al., 1985) is precluded because of the imprecise dose-response relationship and also the inaccuracy and small amount (insufficient size in performing the conventional regression analysis) of the data. The method allows for taking uncertainties in both the data and the relationship so the doseresponse assessments can be made that are more realistic and appropriate than those made without taking uncertainties into account. References Allen, B.C., Crump, K.S. and Shipp, A.M. (1988). Correlation between Carcinogenic Potency of Chemicals in Animals and Humans. Risk Analysis, 8, 531–544. Bardossy, A. (1990). Note on Fuzzy Regression. Fuzzy Sets and Systems, 37, 65–75. Curtis, B.A. (1992). Health Risk Analysis of Groundwater Nitrate Contamination. Ph.D. Dissertation, Dept. of Civil Engineering, Univ. of Nebraska, Lincoln, NE 68588, U.S.A. ETETOC (1988). Nitrate and drinking water. Technical Report No. 27, European Chemical Industry Ecology and Toxicology Centre, Brussels, Belgium. Lee, Y.W., Bogardi, I. and Stansbury, J. (1991). Fuzzy Decision Making in Dredged-Material Management. J. of Environmental Engineering, ASCE, 117, 614–630. Lee, Y.W. (1992). Risk Assessment and Risk Management for Nitrate-Contaminated Groundwater Supplies. Ph.D. Dissertation, Department of Civil Engineering, Univ. of Nebraska, Lincoln, NE 68588, U.S.A. Mirvish, S.S. (1991). The Significance for Human Health of Nitrate, Nitrite and N-nitroso Compounds, P. 253–266. In: Nitrate Contamination: Exposure, Consequence and Control. Springer-Verlag, Berlin, Germany. Neter, J., Wasserman, W., and Kutner, M.H. (1985). Applied Linear Statistical Models. Richard D. Irwin, Inc., Homewood, IL. Tanaka, H., Uejima, S. and Asai, K. (1982). Linear Regression Analysis with Fuzzy Model. IEEE Transactions on Systems, Man and Cybernetics, 12, 903–907. Tanaka, H. and Watada, J. (1988). Possibilistic Linear Systems and their Applications to the Linear Regression Model. Fuzzy Sets and Systems, 27, 275–289. Terracini, B., Magee, P.N. and Barnes, J.M. (1967). Hepatic Pathology in Rats on Low Dietary Levels of Dimethylnitrosamine. British J. of Cancer, 21, 559–565. Zimmermann, H.J. (1985). Fuzzy Set Theory and its Application. Kluwer-Nijhoff Publishing, Hingham, Mass.
140