Optimization of regional water-quality-monitoring strategies

Integrated Design of Hydrological Networks (Proceedings of the Budapest Symposium, July 1986). IAHS Publ. no. 158,1986.

Optimization of regional water-qualitymonitoring strategies Optimalisation des programmes régionaux la qualité des eaux

de surveillance

de

J. PINTÉR & L. SOMLYÔDY Water Resources Research Centre (VITUKI) H-1453 Budapest, P.O. Box 27, Hungary

ABSTRACT The problem of optimizing a regional waterquality observation network is considered: the basic purpose of the network is to provide a statistically sound estimate of the annual nutrient load discharged into a lake by its tributaries. Following a brief description of the underlying lake eutrophication problem, first a single time period, single water quality component decision model is formulated, then several possible extensions are proposed. The methodology is illustrated by a numerical example.

INTRODUCTION The protection of the human environment is an issue of vital importance that is to be considered as an organic constituent of all technical, economic, and social development strategies. This fact directly implies the necessity of regular environmental monitoring, i.e. the design and operation of problem-oriented observational networks. The determination of a coherent environmental monitoring strategy should be based on the joint consideration of a number of interdependent theoretical and practical issues, e.g. Beckers & Chamberlain (1974), Bendat & Piersol (1971), Lettenmaier et al. (1982), Sanders, ed. (1979), Schilperoort & Groot (1983), Somlyôdy & van Straten (1985), or Somlyôdy et al. (1986). This paper is devoted to a specific subject in regional water-quality monitoring, the optimal temporal and spatial allocation of discrete observations on tributaries of a lake. Following a brief characterization of the underlying water-qualitymanagement problem, artificial eutrophication in a lake, our basic (single period, single quality indicator) model is formulated. Several extension possibilities (stratification, multiple quality indicators, sensitivity issues, scheduling of measurements) also are highlighted. Finally, the proposed methodology is illustrated by numerical examples. For more details on this complex environmental monitoring strategy, only a part of which is reported here, the reader is referred to Pintêr et al. (1985). 259

260 J.Pinter & L.Somlyody ARTIFICIAL EUTROPHICATION OF LAKES: MONITORING OBJECTIVES Manmade eutrophication in lakes, caused primarily by increasing municipal and industrial waste-water discharges and the intensive use of agricultural chemicals, recently has been considered to be one of the major water-quality problems. The typical symptoms of eutrophication, (algae blooms, water coloration, floating water plants, organic debris, unpleasant taste and odor), frequently lead to serious limitations of water use. As a rule, both the main causes of artificial eutrophication and the most important water-qualitycontrol measures originate from and are connected to the region surrounding the lake; their impact on lake water quality is the result of interdependent in-lake processes. Thus, eutrophication management requires a complex analysis of the whole region, including all relevant natural phenomena and human activities. The most frequently used indicator for describing lake eutrophication is ( C M - a ) m a x , the annual peak value of chlorophyll-a concentration which characterizes the algal biomass (OECD, 1982). As shown by both empirical and dynamic simulation models (OECD, 1982; or Somlyôdy & van Straten, 1985), for large lakes ( C M - a ) m a x is determined primarily by the annual (phosphorous and/or nitrogen) nutrient load, while the functional relationship between indicator and load is often linear in character. Thus, from the viewpoint of proper eutrophication control, observation of the total annual load reaching the water body is of major importance. In most cases, nutrients carried by rivers represent a considerable portion of the lake's load. These tributaries drain watersheds of different sizes and slopes, pollution origin (point vs. nonpoint sources) and contribution. Consequently, the relative magnitude and dynamics of the loads may be quite different. Observation (sampling) is performed in most of the countries by water authorities, following certain basic guidelines and rules. For large lakes, it is fairly typical that the samples are transported to several laboratories, often belonging to different district water authorities. In such situations, a number of transportation routes are possible, as it might often be economical to collect samples from several (but not necessarily all) monitoring stations during the same tour. The above discussion shows that a meaningful monitoring strategy can be based on the "optimal" temporal and spatial allocation of sampling effort, viz. the number of measurements and number of routes to be accomplished. This strategy yields a statistically well-based estimate of total yearly nutrient load to the lake—a basic factor for characterizing and managing eutrophication.

A BASIC MODEL FOR REGIONAL WATER-QUALITY MONITORING Because the sampling problem is significantly affected by (partially) unknown processes and hydrometeorological stochasticities, it can be formulated naturally by applying statistical concepts. Specifically, we want to determine a minimum cost regional monitoring program that assures the estimation of the yearly average (or total) load within given error-variance bounds. The principal decision variables are the number of measurements to be carried out at each monitoring

Regional water-quality-monitoring strategies 261 station and the number of different routes to be used for sample collection per year. There exist classical results in finite population statistics (Cochran, 1962), which allow the computation of the variance of the mean value of a random variable as estimated from a reduced sample (instead of the total sample). Moreover, Bayley & Hammersley (1946) provided a formula for taking into consideration the interdependence (autocorrelation structure) of subsequent measurements. Thus the effect of sample reduction with respect to the accuracy of the estimated mean can be approximated in the case of a single observation station. On the other hand, no directly applicable methodology was known to us concerning the statistical characterization of our regional monitoring problem, viz. several stations possibly with correlated observations. Therefore, the extension of the abovementioned results was necessary. Below we summarize the proposed statistical background of our model; the details are given in Pintêr et al. (1985).

FIG.l Regional water-quality-monitoring on the tributaries of a lake. Denote respectively by m=l, ..., M the given monitoring stations (one station on each tributary, see Fig.l), Ç m the nutrient load at station m, N m and n m the total and the reduced sample sizes at station m. (We shall assume that the total sample permits the exact calculation of the average load at each station.)

262 J.Pinter & L.Somlyody If Ç = i

Ç m is the summed amount of nutrient load reaching the m=l lake, then, based on the reduced sample, the variance of the mean estimate can be approximated as _ 0 2 (O = D2

M

_ Çm

l

M = I

m=l M

N

m=l

- n

n m Nm

yM i ml,m2=l 2 ml 0

j=l, .., J

(7)

integer

The decision variables of the model defined by relations (3) through (7) are r^, the annual number of samples to be taken at monitoring station m, and XJ, the annual number of tours on route j for collecting samples. It is assumed that for each route the unit costs Cj of this tour (including sample collection, transportation, and laboratory analysis) are known; thus, the objective function expresses our aim to minimize the yearly operational costs of the monitoring network. The constraints defined by equation (4) refer to the interrelations between the variables Xj and n m . In other words, defining am* = 1 if, and only if, station m belongs to route j and, amj = 0 otherwise;, equation (4) expresses the fact that with a proper combination and scheduling of routes, exactly n m measurements are taken at each station m. The constraint defined by inequality (5) is an accuracy requirement with respect to the mean estimate; it is an explicit function of the nm's as shown by formula (1) (K > 0 is a model parameter). Finally, inequalities (6) and (7) are obvious logical constraints concerning the decision variables (note that relations (4) and (7) imply also the integrality of n m m=l, ..., M ) . From the above interpretation, one can see that the relatively simple model contains the basic characteristics of our regional monitoring-network optimization problem. Evidently, some possible modifications could be fitted into this framework, e.g. inequality (5) may be replaced by another (or several) accuracy requirement(s). Before turning to some numerical issues concerning the solution of our basic model, its most practically relevant generalizations are outlined below.

EXTENSIONS

Stratification Assume that for some r.v. Ç we have a total sample of size N that can be divided into homogeneous strata:

N = I

% ; the

statistical properties of each stratum I may be different reflecting, for example, seasonality effects in the random nutrient-load pattern on a selected tributary. Obviously, in this case a respective subproblem analogous to relations (3) through (7) can be formulated for each stratum % .

264 J.Pinter & L.Somlyody Several water-quality constituents Assume now that, instead of having a single nutrient or pollutant component, some different constituents are to be detected. This is the case of multiple water-quality indicators, or a single indicator influenced by several nutrient or pollutant components. Then one may wish to place accuracy bounds for each of them separately; this means that relation (5) will be replaced by the respective number of accuracy requirements. Supposing now that the costs c; can be at least partially decomposed with regard to these constituents; one can obtain the respective multicomponent extension of the basic model. Note that stratification and several water-quality constituents can be simultaneously considered, while maintaining a model structure analogous to relations (3) through (7). Parameterization (sensitivity analysis) issues Relation (5) and, hence, the optimum value and the decision variables evidently depend on K, the prescribed accuracy of the mean-nutrientload estimate. Therefore, by taking different values of K and solving the parameterized problem, the tradeoff function between the acceptable estimation error and the respective monitoring program costs can be approximated, at least in the basic single period, single water-quality indicator case. The above extensions lead to nontrivial multiobjectivity aspects and analytically nontractable interrelations between the parameters of the extended models. In the case of stratification with a single indicator, ideally all possible subdivisions of an overall accuracy bound should be taken into account to yield an optimal parameterization of the stratified problem. With the lack of knowledge about the analytical dependence of the total cost on the model parameters, one has to be satisfied with some heuristic (suboptimal) parameterization of the stratified model system (Cochran, 1962; Pintêr et al., 1985). For the case of several quality components, the respective methods of multiobjective programming (with noncommensurable objectives) can be proposed; a review of such methods is given, e.g. by Hwang & Masud (1979). It is worthwhile to note that, in spite of the enumerated methodological difficulties, by simply selecting a properly detailed "menu" of parameterizations, a fairly representative sensitivity investigation can be accomplished; the only basic prerequisite of that being the easy numerical handling of the parameterized set of problems. This way the decisionmaker can obtain quantitative information about the costs vs. accuracy tradeoff of the rather "soft" (i.e. essentially not unambiguously definable) problem of "optimal" monitoring. Scheduling of measurements The numerical solution of our basic model or its respective extension provides information about the total (yearly or seasonal) number of measurements to be taken at each station and the number of tours on each route. This information serves as a strategic guideline for timing the measurements in a reasonable manner. Without going into

Regional water-quality-monitoring strategies 265 mathematical details, we note that it is relatively simple to formulate an easy-to-handle linear-programming model that assures an acceptable pattern for the sampling procedure.

COMPUTATIONAL ASPECTS As was stated before, the simple numerical handling of the model(s) is advantageous, especially when considering a number of different parameterizations. Therefore, some simplifying assumptions and other computational issues are summarized below. First of all, the integrality assumptions on Xj—and, hence, also on n m —can be omitted. The motivation for this is the significantly reduced computational complexity of the relaxed problem and the simple way of "rounding up" the continuous optimal solution of the relaxation. (In practice, both n m and Xj are large enough to make the "round-up" error rather inessential, especially when compared to other, unavoidable model inaccuracies.) The continuous relaxation of relations (3) through (7) is basically a linear-programming (LP) problem, except for its single nonlinear constraint, inequality (5). It is not difficult to see that inequality (5) in many practical, relevant cases (e.g. when the observations per station are independent or positively correlated) leads to a convex constraint. Thus, relations (3) through (7) define a convex-programming problem with a rather simple structure. The literature of mathematical programming abounds in algorithms that are capable to solve such problems (see Himmelblau, 1972; or Simmons, 1975). By considering the special structure of relations (3) through (7), the supporting hyperplane method of Veinott (1967) can be advantageously applied to solve our problem; this method yields a successively refined sequence of linear approximations to the model (3) through (7), while the resulting LP's are easy to handle.

A NUMERICAL EXAMPLE For illustrating the outlined approach, a simple monitoring network will be considered; the monitoring system consists of seven observational stations (M = 7) and three central laboratories with corresponding road systems (Fig.l). Denote respectively by T m and D^ the mean and variance of the nutrient loads (total phosphorus load, kg/day) at each station m. Table 1 contains the data of the example. TABLE 1 station m

4

Mean and variance of nutrient loads per

1

2

3

4

5

6

7

6.6

9.4

69.2

225.1

34.6

30

0.8

392

795

4,789

5,067

1,211

2,025

16

266 J.Pinter & L.Soralyody Note that the values T m do not figure explicitly in the model formulation. For simplicity's sake, stratification aspects are not considered and the independence of measurements, taken at different stations, is assumed. By taking into consideration the road network, the costs of transportation and laboratory analysis, the following possible routes and respective unit costs will be considered (each route is represented by the set of stations belonging to that route): 1 3 1.3} , 1,2,3} do = 5 d n = L7 d 13= 5,7} . d 15 = [5.6,7]

d, d3 d^ d7

= = = =

979, = 944, = = 1,219, = 1,319, CO = 665, cu = 450, c 765, 13= c 895, 15=

d2 = d4 = d 6 = d8 = d 10= d 12= d 14=

c

l 3 c5 c7 c

844, c2 = c 4 = 1,079, c 6 = 1,084, c 695, 8 = c = 665, 10 c 12 = 765, c 765, 14=

2 1,2 2,3 } 4 6 5,6 6,7

where d. is the i route and is defined by the station numbers contained within the brackets and c i is the unit cost of the i route. With the above data, the problem defined by relations (3) through (7) can be specified and solved for different values of the variance bound K. The results of these calculations are summarized in Fig.2 and Table 2. TABLE 2

Optimal monitoring strategies for different variance bounds

Variance bounds K

Total monitoring costs C

x

166,188 94,717 52,203 21,319 11,083 6,048

13 3 4 5 1 1

500 1,000 2,000 5,000 10,000 20,000

Number of routes in optimal solution

6

x

7

24 20 8 1 1 1

x

8

126 70 40 16 10 4

x

15

38 19 11 4 2 1

Number of observations per station

n

l

n

2

24 37 20 23 8 12 1 6 1 2 1 2

n

n4

n

37 23 12 6 2 2

126 70 40 16 10 4

38 19 11 4 2 1

3

5

n6

n7

38 38 19 19 11 11 4 4 2 2 1 1

From Fig.2 one can see that the monitoring program cost C = C(K) is (with a good approximation) inversely proportional to the variance bound K. This result is in concordance with classical results of probability theory and clearly shows that there exists an upper bound on meaningful monitoring effort. It is also worth noting that, for all examined values of K, in the optimal solution only the same four routes are used. This fact can be interpreted as a demonstration of some problem-structure stability. As our example illustrates, even very simple monitoring networks

Régional water-quality-monitoring strategies 267 C(K) monitoring cost1503

Do Ft]

100-

50

101

2

10

15

20 K

variance bound [l03(kg/day)2]

FIG.2 Trade-off between acceptable variance bound of monitoring and monitoring program cost. will give rise to nontrivial optimization issues. This fact and the usually rather significant monitoring program costs justify the use of mathematical modeling tools for determining rational environmental-surveillance strategies. ACKNOWLEDGEMENTS This research was supported by the National Environmental Protection Office (OKTH), Budapest, Hungary.

REFERENCES Bayley, G.F. & Hammersley, J.M. (1946) The effective number of independent observations in an autocorrelated time series. J. Roy. Stat. Soc. Ser. IB, 8, 184-197. Beckers, V.V. & Chamberlain, S.G. (1974) Design of cost-effective water quality surveillance systems. Environmental Protection Agency Report No. 600/5-74-004, Washington, D.C. Bendat, J.S. & Piersol, A.G. (1971) Random Data: Analysis and Measurement Procedures. Wiley - Interscience, New York. Cochran, W.G. (1962) Sampling Techniques. Wiley, New York. Himmelblau, D.M. (1972) Applied Nonlinear Programming. McGrawHill, New York. Hwang, Ch.-L. & Masud, A.S. Md. (1979) Multiple Objective Decision

268 J.Pinter & L.Somlyôdy Making - Methods and Applications. Springer Verlag, Berlin Heidelberg. Lettenmaier, D.P., Conquest, L.L. & Hughes, J.P. (1982) Routine streams and rivers water quality trend monitoring review. Technical Report No. 75, C.W. Harris Hydraulics Laboratory, Dept. of Civil Engineering, Univ. of Washington, Seattle, Washington. OECD (1982) Eutrophication of Waters. Monitoring, Assessment and Control. Paris. Pinter, J., Somlyôdy, L., Koncsos, L., Szilâgyi, F. & Hanâcsek, I. (1985) Elaboration of an information system for environmental protection. Optimal water quality sampling of lakes and their tributaries (in Hungarian). Technical Report No. 7614/2/797. Water Resources Research Centre (VITUKI), Budapest, Hungary. Sanders, T.G., ed. (1979) Design of Water Quality Monitoring Networks. Colorado State Univ., Fort Collins, Colorado. Schilperoort, T. & Groot, S. (1983) Design and optimization of water quality monitoring networks. Delft Hydraulics Lab., Publ. No. 268, Delft, The Netherlands. Simmons, D.M. (1975) Nonlinear Programming for Operations Research. Prentice-Hall, Inc., Englewood Cliffs, New Jersey. Somlyôdy, L. & van Straten, G. (1985) Modeling and Managing Shallow Lake Eutrophication, with Application to Lake Balaton. Springer Verlag (in press). Somlyôdy, L., Pinter, J., Koncsos, L., Hanâcsek, I. & Juhâsz, I. (1986) Water quality monitoring in lakes and tributaries. Proc. 2nd Scientific Assembly of the IAHS (Budapest, Hungary, July 2-10). Veinott, A.F., Jr. (1967) The supporting hyperplane method for unimodal programming. Opns. Res. 15, 147-152.