Optimizing disinfection by-product monitoring points

0 downloads 0 Views 2MB Size Report
Jun 4, 2018 - Optimizing disinfection by-product monitoring points in a distribution system using cluster analysis. Ianis Delpla a, *. , Mihai Florea b ...
Chemosphere 208 (2018) 512e521

Contents lists available at ScienceDirect

Chemosphere journal homepage: www.elsevier.com/locate/chemosphere

Optimizing disinfection by-product monitoring points in a distribution system using cluster analysis ve Pelletier c, Manuel J. Rodriguez a Ianis Delpla a, *, Mihai Florea b, Genevie a  Ecole sup erieure d'am enagement du territoire et de d eveloppement r egional (ESAD), Universit e Laval, Pavillon F-A. Savard, 2325, rue des Biblioth eques, local 1612 Qu ebec, QC, G1V 0A6, Canada b Thales Canada, Defence & Security, Thales Research & Technology (TRT) Canada, 1405, boul. du Parc Technologique, Qu ebec, QC, G1P 4P5, Canada c  Departement de g enie civil et de g enie des eaux, 1065, rue de la M edecine Pavillon Adrien-Pouliot, local 2986, Qu ebec, QC, G1V 0A6, Canada

h i g h l i g h t s  Cluster analysis allows identifying locations with water quality issues (THMs/HAAs).  Different monitoring scenarios and solutions were proposed to help decision-making.  Free chlorine and water temperature allow optimizing sampling points and periods.

a r t i c l e i n f o

a b s t r a c t

Article history: Received 23 December 2017 Received in revised form 23 April 2018 Accepted 2 June 2018 Available online 4 June 2018

Trihalomethanes (THMs) and Haloacetic Acids (HAAs) are the main groups detected in drinking water and are consequently strictly regulated. However, the increasing quantity of data for disinfection byproducts (DBPs) produced from research projects and regulatory programs remains largely unexploited, despite a great potential for its use in optimizing drinking water quality monitoring to meet specific objectives. In this work, we developed a procedure to optimize locations and periods for DBPs monitoring based on a set of monitoring scenarios using the cluster analysis technique. The optimization procedure used a robust set of spatio-temporal monitoring results on DBPs (THMs and HAAs) generated from intensive sampling campaigns conducted in a residential sector of a water distribution system. Results shows that cluster analysis allows for the classification of water quality in different groups of THMs and HAAs according to their similarities, and the identification of locations presenting water quality concerns. By using cluster analysis with different monitoring objectives, this work provides a set of monitoring solutions and a comparison between various monitoring scenarios for decision-making purposes. Finally, it was demonstrated that the data from intensive monitoring of free chlorine residual and water temperature as DBP proxy parameters, when processed using cluster analysis, could also help identify the optimal sampling points and periods for regulatory THMs and HAAs monitoring. © 2018 Elsevier Ltd. All rights reserved.

Handling Editor: W Mitch Keywords: Trihalomethanes Haloacetic acids Cluster analysis Monitoring Drinking water

1. Introduction Disinfection is widely used to inactivate microorganisms and ensure the safety of drinking water in the distribution system. However, chemical disinfectants are also powerful oxidants, reacting with natural organic matter, anthropogenic contaminants and bromide/iodide present in source waters to form disinfection by-products (DBPs) (Richardson, 2011). Among the many DBPs

* Corresponding author. E-mail address: [email protected] (I. Delpla). https://doi.org/10.1016/j.chemosphere.2018.06.009 0045-6535/© 2018 Elsevier Ltd. All rights reserved.

identified, trihalomethanes (THMs) and haloacetic acids (HAAs) are the main group of DBPs usually detected in drinking water. These DBPs exhibit an important spatio-temporal variability within the rodes, distribution system (Legay et al., 2010; Rodriguez and Se 2001; Rodriguez et al., 2003, 2004; Scheili et al., 2015). This variability is associated mainly with raw water characteristics, treatment operations and distribution. The following variables are commonly proposed to explain DBP variability: content of organic matter (amount and composition), water temperature, pH, residence time, chlorine dose, and bromide concentrations (Krasner, rodes, 2001). 1999; Oxenford, 1996; Rodriguez and Se Because of the potential impact of DBPs on human health such

I. Delpla et al. / Chemosphere 208 (2018) 512e521

as cancer (bladder and colorectal) and reproductive outcomes (stillbirth and growth retardation) (Levallois et al., 2010; Richardson et al., 2007; Toledano et al., 2004; Villanueva et al., 2003), some groups of compounds such as THMs and HAAs are bec (Canada), regulated in several countries. In the province of Que the drinking water regulation sets an acceptable annual maximum concentration of 80 mg/L for THM4 (chloroform, bromodichloromethane, chlorodibromomethane and tribromomethane) and 60 mg/L for HAA5 (monochloroacetic acid, dichloroacetic acid, trichloroacetic acid, monobromoacetic acid and dibromoacetic acid), based on the annual average of quarterly (trimester) samples at the extremity of the distribution system. In the case of multiple sampling points during a trimester, the maximum concentrations of THM4 and HAA5 should be identified for that trimester and the mean of the maximum values for four trimesters must be calculated (MDDELCC, 2017). Parameters that affect DBP variability (e.g., free chlorine, water residence time) can be used to identify the monitoring points where the highest DBP concentrations are observed (MDDELCC, 2017). However, the regulation does not describe how the sampling locations can be identified optimally. In this context, different methods have been developed to identify the optimal monitoring points for water quality monitoring. Some studies seek to optimize drinking water monitoring by optimizing demand coverage in the distribution network (Kumar et al., 1997; Lee and Deininger, 1992; Liu et al., 2010) or by simulating free chlorine residual degradation (Woo and Kim, 2003). However, very few studies have attempted to optimize the monitoring of DBPs and their explanatory variables in distribution systems. In humid continental climates that undergo important temperature variations and have significant annual total precipitation, the control of DBPs is of particular importance during the warm season. The highest levels of DBPs were noted in this period because of a higher presence of organic precursors in raw waters, and higher water temperatures which favor DBP presence in the distribution system (Abdullah et al., 2009; Rodriguez et al., 2007). As examples, the highest levels of commonly-detected DBPs, such as THMs and HAAs, were measured during the warm season in Canada (LeBel et al., 1997; Rodriguez et al., 2007; Williams et al., 1997), China (Chen et al., 2008; Wei et al., 2010), and Poland bec, Canada, (Buszewski and Ligor, 2001; Dojlido et al., 1999). In Que THM levels can be 2.5- to 5-fold higher in summer, compared to averaged levels in winter (Rodriguez et al., 2003, 2007). In other countries with a similar climate, such as North China and Poland, THM and HAA levels can be 2- to 6-fold higher in late summer than in winter (Buszewski and Ligor, 2001; Chen et al., 2008; Dojlido et al., 1999; Wei et al., 2010). Ultimately, these observations highlight some possible issues concerning control and regulatory compliance for THMs and HAAs (Rodriguez et al., 2007). In Quebec, DBP regulation is based on quarterly samples (from 1 to 8 per season, depending on the size of the distribution system (MDDELCC, 2017), and the regulatory exceedances that can be observed are very often associated with values measured in summer. Research projects and monitoring for regulatory purposes (online and manual monitoring) have generated important and complex multidimensional water quality databases. The analysis of large data sets is challenging, but can be achieved through the application of various multivariate statistical approaches (principal component analysis, multiple regression, cluster analysis, factorial analysis, discriminant analysis, etc.) (Hamzaoui-Azaza et al., 2011; Iscen et al., 2008; Kazi et al., 2009; Shrestha and Kazama, 2007; Simeonov et al., 2003; Singh et al., 2005; Vega et al., 1998; Wang et al., 2013). These approaches, when applied to multiple

513

parameters, might lead to an improved understanding of water quality and a better identification of the factors and/or sources that impact water supply systems; they might also constitute a tool for reliable water resources management to help improve the response to contamination events (Liu et al., 2015; Morales et al., 1999; Reghunath et al., 2002; Wunderlin et al., 2001). However, these techniques are rarely applied to the optimization of drinking water monitoring in distribution networks. The main objective of this study is to optimize water quality monitoring design in distribution networks using data mining techniques based on high frequency water quality data. Two specific objectives were defined: i) Optimize locations and periods of DBPs monitoring, and ii) evaluate the use of parameters that affect DBP variability in order to optimize DBP monitoring. To achieve this, we used a high frequency spatio-temporal water quality data set acquired for two regulated groups of DBPs: THMs and HAAs based on multiple monitoring points in a distribution system. The method chosen for data mining was clustering analysis. This method is often used for the optimization of monitoring networks in rivers, but is rarely applied to distribution systems with DBP data, possibly because high frequency data for these compounds are rarely available (due mainly to the relatively high costs of their analysis). Only the study of Astel et al. (2006) uses a clustering method for the assessment of DBP spatial variability in distribution networks. However, this study was conducted with a reduced number of measurements made at low frequency (12 sites, 24 samples per site, 1 sampling/season) on a large distribution system during a period of 6 years. Using data resulting from high frequency sampling and multiple monitoring locations that cover both the spatial and temporal dimensions could help assess the suitability of cluster analysis for the optimization of drinking water monitoring.

2. Materials and methods 2.1. Area of study and data This study is based on a large database acquired through a robust monitoring program conducted previously to assess the residence times and spatio-temporal variability of DBPs in the bec, Canada (Rochette, 2016). distribution system of the city of Que This distribution system provides water to almost 306 000 citizens. The specific sector under study in this distribution system is a bec City residential area built in the 1960s, representative of Que suburbs. In this sector, 6000 citizens are supplied by the distribution system, in an area of approximately 2 km2. Two valve chambers (A and B) supplied the sector, with flows measured at A being about 9% of those measured in B on average during the study period (Rochette et al., 2017). The distributed water comes from a surface water source that is subjected to a complete water treatment including coagulation/flocculation, sedimentation, inter-ozonation and filtration, followed by chlorination before distribution. The climate in the area is classified as humid continental and, consequently, is characterized by important variations during the year, with average air temperatures ranging from 12.8  C to 19.3  C. These variations have impacts on the variability of raw water and the quality of distributed water. Because there are very small variations in drinking water quality and DBP levels during the cold season (from October to April), we decided to focus sampling on the warm season (Spring: May to June; Summer: July to September). In fact, the highest levels of DBPs are observed during the warm season, due to the presence of organic precursors that increase chlorine demand and the higher temperatures that favor DBPs occurrence (Rodriguez et al., 2007).

514

I. Delpla et al. / Chemosphere 208 (2018) 512e521

Our previous work in the system under study has shown that the levels of THMs and HAAs are usually very low during the cold periods (75th percentile of the population distribution in the dissemination areas). It should noted that regulatory compliance is the underlying objective behind the three objectives elicited. The three objectives helped define the areas under study. They are mapped in Supplementary material 1. The monitoring objectives were developed in different scenarios by modifying the parameters, frequency and number of sampling locations (Table 1). These scenarios take into account the cost of water quality analysis in their design in order to propose solutions at similar costs. Four different scenarios were applied for the three different objectives. In Table 1, scenario 1 corresponds to a high frequency DBPs monitoring with consequently less monitoring points. Scenario 2 corresponds to the sampling frequency and number of monitoring points that are defined in the drinking water regulation in the Province of Quebec for this size of the distribution system (MDDELCC, 2017). Finally, scenarios 3 and 4 correspond to the monitoring of DBPs proxies parameters; scenario 3 giving priority to the sampling frequency and scenario 4 to the number of monitoring points. 2.3. Statistical methods Cluster analysis can be used for data reduction, hypothesis generation and testing, and prediction based on groups (Halkidi et al., 2001). Clustering analysis is conducted with a hierarchical (agglomerative or divisive) or a non-hierarchical method (k-groups partitioning). In hierarchical clustering, clusters are formed

I. Delpla et al. / Chemosphere 208 (2018) 512e521

515

bec). Fig. 1. Location of the sampling sites and valve chambers in the distribution network (Adapted from Rochette, 2016 and Ville de Que

sequentially by starting with the most similar pair of objects and forming higher clusters step-by-step. Similarity (the difference between analytical values of two observations) is calculated using squared Euclidean distances. Ward's method uses the analysis of variance approach to evaluate the distances between clusters in order to minimize the sum of squares of any two hypothetical

clusters that could be formed at each grouping step (Alberto et al., 2001). This method is often used for grouping analysis, particularly in studies focusing on the optimization of the number of sampling points and/or water quality parameters measured in river water quality monitoring networks (Kazi et al., 2009; Shrestha and Kazama, 2007; Singh et al., 2005; Vega et al., 1998).

516

I. Delpla et al. / Chemosphere 208 (2018) 512e521

Table 1 Description of the scenarios under study. Objective

Scenario

Number of locations

Frequency

Parameters

1. Reflect maximum mean concentration

1.

1

1/month

2.

4

4/yeara

3. 4. 5.

2 8 1

1/week 1/month 1/month

6.

4

4/yeara

7. 8. 9.

2 8 1

1/week 1/month 1/month

10.

4

4/yeara

11. 12.

2 8

1/week 1/month

THM4 HAA6 THM4 HAA6 Free chlorine, Free chlorine, THM4 HAA6 THM4 HAA6 Free chlorine, Free chlorine, THM4 HAA6 THM4 HAA6 Free chlorine, Free chlorine,

2. Reflect maximum exposure of vulnerable populations

3. Reflect maximum exposure of inhabitants

a

UV254, temperature, residence times UV254, temperature, residence times

UV254, temperature, residence times UV254, temperature, residence times

UV254, temperature, residence times UV254, temperature, residence times

Corresponding to 1 sample/season (seasons were defined as MayeJune (spring) and JulyeSeptember (summer)).

In this study, Ward's agglomerative hierarchical cluster analysis was applied. k-means clustering was also tested but showed problems of convergence with our data and was consequently not used for our analysis. Clusters were validated by calculating the Jaccard coefficient. This coefficient is a similarity measure between sets. It is calculated as the ratio of the number of elements in the intersection of two sets over the number of elements in the union of these same sets (Hennig, 2007). Bootstrap resampling was used to evaluate how stable a given cluster was, by calculating the mean value of its Jaccard coefficient over all the bootstrap iterations (n ¼ 100). Data were standardized before running the analysis. All statistical analyses were performed using Matlab version R2016a. In this project, the clustering method was used to: i) assess spatio-temporal drinking water quality variation (THM4 and HAA6) by highlighting some specific points in the distribution network (i.e., maximums of DBPs) and identifying different areas with an homogenous water quality, and ii) identify some proxy parameters for which maximums DBPs were observed. To account for spatial and temporal issues linked with DBPs variability, the cluster analysis was conducted separately according to each spatial and temporal component. Then, the number of clusters and periods were adapted to the scenarios defined in section 2.2 and the maximum value of the parameter within each cluster was identified. 3. Results and discussion First, the application of cluster analysis to identify particular sub-sectors of the studied area of the distribution system (example of THM4 and HAA6) is presented. Then, the DBP monitoring is optimized by applying cluster analysis according to the scenarios previously defined to identify each different sub-sector and the maximum of parameters of interest within each sub-sector. 3.1. Spatial similarity and site grouping for DBPs The dendrogram of patterns for THM4 and HAA6 obtained from cluster analysis and the spatial locations of clusters are presented in Figs. 2 and 3. In the dendrogram, the x-axis represents the monitoring locations and the y-axis is the measure of the closeness of clusters. DBP concentrations depend upon geographic location. As a result of the cluster analysis, two stable clusters for THM4 (Jaccard's coefficient of 0.87 and 0.91, for sub-sector A and B respectively) and

for HAA6 (Jaccard's coefficient of 0.78 and 0.90, for sub-sector A and B respectively) could be identified. Increasing the number of clusters led to the creation of more unstable clusters that were not presented here consequently. For THM4, sub-sector A corresponds to the area where the highest mean and maximum concentrations were observed within the whole study period (34.0 mg/L and 51.4 mg/L, respectively) and for which the lowest mean free chlorine residual was also observed (0.15 mg/L). Additionally, the subsector B corresponded to the area that is hydraulically the closest to the two valve chambers and the main supply pipes. Conversely, for HAA6, the highest mean and maximum concentrations were observed in sub-sector B (23.1 mg/L and 35.5 mg/L, respectively). A single area represented by only one sampling point (sub-sector A) was also defined by cluster analysis. This point corresponds to the lowest HAA6 concentration. Moreover, from a hydraulic point of view, this point is located in an isolated loop where the water residence time could be high and where the lowest free chlorine residual was measured. It has been suggested that some species of HAAs are degraded by microbial activity (e.g., biofilm regrowth) (Bayless and Andrews, 2008; Pluchon et al., 2013; Rodriguez et al., 2004). At this point, the low levels of free chlorine residual could have favored microbial activity. These factors might explain the possible degradation of HAA6 and the lowest concentrations observed. Our results suggest that clustering analysis is helpful in identifying spatially homogenous groups of sampling points (from a drinking water quality perspective) and linking a range of DBP concentrations with a geographical location in the distribution system. Furthermore, specific points, such as sampling points with minimum DBP concentrations, were also revealed by this analysis. In the study of Astel et al. (2006), the clustering technique was applied to the entire distribution system of the city of Gdansk, bec) and revealed only Poland (comparable in size to the city of Que three distinct zones relative to the concentration levels of various groups of DBPs (THMs and organohalogens compounds). Our results also showed that DBPs levels can have very different patterns at a much finer scale, i.e., inside a sub-sector of the distribution system. We analyzed the spatial distribution of THM4 and HAA6 according to the deprivation index. By comparing the THM4 levels between deprived sectors (points 1, 2, 5, 7 to 9, 12 to 17, 19 and 20) and wealthiest sectors (points 3, 4, 6, 10, and 11), only slight non significant differences could be observed in mean (28.1 mg/L and 26.0 mg/L, respectively) and maximum values (44.0 mg/L and

I. Delpla et al. / Chemosphere 208 (2018) 512e521

517

Fig. 2. Dendrogram of THM4 for the neighbourhood under study (left panel). Spatial representation of the corresponding clusters (right panel).

Fig. 3. Dendrogram of HAA6 for the neighbourhood under study (left panel). Spatial representation of the corresponding clusters (right panel).

40.9 mg/L, respectively). No difference was observed for HAA6 levels. 3.2. Optimal locations and periods for DBP monitoring In this section, according to the first objective, we apply clustering analysis with the constraints of different monitoring scenarios and objectives to find optimal locations and periods for DBP monitoring. We applied the 12 scenarios (shown in Table 1) by adjusting the number of clusters and the frequency of sampling for the different parameters (THM4 and HAA6 and their explanatory variables) and, finally, by identifying the maximum within each area defined by clustering. As an example, Fig. 4 presents the optimal locations for THM4 monitoring according to the three objectives. The scenarios elicited (scenarios 2, 6 and 10) all have the same sampling frequency and number of locations. In this figure, four clusters were defined for the two seasons when sampling was conducted (spring and summer) for the three objectives. Then, the maximum THM4 concentration within each cluster was found and highlighted. The results reveal that the optimal monitoring locations vary between seasons and between

the three objectives tested. Considering the differences observed between seasons for the different scenarios and objectives, the variability in clusters reflected the variability of THM4 patterns within the season. The pattern of temporal THM4 variation within a season presented some similarities in some areas of the distribution network and the areas where these similarities are encountered change from one season to another. This explained the spatial variations observed in clustering. The residence time also played an important role in the variability of THM4; for scenario 2, the two blue clusters (dark and light) regrouped the points that are hydraulically the closest to the two valve chambers. As underlined by a recent review of the approaches to optimize water quality monitoring programs (WQMP), the determination of monitoring objectives is almost always noted in the literature as the most important action (Behmel et al., 2016). In our study, we propose various types of monitoring objectives to ensure that a drinking water manager has a sufficient number of options to make its optimization choice. Results show that the choice of monitoring objectives can have a major impact on the location of the optimal sampling points in the distribution system. However, it should be

518

I. Delpla et al. / Chemosphere 208 (2018) 512e521

Fig. 4. Optimal locations for THM4 monitoring according to different sampling objectives. Circles represent the maximum concentration for each cluster. The color schemes serves only to differentiate the clusters.

noted that some points are always elicited irrespective of the objective (i.e., point 14 for the spring season and points 15, 17 and 20 for the summer season in Fig. 4), proving that various objectives can be reconciled with the same choice of location. Identifying these points could prove helpful to water quality managers who might prefer to reconcile different objectives for drinking water monitoring optimization. Complete results for the 12 different scenarios are provided in the Supplementary material 2. Note that some sampling points and dates are also similar. For example, scenarios 1, 5 and 9 (considering the same number of sampling points and frequency) lead to a majority of similar sampling point and dates for THM4 and HAA6 for the three objectives. Thus, it would appear that these sampling points and dates might be good candidates for reconciling the three different objectives. However, it is more difficult to find similar sampling points and dates for different scenarios under the same objective. This result was expected as the parameters are different according to the different scenarios, resulting in different monitoring points and dates. 3.3. Selection of representative locations and periods for monitoring using DBPs proxies Different parameters can explain the spatio-temporal variability of DBPs (free chlorine residual, temperature, pH, residence times, water temperature, TOC or UV254). According to the second objective of this study, we identified in this last section the optimal sampling points and dates for DBPs monitoring using cluster analysis with THM4 and AHA6 proxies. According to Rochette (2016), in the sector of the distribution system under study, the best proxy for THM4 is water temperature and the best proxy for HAA6 is TOC, an organic matter indicator. Free chlorine residual is also another common DBPs explanatory variable (Krasner, 1999; rodes, 2001). Consequently, we Oxenford, 1996; Rodriguez and Se tested water temperature, UV254 (as a proxy of organic matter since TOC was only measured in one sampling location), and free

chlorine residual to find the best monitoring locations and sampling dates for each scenario. The simulated residence times were not tested as they showed only slight correlations with THM4 and AHA6 (Rochette, 2016). Then, the corresponding values of THM4 and HAA6 for these points and dates were collected and the mean of the seasonal maximums was calculated according to the calcubec regulation on lation method for DBP compliance in the Que drinking water (MDDELCC, 2017). In the regulation, it is mandatory to identify the seasonal maximum and to calculate a mean of the seasonal maximums (MDDELCC, 2017). The large and representative spatio-temporal database of 440 observations for the sector under study allowed us to calculate a “factual” maximum for THM4 and HAA6 using this calculation method for comparison. Results of this comparison are presented in Table 2. Important differences may be seen according to the scenarios and proxies used. The three proxies tested led to a very different configuration of sampling points and dates which explains the large differences of mean THM4 and HAA6 concentrations (Table 2). The identification of optimal points with UV254 led to an important underestimation of the “factual” THM4 and HAA6 maximum. However, using free chlorine residual as a proxy of THM4 and temperature as a proxy of HAA6, and then applying the sampling constraints of the regulation with cluster analysis resulted in a good approximation of the “factual” THM4 and HAA6 maximum and only to a small underestimation. The result for THM4 was expected, as free chlorine residual is correlated with this parameter in the database (Spearman r ¼ 0.52, p < 0.01); however, it was surprising for HAA6, since the correlation with temperature is quite weak (r ¼ 0.32, p < 0.01). One explanation could be that water temperature variability is well correlated temporally with HAA6 due to similar patterns of seasonal changes in both parameters, and that the temporal variability is significantly more important than the spatial variability for this parameter (Rochette, 2016). Such differences in proxy type for the two DBPs groups were expected because maximums of THMs and HAAs are generally measured in different locations in the distribution system. THM maximums are generally

I. Delpla et al. / Chemosphere 208 (2018) 512e521

519

Table 2 Comparison between mean THM4 and HAA6 values obtained by calculating the “factual” maximum of THM4 and HAA6 and those obtained from the monitoring locations and dates optimized with DBPs proxies (free residual chlorine, temperature and UV254). Objective

Scenarioa

1

1 2 5 6 9 10

2 3 a

“Factual” maximum

Sampling design optimized with free residual chlorine measurement

THM4 (mg/L)

HAA6 (mg/L)

THM4 (mg/L)

HAA6 (mg/L)

THM4 (mg/L)

HAA6 (mg/L)

THM4 (mg/L)

HAA6 (mg/L)

46.4

30.5

40.0 42.6 40.0 42.6 42.0 38.8

19.3 24.0 19.3 24.0 17.3 26.0

32.8 38.1 30.8 38.1 27.2 44.0

24.0 24.0 24.0 24.0 22.5 29.8

38.9 35.7 38.9 35.7 39.6 35.7

19.6 21.5 19.6 21.5 19.1 22.8

Sampling design optimized with water temperature measurement

Sampling design optimized with UV254 measurement

The others scenarios were not used here as they are not designed to estimate DBPs levels.

measured at the extremities of distribution systems, in zones with a higher contact time between chlorine and water (Bove et al., 2007; Rodriguez et al., 2004; Williams et al., 1997), whereas HAAs maximums are generally measured in the middle of distribution systems due to biofilm degradation (Rodriguez et al., 2004; Tung and Xie, 2009). According to these results, it would appear that considering an intensive spatio-temporal monitoring of free chlorine residual and water temperature in combination with cluster analysis could help optimize the monitoring of regulated DBPs in a residential distribution system supplied with treated surface water disinfected with chlorine. This approach appears to be particularly interesting due to the fact that the measurement of these two parameters is quick and inexpensive. This study has some limitations. It covers only two seasons (spring and summer) rather than an entire year, as mandated by the bec regulation on drinking water quality for monitoring DBPs Que (MDDELCC, 2017). However, as explained previously, in the region under study the warm season is the critical period for DBP monitoring. Moreover, there is other chemometric analyses that could be used for improving data interpretation, in particular data visualization, such as self-organizing maps. This technique allows to reduce the dimensionality of data through the use of selforganizing neural networks (Astel et al., 2007; Tsakovski et al., 2009). This technique has some advantages in data visualization as it allows to visualize easily outliers, presents some semi quantitative spatial information and is able to present both similarities between positive as well as negative correlated variables (Astel et al., 2007; Tsakovski et al., 2009). The optimization procedure proposed here is highly dependent on the characteristics of water quality, the distribution system in the area under study and the type of disinfectant employed in the drinking water treatment plant. However, it should be noted that in bec, the majority of municipal drinking water treatment plants Que use surface waters as a drinking water source and chlorine as a secondary disinfectant (MDDELCC, 2016). Furthermore, only one area of the distribution system was subjected to the analysis. Further work should be conducted by extending the application of the methodology developed for this bec study to a larger distribution system such as the city of Que system, or at least to all similar residential areas of the city of bec, in order to optimize the monitoring locations and periods Que for regulatory surveillance. The critical challenge underlying the generalization of this methodology resides in data availability, as it will be necessary to implement an intensive THM4 and HAA6 data acquisition campaign for comparison purposes. The analysis of these parameters is particularly costly and time-consuming.

4. Conclusions The application of cluster analysis is helpful to classify water quality in different groups of THMs and HAAs based on their similarity and identify locations presenting water quality issues in a residential distribution system supplied with treated surface water disinfected with chlorine. By using cluster analysis defined according to different scenarios and coupled with maximum levels identification, this study determined a set of monitoring solutions according to three monitoring objectives and assessed the differences between various monitoring scenarios. Finally, results showed that using cluster analysis with intensive free chlorine and water temperature as DBP proxy parameters could help drinking water managers identify optimal sampling points and periods for regulatory DBP monitoring. Further work should be conducted to apply this methodology to larger sectors of this distribution system. Note We would provide on demand the code used for this work to the readers who are interested in applying our statistical approach. We invite these persons to contact directly the corresponding author. Conflicts of interest The authors declare that they have no conflict of interest. Acknowledgments We acknowledge Simon Rochette for designing and conducting the sampling program. The authors would like to thank the  Laval, and personnel of the Drinking Water Chair of Universite Sabrina Simard in particular, for the water quality analysis. We bec for providing acknowledge Francois Proulx of the City of Que help in the definition of scenarios. This work was funded by NSERC and Mitacs with the partnership of Thales Canada and the Joint Research Unit in urban sciences (UMRsu). Appendix A. Supplementary data Supplementary data related to this article can be found at https://doi.org/10.1016/j.chemosphere.2018.06.009. References Abdullah, M.P., Yee, L.F., Ata, S., Abdullah, A., Ishak, B., Abidin, K.N.Z., 2009. The study of interrelationship between raw water quality parameters, chlorine demand and the formation of disinfection by-products. Phys. Chem. Earth, Parts A/B/C 34 (13), 806e811.

520

I. Delpla et al. / Chemosphere 208 (2018) 512e521

Alberto, W.D., Marıa del Pilar, D., Marıa Valeria, A., Fabiana, P.S., Cecilia, H.A., Marıa  de los Angeles, B., 2001. Pattern recognition techniques for the evaluation of spatial and temporal variations in water quality. A case study: Suquıa river basin  rdobaeArgentina). Water Res. 35 (12), 2881e2894. https://doi.org/10.1016/ (Co S0043-1354(00)00592-3. Astel, A., Biziuk, M., Przyjazny, A., Namiesnik, J., 2006. Chemometrics in monitoring spatial and temporal variations in drinking water quality. Water Res. 40 (8), 1706e1716. https://doi.org/10.1016/j.watres.2006.02.018. Astel, A., Tsakovski, S., Barbieri, P., Simeonov, V., 2007. Comparison of selforganizing maps classification approach with cluster and principal components analysis for large environmental data sets. Water Res. 41 (19), 4566e4578. Balazs, C., Morello-Frosch, R., Hubbard, A., Ray, I., 2011. Social disparities in nitrate contaminated drinking water in California's san Joaquin valley. Environ. Health Perspect. 1272e1278. https://doi.org/10.1289/ehp.1002878. Bayless, W., Andrews, R.C., 2008. Biodegradation of six haloacetic acids in drinking water. J. Water Health 6, 15e22. https://doi.org/10.2166/wh.2007.002. Behmel, S., Damour, M., Ludwig, R., Rodriguez, M.J., 2016. Water quality monitoring strategies ??? A review and future perspectives. Sci. Total Environ. 571, 1312e1329. https://doi.org/10.1016/j.scitotenv.2016.06.235. Bove, G.E., Rogerson, P. a, Vena, J.E., 2007. Case control study of the geographic variability of exposure to disinfectant byproducts and risk for rectal cancer. Int. J. Health Geogr. 6, 18. https://doi.org/10.1186/1476-072X-6-18. Buszewski, B., Ligor, T., 2001. Application of different extraction methods for the quality control of water. Water, Air, Soil Pollut. 129 (1), 155e165. Chen, C., Zhang, X. jian, Zhu, L. xia, Liu, J., He, W. jie, Han, H. da, 2008. Disinfection ́ their precursorś in a water treatment plant in North China: ́ by-products and seasonal changes and fraction analysis. Sci. Total Environ. 397, 140e147. https:// ́ doi.org/10.1016/j.scitotenv.2008.02.032. ^ Dojlido, J., Zbie c, E., Swietlik, R., 1999. Formation of the haloacetic acids during ozonation and chlorination of water in Warsaw waterworks (Poland). Water Res. 33 (14), 3111e3118. Hales, S., Black, W., Skelly, C., Salmond, C., Weinstein, P., 2003. Social deprivation and the public health risks of community drinking water supplies in New Zealand. J. Epidemiol. Community Health 57, 581e583. Halkidi, M., Batistakis, Y., Vazirgiannis, M., 2001. On clustering validation techniques. J. Intell. Inf. Syst. 17 (2e3), 107e145. Hamzaoui-Azaza, F., Ketata, M., Bouhlila, R., Gueddari, M., Riberio, L., 2011. Hydrogeochemical characteristics and assessment of drinking water quality in ZeussKoutine aquifer, southeastern Tunisia. Environ. Monit. Assess. 174, 283e298. https://doi.org/10.1007/s10661-010-1457-9. Hennig, C., 2007. Cluster-wise assessment of cluster stability. Comput. Stat. Data Anal. 52, 258e271. https://doi.org/10.1016/j.csda.2006.11.025. € Ilhan, S., Arslan, N., Yilmaz, V., Ahiska, S., 2008. Application Iscen, F.C., Emiroglu, O., of multivariate statistical techniques in the assessment of surface water quality in Uluabat Lake. Turkey. Environ. Monit. Assess. 144, 269e276. https://doi.org/ 10.1007/s10661-007-9989-3. Kazi, T.G., Arain, M.B., Jamali, M.K., Jalbani, N., Afridi, H.I., Sarfraz, R.A., Baig, J.A., Shah, A.Q., 2009. Assessment of water quality of polluted lake using multivariate statistical techniques: a case study. Ecotoxicol. Environ. Saf. 72, 301e309. https://doi.org/10.1016/j.ecoenv.2008.02.024. Krasner, S.W., 1999. Chemistry of Disinfection By-product Formation. in: Formation and Control of Disinfection By-Products in Drinking Water. American Water Works Association, Denver, CO, pp. 5e27. Kumar, A., Kansal, M.L., Arora, G., 1997. Identification of monitoring stations in water distribution system. J. Environ. Eng. 123 (8), 746e752. LeBel, G.L., Benoit, F.M., Williams, D.T., 1997. A one-year survey of halogenated disinfection by-products in the distribution system of treatment plants using three different disinfection processes. Chemosphere 34 (11), 2301e2317. Lee, B.H., Deininger, R.A., 1992. Optimal locations of monitoring stations in water distribution system. J. Environ. Eng. 118 (1), 4e16. rodes, J.B., Levallois, P., 2010. The assessment of popLegay, C., Rodriguez, M.J., Se ulation exposure to chlorination by-products: a study on the influence of the water distribution system. Environ. Health 9, 59. https://doi.org/10.1186/1476069X-9-59. Levallois, P., Gingras, S., Marcoux, S., Legay, C., Catto, C., Rodriguez, M., Tardif, R., 2012. Maternal exposure to drinking-water chlorination by-products and smallfor-gestational-age neonates. Epidemiology 23, 267e276. https://doi.org/10. 1097/EDE.0b013e3182468569. Liu, S., Li, Z., Chen, J., Wang, Q., Meng, F., 2010. Flaw of demand coverage based method for optimal locations of monitoring stations and modification. Huanjing Kexue 31 (1), 88e92. Liu, S., Smith, K., Che, H., 2015. A multivariate based event detection method and performance comparison with two baseline methods. Water Res. 80, 109e118. bec (Bilan de la MDDELCC, 2016. Assessment of drinking water quality in Que  de l’eau potable au Que bec) 2010-2014, 2016, p. 80. http://www. qualite mddelcc.gouv.qc.ca/ (last access : 09th January 2016). glement sur la MDDELCC, 2017. Regulation on the Quality of Drinking Water (Re  de l'eau potable). Government of Que bec, Que bec, Canada. qualite Morales, M.M., Martih, P., Llopis, A., Campos, L., Sagrado, J., 1999. An environmental study by factor analysis of surface seawater in the Gulf of Valencia (western Mediteranean). Anal. Chim. Acta 394, 109e117. Oxenford, J.L., 1996. Disinfection by-products: current practices and future

directions. In: Minear, R.A., Amy, G.L. (Eds.), Disinfection By-Products in Water Treatment: the Chemistry of Their Formation and Control. Lewis Publishers, Florida, USA, pp. 3e16. rodes, J.B., Berthiaume, C., Charette, S.J., Gilbert, Y., Filion, G., FournierPluchon, C., Se Larente, J., Rodriguez, M., Duchaine, C., 2013. Haloacetic acid degradation by a biofilm in a simulated drinking water distribution system. Water Sci. Technol. Water Supply 13, 447e461. https://doi.org/10.2166/ws.2013.041. Reghunath, R., Murthy, T.R.J., Raghavan, B.R., 2002. The utility of multivariate statistical techniques in hydrogeochemical studies: an example from Karnataka, India. Water Res. 36, 2437e2442. Richardson, S.D., 2011. Disinfection by-products: formation and occurrence in drinking water. In: Nriagu, J.O. (Ed.), Encyclopedia of Environmental Health, vol. 1. Elsevier, Burlington, pp. 110e136. Richardson, S.D., Plewa, M.J., Wagner, E.D., Schoeny, R., Demarini, D.M., 2007. Occurrence, genotoxicity, and carcinogenicity of regulated and emerging disinfection by-products in drinking water: a review and roadmap for research. Mutat. Res. 636, 178e242. https://doi.org/10.1016/j.mrrev.2007.09.001.  du temps de se jour, du chlore et des sous-produits Rochette, S., 2016. Variabilite s de la de sinfection  chelle d'un quartier re sidentiel. Ph.D thesis. chlore a l'e Rochette, S., Pelletier, G., Bouchard, C., Rodriguez, M., 2017. Variability of water residence time and free chlorine and disinfection by-product concentrations within a residential neighborhood. J. Water Supply Res. Technol. - Aqua 66 (4), 219e228. rodes, J.B., 2001. Spatial and temporal evolution of triRodriguez, M.J., Se halomethanes in three water distribution systems. Water Res. 35, 1572e1586. https://doi.org/10.1016/S0043-1354(00)00403-6. rodes, J.B., Levallois, P., 2004. Behavior of trihalomethanes and Rodriguez, M.J., Se haloacetic acids in a drinking water distribution system. Water Res. 38, 4367e4382. https://doi.org/10.1016/j.watres.2004.08.018. rodes, J.-B., Levallois, P., Proulx, F., 2007. Chlorinated disinfection Rodriguez, M.J., Se by-products in drinking water according to source, treatment, season, and distribution location. J. Environ. Eng. Sci. 6, 355e365. https://doi.org/10.1139/ s06-055. rodes, J.B., Bouchard, C., 2003. Trihalomethanes in Rodriguez, M.J., Vinette, Y., Se bec region (Canada): occurrence, variations and drinking water of greater Que modelling. Environ. Monit. Assess. 89, 69e93. https://doi.org/10.1023/A: 1025811921502. Scheili, A., Rodriguez, M.J., Sadiq, R., 2015. Seasonal and spatial variations of source and drinking water quality in small municipal systems of two Canadian regions. Sci. Total Environ. 508, 514e524. https://doi.org/10.1016/j.scitotenv.2014.11.069. Shrestha, S., Kazama, F., 2007. Assessment of surface water quality using multivariate statistical techniques: a case study of the Fuji river basin, Japan. Environ. Model. Software 22, 464e475. https://doi.org/10.1016/j.envsoft.2006.02.001. rodes, J.B., Rodriguez, M.J., 2013. Spatio-temporal variability of nonShanks, C.M., Se regulated disinfection by-products within a drinking water distribution network. Wat Res. 47 (9), 3231e3243. Simeonov, V., Stratis, J.A., Samara, C., Zachariadis, G., Voutsa, D., Anthemidis, A., Sofoniou, M., Kouimtzis, T., 2003. Assessment of the surface water quality in Northern Greece. Water Res. 37, 4119e4124. https://doi.org/10.1016/S00431354(03)00398-1. Singh, K.P., Malik, A., Sinha, S., 2005. Water quality assessment and apportionment of pollution sources of Gomti river (India) using multivariate statistical techniques - a case study. Anal. Chim. Acta 538, 355e374. https://doi.org/10.1016/j. aca.2005.02.006. Statistics Canada, 2006. 2006 Census of Population. May 16, 2006. Toledano, M.B., Nieuwenhuijsen, M.J., Best, N., Whitaker, H., Hambly, P., de Hoogh, C., Fawell, J., Jarup, L., Elliott, P., 2004. Relation of trihalomethane concentrations in public water supplies to stillbirth and Birth Weight in three water regions in England. Environ. Health Perspect. 113, 225e232. https://doi.org/10. 1289/ehp.7111. Tsakovski, S., Kudlak, B., Simeonov, V., Wolska, L., Namiesnik, J., 2009. Ecotoxicity and chemical sediment data classification by the use of self-organising maps. Anal. Chim. Acta 631 (2), 142e152. Tung, H.H., Xie, Y.F., 2009. Association between haloacetic acid degradation and heterotrophic bacteria in water distribution systems. Wat Res. 43 (4), 971e978. n, L., 1998. Assessment of seasonal and Vega, M., Pardo, R., Barrado, E., Deba polluting effects on the quality of river water by explaratory data analysis. Water Res. 32, 3581e3592. ndez, F., Malats, N., Grimalt, J.O., Kogevinas, M., 2003. MetaVillanueva, C.M., Ferna analysis of studies on individual consumption of chlorinated drinking water and bladder cancer. J. Epidemiol. Community Health 57, 166e173. Wang, Y., Wang, P., Bai, Y., Tian, Z., Li, J., Shao, X., Mustavich, L.F., Li, B.L., 2013. Assessment of surface water quality via multivariate statistical techniques: a case study of the Songhua River Harbin region, China. J. Hydro-Environ. Res. 7, 30e40. https://doi.org/10.1016/j.jher.2012.10.003. Wei, J., Ye, B., Wang, W., Yang, L., Tao, J., Hang, Z., 2010. Spatial and temporal evaluations of disinfection by-products in drinking water distribution systems in Beijing, China. Sci. Total Environ. 408, 4600e4606. https://doi.org/10.1016/j. scitotenv.2010.06.053. Williams, D.T., LeBel, G.L., Benoit, F.M., 1997. Disinfection by-products in Canadian drinking water. Chemosphere 34 (2), 299e316. https://doi.org/10.1016/S00456535(96)00378-5. Woo, H., Kim, J.H., 2003. Optimal location of water quality monitoring sites in water

I. Delpla et al. / Chemosphere 208 (2018) 512e521 distribution systems. In: Advances in Water Supply Management: Proceedings of the International Conference on Computing and Control for the Water Industry, 15e17 September 2003. Taylor & Francis, London, UK, p. 471. Wunderlin, D.A., Diaz, M.P., Ame, M.V., Pesce, S.F., Hued, A.C., Bistoni, M., 2001.

521

Pattern recognition techniques for the evaluation of spatial and temporal variation in water quality. A case study: Suquia river basin (Cordoba Argentina). Water Res. 35 (12), 2881e2894.