Mapping vulnerability of multiple aquifers using ...

3 downloads 0 Views 3MB Size Report
but the four FL-based models (SFL, MFL, LFL and SCFL) derive their values as per internal strategy within these .... assessed using the DRASTIC framework in its basic form given by US .... ber of the model data points (Tayfur et al., 2014, among others). 2.4. .... recharged (i) by seasonal rivers flowing from both the northern.
Science of the Total Environment 593–594 (2017) 75–90

Contents lists available at ScienceDirect

Science of the Total Environment journal homepage: www.elsevier.com/locate/scitotenv

Mapping vulnerability of multiple aquifers using multiple models and fuzzy logic to objectively derive model structures Ata Allah Nadiri a,⁎, Zahra Sedghi a, Rahman Khatibi b, Maryam Gharekhani a a b

Department of Earth Sciences, Faculty of Natural Sciences, University of Tabriz, 29 Bahman Boulevard, Tabriz, East Azerbaijan, Iran GTEV-ReX Limited, Swindon, UK

H I G H L I G H T S

G R A P H I C A L

A B S T R A C T

• DRASTIC vulnerability indices (VI) are improved to protect multiple aquifers system. • Inherent expert judgment is higher in VI values with unconfined/confined aquifers. • Correlations between basic VI and measured nitrate-N values are poor. • Acceptable correlation between SFL/ MFL/LFL results and distributed nitrate-N values • Multiple models of SCFL give more defensible results to serve as proactive tools.

a r t i c l e

i n f o

Article history: Received 28 December 2016 Received in revised form 5 March 2017 Accepted 11 March 2017 Available online xxxx Editor: Simon Pollard Keywords: Fuzzy logic (SFL MFL LFL) Groundwater vulnerability indices Multiple aquifers (confined and unconfined) Multiple models SCFL model

a b s t r a c t Driven by contamination risks, mapping Vulnerability Indices (VI) of multiple aquifers (both unconfined and confined) is investigated by integrating the basic DRASTIC framework with multiple models overarched by Artificial Neural Networks (ANN). The DRASTIC framework is a proactive tool to assess VI values using the data from the hydrosphere, lithosphere and anthroposphere. However, a research case arises for the application of multiple models on the ground of poor determination coefficients between the VI values and non-point anthropogenic contaminants. The paper formulates SCFL models, which are derived from the multiple model philosophy of Supervised Committee (SC) machines and Fuzzy Logic (FL) and hence SCFL as their integration. The Fuzzy Logicbased (FL) models include: Sugeno Fuzzy Logic (SFL), Mamdani Fuzzy Logic (MFL), Larsen Fuzzy Logic (LFL) models. The basic DRASTIC framework uses prescribed rating and weighting values based on expert judgment but the four FL-based models (SFL, MFL, LFL and SCFL) derive their values as per internal strategy within these models. The paper reports that FL and multiple models improve considerably on the correlation between the modeled vulnerability indices and observed nitrate-N values and as such it provides evidence that the SCFL multiple models can be an alternative to the basic framework even for multiple aquifers. The study area with multiple aquifers is in Varzeqan plain, East Azerbaijan, northwest Iran. © 2017 Elsevier B.V. All rights reserved.

⁎ Corresponding author. E-mail addresses: [email protected] (A.A. Nadiri), [email protected] (M. Gharekhani).

http://dx.doi.org/10.1016/j.scitotenv.2017.03.109 0048-9697/© 2017 Elsevier B.V. All rights reserved.

76

A.A. Nadiri et al. / Science of the Total Environment 593–594 (2017) 75–90

1. Introduction Tools are required for defensible decisions on proactive management of complex aquifer systems, in which anthropogenic contaminant risks directly affect the environment and human health. Complex aquifers are the subject of this paper, which cover physical varieties of groundwater aquifer types sometimes found in a single study area. The assessment of the DRASTIC Vulnerability Indices (VI) of complex aquifers require appropriate care, as discussed in the paper. Aquifer types depend on their source (alluvial, glacial drifts or rock fissures) but more so, on the hydraulics of groundwater level in the form of unconfined and confined aquifers. Multiple aquifers configured by a number of unconfined and confined aquifers in one basin are already known to the vulnerability index assessment problems using the DRASTIC framework as given by Aller et al. (1987). This paper investigates a data-driven model for the DRASTIC framework, in which vulnerability indices of multiple aquifers are identified through Artificial Intelligence (AI) techniques. The DRASTIC framework comprises generally the seven hydrogeological parameters covering the hydrosphere and lithosphere: Depth to water table (D), net Recharge (R), Aquifer media (A), Soil media (S), Topography or slope (T), Impact of the vadose zone (I), and hydraulic Conductivity (C). However, the local variations are accounted for by assigning rates to each parameter and the relative importance of each of these parameters is accounted for by assigning weights. Recently, the framework has been successfully applied to unconfined aquifers (Babiker et al., 2005; Huan et al., 2012; Ouedraogo et al., 2016; Shrestha et al., 2016; Sadeghfam et al., 2016; Baghapour et al., 2016; Jafari and Nikoo, 2016). The procedure of groundwater vulnerability assessment in multiple aquifers is the same for both unconfined and confined aquifers except for Depth (D), Aquifer (A) and Impact (I). This framework is a topdown prescriptive approach but despite its popularity, the DRASTIC framework is susceptible to: (i) the need for expert judgment on assigning weights and rates for each parameter, which expose the output vulnerability maps to uncertainties; and (ii) methodological problems in assessing vulnerability of two adjacent unconfined and confined aquifers in the same study area. The framework is also consensual and as such there are no right or wrong VI values and they cannot be measured.

Among different AI techniques, this paper uses two different AI techniques: Fuzzy Logic (FL) to treat subjectivity in DRASTIC indices and Artificial Neural Networks (ANN) but this is for a specific purpose, as discussed later. Application of FL techniques to the DRASTIC framework is categorized in Table 1 with a focus on the treatment of the ratings and weightings as required by the basic framework. This paper builds on them and identifies the values of the ratings and weightings for the parameters using Supervised Committee Machine with Artificial Intelligence (SCMAI) both in unconfined and confined aquifers except for the D, A and I parameters, as discussed in due course. The Committee Machine with Artificial Intelligence (CMAI) models may be implemented as a linear (CMAI) or nonlinear (SCMAI) method which is introduced by Nadiri et al. (2013). If DRASTIC vulnerability indices are to serve as defensible tools, the following gaps need to be treated: (i) hydrogeology parameters are inherently uncertain and imprecise; (ii) unconfined and confined aquifers use prescriptive values based on expert judgment to estimate groundwater vulnerability. This research attempts to fill these gaps by applying FL to input and output data to cater for their inherent uncertainty and imprecision and by employing (i) three FL models of Sugeno Fuzzy Logic (SFL), Mamdani Fuzzy Logic (MFL), Larsen Fuzzy Logic (LFL) models, which normally provide similar acceptable accuracy but with different strengths and weaknesses; and (ii) a nonlinear version of the Supervised Committee Fuzzy Logic (SCFL) is applied by to exploit the synergy inherent in these FL models. The use of ANN in this study is confined to identifying and seeking synergies in the constituent FL models by receiving the outputs from the three individual FL models as its input and derives new predictions as its final output and conditioning of these outputs using the measured nitrate-N values. Each individual FL model has its own way of handling uncertain parameters in the DRASTIC framework. The program of research currently undertaken by the authors provides evidence for the proof-of-concept to the application of AI through Supervised Committee techniques to DRASTIC-based vulnerability indices. Proof-of-concept is Technological Readiness Level 4 (akin to the NASA classification, see: https://www.nasa.gov/sites/default/files/trl. png) but insufficient to ensure the delivery of working tools. This

Table 1 Past applications of FL and ANN types of AI to DRASTIC framework. ID Model references

AHP FAHP FLT Individual AI modeling

SCMAI

ANN SFL MFL LFL ANN, NF, FL MFL, by ANN 1 2 3 4 5 6 7 8

Sener and Davraz (2015) Şener and Şener (2015) Dixon (2005) Mohammadi et al. (2009) Rezaei et al. (2013) Fijani et al. (2013) Nadiri et al. (2017a) Present study

SCFL

Aquifer type

SFL,MFL, LFL by ANN

Unconfined aquifer



Confined aquifer

Multiple aquifer





✔ ✔

✔ ✔ ✔

✔ ✔

✔ ✔

✔ ✔







✔ ✔ ✔



✔ ✔ ✔ ✔

Category 1: AHP modifies weights of DRASTIC parameters by a scheme in terms of relative importance of the DRASTIC parameters - e.g. (reference in row 1); none of the data is fuzzified. FAHP as in AHP but the DRASTIC data layers are fuzzified.(reference in row 2). Category 2: FLT use GIS software to process input data and fuzzify overlays of DRASTIC layers; output data are defuzzifies vulnerability indices and modeled through unsupervised FL modeling, e.g. reference in row 3 and 4. Note 1: These models have no optimization & rule definitions but assign weights as per Aller et al. (1987). Note 2: These applications were prescribed rules and if involved rule definitions by data clustering to improve output results, they probably used manual processing susceptible to expert opinions, subjectivity and uncertainty. Category 3: FL models applied to the DRASTIC framework, e.g. (reference in row 5) using SFL and (reference in row 6) using NF. Category 4: SCMAI models implemented by combining multiple AI models, where the model combination may be linear, e.g. (reference in row 6), or nonlinear, e.g. (reference in row 7); SCFL model uses multiple FL models in multiple aquifers (reference in row 8). Note: The term ‘Machine’ is another word for artificial; ‘Committee’ refers to drawing information from multiple models; and ‘Supervised’ refers to using supervised learning techniques in AI modeling. Abbreviations: AHP: Analytic Hierarchy Process, ANN: Artificial Neural Network, FAHP: Fuzzy Analytic Hierarchy Process, NF: Neuro-Fuzzy, LFL: Larsen Fuzzy Logic, MFL: Mamdani Fuzzy Logic, SFL: Sugeno Fuzzy Logic, SCMAI: Supervised Committee Machine with Artificial Intelligence, SCFL: Supervised Committee with Fuzzy Logic.

A.A. Nadiri et al. / Science of the Total Environment 593–594 (2017) 75–90

paper takes a step further to apply SCMAI (SCFL, here) to DRASTIC vulnerability indices towards delivering working tools. The challenge of taking SCFL from the proof-of-concept towards working tools is expedited through the test case of investigating the vulnerability of multiple aquifers in Varzeqan plain of approx. 500 km2, northwest of Tabriz, East Azerbaijan, Iran. It is located near the Sungun Copper Mine and its tailing dam is within the basin at the study area; conversely groundwater under the basin is the main source of water but exposed to strong impacts of anthropogenic activities (i.e. drinking, industry, agricultural, livestock and mining). The possibility of groundwater contamination in Varzeqan plain is high because of the ongoing extensive mining activities. Measuring the amount of nitrate-N in groundwater shows hotspot zones where concentrations are five times the maximum allowed by the World Health Organization (WHO) standards including higher levels of arsenic. Another particular feature of the basin is that it comprises multiple aquifers (unconfined and confined aquifers).

77

superficial deposits (top soil); and (ii) as above but the unconfined aquifer, vadose zone and superficial deposits are absent and the top aquiclude (confining layer) is exposed at the surface. Though the second possibility is rare, it is the case in Varzeqan plain. The particular focus of the paper is on the unconfined and the above Type ii confined aquifers. 2.1. Basic DRASTIC framework The potential to groundwater pollution in the multiple aquifers is assessed using the DRASTIC framework in its basic form given by US EPA (Aller et al., 1987) for both unconfined and confined aquifers by considering a set of seven parameters outlined in Table 2 and Fig. 1. Each one of the parameter layers is processed using raw data from different sources (Fig. 1: Column 1) and are assigned ratings and weighted according to DRASTIC standards (Fig. 1: Column 2-Box 1). The mathematical expression for the DRASTIC framework is as follows:

2. Approach and methodology

DRASTIC Vulnerability Index ¼ Dr Dw þ Rr Rw þ Ar Aw þ Sr Sw þ T r T w þ Ir Iw þ C r C w ð1Þ

A confined aquifer occurs in two ways: (i) it is situated beneath an unconfined aquifer with the stratification comprising: the aquiclude at the bedrock level, confined aquifers, aquiclude (confining layer), unconfined aquifer (saturated), vadose soil media (unsaturated) and

where capital letters are DRASTIC parameters and the subscripts “r” and “w” refer to the rating and weights, respectively. Notably, Eq. (1) is also referred to as ‘weighted overlay analysis’ creating a hierarchy of data: raw data, rated data and rated and weighted data.

Table 2 Outlining DRASTIC parameter. Parameter

Aquifer type Definitions and descriptions

Depth Unconfined

Recharge

Confined Both

Aquifer Unconfined

Confined

Soil media

Both

• Vulnerability is affected by the depth of unsaturated layer measured from surface to the water table • Restricts the migration of contaminants and the rate of water movement; hence provides additional time for contaminant attenuation. • Covers from the ground surface to the base of the confining layer • Vulnerability is affected by the total quantity of water (Recharge) infiltrating from the ground surface to the aquifer (cm) on an annual basis • Transports contaminants vertically to the water table and horizontally within the aquifer. • Main sources for recharge: rainfall, infiltration from rivers and agricultural recycled water • Vulnerability is affected by the subsurface water-yielding storage of soil or fissured media below water table or under piezometric pressure • Saturated media contains water within consolidated/unconsolidated strata, pore spaces and fractures irrespective of their lithography or facies • Aquifer media are directly foci to vulnerability, as they affect the flow within the aquifer and control contaminant contact within the aquifer • Vulnerability is affected by the confining boundary in two ways: the aquifer is defined in terms of the thickness of confining layers



(i) Common type stratigraphy: aquiclude at bedrock, confined aquifers, aquiclude, unconfined aquifer, vadose soil media and top soil (ii) Less common types (e.g. Varzeqan basin): absence of unconfined aquifer, vadose zone and superficial deposit exposed at the surface Vulnerability is affected by the media between ground surface and unsaturated soil often referred to as top soil in soils science It impacts on vulnerability when recharge water infiltrates into the aquifer the contaminant moves vertically through the vadose zone Low vulnerability stems from fine-textured materials (e.g. silts and clays) with low permeability and restricting contaminant migration High vulnerability stems from coarse soil media with high permeability rates in comparison with fine soil media, as above Vulnerability is affected by topographic slope, as follows. High topography (slope) encourages high flow rates at the surface and biodegradation and attenuation and less vulnerability Low topography (slope) encourage retention of water for longer periods and allow a greater infiltration and hence high vulnerability Vulnerability is affected by the vadose zone: the unsaturated soil zone above the water table below top soil, and depend on: (i) its type related to the attenuation characteristics of the material in the zone, and (ii) the control of this zone on the path of contaminant particles to the aquifer system The vadose zone for confined aquifer is delineated by the user based on the information available with its impacts reflecting:

• • • •

(i) the ability of the geologic materials to affect a contaminant moving from the appropriate zone above water surface, (ii) most significant impacts on the migration of pollutants are dependent on the confining layer but the user abstracts the information from different sources to quantify the appropriate impacts (iii) the impermeable layer is used regardless of the other media composition within the vadose zone Vulnerability is affected by hydraulic Conductivity depending on: material, its ability to transmit water for a given hydraulic gradient Hydraulic conductivity depends on voids, fractures, and bedding planes – the greater these factors, the greater conductivity Higher conductivity, the greater is the potential for pollution Contaminants controlled by the rate of groundwater flows.

• • •

Topography

Both

Impacts of vadose zone

• • • • •

Unconfined Confined

Hydraulic conductivity

Both

78

A.A. Nadiri et al. / Science of the Total Environment 593–594 (2017) 75–90

Fig. 1. Flowchart for multiple modeling of multiple aquifers. (Piez = Piezometer, Sat. = Saturation, Unsat. = Unsaturation, GL = Groundwater level).

The treatment of unconfined aquifers only differs from that of confined aquifers in terms of the three DRASTIC parameters (D, A, I), which are Depth, Aquifer, Impact. The three A, S and I parameters form thematic data, in which their variations in a particular media are discontinuous and described in qualitative terms. The remaining four parameters vary continuously, e.g. the depth parameter. Thus, the treatment of the D, R, T, and C parameters is the same for both unconfined and confined aquifers.

The study area is divided into square grid cells with a resolution of 50 m sides (each cell is a 2500 m2 in area) and therefore the total number of the cells in confined and unconfined 60,000 and 22,800, respectively. Each DRASTIC parameter can vary within each and all the grid cells. The variations with the cells are captured by assigning a rating value. For the continuously varying parameters, the parameter variability is ranged and assigned as per prescribed rules given by Aller et al. (1987). The rates typically vary in the range from 1 to 10. The weighting

A.A. Nadiri et al. / Science of the Total Environment 593–594 (2017) 75–90

for each parameter is a measure of its relative importance also assigned as per prescribed rules given by Aller et al. (1987), the values of which range from 1 to 5. The procedure is depicted in Fig. 1, as follows: (i) prepare raw data for each of the seven parameters; (ii) assign prescribed rates between 1 and 10 for each parameter; (iii) assign designated weights to each reclassified parameters, ranging from 1 to 5; and (iv) evaluate Eq. (1). All DRASTIC parameters and groundwater vulnerability map are carried out using a commercial GIS application. 2.2. Fuzzy logic Fuzzy Logic (FL), introduced by Zadeh (1965), is an outgrowth of classical set theory by (i) abandoning the requirement for sharp boundaries of classical sets; and (ii) replacing the membership of an object as a matter of either affirmation or denial by a membership that is a matter of degree (Demico and Klir, 2004). Fuzzy sets include partial membership ranging between 0 and 1 and are represented by a Membership Function (MF) with ambiguous boundaries and gradual transitions between the defined sets to account for inherent uncertainties (Grande et al., 2010). They are more suitable to account for imprecise parameters. An FL model consists of three main parts: (i) fuzzification, (ii) inference engine (fuzzy rule base), and (iii) defuzzification, as described by Nadiri (2015) among others. In fuzzification step, the seven crisp inputs are transformed into fuzzy sets for constructing the inference engine. The inference engine consists of rules which connect multiple inputs to a single output using: (i) the AND operator to support minimum; (ii) the prod (product) operator to support multiply; (iii) the OR operator to support maximum; (iv) and the NOT rule to support without. These rules assign an entire fuzzy set to the output through the implication process via aggregation processes to make a decision. The process of transforming the aggregation result into a crisp output is termed defuzzification, e.g. centroid, bisector, middle of maximum (the average of the maximum value of the output set), largest of maximum, and smallest of maximum. 2.3. Rule definitions through fuzzy clustering Construction of fuzzy models includes an identification of model structures within given datasets through clustering techniques, which emerged in the 1980s, e.g. (Bezdek et al., 1984; Chiu, 1994). These techniques identify optimum numbers of rules and clusters of data, e.g. Subtractive Clustering (SC) (Chiu, 1994; Chen and Wang, 1999; Li et al., 2001) and Fuzzy C-Means (FCM) clustering, e.g. (Bezdek et al., 1984). SC techniques seek a value of cluster radius, which controls the number of clusters and fuzzy rules (Chen and Wang, 1999). The selection of cluster radius is a problem of ‘diminishing returns,’ according to which (i) smaller cluster radii lead to greater number of clusters and hence to numerous clusters but the excessive rule numbers undermine the workability of the model performances; (ii) large cluster radii lead to few clusters and hence in few but coarse rules, so that the adequacy of the model is undermined. Optimum cluster radii are identified by changing the cluster radius from 0 to 1 until a performance criterion reaches its optimum minimum or maximum value. The FCM technique classifies datasets by populating multidimensional spaces into a specific number of clusters through: (i) starting from an initial guess and marking cluster centers in terms of the mean location of each cluster; (ii) assigning each cluster a membership grade at every data point, iteratively moving to the cluster centers and correcting their location; (iii) minimizing an objective function at each iteration based on the distance from any given data point to a cluster center weighted by the membership grade of that data point; and (iv) identifying a list of cluster centers and several membership grades for each data point (Nadiri et al., 2014). In the FCM algorithm, the number of clusters is defined by the user. Choosing the optimum number of clusters is accomplished by measuring the performance of the model during

79

systematically changing the number of the clusters from 1 to the number of the model data points (Tayfur et al., 2014, among others). 2.4. Fuzzy models In the Sugeno FL (SFL) method, the output membership functions are constant or linear called as zero or first order Sugeno Fuzzy model, respectively (Sugeno, 1985), but in the Mamdani FL (MFL) method, the output membership functions are fuzzy sets. Therefore, the output variables need defuzzification (Mamdani, 1976). The SC method is used by the SFL model construction. It is shown that data clustering is efficient and effective in determining the number of membership functions and rules (Nadiri et al., 2014; Tayfur et al., 2014). For groundwater vulnerability predictions by the first order SFL model, a fuzzy if-then rule i is expressed in this study as:       i i i Rule i : If D belongs to MFD and R belongs to MFR and A belongs to MFA       i i i and S belongs to MFS and T belongs to MFT and I belongs to MFI and   i C belongs to MFC ; then ðCVIi ¼ mi D þ ni R þ pi A þ qi S þ di T þ ki I þ f i C þ c1i Þ

ð2Þ where mi, ni,pi, qi, di, ki, fi, and ci are the coefficients. The final output (Outj) is the weighted average of all outputs (aggregation) as follows: ∑ wij Out ij Out j ¼

i

∑ wij

ð3Þ

i

where wij is the firing strength of rule i and Outj, which is estimated via the “and” (minimize) operator. In contrast to SFL, the FCM clustering method is the most appropriate clustering method for MFL and LFL models (Newton et al., 1992; Lee, 2004). For groundwater vulnerability predictions by MFL, a fuzzy if-then rule i is expressed in this study as:       i i i Rule i : If D belongs to MFD and R belongs to MFR and A belongs to MFA       i i i and S belongs to MFS and T belongs to MFT and I belongs to MFI and     i i C belongs to MFC ; then CVI belongs to MFCVI

ð4Þ where CVI are the output, MFiCVI are the corresponding membership function of the output of rule i, MFiD is the membership function of the ith cluster of input D, MFiR is the membership function of the ith cluster of input R, and so forth. The operator among the input membership function is the “and” (minimize) operator and the outputs from the rules are aggregated via the “or” (maximize) operator. The most popular defuzzification method, centroid calculation, was applied to produce the crisp output. The LFL method (Larsen, 1980) is similar to the MFL method with the main difference of using the product operator for the fuzzy implication, which scales the output fuzzy set. In contrast to the basic DRASTIC framework requiring the rating of the D, A, I parameters for unconfined and confined aquifer, the three FL models are often treated as gray box models processing groundwater vulnerability indices by optimization and the FL concept without assigning prescriptive rates and weights to each parameters. Thus, the DRASTIC data layers are used to derive the rates and weights together for both aquifers. 2.5. Supervised Committee Fuzzy Logic (SCFL) model This research builds on using ANN as a nonlinear combiner for the construction of Supervised Committee Fuzzy Logic (SCFL), which reaps on the synergy between similarly performing models. The

80

A.A. Nadiri et al. / Science of the Total Environment 593–594 (2017) 75–90

d is the output of each FL model which has been used as ith where CVI input, f1 and f2 are activation functions for the hidden layer and output layer, respectively, Oj is the jth output of nodes in hidden layer, Wji and Wkj are weights that control the strength of connections between two layers, and the biases bj and bk are used to adjust the mean value for hid∧

den layer and output layer, respectively. CVISCFL is final output of ANN model. The transfer functions of the SCFL model in hidden layer nodes and the output layer node are hyperbolic tangent sigmoid (Tansig) and linear (Purelin), respectively. In the ANN training phase, the Levenberg–Marquardt (LM algorithm) was adopted as a learning algorithm to estimate the weights and biases (Nourani et al., 2008; Asadi et al., 2014; Chitsazan et al., 2015). 3. Study area: Varzeqan plain with multiple aquifers Fig. 2. A schematic of SCFL model structure.

3.1. Location, physical and other technical details

mathematical proof for reaping on the synergy is given in Appendix I and below the mathematical basis is given on how to build the SCFL models in this study. In this study, the non-linear combiner method is called SCFL (Fig. 2), which employs ANN as a supervising or overarching combiner of FL models. The supervision operates over three FL models (see Fig. 2) and includes two major steps: Step 1 estimates CVI using the SFL, MFL and LFL models; Step 2 constructs a supervised ANN as a nonlinear and supervised combiner. The mathematical expression of the SCFL model is: ∧

CVI ¼ FLi ðD; R; A; S; T; I; C Þ

ð5Þ

  ∧ Oi ¼ f i b j þ ∑i W ji CVI

ð6Þ

∧   CVI SCFL ¼ f 2 bk þ ∑i W kj O j

ð7Þ

3.1.1. Location The Varzeqan sub-basin, 1000 km2 in area, is located approx. 70 km northwest of Tabriz, East Azerbaijan, Iran (Fig. 3), of which the Varzeqan alluvial plain is 500 km2. The prevailing climate in the study area is cold and semi-arid (as suggested by de Martonne, 1925 and Emberger, 1930). The average annual precipitation and mean annual temperature are, 358 mm and 13 °C respectively (Varzeqan hydro-climate station 1999–2011). Ahar Chay (River) roughly bisects the basin along its west to east axis and provides a natural recharge to the plain. The highest elevation of the Varzeqan basin is 2815 m Above Mean Sea Level (AMSL) and the lowest elevation is 1363 m AMSE. 3.1.2. Hydrology Varzeqan plain, west of the city of Ahar, is drained by Ahar Chay, which drains a significant number of tributaries until it falls to Qarasu (blackwater) approximately 50 km east of Ahar. Qarasu is a significant tributary of the River Araz, which flows to the Caspian Sea in the Republic of Azerbaijan. The aquifer under Varzeqan plain is

Fig. 3. Location map of study area.

A.A. Nadiri et al. / Science of the Total Environment 593–594 (2017) 75–90

81

Fig. 4. Geological map and Location piezometers (modified from Mehrpartou et al., 1992).

recharged (i) by seasonal rivers flowing from both the northern mountainous ranges known as Qara Dag and southern ranges known as Ala Dag; and (ii) through Pliocene conglomerate covering the plain as the bedrock of the aquifers. Ahar Chay flows from east to

west and is impounded by the Sattarkhan hydroelectric dam just outside Varzeqan plain at the east. There is also a tailing dam to the north of Varzeqan plain impounding the wastewater from the Sungun Copper Mines. The groundwater flow direction is thought

82

A.A. Nadiri et al. / Science of the Total Environment 593–594 (2017) 75–90

to coincide with the surface water flow direction from west to east and towards the Caspian Sea. 3.1.3. Geology Based on the work of (Mehrpartou et al., 1992), both the confined and unconfined aquifers in the study area are composed of quaternary formations of both High Level Pediment Fans and Low Level Piedmont Fans closer to the Ahar Chay and its tributaries. Quaternary alluvial cover of the Pliocene Formations is as a basement overlain by conglomerate at their both northern and southern boundary. The composition of these Quaternary alluvia by-and-large consists of alluvial plains and valley terrace deposits composed of silt, sand and clay. The sheet of older alluvium is mostly covered by clay and fine sediments, and this may provide a partial explanation for the existence of confined aquifer; whereas most of alluvium covered by coarse grains corresponds to unconfined aquifers and being younger they surround the confined aquifer. The central Quaternary and Pliocene Formations are flanked by older Oligocene Formations (granite to diorite) at north and Oligocene Formations (rhyolitic to rhyodacitic volcanobreccia at the south. The oldest rock in the region is the Kaleybar Metamorphic Rock Formation, which is older than Jurassic period (Fig. 4). The northwest of Varzeqan consists of an extensive volcanic rocks composed of andesite and trachyandesite. At the Upper Cretaceous period, tectonic activities and submarine volcanoes have encouraged considerable morphological activities at different parts of the study area and have resulted in deposits of sedimentary layers intermittent with volcanic layers. The remaining areas are largely Eocene deposits composed of volcanic units and volcanogenic rocks with partial layers of carbonate sandstone. Geological surveys indicate that the development of joint and fissures system in igneous rocks and limestone is important for the recharge of groundwater resources in Varzeqan plain. Most faults in the study area are along the southeast-northwest axes. 3.1.4. Hydrogeology Alluvial aquifers of the study area are the result of erosion of the surrounding formations. In the flatter parts of the region, alluvial layers are not thick. The stratigraphic sections of the aquifer, obtained from 27 exploratory drilled wells, give an indication of the size of alluvial deposits to be coarse (e.g. sand and gravel) in the northern part. The results of exploratory wells and piezometer logs are indicative of two types of aquifers: (i) the main unconfined aquifer, and (ii) two confined aquifers to be broadly surrounded by the unconfined aquifer (Fig. 4). This makes the study area to have multiple aquifers. Based on the size of deposits, the Varzeqan multiple aquifer sediment sizes can be categorized into three groups: (i) the eastern part of the aquifer is composed of coarse-grained sedimentary with high hydraulic conductivity (approx. 90 m/day); and (ii) the central part of the aquifer is composed of sand, gravel, and silt with moderate hydraulic conductivity (approx. 25 m/day); and (iii) this gradually changes to silt, clay and slightly sand in the west leading to lower hydraulic conductivity (approx. 8 m/day). The springs in the northern parts of the study area mostly leak from non-alluvial formations and these are encouraged by many local faults. The unconfined aquifer area is over 150 km2 that have mostly been formed from the lake sediments, such as clay, silt, sand and gravel. The material of the unconfined aquifer in the center is fine grains and becomes coarse in the east of the plain. The two confined aquifers, estimated to be approx. 57 km2, have given rise to artesian wells (Fig. 3). Formations with significant permeability consist of alluvial sediments, fluvial deposits, debris sediments, and elevated terrace of Quaternary age. Notably, not all of the 500 km2 area of the basin is regarded as aquifer (unconfined or confined). The 21 piezometers in both aquifers confirm that the overall indicated groundwater flow direction is from west to the east. The groundwater resources in the study area comprise 252 withdrawal wells, 150

springs and 20 qanats. The springs in the northern parts of the study area mostly leak from non-alluvial formations due to substantial local faults. 3.1.5. Land use Varzeqan plain remains a rural hinterland in East Azerbaijan but its rural settlements (the basin-wide population of just less than 50,000), are yet to receive any significant developmental attention. The only large complex in the region is the Sungun Copper Mine and there are few other lesser mines but a predominant preoccupation in the basin is agriculture and animal husbandry. Recent practices on agriculture, animal husbandry and mining are yet to enjoy institutional support to ensure sustainable developments and the sustainability of the natural environment. 3.1.6. Pollution baseline No measured information on pollution is available to the authors, other than the field measurement commissioned in October 2015 serving as the baseline for this research. It comprises 53 samples collected from springs, qanats, and water wells distributed over the study area. Groundwater samples were analyzed in the hydrogeological laboratory at the University of Tabriz for parameters including pH, Electric Conduc− tivity (EC); major elements such as bicarbonate (HCO− 3 ), chloride (Cl +2 +2 + ), sulfate (So−2 ), calcium (Ca ), magnesium (Mg ), sodium (Na ), 4 potassium (K+); minor elements such as nitrate (NO− 3 ), Phosphate − (Po− 4 ) Fluoride (F ); and trace elements such as Arsenic (As), Copper (Cu) by standard methods (APHA, 1998). The accuracy of the analysis was checked by the charge balances error method for the water samples based on (Hounslow, 1995). The charge balance values for all the samples were less than 5%. Therefore, the analysis results were reliable. An initial desktop study of the results shows that the measured samples of nitrate-N, fluoride and arsenic levels are higher than those allowed by WHO (World Health Organization) at some part of the study area. In this paper, the focus is on nitrate within groundwater aquifers but other excessive pollutants will be discussed in separate papers. 3.2. Field data and their preparation 3.2.1. Depth (unconfined aquifers) The depth to water table ranges from 1.6 to 15.75 m in the study area and these were obtained from observation wells. 3.2.2. Depth (confined aquifers) The depth to impermeable layer ranges from 2 m to 30 m in the study area. The Inverse Distance Weighting (IDW) interpolation technique is used for distributing groundwater depth throughout all the grid cells (Fig. 5a). 3.2.3. Recharge Net recharge estimated for the two unconfined and confined aquifers uses the water-table fluctuation method (see Aller et al., 1987), but an allowance is made for the amount of pumped water volume as per Scanlon et al. (2002), see Fig. 5b. 3.2.4. Aquifer media (unconfined aquifer) An aquifer media map was prepared using available geological information. The main components of the aquifer media are gravel, sand, silt and clay. Aquifer media ratings are assigned as per standard DRASTIC for this research, which vary between 1 for clay to 10 for gravel (Fig. 5c). 3.2.5. Aquifer media (confined aquifer) As above but the infiltration depend on the degree of variation for the leakage through the aquicludes. Notably, beneath the confining layer, the composition of the confined aquifer comprises a

A.A. Nadiri et al. / Science of the Total Environment 593–594 (2017) 75–90

83

Fig. 5. The DRASTIC parameters (a, b, c, d, e, f, g), respectively, depth to water, net recharge, aquifer media, soil media, topography, impact of the vadose zone and hydraulic conductivity.

combination of clay, silt, sand and some gravel corresponding to assigned rates of 3 to 8.

3.2.6. Soil In the study area, the soil parameter is classified into four classes and a rating value is assigned to each class between 1 and 9 in the range of 1 to 10 (Fig. 5d).

3.2.7. Topography The topography layer was obtained by using the ASTER DEM data with 28 m spatial resolution. The slope in the study area varies from 0 to 17%. Slope values are rated based on the standardized 1–10 scale, with 1 and 10 being the lowest and highest slope, respectively (Fig. 5e).

3.2.8. Impacts of unsaturated zone (unconfined aquifer) This parameter for the unconfined aquifer was processed using the 15 geological data logs and the assigned rating values were from 4 to 9. The highest value was assigned to the gravelly and sandy zone due to its high permeability and higher sensitivity to groundwater contamination. The Impact layer for the vadose zone map is presented in (Fig. 5f).

3.2.9. Impacts of unsaturated zone (confined aquifer) The confining aquifer layer is largely composed of silt and clay which is impermeable or has usually low permeability. Therefore, based on the recommendations by Aller et al. (1987), a rate of 1 was assigned to the whole layer without considering the thickness of the confining layer. Notably, the confining layer in the study area is composed of silty clay. 3.2.10. Conductivity The conductivity parameter was estimated by using the geological logs of 27 exploratory wells, which showed it to vary between 0.86 m/d and 84 m/d. The rates were assigned as per recommended values; and the output is shown in Fig. 5g. The final maps comprise two sets of overlapping maps, each with seven-layers, and each with its standard ratings and weights. Mathematical operations of multiplication and summation of each is carried out using a commercial GIS tool. 3.3. Preliminary data processing The potential vulnerability index values are calculated for each and all of the cells using the framework given by Aller et al.

84

A.A. Nadiri et al. / Science of the Total Environment 593–594 (2017) 75–90

(1987) or alternatively by the models as presented through training and testing phases. The data are prepared for model fitting by randomly dividing the dataset into training and testing datasets in the proportion of 80% and 20%, respectively. The vulnerability indices calculated for each grid cell during the training and testing phases were conditioned by the NO3 – N data and those for the testing phase were used to assess the performances of the models. In this research, the following performance measures are used to evaluate the performance of the DRASTIC framework: (i) Root Mean Square Errors (RMSE), and (ii) the determination coefficient (R2). These are used to condition the modeled vulnerability indices according to the distributed nitrate-N data layer using the values at all the grid cells; as well as to measure correlation between the conditioned modeled vulnerability indices and their corresponding distributed nitrate-N values throughout all the grid cells used for tuning or testing. Best model performances would render the lowest RMSE values but R2 values closer to 1. This study is presented in terms of NO3 – N and Fig. 6 depicts its spatial distribution in the study area indicative of a number major hotspots. The distribution is based on interpolation using the Inverse Distance Weighting. The Conditioning of Vulnerability Indices (CVI) is calculated by Eq. (8):

CVI ¼

VImax  ðNO3 –NÞi ðNO3 –NÞmax

ð8Þ

between conditioned vulnerability indices and the observed nitrate-N datapoints 4. Results 4.1. Overview of the methodology The purpose of applying SFL, MFL and LFL and thereby the SCFL techniques is: (i) to identify the ratings and weightings of the parameters of the DRASTIC framework for more accurate results than that by the basic DRASTIC processed through assigning their values; and (ii) to test the FL-based models involving combined unconfined and confined aquifers. Notably, the basic DRASTIC framework is processed separately for multiple aquifers (Fig. 4-Boxes 2 and 3) to serve as the baseline for comparison. Multiple aquifers are complex systems, which require continuous spatial data but this is not quite possible in the Varzeqan plain aquifer without expert judgment. FL models remove the need for expert judgment by identifying and using the inherent information in the input and output data. The minimum and maximum possible index values for the unconfined aquifers at each grid cell are 23 and 230 and these for the confined aquifers are 23 and 180. To the best knowledge of the authors, there is no recommended guidance on the division of the ranges for low, medium and high vulnerability indices but in this paper, the authors divide the ranges into three equal intervals. 4.2. Basic DRASTIC framework (unconfined and confined aquifer)

where VImax is the maximum vulnerability calculated from the DRASTIC framework, (NO3-N)i is nitrate concentration and (NO3 – N)max is the maximum nitrate concentration. The measured nitrate-N values comprise 33 datapoints, which are referred to as the nitrate-N point data. These are regenerated to derive intrapolated values at each grid cell to form a data layer, and are referred to as the nitrate-N data layer. The first use of the nitrate-N data layer is to serve as the basis to condition the vulnerability indices (to produce CVI-values) by (i) the basic DRASTIC framework: and (ii) the three FL-based modeling results (SFL, MFL and LFL) and the SCFL results. The second use of the observed nitrate-N data is to derive determination coefficient (R2 -values)

The methodology for applying the basic DRASTIC framework using Aller et al. (1987) is given in Fig. 1: Boxes 2–3, which shows the procedure for estimating vulnerability indices for Varzeqan basin comprising unconfined and confined aquifers. Specific information used for processing the calculations is outlined in Section 3 and the index values for each grid cell are calculated as Eq. (1) and conditioned as Eq. (8) using the nitrate-N concentration data. Index values for this integrated model range from 92 to 164 in the unconfined aquifers and from 48 to 93 in the confined aquifers. Evidently, there is a discontinuity in the vulnerability values across the unconfined/confined aquifer zones. Using the categorization for vulnerability

Fig. 6. Distribution of NO3-N concentration of study area and hotspots.

A.A. Nadiri et al. / Science of the Total Environment 593–594 (2017) 75–90

85

Fig. 7. The vulnerability maps using different methods: (a) DRASTIC framework, (b) SFL, (c) MFL, (d) LFL and (e) SCFL. Note 1: UA stands for Unconfined Aquifer; CA stands for Confined Aquifer. Note 2: Independent palette colors are used for confined and unconfined aquifers. Note 3: Low vulnerability: 23–91, Moderate vulnerability: 91–158, High vulnerability: 158–226.

86

A.A. Nadiri et al. / Science of the Total Environment 593–594 (2017) 75–90

by Aller et al. (1987), the vulnerability of Varzeqan plain was assessed as follows: (i) for unconfined aquifers low, medium and high vulnerability zones cover 22%, 47%, 31%, of these areas, respectively; (ii) for the confined aquifers, low and medium vulnerability zones cover 78%, 22% to these areas, respectively. The most vulnerable areas are in the east and center, where the aquifer is unconfined and composed of sand and gravel indicating high hydraulic conductivity. Existence of a confining clay layer in the western part of the plains provides protection and reduces risk of contamination. The basic DRASTIC framework is studied against the NO3-N concentration data to estimate the correlation between the vulnerability indices from basic DRASTIC framework (Fig. 7a) and (NO3-N) levels (Fig. 4). The results shown in Table 3 reveal that their determination coefficient is not high and as follows: (i) it is 0.5 for the unconfined aquifer zones; and (ii) it is as low as 0.3 for the confined aquifer zones. This makes case for searching improvements and hence the application of FL-based models, as reported below.

RMSE value of 14.7 for the testing phase. The determination coefficient between the conditioned MFL vulnerability indices and the nitrate-N data layer values is 0.83 for the training phase and 0.61 for the testing phase. This is a considerable improvement over that of the basic DRASTIC framework. The MFL mapping of vulnerability is given in Fig. 6c. 4.3.3. LFL The implementation of this model is the same as the MFL model except for the fuzzy implication, where LFL uses the product operator instead of the “and” operator for the fuzzy implication. The cluster radius and rule numbers for LFL are identical to MFL. Table 3 presents the values of the performance measures for the LFL model, in which the identified model achieves the lowest RMSE value of 14.6 for the testing phase. The determination coefficient between the conditioned LFL vulnerability indices and the nitrate-N data layer values is 0.82 for the training phase and 0.63 for the testing phase. This is also a considerable improvement over that of the basic DRASTIC framework. The LFL mapping of vulnerability is given in Fig. 6d.

4.3. Fuzzy logic (FL) 4.4. Supervised Committee Fuzzy Logic (SCFL) model The results of identifying groundwater vulnerability indices are now reported using three different FL models (SFL, MFL and LFL). As depicted in Fig. 4: Box 4, these models comprise: (i) the input data of D, R, A, S, T, I, and C; (ii) output data specified in terms of nitrate-N values using Eq. (8); and (iii) the three different formulations as necessary for each of SFL, MLF and LFL. 4.3.1. SFL The SFL modeling used: (i) the trapezoidal membership function for input parameters; (ii) the linear Eq. 2 for the output membership function; and (iii) the cluster radius was identified by minimizing RMSE. The number of the input and output dataset clusters in SFL was identified to be 7 using the SC method. The optimum clustering radius is searched by performing the clustering process several times and gradually increasing the clustering radius from 0 to 1 (with 0.05 intervals). This led to selecting 0.3 as the cluster radius and the number of corresponding optimum rules was the same as the cluster number (e.g. 7). Table 3 gives the values of performance measures for SFL, in which the identified model achieves the lowest RMSE value of 8.4 for the testing dataset. The determination coefficient between the conditioned SFL vulnerability indices and the nitrate-N data layer values is 0.95 for the training phase and 0.9 for the testing phase. This is a considerable improvement over that of the basic DRASTIC framework. The SFL mapping of vulnerability is given in Fig. 6b. 4.3.2. MLF The number of input and output dataset clusters in the MFL model was identified to be 30 by minimizing RMSE values and the optimum solution was found to comprise 50 rules in the form of linking input to output by using “and” operator. The output of the rules is aggregated via the “or” (maximize) operator. The most popular defuzzification method, centroid calculation, was applied to produce the crisp output. The values of the performance measures for the MFL model are presented in Table 3, in which the identified model achieves the lowest

The SCFL model (Eqs., (5), (6), and (7)), as schematized in Fig. 1 (Box 7), was constructed to determine the overall prediction of groundwater vulnerability, by integrating the results of predicted data from SFL, MFL and LFL. An ANN modeling is used to integrate these models. The MLPLM structure is employed by the ANN model, which has three neurons for the input layer and one neuron for the output layer. The number of neurons for the hidden layer is 6. The transfer function for the hidden layer in all parts is TANSIG and for the output layer is PURELIN. The LM algorithm was used to optimize weights and biases in the ANN model, which required 100 epochs. The values of the performance measures for the SCFL model are presented in Table 3, in which the identified model achieves the lowest RMSE value of 3.3 for the testing phase. The determination coefficient between the conditioned SCFL vulnerability indices and the nitrate-N data layer values is 0.99 for the training phase and 0.98 for the testing phase. This is a considerable improvement over that of the basic DRASTIC framework and an overall best performance among all the results. The SFL mapping of vulnerability is given in Fig. 6c and their visual comparison with Fig. 6a–d confirms that the reproduction of these hotspots by the five sets of results vary in degree, in which SCFL reveals most and fails least. 4.5. Overview of model performance The four modeling results (SCFL and the three FL models: SFL, MFL and LFL) are presented in Table 3. The three FL models produce acceptable results in terms of performance measures, which may be ranked as: SFL, MFL and LFL. In contrast to SFL, the LFL model in the vicinity of the boundary of confined/unconfined aquifers performs better and hence this underlines the fact that the SCFL is able to make use of inherent synergies. Although performance ranking in this way is a standard practice but it ignores synergies in the respective results; whereas, SCFL provides an alternative approach to reap on the synergy of the constituent

Table 3 Performance measures of the models (basic DRASTIC, SFL, MFL, LFL, SCFL). Models Phase

Criteria

Training: 80% of the dataset Testing: 20% of the dataset Key

RMSE R2 RMSE R2

SCFL

SFL

MFL

LFL

Basic DRASTIC

0.83 0.99 3.3 0.97 Very good

4.7 0.95 8.4 0.9 Good

6.2 0.83 14.7 0.61

6.4 0.82 14.6 0.63

R2 for UA:0.5; R2 for CA:0.3

Good enough

Poor

A.A. Nadiri et al. / Science of the Total Environment 593–594 (2017) 75–90

models as confirmed by the results. The determination coefficient values are discussed in the previous section, and provide supporting evidence for the strength of multi-model approaches. A closer view of the modeling results in Fig. 7a–e shows that most of the northeast and central unconfined aquifers are classified as having moderate and high pollution potentials, where the Aquifer media in these areas are mainly composed of coarse alluvial deposits such as sand and gravel. Since most real aquifer systems are heterogeneous and complex, the presented SCFL model has a potential to estimate other hydrochemical or hydrogeological parameters. Vulnerability indices of unconfined aquifers are significantly higher than those of the confined aquifer and this is attributed to the protection by the impermeable layer. 5. Discussion Fig. 7a–e show overlay of the 33 nitrate-N data points as graduated circle symbols over the modeled vulnerability indices. These are used in the following quantitative model evaluation exercise. At the 33 observation wells, both measured nitrate-N concentration values and their corresponding vulnerability index values were processed as follows: (i) group the concentrations into three categories: low, moderate and high; (ii) group the modeled vulnerability values into low, moderate and high bands; (iii) assign ‘3’ to a given model performance at an observation well, if the difference in the categories of NO3-N concentration and vulnerability values is 0 but assign scores of 2 or 1 when the differences are 1 or 2, respectively; and (iv) add the scores for each model and calculate their Correlation Index (CI). Consider the following example for obtaining the CI for the prediction of the SCFL model. The analysis of the results show that there are 19 wells for the same vulnerability and nitrate-N concentration categories, 13 wells with a difference of 1 in the categories, and 1 well with a difference of 2 in the categories. The multipliers for these three groups of wells are 3, 2, and 1, respectively. Hence, the calculated CI is: 19 × 3 + 13 × 2 + 1 × 1 = 84. Higher CI means higher correlation. The coincidence of wells and vulnerability categories predicted by DRASTIC, SFL, MFL, LFL and SCFL models are presented in Table 4. The results in both Tables 3 and 4 are evidence for proof-of-concept on the application of multi-model supervised committee models for conditioning the estimated DRASTIC vulnerability index of multi-aquifer systems. The basic DRASTIC framework has been criticized for using a procedure based on constant weighting and rating without considering local conditions and inherent uncertainties/ imprecision in the parameters. Table 4 Correlation Index (CI) between Vulnerability Indices (DRASTIC, SFL, MFL, LFL and SCFL) and contamination levels at the Observation Wells. Model

Category

1a.DRASTIC(UA)

Low Moderate High Low Moderate High Low Moderate High Low Moderate High Low Moderate High Low Moderate High

1b.DRASTIC(CA)

2. SFL

3. MFL

4. LFL

5. SCFL

Nitrate-N concentration

CI

Low

Moderate

High

3 2 1 2 3 – 5 5 0 6 4 0 6 4 0 8 2 0

3 4 4 2 2 – 5 10 2 6 8 3 6 8 3 5 10 2

1 3 2 – – – 3 1 2 3 2 1 2 3 1 1 4 1

53

13

80

78

79

84

87

However, the FL-based models are data-driven approaches, which extract quantitative information by a collective consideration of all the local data. The scatter diagram displaying modeled values of modeled CVI values against measured nitrate-N values provide a visual insight into the performance of the five sets of results and these are given in Fig. 8. These results confirm that the performances are ranked in the order of basic DRASTIC framework, LFL, MFL, SFL and SCFL. If nitrate-N values are taken as measured values (i) the modeled indices by the basic DRASTIC framework are overestimates for the unconfined aquifers but a mixed fortune (overestimation and underestimation) is observed for the confined aquifers and both tend to strongly scatter; (ii) SFL tends to both overestimate and underestimate, whereas underestimation is predominant for both MFL and LFL, as well as the three models tend to scatter moderately; and (iii) SCFL results provide visual evidence for its ability to reap on the synergy of the three FL models by reducing the scatter in the results. The contributions of the paper are of two types. (i) The state-of-art research findings: the reported results on high performances of FLbased models provide further evidence to the inherent ability of the FL models in coping with uncertain/imprecise parameters and this reinforces similar investigations by Şener and Şener (2015), Dixon (2005), and Mohammadi et al. (2009); as well as further evidence on the capability of different FL-based models to estimate CVI in multiple unconfined/confined aquifers and this parallels similar investigations by previously published works, e.g. Rezaei et al. (2013), Fijani et al. (2013), Nadiri et al. (2017a, 2017b). (ii) The innovative development: the paper introduces the use of the supervised committee multiple models to complex unconfined/confined aquifers and the results show that SCFL reaps synergy of the three FL-based models and provides a significant improvement on the performance of the individual models. Improving on the performance measures of a set of modeling results should not encourage the view that the subsequent vulnerability mapping is correct. For instance, the SCFL model renders the value of 0.99 in training phase and 0.97 in test phase for the determination coefficient but it completely misses observed hotspots at the northeast corner of the study area, as shown in Fig. 6. The modeler has to be continually mindful of the results and underline uncertainties inherent in the data. Various examples are discussed below. The requirement for being mindful over modeling procedures and results reflects practical experience over the years but the topic is yet to become a subject of knowledge extraction. One obvious example on modeling procedure is uncertainty in the data on the delineation between the unconfined/confined aquifers. This is normally possible if there exist an extensive number of logged wells but this is not the case in this study. Nonetheless, the available logs were all used for a more meaningful delineation but geological maps were used in their absence. The boundaries between unconfined/confined aquifers mark the interfaces for recharge and thereby contaminants to feed to the confined aquifers. Stratigraphic cross-sections through the aquifers should be valuable to gain a further insight into the problems but were not available for this study. Such a work is now underway, which includes a study of hydrochemical investigation of the aquifers to determine the trace and minor elements anomalies and possibly shed light to the delineation of interfaces. Vulnerability assessment has become a professional practice for a proactive management of groundwater resources in many countries. For instance, in England and Wales, the Environment Agency (EA) is responsible for the management of aquifers, which has carried out groundwater vulnerability mapping for the whole country through a procedure similar to DRASTC. EA defines vulnerability as the likelihood of a pollutant discharged at ground level (i.e. above the soil zone) reaching groundwater for superficial and bedrock aquifers (EA, 2014). Vulnerability is worked out through a similar procedure but using different data layers (basewater flow index, soil leaching class, Drift properties, Unsaturated zone), where ratings are called scores and likewise these are further weighted for the assessment of vulnerability and

88

A.A. Nadiri et al. / Science of the Total Environment 593–594 (2017) 75–90

Fig. 8. Scatter Diagrams: Vulnerability index values versus directly measured NO3-N concentration at Observation Wells: (a) DRASTIC - unconfined, (b) DRASTIC - confined, (c) SFL model, (d) MFL model, (e) LFL model, (f) SCFL model.

thereby aquifer designations. These are then used as ‘development control constraints’ where no proposed development will be approved at high vulnerable aquifer zones. Similar practices may be found in other countries. The focus of this paper is on the science base of aquifer vulnerability and not practice but evidently the gained insight should be valuable for the future of aquifer management at Varzeqan basin. Attention may also be drawn to encouraging the use of the term framework in association with aquifer vulnerability assessments rather than using the term “method” as a qualifier. This is justified by the authors elsewhere, see Nadiri et al. (2017a, 2017b). The authors are active on investigating the suitability of applying AI techniques to improve the defensibility of vulnerability assessments and in particular this paper presents further evidence on improved performance of multi-models under the complex mixed multiple aquifers. As a result of this paper, Technical Readiness Level (TRL) of the supervised committee machines using multiple models should be more than that of the proof-of-concept but the authors have further investigations underway before demonstrating its status as a working tool. 6. Conclusions In arid and semi-arid areas, aquifer water management plays an appropriate role within human health of river basins and therefore their protection from anthropogenic contamination sources can be managed by proactive tools based on aquifer vulnerability indices. This study uses the DRASTIC framework to assess groundwater vulnerability for a plain with multiple aquifers (Varzeqan basin, East Azerbaijan, Iran). It uses Geographical Information Systems (GIS) to map the vulnerability of a study area by processing seven data layers related to hydrosphere and lithosphere parameters of: Depth to water table (D), net Recharge (R), Aquifer media (A), Soil media (S), Topography or slope (T), Impact of the vadose zone (I), and hydraulic Conductivity (C). The results of the basic DRSTIC framework show that: (i) determination coefficient between the values of the indices and nitrate-N values for the whole study area is 0.5 for the unconfined aquifers and 0.3 for the confined aquifers; and (ii) the values of indices for the whole study area range between 92 and 164 for unconfined aquifers and between 48 and 93 for the confined aquifers. These are zoned as per

recommendations to provide the vulnerability map of the study area using the basic DRASTIC framework. The poor determination coefficient calculated by the basic DRASTIC framework makes a research case for the application of Artificial Intelligence (AI) techniques. The performance of four AI models is investigated comprising: Supervised Committee Fuzzy Logic (SCFL), and three Fuzzy Logic FL models of: Sugeno Fuzzy Logic (SFL), Mamdani Fuzzy Logic (MFL) and Larsen Fuzzy Logic (LFL). The results show that SCFL performs consistently better than the individual FL models in terms of considerably improving on RMSE, as evidence for its ability to combine the synergies between the SFL, MFL and LFL models. The groundwater vulnerability map obtained from the SCFL model was conditioned using nitrates concentration values, for which their determination coefficient for the testing data is as high as 0.97. Since most real aquifer systems are heterogeneous and complex, the considerably improved performance of the multiple model philosophy provides strong evidence for the SCFL model being able to derive information from heterogeneous hydrochemical and hydrogeological imprecise parameters. The study reports the research investigation for a complex unconfined/confine aquifer system and shows that defensible results are obtainable using local data subject to uncertainty and imprecision. The authors' approach to extracting the synergy of multiple models is an ongoing research program with the aim towards producing working tools tested for a wide range of practical problems. Working tools would correspond to Technological Readiness Level 9 (akin to the NASA classification, see: https://www.nasa.gov/sites/default/files/trl. png), whereas the status of this program now should be TRL5 or TRL6. Achieving the status of TRL9 requires testing the performance of the SCFL philosophy against different conditioning strategies, as well as conditioning towards multiple contaminants. Some of these works are already underway. Acknowledgement The authors acknowledge gratefully the advice by Prof. Asghari Moghaddam and Mr. Oroji as director and expert of hydrogeological laboratory at the University of Tabriz for helping us in hydrochemical analysis.

A.A. Nadiri et al. / Science of the Total Environment 593–594 (2017) 75–90

Appendix I. Mathematical context to supervised committee models Committee Fuzzy Logic (CFL) was introduced recently using simple averaging and the weighted averaging for different fuzzy logic modeling results with similar performances and efficiency. These involve combining the results of similar modeling results to reap synergies within the constituent FL models under study to produce the final output (Kadkhodaie-Ilkhchi et al., 2009; Labani et al., 2010; Tayfur et al., 2014; Nadiri et al., 2015). Fundamentals of CFL models are described by (Chen and Lin, 2006; Kadkhodaie-Ilkhchi et al., 2009). The mathematical treatment below shows that committee machines indeed improve on the performance of individual models. The assumption is that there are N-trained FL output vectors of, say, SFL, MFL and LFL as used in this study to predict modeled vector CVI. The prediction error could be written as: ∧

en ¼ CVIn −CVI

ð1Þ

n ¼ 1; :::; N

ð2Þ ∧

The expectation of the squared error for the nth FL model CVIn is " 2 # ∧   En ¼ ξ CVIn −CVI ¼ ξ e2n

ð3Þ

in which ξ[∘] is the expectation. The average error for each of the FL model acting alone is:   N N Eavg ¼ 1=N∑n¼1 En ¼ ð1=kÞ∑n¼1 ξ e2n

ð4Þ ∧

Applying the averaging method, output vector CVI CFL of the CFL is ∧

N



CVI CFL ¼ 1=N∑n¼1 CVI n

ð5Þ

Therefore, the CFL has the prediction-squared error: " " 2 # 2 # ∧ ∧ N ECFL ¼ ξ CVICFL −CVI ¼ ξ 1=N∑n¼1 CVIn −CVI  2  N ¼ ξ 1=N∑n¼1 e2n

ð6Þ

Considering Cauchy's inequality: 2

ða1 b1þ a2 b2 þ ::: þ ai bi Þ ≤ a21 þ a22 þ ::: þ a2i 2

2

2

 b1 þ b2 þ ::: þ bi

ð7Þ

and applying it to the ECFL " ECFL ¼ ξ

∧ N 1=N∑n¼1 CVIn −CVI

2 #

N   ≤ ð1=NÞ∑n¼1 e2n ¼ Eavg

ð8Þ

which indicates that the CFL with simple averaging and weighted averaging as linear combiner gives smaller errors than the average of all the FL models. References Aller, L., Bennett, T., Lehr, J.H., Petty, R.J., Hackett, G., 1987. DRASTIC: a standardized system for evaluating ground water pollution potential using hydrogeologic settings. EPA 600/2–87-035. U.S. Environmental Protection Agency, Ada, Oklahoma. American Public Health Association, 1998. Standard Method for the Examination of Water and Wastewater. 17th ed. (Washington, DC). Asadi, S., Hassan, M., Nadiri, A., Heather, D., 2014. Artificial intelligence modeling to evaluate field performance of photocatalytic asphalt pavement for ambient air

89

purification. Environ. Sci. Pollut. Res. 21:8847–8857. http://dx.doi.org/10.1007/ s11356-014-2821-z. Babiker, I.S., Mohamed, M.A.A., Hiyama, T., Kato, K., 2005. A GIS-based DRASTIC model for assessing aquifer vulnerability in Kakamigahara Heights, Gifu Prefecture, central Japan. Sci. Total Environ. 345 (1–3):127–140. http://dx.doi.org/10.1016/j.scitotenv. 2004.11.005. Baghapour, M.A., Nobandegani, A.F., Talebbeydokhti, N., Bagherzadeh, S., Nadiri, A.A., Gharekhani, M., Chitsazan, N., 2016. Optimization of DRASTIC method by artificial neural network, nitrate vulnerability index, and composite DRASTIC models to assess groundwater vulnerability for unconfined aquifer of Shiraz Plain, Iran. J. Environ. Health Sci. Eng. 14 (1):13. http://dx.doi.org/10.1186/s40201-016-0254-y. Bezdek, J.C., Ehrlich, R., Full, W., 1984. The fuzzy c-means clustering algorithm. Comput. Geosci. 10:191–203. http://dx.doi.org/10.1016/0098-3004(84)90020-7. Chen, C.H., Lin, Z.S., 2006. A committee machine with empirical formulas for permeability prediction. J. Comput. Geosci. 32, 485–496. Chen, M.S., Wang, S.W., 1999. Fuzzy clustering analysis for optimizing fuzzy membership functions. Fuzzy Sets Syst. 103 (2):239–254. http://dx.doi.org/10.1016/S01650114(98)00224-3. Chitsazan, N., Nadiri, A.A., Tsai, F.T.-C., 2015. Prediction and structural uncertainty analyses of artificial neural networks using hierarchical Bayesian model averaging. J. Hydrol. 528, 52–62. Chiu, S., 1994. Fuzzy model identification based on cluster estimation. J. Intell. Fuzzy Syst. 2:267–278. http://dx.doi.org/10.3233/IFS-1994-2306. de Martonne, E., 1925. Trait´e de G´eographie Physique: 3 tomes, Paris. Demico, R.V., Klir, G.J., 2004. Fuzzy Logic in Geology. Elsevier Academic Press, San Diego, p. 347. Dixon, B., 2005. Applicability of neuro-fuzzy techniques in predicting ground-water vulnerability: a GIS-based sensitivity analysis. J. Hydrol. 309 (1–4):17–38. http://dx. doi.org/10.1016/j.jhydrol.2004.11.010. EA, 2014. New Groundwater Vulnerability Mapping Methodology (Report: SC040016). Fijani, E., Nadiri, A.A., Moghaddam, A.A., Tsai, F.T.-C., Dixon, B., 2013. Optimization of DRASTIC method by supervised committee machine artificial intelligence to assess groundwater vulnerability for Maragheh–Bonab plain aquifer, Iran. J. Hydrol. 503: 89–100. http://dx.doi.org/10.1016/j.jhydrol.2013.08.038. Grande, J., Andújar, J., Aroba, J., Beltrán, R., de la Torre, M., Cerón, J., Gómez, T., 2010. Fuzzy modeling of the spatial evolution of the chemistry in the Tinto River (SW Spain). Water Resour. Manag. 24 (12):3219–3235. http://dx.doi.org/10.1007/s11269-0109603-2. Hounslow, A.W., 1995. Water quality data. Analysis and Interpretation. CRC Press LLC, Lewis publishers (397 pp. http://www.sciencedirect.com/science/article/pii/ S0098300405001846). Huan, H., Wang, J., Teng, Y., 2012. Assessment and validation of groundwater vulnerability to nitrate based on a modified DRASTIC model: a case study in Jilin City of northeast China. Sci. Total Environ. 440:14–23. http://dx.doi.org/10.1016/j.scitotenv.2016.01. 144. Jafari, S.M., Nikoo, M.R., 2016. Groundwater risk assessment based on optimization framework using DRASTIC method. Arab. J. Geosci. 9:742. http://dx.doi.org/10.1007/ s12517-016-2756-4. Kadkhodaie-Ilkhchi, A., Rezaee, M.R., Rahimpour-Bonab, H., Chehrazi, A., 2009. Petrophysical data prediction from seismic attributes using committee fuzzy interference system. Comput. Geosci. 35:2314–2330. http://dx.doi.org/10.1016/j.cageo.2009. 04.010. Labani, M.M., Kadkhodaie-Ilkhchi, A., Salahshoor, K., 2010. Estimation of NMR log parameters from conventional well log data using a committee machine with intelligent systems: a case study from the Iranian part of the South Pars gas field, Persian Gulf Basin. J. Pet. Sci. Eng. 72:175–185. http://dx.doi.org/10.1016/j.petrol.2010. 03.015. Larsen, P.M., 1980. Industrial application of fuzzy logic control. Int. J. Man Mach. Stud. 12: 3–10. http://dx.doi.org/10.1016/S0020-7373(80)80050-2. Lee, K.H., 2004. First Course on Fuzzy, Theory and Applications. Springer, Berlin, 335 pp. Li, H., Philip, C.L., Huang, H.P., 2001. Fuzzy neural intelligent systems. Mathematical Foundation and the Applications in Engineering. CRC Press, Inc., Boca Raton, FL (http://dl. acm.org/citation.cfm?id=527743). Mamdani, E.H., 1976. Advances in the linguistic synthesis of fuzzy controllers. Int. J. Man Mach. Stud. 8 (6):669–678. http://dx.doi.org/10.1016/S0020-7373(76)80028-4. Mehrpartou, M., Amini Fazl, A., Radfar, J., 1992. Geologic Map of Varzeghan, Scale 1: 100000. Geological Survey of Iran. Mohammadi, K., Niknam, R., Majd, V.J., 2009. Aquifer vulnerability assessment using GIS and fuzzy system: a case study in Tehran-Karaj aquifer, Iran. Environ. Geol. 58: 437–446. http://dx.doi.org/10.1007/s00254-008-1514-7. Nadiri, A.A., 2015. Application of artificial intelligence methods. Geosciences and Hydrology. OMICS International Publications http://dx.doi.org/10.4172/978-1-63278-0614-062. Nadiri, A.A., Fijani, E., Tsai, F.T.C., Asghari Moghaddam, A., 2013. Supervised committee machine with artificial intelligence for prediction of fluoride concentration. Hydroinform. J. 15 (4):1474–1490. http://dx.doi.org/10.2166/hydro.2013.008. Nadiri, A.A., Chitsazan, N., Tsai, F., Moghaddam, A., 2014. Bayesian artificial intelligence model averaging for hydraulic conductivity estimation. J. Hydrol. Eng.:520–532 http://dx.doi.org/10.1061/(ASCE)HE.1943-5584.0000824 (10.1061/(ASCE) HE.19435584.0000824). Nadiri, A.A., Marwa, H., Asadi, S., 2015. Supervised intelligence committee machine to evaluate field performance of photocatalytic asphalt pavement for ambient air purification. J. Transp. Res. Board 2528:96–105. http://dx.doi.org/10.1007/s11356-0142821-z. Nadiri, A.A., Gharekhani, M., Khatibi, R., Sadeghfam, S., Asgari Moghaddam, A., 2017a. Groundwater vulnerability indices conditioned by supervised intelligence committee

90

A.A. Nadiri et al. / Science of the Total Environment 593–594 (2017) 75–90

machine (SICM). Sci. Total Environ. 574:691–706. http://dx.doi.org/10.1016/j. scitotenv.2016.09.093. Nadiri, A.A., Gharekhani, M., Khatibi, R., Asgari Moghaddam, A., 2017b. Assessment of groundwater vulnerability using supervised committee to combine fuzzy logic models. Environ. Sci. Pollut. Res. http://dx.doi.org/10.1007/s11356-017-8489-4 (In press). Newton, S.C., Pemmaraju, S., Mitra, S., 1992. Adaptive fuzzy leader clustering of complex data sets in pattern recognition. IEEE Trans. Neural Netw 5, 794–800. Nourani, V., Asgharimoghaddam, A., Nadiri, A.A., 2008. Forecasting spatiotemporal water levels of Tabriz aquifer. Trends Appl. Sci. Res. 3 (4), 319–329. Ouedraogo, I., Defourny, P., Vanclooster, M., 2016. Mapping the groundwater vulnerability for pollution at the pan African scale. Sci. Total Environ. 544:939–953. http://dx.doi. org/10.1016/j.scitotenv.2015.11.135. Rezaei, F., Safavi, H.R., Ahmadi, A., 2013. Groundwater Vulnerability Assessment Using Fuzzy Logic: A Case Study in the Zayandehrood Aquifers, Iran. J. Environ. Manag. 51, 267–277. Sadeghfam, S., Hassanzadeh, Y., Nadiri, A.A., Zarghami, M., 2016. Localization of groundwater vulnerability assessment using catastrophe theory. Water Resour. Manag. 30 (13), 4585–4601.

Scanlon, B., Healy, R., Cook, D., 2002. Choosing appropriate techniques for quantifying groundwater recharge. Hydrol. J. 10 (1):18–39. http://dx.doi.org/10.1007/s10040001-0176-2. Sener, E., Davraz, A., 2015. Assessment of groundwater vulnerability based on a modified DRASTIC model, GIS and an analytic hierarchy process (AHP) method: the case of Egirdir Lake basin (Isparta, Turkey). Hydrogeol. J. 21:701–714. http://dx.doi.org/10. 1007/s10040-012-0947-y. Şener, E., Şener, Ş., 2015. Evaluation of groundwater vulnerability to pollution using fuzzy analytic hierarchy process method. Environ. Earth Sci. 73 (12):8405–8424. http://dx. doi.org/10.1007/s12665-014-4001-3. Shrestha, S., Semkuyu, D.J., Pandey, V.P., 2016. Assessment of groundwater vulnerability and risk to pollution in Kathmandu Valley, Nepal. Sci. Total Environ. 556:23–35. http://dx.doi.org/10.1016/j.scitotenv.2016.03.021. Sugeno, M., 1985. Industrial Application of Fuzzy Control. (North Holland, New York). p. 269. Tayfur, G., Nadiri, A.A., Moghaddam, A., 2014. Supervised intelligent committee machine method for hydraulic conductivity estimation. Water Resour. Manag. 28 (4): 1173–1184. http://dx.doi.org/10.1007/s11269-014-0553-y. Zadeh, L.A., 1965. Fuzzy sets. Inf. Control 8 (3), 338–353.

Suggest Documents