Hydrological Sciences-Journal-des Sciences Hydrologiques, 46(2) April 2001
227
Mapping groundwater quality variables using PCA and geostatistics: a case study of Bajo Andarax, southeastern Spain
FRANCISCO SÂNCHEZ-MARTOS Department of Hydrogeology, University of Almeria, Campus Universitario, E-04120 Almeria, Spain e-mail: fmartos@ual,es
R. JIMÉNEZ-ESPINOSA Department of Geology, University ofJaén, Virgen de la Cabeza 2, E-23071 Jaén, Spain e-mail:
[email protected]
A. PULIDO-BOSCH Department of Hydrogeology, University of Almeria, Campus Universitario, E-04120 Almeria, Spain e-mail:
[email protected] Abstract A case study is presented for the application of statistical and geostatistical methods to the problem of estimating groundwater quality variables. This methodology has been applied to the investigation of the detrital aquifer of the Bajo Andarax (Almeria, Spain). The use of principal components analysis is proposed, as a first step, for identifying relevant types of groundwater and the processes that bring about a change in their quality. As a result of this application, three factors were obtained, which were used as three new variables (VI: sulphate influence; V2: thermal influence; and V3: marine influence). Analysis of their spatial distribution was performed through the calculation of experimental and theoretical variograms, which served as input for geostatistical modelling using ordinary block kriging. This analysis has allowed a probabilistic representation of the data to be obtained by mapping the three variables throughout the aquifer for each sampling point. In this way, one can evaluate the spatial and temporal variation of the principal physico-chemical processes associated with the three variables VI, V2 and V3 implicated in the groundwater quality of the detrital aquifer. Key words PCA; geostatistics; groundwater quality; Spain
Cartographie de variables de la qualité des eaux souterraines en utilisant l'ACP et la géostatistique: étude de cas de l'Andarax inférieur, dans le sud-est de l'Espagne Résumé Nous présentons une étude de cas pour l'application de méthodes statistiques et géostatistiques au problème de l'estimation des variables de qualité des eaux souterraines. Nous avons appliqué cette méthodologie à l'étude de l'aquifère détritique de l'Andarax inférieur (Almeria, Espagne). Les variables concernées comportent des ions majeurs ainsi que plusieurs grandeurs physico-chimiques. Nous proposons l'utilisation d'analyses en composantes principales, comme première étape, pour identifier les groupes d'eau les plus remarquables, et les facteurs qui peuvent changer leur qualité. Les principaux groupes de processus ont ainsi été identifiés, en obtenant trois facteurs qui ont été utilisés comme trois nouvelles variables (VI l'influence sulfatée, V2 l'influence thermale et V3 l'influence marine). L'analyse de leur distribution spatiale a été réalisée par le calcul des variogrammes expérimentaux et théoriques, qui servent d'entrées pour la réalisation de l'estimation géostatistique par la méthode du krigeage ordinaire par blocs. Cette analyse a permis d'obtenir une représentation probabiliste de l'information par la cartographie des trois variables le long de l'aquifère dans chaque prélèvement. On peut ainsi évaluer la variation spatiale
Open for discussion until I October 2001
228
Francisco Sânchez-Martos et al.
et temporelle des principaux processus physico-chimiques associés aux variables VI, V2 et V3, impliqués dans la qualité des eaux souterraines de l'aquifère détritique. Mots clefs ACP; géostatistique; qualité des eaux souterraines; Espagne
INTRODUCTION The study of hydrogeochemical evolution in complex aquifers requires manipulation of a wide range of data of diverse origin. The physico-chemical parameters indicate the diversity of the groundwater and orientation of the possible processes that take place through the aquifer. A series of geological variables must also be considered which determine the chemical evolution of the water. The current work has as its principal objective the study of the hydrogeochemical evolution of a complex aquifer, using a methodology that takes into account all the factors, considering the physico-chemical characteristics of the groundwater as well as basic data. Based on this multivariate and complex information, using principal component analysis (PCA), it is intended to establish a series of factorial variables that summarize all the hydrogeochemical information. A geostatistical study of these derived variables allows one to work in a reduced multivariate space, and to establish their spatial distribution throughout the aquifer by the calculation of variograms. Likewise, it is intended to produce maps of groundwater quality using these factorial variables and ordinary kriging. In this way, it is hoped to verify whether these new variables permit location of the zones where various physico-chemical processes are superimposed, considering the hydrogeochemical and geological parameters. Ultimately, the aim is to identify the development in space of the principal processes that act on groundwater quality.
HYDROGEOLOGICAL SETTING The valley of the Lower Andarax River (Bajo Andarax) is flanked by the Sierra Alhamilla and the Sierra Nevada where pelitic metasediments outcrop. The extreme west corresponds to the Sierra de Gâdor which constitutes a limestone-dolomite massif with outcrops of phyllites (Fig. 1). The depression is filled by post-orogenic detrital deposits of diverse lithology (marls, sandy loams, sands and conglomerates) with numerous evaporite gypsiferous intercalations. In accordance with the geological characteristics of the area, the following hydrogeological units have been defined, namely detrital aquifer, carbonate aquifer, and deep aquifer (Pulido-Bosch et al, 1992; Sânchez-Martos, 1997). The detrital aquifer runs the entire length of the valley and includes Quaternary alluvial and deltaic deposits, together with Pliocene deltaic sandy silt conglomerates. The thickness of the aquifer varies between 200 m (northwestern area), and 20^10 m (delta). The detrital aquifer shows sharp fluctuations in water level, with a clear seasonal recovery. A study of the hydrogeochemical evolution of the detritic aquifer of the Lower Andarax presents a series of difficulties related to the diversity of its groundwater, in the sense that one can find thermal processes, marine intrusion and high sulphate and boron contents (Sânchez-Martos, 1997; Sânchez-Martos et al, 1999). To this can be added the geological complexity of the area, with an intense tectonic activity
Mapping groundwater quality variables using PCA and geostatistics
229
(a)
Rioja
- Limestones and dolomites Metapelitic rocks Conglomerates, silty mails and sandy marls \
Main fractures Hidrogeological section
Boundary of the study area shown in Figs 4, 5 and 6.
r
(b)
Sierra Alhamilla Sierra de Gâdor Detrital Aquifer &
ft Carbonate Aquifer Deep Aquifer Limestones and dolomites
J:l:i:[:j Sandy, silt and conglomerates
Silt marls and sandy marls Phyllites and calcoschists Fig. 1 (a) Location of the detrital aquifer in the Lower Andarax, showing the situation of the principal fractures, and (b) hydrogeological cross-section across the Lower Andarax. (Bousquet & Phillip, 1976, Sanz de Galdeano & Rodriguez Fernandez, 1985) and the presence of a great lithological diversity with frequent changes of fades (Voermans & Baena, 1983).
Francisco Sânchez-Martos et al.
230
4000
S y
,
Serra Aitiam;]
/
/ J
4000 -3000
/
; ,>
_^_Sierra / s
;
y
- j
Lx "
Mg
S04
10
/ ^
_^0Û
'
,
*
*
23
'
y
y
,.
V. z
V
,
;
^
' *
^4KK>
20 a
4C
50
80
.60
m
20
HC03
43
6C
80
30
Cl
sooe 7000
M. ht- i
Fig. 2 Spatial distribution of electrical conductivity ([xS cm"1) in the waters of the detrital aquifer (May 1992) and Piper diagram of the detrital aquifer for the sampling runs.
The water of this aquifer has several types of faciès, with a gradation between magnesium or calcium sulphate, sodium-magnesium and chloro-sulphate and sodium chloride types (Fig. 2). The first belongs to the Gâdor area, where conductivity is less than 2000 u,S cm"1. The other groups appear mainly in the area around Rioja and the Tabernas gully (4000 \iS cm"1), and in the coastal strip, which exceeds 8000 U.S cm"1.
METHODOLOGY Hydrogeochemical data The hydrogeochemical information obtained in the Lower Andarax basin was derived from a sampling network in the Detrital Aquifer with 31 wells. The sampling was earned out during the low-water period (September) and during the high-water period (May). The physico-chemical parameters, electrical conductivity, temperature and pH, were directly determined in the well using a WTW field conductivity meter, model LFT 91 and a WTW field pH meter model pH 95. The anions CF, S0 4 2 ~ and N03~~ were analysed using ion chromatography with a DIONEX chromatograph, model DX-100. HCO3"" was analysed by potentiometric determination in a Methrom Titroprocessor, model 686. The concentrations of the cations Na+, Mg 2+ ,Ca 2+ and K+, together with B J + , Li+ and Sr2+ were measured by inductively-coupled plasma analysis
Mapping groundwater quality variables using PCA and geostatistics
Table 1 Chemical composition of groundwater of the Lower Andarax (September 1992). pH No. T EC so 4 2 ~ H C 0 3 " NO3- Na+ Mg 2 + Ca2+ K +
cr
1 2 3 4 6 7 8 9 10 11 12 13 16 18 21 22 23 24 28 29 30 33 34 35 37 39 42 44 46 47 80P
21.3 19.9 21.3 24.1 22.4 22.1 23.6 19.7 22.3 20.8 19.5 27.5 20.6 20.1 23.4 26.9 21.0 21.3 24.2 23.6 21.5 25.1 19.7 23.5 22.1 17.4 19.0 21.0 21.8 22.2 22.8
8100 7680 2340 5830 6220 3570 13530 3210 4620 3160 3520 3050 4450 3270 2240 2430 4430 3040 1684 4390 3100 1479 1487 3950 1287 1820 1927 1540 4830 4830 3100
6.5 1486 2314 6.5 1614 2010 6.9 256 765 6.8 1542 794 7.5 1322 1858 6.8 632 1150 6.9 4232 1876 6.7 430 1317 6.9 564 1644 6.7 368 1140 7.0 422 1280 7.7 655 830 6.7 530 1930 7.8 336 1150 7.2 303 565 7.0 382 640 6.9 724 1490 6.9 528 794 7.0 195 413 9.4 1045 844 7.0 384 1192 6.8 80 440 7.1 139 391 7.1 468 2036 6.2 79 351 6.9 106 614 6.9 151 816 7.3 124 504 6.7 362 2606 6.7 1030 1075 7.0 456 807
532 423 301 252 976 256 246 319 373 350 334 284 300 404 306 313 317 322 296 144 374 279 279 158 273 268 295 195 251 350 296
210 65 19 24 65 27 138 68 50 59 76 0 118 74 0 3 41 28 7 0 70 18 8 36 8 25 25 10 0 100 22
812 702 183 651 716 327 2547
288 442 266 324 386 377 235 249 231 451 261 110 652 259 117 72 209 65 81 106 91 389 494 310
476 522 90 259 417 164 533 175 275 154 167 137 248 180 72 120 201 150 77 217 166 31 58 277 47 70 93 50 261 238 101
441 526 218 271 374 338 417 298 344 270 296 182 382 266 126 141 325 188 147 68 277 154 161 485 123 190 243 153 470 268 192
12 11 7 15 31 11 45 11 19 10 11 15 21 10 10 11 10 16 5 17 7.2 3 4 7 4 4 4 4 19 14 12
231
B3+
Li+
Sr2+
1.42 0.80 0.37 1.29 1.51 0.84 1.70 0.60 1.54 0.58 0.78 0.82 0.75 0.31 0.89 0.30 1.40 0.60 0.46 0.74 0.32 0.07 0.38 0.47 0.07 0.06 0.27 0.00 1.77 1.11 0.42
0.38 0.29 0.12 0.22 0.13 0.19 0.29 0.22 0.60 0.17 0.23 0.51 0.25 0.10 0.46 0.36 0.42 0.35 0.10 0.72 0.17 0.04 0.01 0.29 0.05 0.02 0.03 0.05 0.60 0.23 0.43
10.05 10.63 4.26 6.42 5.23 5.87 8.03 5.49 12.70 5.29 5.90 5.26 7.54 4.26 3.25 3.24 5.88 4.43 3.89 1.19 5.44 1.82 2.73 13.05 2.32 3.16 3.85 2.17 9.82 6.47 4.27
T: temperature (°C), EC: electrical conductivity (JJ,S cm" ). Ionic concentrations in mg l"1.
(ICP-OES) using a LEEMAN LABS atomic emission spectrophotometer, model PS-1000. The results are shown in Table 1.
Numerical methods Principal component analysis This is the most widely used method of multivariate data analysis owing to the simplicity of its algebra and its straightforward interpretation. A linear transformation is defined which transforms a set of correlated variables into uncorrelated factors. These orthogonal factors can be shown to extract successively a maximal part of the local variance of the variables. The basic problem solved by principal component analysis is to transform a set of correlated variables into uncorrelated quantities, which could be interpreted in a ideal, multi-Gaussian, context as independent factors underlying the phenomenon (Wackernagel, 1995).
232
Francisco Sânchez-Martos et al.
Let Z be the n x N matrix of data from which the means of the variables have already been subtracted. Then the correspondingNxN variance-covariance matrix Fis: V = [alJ] = ^ZTZ (1) n Let Y be an n x N matrix containing in its rows the n samples of factors Yp (p = 1, ..., AO, which are uncorrelated and of zero mean. The variance-covariance matrix of the factors is diagonal, owing to the fact that the covariances between factors are nil by definition: 1
(dn
0
0 ^
0
"-.
0
0
0
dK
1T
D = -Y Y
(2) J
and the diagonal elements dpp are the variances of the factors. A matrix A is sought, NxN orthogonal, which linearly transforms the measured variables into synthetic factors: Y = ZA
withA r A = l
(3)
Multiplying this equation from the left by \ln and YT and replacing Y by ZA one obtains: -YTY=-YTZA^~(ZAf(ZA) n n n Finally D = ATVA
= ^ATZTZA n
=> VA = AD
= AT-(ZTZ)A n
(4)
(5)
It can immediately be seen that the matrix Q orthonormal of eigenvectors of V offers a solution to the problem and the eigenvalues kp are then simply the variances of the factors Yp. Principal component analysis is nothing else than a statistical interpretation of the eigenvalue problem: VQ^QA
mthQTQ
=l
(6)
defining the factors as: Y = ZQ
(7)
Another important aspect of principal component analysis is that allows to define a sequence of orthogonal factors which successively absorb a maximal amount of variance of the data. Take a vector j>i corresponding to the first factor obtained by transforming the centred data matrix Z with a vector a, calibrated to unit length: j , = Zax
with a[ax - 1
(8)
The variance of j i is: var(j,) = -y1lyx n
=-a]ZTZax n
=a\Va,
(9)
Mapping groundwater quality variables using PCA and geostatistics
233
To attribute a maximal part of the variance of the data to yx, an objective function (|>i with a Lagrange parameter X\ is defined, which multiplies the constraint that the transformation vector a\ should be of unit norm: (^a^-A^afa.-l)
(10)
Setting the derivative with respect to a\ to zero: -A aal
=
0
0 f
spherical
y(h) = c2
-ih 2a
h3 ^ 2a'
y(h) = c2
h> a
a = range;
c - sill
cNug
, a,c2 constant
J
(19)
0