Methods. In this study the main unit of generalization is the spatial cluster and the main ... Copression Distance metri
SCALABLE GENERALIZATION OF HYDRAULIC CONDUCTIVITY IN QUATERNARY STRATA FOR USE IN A REGIONAL GROUNDWATER MODEL 1
1
2
3
1
1
INVESTING IN YOUR FUTURE
1
Jānis JĀTNIEKS , Konrāds POPOVS , Ilze KLINTS , Andrejs TIMUHINS , Andis KALVĀNS , Aija DĒLIŅA , Tomas SAKS 1 2 3 Faculty of Geography and Earth Sciences, Faculty of Physics and Mathematics, Laboratory for Mathematical Modelling of Environmental and Technological Processes. University of Latvia, Riga, Latvia. Contact:
[email protected], www.puma.lu.lv 1140
1
6400000
0.5
6380000
0.1
6360000
0.05 0.01
6340000 6320000
0.001
0.0005
6300000
0.0001
6280000
5E-005
6260000 6240000
PermeabilityZ [m/day]
0.005
1130
1150
1110
NCD 12c
1100 NCD 13c
1090 1080
1100
NCD 74c
1060 1050 Number of logical clusters after dendrogram flattening
Fig. 8. Squared sum of errors against geometries prepared from different clustering solutions using NCD matrix.
1E-005
Squared error sum distribution by clustering solution and associated experimental geometry structure - NCD matrix
5E-006
Number of logical clusters after dendrogram flattening 0
6220000
This work aims to present calibration results and detail our experience while integrating a scalable generalization of hydraulic conductivity for Quaternary strata into the regional groundwater modelling system for the Baltic artesian basin – MOSYS V1.
Methods
spatial clustering results into the MOSYS finite-element mesh geometry and resolving structure conflicts, arising from different lithological structures, geological columns selected from the most typical borehole log in this cluster. 100
6405000
50 10
6400000
5 1
6395000
0.5 0.1
6390000
0.05 0.01
6385000
0.005 0.001
6380000
0.0005
PermeabilityXY [m/day]
In this study the main unit of generalization is the spatial cluster and the main criteria for finding adequate representative borehole columns is the Normalized Copression Distance metric – a nonparametric datamining technique calculated by CompLearn ncd utility (Cilibrasi, 2007). Spatial clusters were obtained from flattened completelink hierarchical clustering dendrogram representations, supplying cluster membership identifiers for each borehole. Using Voronoi tesselation and GIS dissolve spatial analysis functionality on this data spatial clusters were obtained in GIS format. The procedure is described in detail in “Lithological Uncertainty Expressed by Normalized Compression Distance” by Jatnieks et al. 2012. Full overview of the experiment workflow with software components and key stages is shown in Figure 1. We used two experiment matrices for hierarchical clustering. One where spatial clusters are obtained from Normalized Compression Distance (further identified as NCD matrix) matrix alone and a second where matrix is combined with range normalized Euclidean distance matrix of borehole locations in BalticTM93 projection (further identified as NCD+E experiment matrix).
Fig. 3. Algorithm for calculating Z values for including
0.0001
6375000
5E-005 1E-005
6370000
5E-006
6365000
1E-006 5E-007
6360000
1E-007 5E-008
6355000 380000
390000
400000
410000
420000
430000
440000
Fig. 4. After the algorithm described in Fig. 3 creates the layer structure into the model geometry, the respective conductivity values are assigned to mesh elements, created by the new placeholder layers. The cluster membership for each of mesh elements is determined by intersecting the element centroids, shown here in white with spatial clusters in GIS Shapefile layer. Log scale for conductivity values on Y axis.
400000
450000
500000
550000
600000
650000
700000
750000
NCD+E 7c NCD+E 67c NCD+E 69c
1000 NCD+E 70c NCD+E 73c 950
300
850 1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97 103 109 115 121 127 133 139 145 151 157 163 169 175 181 187 193 199 205 211 217 223 229 235 241 247 253 259 265 271 277 283 289 295 301 307 313 319 325 331 337 343 349 355 361 367 373 379 385 391
350000
400
Number of iterations for MOSYS auto-calibration routine
Fig. 12. Selected experimental model geometries and their response to
500
calibration of model Quaternary strata.
600
D2nr
D2ar
D2br
D3gj
D3am
D3pl
D3slp
D3dg
D3ogkt
D3stel
D3jnsk
C1
P
Q
700
Fig. 9. Squared error distribution by model calibration layers for
Results
The experiment geometries based on NCD matrix perform each of the geometries derived from different logical clustering better overall. solutions using NCD matrix. The various Quaternary representations tested have a measurable effect on model results - the worst geometries, such Number of logical clusters against squared sum of head errors - NCD+E matrix as NCD+E 7, 3 and others with low logical cluster count, perform more than twice as bad compared to the best NCD clustering based geometries. NCD clustering based geometries with the better initial calculation results (in Figures 8 and 10), such as the 12 and 13 cluster variants, also tend to respond better to optimization with MOSYS auto-calibration routine (Figure 12). Some the geometries with lower results from intial model run, Fig. 10. Squared sum of errors against geometries prepared from show very minimal improvement with calibration (Figure 12). different clustering solutions using NCD+E matrix. For our experiment territory, consisting of formerly glaciated Squared error sum distribution by clustering solution and associated experimental geometry structure - NCD+E matrix Number of logical clusters after dendrogram flattening territory of Latvia, the best performing generalizations have 12 and 13 clusters. This could have been successfully deduced from the clustering dendrogram in Figure 2. The best performing model geometry before applying the NCD metric based clustering solution, had squared error sum of 814 meters. The best NCD clustering geometry provides 9.7% improvement to 735. This may seem modest in nominal terms. It isn't, because the experimental Quaternary strata were inserted in a previously calibrated model geometry with initial conductivity values as estimates. Model autocalibration works by multiplying D2nr D2ar D2br D3gj D3am D3pl D3slp D3dg D3ogkt D3stel D3jnsk C1 P Q layer material properties with coefficients determined by the the Scipy L-BFGS-B routine (Zhu et al. 1997). Since this works on a Fig. 11. Squared error distribution by model calibration layers for layer basis, the proportional differences in conductivity values each of the geometries derived from different logical clustering between elements in each layer remain constant. Ideally, the solutions using NCD+E matrix. model geometry area built using clustering and the rest of the The response of the selected new model geometries to territory would be calibrated seperately. This could yield further calibration indicates their influence on the underlying improvements. model strata, not just improved performance in the part References 1. Bennett, C. H., Gacs P., Li M., Vitanyi P., Zurek W. 1998, Information Distance, IEEE Transactions on Information Theory, of the model representing the Quaternary deposits. 44(4), 1407-1423., IEEE. R. 2007., Statistical Inference Through Data Compression, ILLC Dissertation Series DS-2007-01, Institute for Geometries NCD 12c, NCD 13c were calibrated further, 2. Cilibrasi, Logic, Language and Computation, Universiteit van Amsterdam. including the remaining bedrock layers as optimization 3. Klints I., Virbulis J., Timuhins A., Sennikovs J. and Bethers U., Calibration of the hydrogeological model of the Baltic Basin, EGU Geophysical research abstracts EGU2012-10003 [this section poster board A255]. parameters and yielding further decrease of squared 4. Artesian Takčidi, E. 1999. Datu bāzes "Urbumi" dokumentācija [Documentation of the database "Boreholes"]. Valsts ģeoloģijas Rīga. [In Latvian]. error sum, used as overall fitness function for model 5. dienests, Valner, L. 2003. Hydrogeological model of Estonia and its applications. Proc. Estonian Acad. Sci. Geol., 52, 3, 179-192. performance against observed pressure heads. 6. Virbulis, J., Bethers, U., Saks, T., Sennikovs, J. & Timuhins, A. Hydrogeological model of the Baltic Artesian Basin. Journal. [Under revision]. Examples of crossections from NCD 12c structure are 7. Hydrogeology C. Zhu, R. H. Byrd and J. Nocedal. L-BFGS-B: Algorithm 778: L-BFGS-B, FORTRAN routines for large scale bound constrained optimization (1997), ACM Transactions on Mathematical Software, 23, 4, pp. 550 - 560. shown in Figures 6 and 7. 1750 1700 1650 1600 1550 1500 1450 1400 1350 1300 1250 1200 1150 1100 1050
Number of logical clusters after dendrogram flattening
70 69 67 68 42 43 51 50 47 49 46 45 44 81 82 48 66 57 52 55 56 53 62 54 61 64 63 85 83 84 60 65 59 58 78 77 79 80 72 71 73 74 76 37 40 41 39 38 33 32 31 30 29 27 28 35 34 36 26 25 24 23 22 20 21 17 18 19 16 11 14 12 13 15 10 9 8 5 6 4 7
Scope and objective
300000
1050
900
200
70 69 67 68 42 43 51 50 47 49 46 45 44 81 82 48 66 57 52 55 56 53 62 54 61 64 63 85 83 84 60 65 59 58 78 77 79 80 72 71 73 74 76 37 40 41 39 38 33 32 31 30 29 27 28 35 34 36 26 25 24 23 22 20 21 17 18 19 16 11 14 12 13 15 10 9 8 5 6 4 7
The cover of Quaternary sediments, especially in formerly glaciated territories usually is the most complex part of the sedimentary sequences. In regional hydro-geological models it is often assumed as a single layer with uniform or calibrated properties (Valner, 2003). However, the properties and structure of Quaternary sediments control the groundwater recharge: it can either direct the groundwater flow horizontally towards discharge in topographic lows or vertically, recharging groundwater in the bedrock. Building a representative structure of Quaternary strata in a regional hydrogeological model is important as this affects all the underlying strata.
5E-008
Squared error sum in meters (smaller is better)
Introduction
6180000
NCD 79c
100
0
200
Squared error sum in meters by model calibration layers (smaller is better)
Normalized Compression Distance and hierarchical clustering as an approch for scalable generalization of Quaternary sediment lithological structure in the MOSYS regional groundwater modelling Fig. 2. Complete-link agglomerative hierarchical system for the Baltic artesian basin territory. clustering dendrogram for Normalized Compression Distance matrix of serialized borehole log lithological structures.
1E-007
Squared error sum in meters by model calibration layers (smaller is better)
5E-007
6200000
NCD 14c
1070
1E-006
Fig. 1. Overview of the full experiment workflow for using
Optimization runs for Quaternary strata in 5 model geometries for each of NCD and NCD + E matrices
1120
12 13 14 15 17 16 23 50 51 52 53 55 54 22 56 60 61 57 58 20 21 59 48 49 64 9 63 62 6 11 39 19 38 18 37 10 70 71 75 74 73 72 67 47 65 66 46 5 36 35 69 68 24 34 33 32 4 45 77 76 43 42 29 41 28 25 26 40 27 30 8 31 44 85 84 7 82 80 81 83 78 79
5
Target function values against optimization iteration count
Number of logical clusters against squared sum of head errors - NCD matrix
12 13 14 15 17 16 23 50 51 52 53 55 54 22 56 60 61 57 58 20 21 59 48 49 64 9 63 62 6 11 39 19 38 18 37 10 70 71 75 74 73 72 67 47 65 66 46 5 36 35 69 68 24 34 33 32 4 45 77 76 43 42 29 41 28 25 26 40 27 30 8 31 44 85 84 7 82 80 81 83 78 79
6420000
Squared error sum in meters (smaller is better)
10
Squared error sum in meters (smaller is better)
B This degree of similarity and the spatial heterogeneity of the cluster polygons can be varied by different flattening of the hierarchical cluster model into variable number of clusters. Such an approach provides a scalable generalization solution. It can be scaled from broad to more detailed generalization, depending on model and structural characteristics of the modelling territory, with the help of the clustering dendrogram (Figure 2). Using the dissimilarity matrix of the NCD metric, a borehole, most similar to all the others from the lithological structure point of view, can be identified A from the subset of all cluster member boreholes. The log structure of this borehole then is applied Fig. 5. Spatial distribution of vertical permeability values in the throughout the territory of the corresponding spatial MOSYS V1 modelling system mesh in the clustering experiment cluster. The spatial locations of the same logical clusters can be disjoint - in different parts of the territory in Latvia. Screenshot fragment of MOSYS visualization modelling territory and have the same borehole log platform HiFiGeo, log scale, EPSG:25884 projection. column, representing the lithological structure throughout spatial clusters with the same logical cluster identifier. Hori162 model geometries Vertical zontal were prepared, incorporating Class Lithology conduc- conductivity, tivity, m/day different candidate m/day 1 sandstone 5 1 representations of Quaternary sandstone2 1 0.5 low strata made from spatial conductivity clusters with the most typical sandstone3 high 50 20 borehole column, selected conductivity 4 limestone 30 10 using NCD metric. The limestone-low 5 1 0.1 borehole log column conductivity dolomite-high lithological structure is 6 100 30 conductivity 7 dolomite 30 10 Fig. 6. Middle fragment of vertical crossection AB (overview in Figure composed using aggregate domerite 5) across the Vidzeme region, showing horizontal permeability values lithology types as shown in 8 (clayey 0.1 0.01 dolomite) on the right Y axis in Quaternary represented by 8 modelling system Table 1 with their 0.0000 9 clay 0.00001 layers (only Q layers shown, for simplicity). Left Y axis - Z values in 1 conductivity values. 10 loam 0.001 0.001 To integrate these 11 sandy loam 0.001 0.001 meters above sea level. Distance from crossection A point on X axis. silt 0.01 0.01 results into the geometry 12 13 sand 5 5 of regional groundwater 14 gravel 20 20 10 10 model MOSYS V1, the 15 gypsum peat 10 10 algorithm, shown in Figure 16 gyttja and biogenic 0.0003 0.0003 3, calculates mesh point 17 other sediments heights in meters above sea 18 soil 1 1 other 1 1 level for every mesh point in 19 20 concrete 0 0 every newly created Table 1. Initial Quaternary layer. conductivity values After the layer surface from lithology calculations, the mesh classifier, used for element cluster memebership generalization of is determined by intersecting borehole log Fig. 7. Previous implementation of Quaternary strata in the MOSYS V1 element centroids (white dots structures for this modelling system consisted of 4 layers - two sets of aquifer-aquitard in Figure 4) with the spatial study. transitions. The arrow indicates the border between the previous clusters. Quaternary implementation (left) and the new Quaternary The performance of resulting geometries is implementation, using hierarhical clustering of Normalized detailed in Figures 8-11. For both matrix variants Compression Distanece, on the right. (NCD and NCD+E) 82 model geometries, based on different flattenings of cluster dendrogram (Figure This provided insight in their sensitivity to calibration 2) were prepared and run in MOSYS V1 modelling through further optimization of hydraulic conductivity values environment. From the inital model run results in in Quaternary strata (Figure 12). The model geometry, in Figures 8 and 10, 3 better perfoming, one which the Quaternary clustering experiment geometries were median and also the worst performing geometry integrated, had already been calibrated until convergence was run through MOSYS autocalibration routine before these experiements. (Klints et al. 2012). 6440000
This work was supported by the European Social Fund project "Establishment of interdisciplinary scientist group and modelling system for ground-water research", 2009/0212/1DP/1.1.1.2.0/09/APIA/VIAA/060
400
600
800
1000
1200
1400