Data Mining and Data-Driven Modelling in Engineering Geology Applications
126
Angelo Doglioni, Annalisa Galeandro, and Vincenzo Simeone
Abstract
During the last decade, the increasing monitoring and measurement data availability as well as a diffused power of computation, is encouraging scientists and practitioners at using datamining techniques to improve the knowledge of natural and engineering system and to model identified from data, namely data-driven models. Interesting results, both from the practical and scientific viewpoints can be obtained. Here a review of some of the mostly used datadriven techniques is given, showing how they are used in an Engineering Geology framework. Some specific examples are provided, emphasizing potentialities of data-driven modelling applied to Engineering Geology. Keywords
Data mining Data-driven Non-linear behavior Numerical modelling Engineering geology
126.1
Introduction
Modelling natural systems and phenomena, like Engineering Geology issues, can be usually quite complex, due to their complexity and to their non-linearity. Traditional modelling approaches sometimes are not fully adequate at modelling such problems, since often incapable to interpret the complexity of the underlying phenomena. Data-mining and datadriven approaches give the opportunity to get mathematical flexible relationships, able to effectively represent the dynamics of systems, starting from historical sequences of input-output data Ljung (1999). In fact, models identified from measured data can describe systems, being relatively easy to be used by geologists and sometimes providing physical information, i.e. scientific knowledge discovery. For this purpose it is possible the use of data-mining techniques supported by the traditional knowledge, allowing for the development of
A. Doglioni (&) A. Galeandro V. Simeone Department of Civil Engineering and Architecture, Technical University of Bari, via E. Orabona 4, 70125, Bari, Italy e-mail:
[email protected]
Scientific knowledge discovery
integrated strategic approaches to link available data with the dynamic of the natural system. These approaches permit to catch some correlations, affecting the dynamics of the system, which are not always obvious. Data-driven models proved efficient and effective at modelling manifold problems related to engineering geology: groundwater dynamics, landslide susceptibility and reactivation, foundation settlements, rock and soils stressstain relations, hazard zoning, foundation engineering, however their use is still confined and in particular some data-driven techniques are just marginally used. There a lot of classes of data-driven models, for instance: Artificial Neural Networks (ANN), Evolutionary Modelling (EM), Time-Frequency Analysis (TFA), State-Space Modelling (SSM) and Local Regression Tree (LRT), etc. Although there exist other general-purpose data-driven techniques, these are probably the most well-known and largely applied techniques. A short review of three interesting data-driven modelling techniques applied for engineering geology purpose: ANNs, EMs and TFA follows. For each of them are reported some applications. This elicits interesting results and potentially larger usage.
G. Lollino et al. (eds.), Engineering Geology for Society and Territory – Volume 5, DOI: 10.1007/978-3-319-09048-1_126, © Springer International Publishing Switzerland 2015
647
648
126.2
A. Doglioni et al.
Artificial Neural Networks
Based on understanding of the brain and its associated neural systems, ANN use highly simplified models composed of many processing elements (neurons) connected by links of variable weights (parameters) to form a black-box representation of systems (Haykin 1999). The greatest advantage of ANN over other modelling techniques is their capability to model complex, non-linear processes without previous assuming the form of the relationship between input and output variables. Learning in ANN involves tuning the parameters (weights) of interconnections in a highly parameterized system. In particular, ANNs are quite popular for modelling the complex behaviour of geotechnical materials: Shahin et al. (2009) provide a review of these applications. ANNs are also used to model shallow and deep foundations, in order to assess possible settlements or their bearing capacity (e.g. Javadi and Rezania 2009; Kuo et al. 2009). Further applications of ANNs are available for modelling landslide susceptibility. Kawabata and Bandibas (2009) focuses on the use of an ANN to quantitatively model the relationship between landslide occurrence and geologic and morphology factors, like geology, slope, elevation, aspect, distance from the nearest geologic boundary and density of geologic boundaries, in order to generate landslide susceptibility maps. Giustolisi and Simeone (2006) introduce a hybrid evolutionary approach for the structural and input optimization of ANNs, in order to predict groundwater table dynamic response to rainfall (Fig. 126.1). This is pursued by modelling the non-linear relationship between monthly groundwater table levels and rainfall amount.
126.3
Evolutionary Modelling
Evolutionary Modelling (EM) is an evolutionary computing based method that generates a “transparent” and structured representation of the system being studied by mean of explicit equations. The most frequently used EM method is the so-called symbolic regression, which was proposed by Koza (1992). The technique creates mathematical expressions to fit a set of data using the evolutionary process. The EM process mimics natural selection as the fitness of the solutions in the population improves through the generations. The term fitness in this instance refers to a measure of how closely expressions fit the data points. Chang and Chien (2007) proposed a Genetic Algorithm (GA) (Goldberg 1989) for predicting the occurrence of debris flow. A multi-objective technique, namely Multiobjective Evolutionary Polynomial Regression (EPRMOGA) introduced by Giustolisi and Savic (2009) was successfully applied to manifold engineering geology related problems. It proved effective at modelling problems related to groundwater dynamics (Giustolisi et al. 2008; Doglioni et al. 2010, 2011, 2014) and to geotechnical, structural and earthquake engineering (e.g. Javadi et al. 2006; Rezania et al. 2008). Doglioni et al. (2012) used this approach to model landslide activation of a deep-seated landslide, using past cumulative rainfall values (Fig. 126.2). In all these applications, EPR-MOGA models proved easy to be used and worked more accurately than the existing models.
Fig. 126.1 Sampled and ANNMOGA returned values for a case study investigating the dynamic response of a porous aquifer to rainfall
126
Data Mining and Data-Driven Modelling
649
Fig. 126.2 EPRMOGA prediction of landslide reactivations: values higher than 0.5 mean reactivation, 10 out of 11 reactivation events are successfully predicted
126.4
Time Frequencies Analysis
Time-Frequency Analysis (TFA) includes mathematical applications, which resort to the operators of translation and modulation of data series (Gröchenig 2001). An interesting application of these approaches is Digital Elevation Model (DEM) analysis according to Discrete Wavelet Transform (DWT) (Daubechies 1992), for the identification of anomalies of unique features of the surficial topography. Kalbermatten et al. (2012) introduced wavelet transform as space-frequency descriptors, in order to detect geomorphological structural elements in a valley. The scale of topographical features is analysed according to a wavelet transform based discretization of the continuous scale, filtering by successive elimination of low-pass information contained in the DEM. Then, using the inverse wavelet transform, a high-resolution image containing only highpass information for a series of scales is constructed, thus allowing to delineate a landslide by a DEM. Doglioni and Simeone (2014) used DWT in particular for the analysis of the detail coefficients of the wavelet transform to provide evidences of anomalies or singularities, i.e. discontinuities of the land surface not clearly evident from the DEM as it is. Doglioni (2013) used DWT to investigate tectonics at a regional scale. Detail coefficients are analysed, since their variations are associated to slope discontinuities/ sudden variations of DEM, see Fig. 126.3, where high value detail coefficients delineates the main structural discontinuities if for instance compared with a geo-structural map (Festa 2003). Therefore, detail coefficients returned by DWT may be correlated to the discontinuities of topographic
Fig. 126.3 DWT of a DEM: detail coefficients higher than 30 in yellow, showing how they delineate the main tectonic structures
surface and analyzed at a regional scale in order to outline the main tectonic structures of a region.
126.5
Conclusions
Data-driven model potentialities and some applications in engineering geology were here presented. In particular, this work aims at emphasizing poorly known potentialities of data-driven modelling applied to engineering geology, thus introducing a valid alternative framework to classical physically-based approaches. Among the reviewed data-driven approaches, ANNs, EAs, TFA were focused, presenting some applications to engineering geology problems. The achieved results emphasize how a data-driven approach can
650
successfully model complex systems, starting from measured data. This is of particular interest for natural systems and specifically for earth science and engineering geology problems, since these often yield complex dynamics, which can be hardly approached by traditional physically based techniques. Finally, it is also shown how a data-driven technique can be used both for predicting the behaviour of a system and for scientific knowledge discovery, like for EPRMOGA applications. Some limits in their use are related to the available of data. Other problems may be related to potential numerical problems or to data over fitting and finally to the difficulty to generalize to other situations.
References Chang TC, Chien TH (2007) The application of genetic algorithm in debris flow prediction. Environ Geol 53(2007):339–347. doi:10. 1007/s00254-007-0649-2 Daubechies I (1992) Ten lectures on wavelets. SIAM, Philadelphia, p 377 Doglioni A (2013) The use of discrete wavelet transform for the analysis of topographic surface for geological purposes. Rend Online Soc Geol It 24:104–106 Doglioni A, Simeone V (2014) Geomorphometric analysis based on discrete wavelet transform. Environ Earth Sci 71(7):3095–3108. doi:10.1007/s12665-013-2686-3 Doglioni A, Mancarella D, Simeone V, Giustolisi O (2010) Inferring groundwater system dynamics from time series data. Hydrolog Sci J 55(4):593–608. doi:10.1080/02626661003747556 Doglioni A, Galeandro A, Simeone V (2011) A data-driven model of the shallow porous aquifer of south Basilicata—Italy. Adv Res Aquat Environ Environ Earth Sci 1:233–240. doi:10.1007/978-3642-19902-8_27 Doglioni A, Fiorillo F, Guadagno FM, Simeone V (2012) Evolutionary polynomial regression to alert rainfall-triggered landslide reactivation. Landslide 9(1):53–62. doi:10.1007/s10346-011-0274-8 Doglioni A, Galeandro A, Simeone V (2014) Evolutionary data-driven modelling of Salento shallow aquifer response to rainfall, in engineering geology for society and territory, vol 3. In: Lollino G, Arrattano M, Rinaldi M, Giustolisi O, Marechal J-C, Gordon E (eds) Grant, Springer, Berlin Festa V (2003) Cretaceous structural features of the Murge area (Apulian Foreland, Southern Italy). Eclogae Geologicae Helvetiae 96:11–22. doi:10.1007/sOOOf5-003-1076-3
A. Doglioni et al. Giustolisi O, Savic DA (2009) Advances in data-driven analyses and modelling using EPR-MOGA. J Hydroinform 11(3–4):225–236. doi:10.2166/hydro.2009.017 Giustolisi O, Simeone V (2006) Optimal design of artificial neural networks by a multi-objective strategy: groundwater level predictions. Hydrolog Sci J 51(3):502–523. doi:10.1623/hysj.51.3.502 Giustolisi O, Doglioni A, Savic DA, di Pierro F (2008) An evolutionary multi-objective strategy for the effective management of groundwater resources. Water Resour Res 44(1):W01403. doi:10.1029/ 2006WR005359 Goldberg DE (1989) Genetic algorithms in search, optimization, and machine learning. Addison-Wesley Longman Publishing Co., Inc., Boston, p 432 Gröchenig K (2001) Foundations of time-frequency analysis. Birkhäuser, Berlin, p 359 Haykin S (1999) Neural networks: a comprehensive foundation, 2nd edn. Prentice-Hall Inc., Upper Saddle River, p 842 Javadi AA, Rezania M (2009) Applications of artificial intelligence and data mining techniques in soil modelling. Geomech Eng; Int J 1(1):53–74. doi:10.12989/gae.2009.1.1.053 Javadi AA, Rezania M, Nezhad MM (2006) Evaluation of liquefaction induced lateral displacements using genetic programming. Comput Geotech 33(4–5):222–233. doi:10.1016/j.compgeo.2006.05.001 Kalbermatten M, Van De Ville D, Turberg P, Tuia D, Joost S (2012) Multiscale analysis of geomorphological and geological features in high resolution digital elevation models using the wavelet transform. Geomorphology 138(1):352–363. doi:10.1016/j.geomorph. 2011.09.023 Kawabata D, Bandibas DJ (2009) Landslide susceptibility mapping using geological data, a DEM from ASTER images and an artificial neural network (ANN). Geomorphology 113(2009):97–109. doi:10. 1016/j.geomorph.2009.06.006 Koza JR (1992) Genetic programming: on the programming of computers by natural selection. MIT Press, Cambridge, p 840 Kuo YL, Jaksa MB, Lyamin AV, Kaggwa WS (2009) ANN-based model for predicting the bearing capacity of strip footing on multilayered cohesive soil. Comput Geotech 36(3):503–516. doi:10. 1016/j.compgeo.2008.07.002 Ljung L (1999) System identification: theory for the user, 2nd edn. Prentice-Hall Inc., UpperSaddle River, p 672 Rezania M, Javadi AA, Giustolisi O (2008) An evolutionary-based data mining technique for assessment of civil engineering systems. J Eng Comput 25(6):500–517. doi:10.1108/02644400810891526 Shahin MA, Jaksa MB, Maier HR (2009) Recent advances and future challenges for artificial neural systems in geotechnical engineering applications. Adv Artif Neural Syst 308239:9. doi:10.1155/2009/ 308239