Landslide susceptibility mapping using a modified

43 downloads 0 Views 6MB Size Report
file using the second degree polynomial method of Zevenbergen and Thorne .... the class i of the variable j, f, the landslide probability within the whole dataset. ...... planning: The 'Oltre Po Pavese' Case History (Regione Lombardia – Italy).
Original Paper Landslides DOI 10.1007/s10346-015-0565-6 Received: 8 September 2014 Accepted: 12 February 2015 © Springer-Verlag Berlin Heidelberg 2015

Paraskevas Tsangaratos I Ioanna Ilia

Landslide susceptibility mapping using a modified decision tree classifier in the Xanthi Perfection, Greece

Abstract The objective of this study was to validate the outcomes of a modified decision tree classifier by comparing the produced landslide susceptibility map and the actual landslide occurrence, in an area of intensive landslide manifestation, in Xanthi Perfection, Greece. The values that concerned eight landslide conditioning factors for 163 landslides and 163 non-landslide locations were extracted by using advanced spatial GIS functions. Lithological units, elevation, slope angle, slope aspect, distance from tectonic features, distance from hydrographic network, distance from geological boundaries and distance from road network were among the eight landslide conditioning factors that were included in the landslide database used in the training phase. In the present study, landslide and non-landslide locations were randomly divided into two subsets: 80 % of the data (260 instances) were used for training and 20 % of the data (66 instances) for validating the developed classifier. The outcome of the decision tree classifier was a set of rules that expressed the relationship between landslide conditioning factors and the actual landslide occurrence. The landslide susceptibility belief values were obtained by applying a statistical method, the certainty factor method, and by measuring the belief in each rule that the decision tree classifier produced, transforming the discrete type of result into a continuous value that enabled the generation of a landslide susceptibility belief map. In total, four landslide susceptibility maps were produced using the certainty factor method, the Iterative Dichotomizer version 3 algorithm, the J48 algorithm and the modified Iterative Dichotomizer version 3 model in order to evaluate the performance of the developed classifier. The validation results showed that area under the ROC curves for the models varied from 0.7936 to 0.8397 for success rate curve and 0.7766 to 0.8035 for prediction rate curves, respectively. The success rate and prediction curves showed that the modified Iterative Dichotomizer version 3 model had a slightly higher performance with 0.8397 and 0.8035, respectively. From the outcomes of the study, it was induced that the developed modified decision tree classifier could be efficiently used for landslide susceptibility analysis and in general might be used for classification and estimation purposes in spatial predictive models. Keywords Landslide susceptibility . Decision tree . Certainty factor . Greece Introduction Landslides are considered as one of the most frequent and disastrous natural hazards worldwide (Schuster 1996; Glade et al. 2005). Already since the early 1990s, there was a keen interest in the international scientific community for the study of landslide phenomena, mainly because of the growing interest in the socioeconomic impact and the growing pressure in terms of development and urbanization of the environment (Aleotti and Chowdhury 1999). The proclamation of the decade 1990–2000 as the Decade for Disaster Reduction (International Decade for Natural Disaster

Reduction (IDNDR)) by the United Nations had a positive impact in encouraging the scientific community and all related communities to find solutions to the problem of natural disasters. The effort was focused on developing effective strategies for predicting and mitigating the effects of natural hazards. In the last two decades, the development of GIS data-processing techniques and the configuration of more advanced qualitative and quantitative techniques have allowed the production of numerous studies concerning landslide risk, hazards and susceptibility analysis (Van Westen et al. 2006; Fell et al. 2008). Assuming that landslides will occur in the future in areas that share the same conditions that produced them in the past, susceptibility assessments can be used to predict the geographical location of future landslides (Guzzetti et al. 1999, 2005, 2006). On this base, susceptibility zoning could be thought as a process that provides the spatial distribution and rating of the terrain units according to their propensity to produce landslides. Specifically, it refers to the process that provides highly valued knowledge that is influenced by topographical, geological-geotechnical, environmental and anthropogenic factors (Fell et al. 2008). Among the most used methods for the prediction of landslide susceptibility are those based on weighting the landslide-related factors using statistical and probabilistic techniques (Guzzetti et al. 1999; Lee and Min 2001; Lee and Pradhan 2007; Nandi and Shakoor 2010; Cervi et al. 2010; Pourghasemi et al. 2012a, b, c; Tien Bui et al. 2013; Sabatakakis et al. 2013; Tsangaratos et al. 2014; Pourghasemi et al. 2014). Specifically, numerous studies can be found that use bivariate statistical models (Thiery et al. 2007; Magliulo et al. 2008; Yilmaz et al. 2012), multivariate models that implement discriminant analysis (Lee et al. 2008) or linear and logistic regression (Dai and Lee 2003; Ayalew and Yamagishi 2005; Duman et al. 2006; Akgun 2012; Pourghasemi et al. 2013; Kavzoglu et al. 2014), frequency ratio (Lee and Sambath 2006; Lee and Pradhan 2006, 2007; Yilmaz 2010; Akinci et al. 2011), certainty factor approach (Binaghi et al. 1998; Lan et al. 2004; Sujatha et al. 2012; Devkota et al. 2013), Dempster-Shafer and weight of evidence models (Lee et al. 2004a; Tangestani 2009; Ilia et al. 2010; Park 2010; Neuhauser et al. 2012; Mohammady et al. 2012; Kouli et al. 2014). However, as stated extensively in the international scientific bibliography, the required spatial analyses during a landslide assessment deal with the manipulation of huge in volume geospatial databases where the traditional spatial analytical techniques, statistical and probabilistic methods cannot easily discover new and unexpected patterns, trends and relationships (Miller and Han 2001). An area of research that promises to overcome such difficulties is found in the domain of soft computing (SC). Specifically, SC has been applied for landslide susceptibility, hazard and risk evaluation. SC methods enable artificial neural networks (Lee et al. 2003; Ermini et al. 2005; Gomez and Kavzoglu 2005; Ferentinou and Sakellariou 2007; Caniani et al. 2008 ; Melchiore et al. 2008; Lee et al. 2004b; Marjanovic et al. 2009; Pradhan and Lee 2009, 2010b; Landslides

Original Paper Sezer et al. 2011; Tien Bui et al. 2012a; Zare et al. 2013; Tsangaratos and Benardos 2014), decision tree models (Flentje et al. 2007; Saito et al. 2009; Wan 2009; Nefeslioglu et al. 2010; Yeon et al. 2010; Tien Bui et al. 2012b; Tsangaratos 2012; Felicisimo et al. 2013; Pradhan 2013), fuzzy Logic approach (Juang et al. 1992; Binaghi et al. 1998; Ercanoglu and Gokceoglu 2002, 2004; Pradhan et al. 2009; Akgun et al. 2012; Pourghasemi et al. 2012a, b, c; Thiery et al. 2014), and neuro - fuzzy (Elias and Bandis 2000; Kanungo et al. 2006; Pradhan et al. 2010; Vahidnia et al. 2010; Oh and Pradhan 2011; Sezer et al. 2011; Tien Bui et al. 2012c; Pradhan 2013). Saito et al. (2009) utilized a decision tree (DT) model for landslide susceptibility mapping in the Akaishi Mountains, Japan, and concluded that the decision tree model produced results with appropriate accuracy for estimating the probabilities of future landslides. Yeon et al. (2010) applied a decision tree, using the Quinlan’s algorithm C4.5, to produce a landslide susceptibility map in Injae, Korea. After the tree construction process, leaf nodes were relatively evaluated by the m-branch smoothing method, a method utilized for representing susceptibility. The accuracy of a two-fold cross-validation was estimated to be 86.06 %, while the accuracy using all known data was estimated to be 89.26 %. Tien Bui et al. (2012b) investigated and compared the results of three data mining approaches, the support vector machines, decision tree and naive Bayes models for spatial prediction of landslide hazards in the Hoa Binh province, Vietnam. Although the DT was found to have the lowest prediction capability, it had three strong advantages: (a) easy to construct, (b) the resulting models could be easily interpreted and (c) the models provided clear information on the relative importance of the input factors. Pradhan (2013) also compared the prediction performances of three different approaches, decision tree, support vector machine and adaptive neuro-fuzzy inference system for landslide susceptibility mapping at Penang Hill area, Malaysia. For the decision tree model, the author selected the CHAID method for its accordance between its properties. According to the results, the DT model had a slightly higher prediction performance when the full ranges of factors were used. The main difference between the proposed methodology and the previously mentioned studies is that the modified DT grows in full extent, and to minimize the possibility of overfitting, a phenomenon that is responsible for poor prediction power, the certainty factor method, is implemented. The certainty factor method is a statistical method, for managing uncertainty in rule-based systems. In our implementation, the certainty factor value (CFv) for each class of each landslide-related variable is calculated. By combining the CFv pairwise, according to specific integration rules, the belief values of each decision rule that has been produced by the DT classifier are calculated. The final CFv expresses the landslide susceptibility belief value for each grid cell, and a landslide susceptibility belief map was produced with the use of GIS technology. To evaluate the performance of the developed methodology, the produced susceptibility map was compared with the outcomes of a DT model that follows the J48 algorithm. The computation process was carried out using WEKA ver.3.6.6 (Hall et al. 2009) for the J48 algorithm and the ID3 algorithm and Visual Basic ver.6.0 (Bradley and Millspaugh 2001) for the modified ID3 algorithm, while ArcGIS 9.3 (Ormsby et al. 2008) was used for compiling and analyzing the data and also for producing the landslide susceptibility maps. Landslides

The main objective of the present paper was to provide a methodological approach to predict the landslide susceptibility in an area of interest that is based on methods derived from the SC domain, such as those based on DT algorithms. The use of DT algorithms has received considerable attention in recent years in various geo-engineering applications, mainly due to their capability of solving problems and representing the solution in a rulebased format which can be easily understood and interpreted by humans. Study area and available data Study area The study area was the wider area of Perfection of Xanthi, between longitudes 560.000 and 600.000 and latitudes 4.584.650 and 4.557.250 using GGRS 1987 as the reference coordinate system, which covers approximately 800 km2. The area is bounded to the north by the Greek-Bulgarian borders and extended to the south up to the Neogene Thrace basin (Fig. 1). The area is characterized as highly mountainous, with elevation values ranging between 30 and 1.800 m. The geomorphological pattern includes gullies and small branches, a typical morphological pattern that is found in areas where impermeable rocks cover the surface. The mean annual rainfall of the area is approximately 1.100 mm, which mainly falls during the winter period (Ilias et al. 2000). The vegetation of the area is generally characterized as extensive, covering 60 % of land by broadleaf forests. The wider area exhibits moderate to low seismic activity. Data To extract the landslide parameters, a 1:50.000-scale topographic map and 1:25.000-scale engineering geological map were used. The ground conditions and the morphological settings of the area could be considered as the primary factors for numerous shallow and relatively small landslides triggered by physical (e.g. intense, short period rainfall) or man-made processes. The landslide conditioning factors involved a variety of input layers, some being directly digitized from the original thematic maps, while others were derived from spatial GIS calculations. All landslide conditioning factors were converted into a 20×20-m float-type raster file. These input-raster layers included: a lithological unit layer (LUL), a distance from geological boundaries layer (DGBL), a distance from tectonic characteristic layer (DTL), a distance from hydrographic network layer (DHNL), a distance from road network layer (DRNL), a slope angle layer (SANL), a slope aspect layer (SASL) and an elevation layer (EL). The most essential stage of the implementation is the process of dividing each input, raster layer into classes. Expert knowledge and statistical analysis could be useful means in defining the classes. Specifically, statistical measures in descriptive analysis that are commonly used to describe a data set are measures of central tendency and measures of variability or dispersion. The mean, median, mode, the standard deviation, the range values of the variables, kurtosis and skewness were statistical measures that were estimated and helped to define the classes. The lithological data were obtained in the form of a digital vector map that had the same resolution and projection as those of the digital elevation model. The wider research area consisted of marbles and schists formations, magmatites, gneisses,

Fig. 1 The study area

amphibolites and ultra mafic rocks, of Palaeozoic age. Also, large areas are covered by tertiary mollasic and igneous formations (Liati and Seidel 1996). In the present study, the lithological unit layer was reclassified into five classes of different landslide susceptibility levels, namely, class A, formations of very low susceptible values (marbles, gneiss-leptinite), class B, formations of low susceptible values (tertiary conglomerate, schist-gneiss, dacite-andesite, gabbro-diorite, metagranodiorite), class C, formations of medium susceptible values (amphibolite, gneiss with intrusions of white coarse crystalline marble), class D, formations of highly susceptible values (granodiorite, ophiolite) and class E, formations of very high susceptible values (quaternary loose, flysch, migmatite and granite-gneiss) (Fig. 2a). The five landslide susceptibility levels were formed by combining our findings during the field survey and the geo-mechanical behaviour that has been described in previous studies contacted in the wider area (Ilias et al. 2000). As reported through the literature, the geological boundaries seem to have an influence on landslide occurrence (Kawabata and Bandibas 2009). It is believed that areas closer to geological boundaries have lower geo-mechanical values than the parent intact formation and as a consequence could be more susceptible to instability. In the present study, the Euclidean distance between each grid cell of the research area and the geological boundaries was calculated, and the areas were reclassified into three (3) classes A, B and C (Fig. 2b). The first class includes areas that are less than 200 m from geological boundaries, class B includes areas that have distance between 201 and 400 m, and finally class C, areas that are greater than 401 m. The range of boundaries for each class was based on the relative spatial distribution and the statistical indexes, like mean distance and standard deviation, of the landslide database. The tectonic characteristics, mainly faults, thrusts and overthrusts, were mapped based on field work and aerial photograph (Ilias et al. 2000). The research area was reclassified into three (3) categories, areas that cover zones that have distance less

than 250 m from tectonic features, areas that cover zones that have distance between 251 and 500 m and areas with distance greater than 501 m from tectonic characteristics (Fig. 2c). The range of boundaries for each class has also been based upon statistical analysis and the spatial distribution of the recorded landslides. It is common to appreciate that an area that is in proximity to hydrographic network is more likely to have higher water content than an area that is more distant from the network (Wan et al. 2010). This possibility may influence the stability of the area, since wet conditions affect the stability and the underground flow. In the present study, the values for each class were based upon expert knowledge and were attained by ArcGIS spatial module using the Euclidean distance between the sample grid cell and the nearest hydrographic network (Fig. 2d) producing four (4) classes (451 m). Road cuts are considered as sites of high instability induced by human intervention. According to Pradhan and Lee (2010a), a given road segment may act as a barrier, a net source, a net sink or a corridor for water flow, and it usually serves as a source of landslides. If a road is near a sample grid cell, the slope may present instability problems due to human disturbance (Fig. 2e). The layer was reclassified into four (4) classes, 601 m. The elevation of a surface is considered to be formed by the combined action of tectonic activity and weathering and erosion processes and is also related the action of the climatic conditions through a complex interactive influence (Dai et al. 2002). According to Pachauri and Pant (1992), altitude can be considered as a variable that indirectly contributes to the slope failure manifestation. In the present study, the elevation layer was classified into four (4) classes (801 m) based on expert knowledge and statistical analysis (Fig. 2f). The slope angle and the slope aspect were derived from DEM file using the second degree polynomial method of Zevenbergen and Thorne (1987) and were reclassified according to expert Landslides

Original Paper

Landslides

R Fig. 2

a Lithological units. b Distance from geological boundaries. c Distance from tectonic features. d Distance from hydrographic network. e Distance from road network. f Elevation. g Slope aspect. h Slope angle

knowledge and the local geological and geotechnical conditions. According to Rozos et al. (2006, 2008), in Greece, certain slope aspect is associated with increased snow concentration, with longer periods of freeze and thaw action and intensive erosion and weathering processes. For the purpose of the present study, the slope aspect layer was first classified into a 9 class layer (N-NE, NE-E, E-SE, SE-S, S-SW, SW-W, W-NW, NW-N and Flat) and then reclassified into four (4) classes with varying degree of landslide susceptibility (Fig. 2g). Class A 225°–270° (SW-W) low susceptibility, class B 45°–90° (NE-E) medium susceptibility, class C 90°–135° (E-SE), 270°–315° (W-NW) high susceptibility and class D 315°–45° (NW-N, N-NE) and 135°–225° (SE-S, S-SW). As for the slope angle layer, it was also reclassified into four (4) classes according to the local geological and geotechnical conditions (Fig. 2h). Class A (46°). The landslide inventory database was prepared in the form of a GIS geodatabase in which information about the location, features and abundance of 163 mapped landslides and 163 non-landslides locations were archived. The main landslide characteristics were described according to standard WP/WLI (1993) recommendations. Landslides in the research area were classified as rational (shallow) slide type and rapid earth—flows according to Cruden and Varnes Classification (1996) by analyzing airborne imagery, extensive field investigation and previous research studies (Ilias et al. 2000). The 163 non-landslide locations were identified though the use of a classifier that utilized a distance metric, the Mahalanobis distance, and provided areas of extremely low possibility of instability (Tsangaratos and Benardos 2014). From the contacted field survey, it was estimated that the largest landslide was approximately 3.200 m2 in size, while the smallest 285 m2 and that the average size of a typical landslide was about 1.950 m2. For the convenient of analysis, landslides are projected as areas that cover an area of 400 m2. In the case where the landslide is larger, more points were added in the inventory map. During the present study, it was also observed that slope instability phenomena are not only located in zones of previous landslide instability but also in zones where no activity has been reported. In most cases, the landslide activity affected urban and cultivated areas, linear infrastructures and civil engineering works. Decision tree and landslide susceptibility analysis A DT model appears with a hierarchical tree structure and is referred as a non-parametric method that is capable in identifying non-linear and non-additive relationships between input factors and targeting or predictive factors (Bell 1999; Jones et al. 2006). DT describes structural patterns in data, based on a set of rules (Witten and Frank 2005). The factors can be any type of factors from binary, nominal, ordinal and quantitative values, while the classes must be qualitative type (categorical or binary, or ordinal). For a given dataset of factors together with its classes, a DT produces a sequence of rules that can be used to recognize the class of unseen records (Murthy 1998). The structure or model and the possible consequence of decision tree can be explained in the form of a tree structure. The tree is composed of a root node, the top-most decision node which

corresponds to the best predictor variable, a set of internal nodes, decision nodes that have two or more branches and a set of terminal nodes, leaf nodes that represent a classification or a decision. Each node of the decision tree structure makes a binary decision that separates either one class or some of the classes from the remaining classes. The processing is carried out by moving down the tree until the terminal node is reached (Breiman et al. 1984). Several decision tree learning algorithms have been proposed, such as classification and regression trees (CART) (Breiman et al. 1984), Iterative Dichotomizer version 3 (ID3) (Quinlan 1986) and C4.5 (Quinlan 1993). They differ by the way they quantify the distinction and diversity criteria, but they all share a common hypothesis, that is, entities of the input dataset might be independent. The core algorithm for building a decision tree is called Interative Dichotomiser 3 (ID3) developed by Quinlan in the mid 1980s (Quinlan 1986). ID3 is a classical machine-learning algorithm for generating rules from decision tables, referred as an algorithm that employs a top-down procedure, and uses entropy and information gain as classification criteria in the search process (Quinlan 1986; Mitchell 1997). In information theory, a theory formulated by Shannon (1948), entropy is defined as a probability-based measure used to calculate the amount of uncertainty. If the impurity or randomness of a collection with respect to the target classifier is high, then the expected entropy is high, while if there is no randomness, complete uniformity-homogenous with respect to the target classifier then the entropy is zero. It is applied hierarchically at each level of the decision tree to aggregate the diversities exhibited by the supporting attributes at the top level of the hierarchy (Li and Claramunt 2006). Named after Boltzmann’s H-theorem, Shannon (1948) denoted the Entropy H as H ðDÞ ¼ −

n X

ð1Þ

Pi log2 Pi

i¼1

where n is the number of classes in the domain of the data set D; Pi the proportion of the number of class i elements over the total number of data set D. The information gain is used to measure the expected reduction in entropy at the immediate lower level of the hierarchy, where datasets are refined using another supporting attribute (Li and Claramunt 2006). Information gain can be written as GainðD; AÞ ¼ H ðDÞ−

X

jDv j H ðDv Þ jDj v∈ValuesðAÞ

ð2Þ

where values (A) provides the domain of supporting attribute (A); Dv denotes the subset of D where the corresponding value A is v for each record and |Dv| and |D| denote the cardinality of Dv and D, respectively. A typical tree generation process of ID3 algorithm is as follows: first, a condition attribute with minimal entropy of maximum information gain is chosen from a decision table. Then, this variable is used to divide the decision table into different classes. The above two steps are repeated until each record of the decision table only belongs to one of these classes. Once built, a decision tree can Landslides

Original Paper be used to classify data by starting at the root node of the tree and moving through it until a terminal node is encountered. Each terminal node provides a decision rule, or outcome, that allows us to make a predictive statement about the data. The next step is to choose variable with the highest information gain as the Bbest^ decision node. Best is defined by how well the variable splits the set into homogeneous subsets that have the same value of the target variable (Bin et al. 2004). In the past decade, DT was considered to be an unsuitable method to apply in spatial event prediction and landslide susceptibility analysis. This could be explained by the fact that DT models normally require a discrete type of output class, whereas when performing susceptibility analysis, we need the results to be represented as a continuous value (Yeon et al. 2010). Another characteristic that made the application of DT in landslide assessments problematic was the fact that when trying to estimate the ratio between landslide and non-landslide classes, the result showed the database to be highly imbalanced since landslides are represented in grid raster spatial data and are composed of a small numbers of pixels. Consequently, in this case, landslides were represented as a minority event class and were treated as noise, data with poor information (Yeon et al. 2010). A third point in which the implementation of DT could be proved to be inaccurate is the fact that the user has to confront a significant question about the optimal size of the final tree. A DT that is too large risks overfitting the training data and poorly generalizing to new testing samples. On the other hand, a small DT might not capture important structural information. However, it is hard to tell when a DT algorithm should stop training since it is not possible to evaluate if the addition of a single extra node will decrease the error of classification. On the other hand, DT models can incorporate both categorical and continuous factors without strict assumptions with respect to the distribution of the data (Bou Kheir et al. 2010). The outcome of applying a DT model is a set of rules that consists of an BAND^ combination of nodes from the root to the leaf. When a rule is interpreted, the use of all combinations of the node is needed, and so, the relationship among causal factors is implicitly included in the rule. More specifically, the implementation of a DT model made it possible to explain the relationship among the causal factors that contribute to the landslide phenomena, a process that other methods are not capable of doing (e.g. artificial neural networks). The objective of this study was to implement a method for landslide susceptibility analysis using a modified DT classifier, which followed the ID3 algorithm. A well-balanced landslide dataset with equal number of incidence that represented states of stability/instability was created and used to develop a full-grown decision tree. The leaf node ranking method for representing susceptibility values was achieved by applying a statistical method, the certainty factor (CF) method. The basic principles of the CF method were first introduced in MYCIN, an expert system for the diagnosis and therapy of blood infections and meningitis (Shortliffe and Buchanan 1975). Among the commonly used GIS analysis models for landslide hazard that follow the statistical approach, the CF model has been investigated by numerous researchers (Chung and Fabbri 1993, 1998; Binaghi et al. 1998; Luzi and Pergalani 1999; Lan et al. 2004; Gokceolu et al. 2005; Kanungo et al. 2011; Pourghasemi et al. 2012a, b, c). The CF method is one of the possible proposed favorability functions to Landslides

handle the problem of combination of different data layers and the heterogeneity and uncertainty of the input data (Lan et al. 2004). The CF, defined as a function of probability, was originally proposed by Shortliffe and Buchanan (1975) and later modified by Heckerman (1986) according to equation: 8 f i j −f > > > ; if f i j ≥ f > < f i j ð1−f Þ ð3Þ C Fi j ¼ f i j −f > >   < f ; if f > i j > : f 1− f i j where CFij is the certainty factor that responds to a certain class i of variable j, fij, the landslide probability within the class i of the variable j, f, the landslide probability within the whole dataset. The probabilities are derived following a GIS procedure that overlays each landslide-related variable with the landslide inventory layer and calculates the landslide occurrence frequency. The positive outcome as mentioned above represents an increasing certainty in causality, while negative values correspond to the opposite, where the presence of the rule tends to disfavour the occurrence of a landslide. A value close to zero means that it is difficult to give any indication about the causality. Next, the CF values of the causative factor are pairwise combined using the CF combination rule. A combination of two CF values, x and y, from two different layers of information is a CF value z obtained as follows (Chung and Fabbri 1993; Binaghi et al. 1998; Luzi and Pergalani 1999): 8 x þ y−xy x; y ≥ 0 > > < xþy ; x; y opposite sign ð4Þ z ¼ 1−minðjxj; jyjÞ > > : x þ y þ xy x ; y 〈0 The final outcome measures the belief in the observed outcome. The belief of a rule ranged between a value of −1, which expressed decreasing certainty in landslide occurrence, and a value of +1 that expressed increasing certainty in landslide occurrence. The CF method calculated the certainty factor value (CFv) for each class of each landslide-related variable, and after combining the CFv pairwise according to specific integration rules, it measured the belief of each decision rule that was produced by the DT classifier. The final CFv expresses the landslide susceptibility belief value for each grid cell, and a landslide susceptibility belief map was produced with the use of GIS technology. Procedures and methods The developed framework could be separated into a five-phase procedure: (a) the data preparation phase, (b) the phase of estimating the CF method values, (c) the phase of constructing the rule database though the application of a DT classifier and estimating the belief values of each rule, (d) the phase of reclassifying and constructing the landslide susceptibility belief map and (e) the performance and validation phase (Fig. 3).

The preparation phase The first phase involves a series of actions that assist in preparing the available data for further analysis. It begins with gathering the

objective of many information processing domains, particularly when applied to the analysis of financial, economical, health, environmental and demographic phenomena where the data are potentially large, complex and not easily observable (Li and Claramunt 2006). In general, classification refers to finding rules to assign data items into pre-existing classes (Miller and Han 2001). In this phase, the user creates a relational database using the dbf file that was made in the preparation phase. The objective is to initialize the ID3 algorithm in order to create a set of rules that assist into classifying unknown areas into the two distinctive states of stability, stable or unstable. The outcome of this phase is a single txt file that contains a set of rules that have the following format: ½A; B; C1 ; C2 ; …; Cn ; D

Fig. 3 Flowchart of developed methodology

data and creating a geodatabase, the common data storage and management framework for ArcGIS. In the next step of the preparation phase, the user needs to reclassify the layers, according to expert knowledge into classes. This step involves the utilization of the geoprocessing tool BExtract values from points^, found in the Spatial Analyst Toolbox. The tool extracts the cell values of a raster based on a set of point features and records the values in the attribute table of an output feature class. The process is repeated for each layer, and finally, a singe shapefile has been created including for each landslide and non-landslide location the cell values for each layer. The preparation phase ends with the action of exporting the shapefile into a dbf format file and also exporting each reclassified layer into ASCII format files for further analysis. In landslide modelling, the landslide data are split into two parts, training and validation datasets. In the present study, the two datasets, landslide and non-landslide points were randomly divided into two subsets: 80 % of the data (260 instances) were used for training and 20 % of the data (66 instances) for validating the developed classifier. Estimating the certainty factor values The second phase involves the estimation of the CFv for each class of each layer, according to the CF method (Eq. 3). The values are stored in the geodatabase in order to assist in calculating the leaf node ranking that corresponds to the landslide susceptibility belief values. Constructing the rule database and estimating the belief values of the rule database The third phase involves techniques that could be found in classification and cluster analysis, where traditional classification problems present a set of data to be separated into two or more different groups. Classification of multi-attribute data is an

ð5Þ

where A = the id of rule, B = the number of the leaf, Ci = the number of the class (i=B) and D = (0,1) state of stability. The next step is to utilize the CF method combination rule (Eq. 4), in order to measure the belief of each rule that has been produced by the DT classifier. Table 1 shows the description and range of belief values that the produced rules may be assigned. The belief of a rule range between a value of −1, which expressed decreasing certainty in landslide occurrence, and a value of +1 that expressed increasing certainty in landslide occurrence. If the estimated belief value range between −0.05 and +0.05, then there is much quantity of uncertainty in the rule. Those rules that are characterized by uncertainty are excluded from the next phase. The classification and landslide susceptibility belief mapping phase The next phase was to combine all the weighted rules (Eq. 6) and produce the landslide susceptibility map. Each grid of the final landslide susceptibility belief map obtained a value that ranged between −1 and +1, whereas −1 corresponds to the most stable conditions and +1 corresponds to the most critical value of slope instability. 1X rule j n j¼1 n

LSBgrdi ¼

ð6Þ

where grdi the ith grid cell, rulej the jth rule. In order to proceed easily to the interpretation of the cartographic product, a zonation procedure was enabled by dividing the belief values into different susceptibility classes. The result of the Table 1 Certainty classes (Luzi and Pergalani 1999; Lan et al. 2004)

Description

Range values

Very low certainty–stable conditions

−1.0 to −0.5

Low certainty–moderate stable conditions

−0.5 to −0.05

Uncertain

−0.05 to +0.05

High certainty–moderate unstable conditions

+0.05 to +0.5

Very high certainty–highly unstable conditions

+0.5 to +1.0

Landslides

Original Paper reclassification procedure was to provide five classes of susceptibility, namely, very low, low, medium, high and very high susceptibility, using the Jenks natural breaks classification method (Jenks 1967; Ayalew and Yamagishi 2005; Akgun et al. 2012; Jaafari et al. 2014). The performance evaluation and validation phase The reliability of the developed model was evaluated by estimating the Cohen kappa index (k) (Cohen 1960; Hoehler 2000; Guzzetti et al. 2006; Tien Bui et al. 2012d). The Cohen kappa index was useful in order to estimate the model classification power compared to chance selection. In our case, a k value close to 0 indicates that no agreement exists between the landslide model and reality whereas a k value close to 1 indicates a perfect agreement (Landis and Koch 1977) (Table 2). k¼

Pc −Pexp ; 1−Pexp

ð7Þ

where Pc ¼

TP þ TN TP þ TN þ FP þ FN

Pexp ¼

ð8Þ

ððTP þ FN Þ*ðTP þ FPÞ þ ð FP þ TN Þ*ð FN þ TN ÞÞ sqrt ðTP þ TN þ FP þ FN Þ

ð9Þ

TP stands for true positive, when the outcome from a prediction is p and the actual value is also p. If the actual value is n, then it is said to be a false positive (FP). Conversely, a true negative (TN) has occurred when both the prediction outcome and the actual value are n, and false negative (FN) is when the prediction outcome is n while the actual value is p (Fawcett 2004). The sum of the four entries tp, tn, fp, fn equals the number of the test examples. The next procedure is to validate how well the modified DT classifier had classified the research area according to the susceptibility classes produced and the cumulative percentage of the observed landslide occurrence. An ideal landslide susceptibility map must have an increasing landslide density ratio when moving from low susceptible classes to high susceptible classes and satisfy two spatial effective rules (Can et al. 2005; Pradhan and Lee 2010b): the identified landslide areas should occupy areas of high susceptibility class and the high susceptibility class should cover small Table 2 Model classification power k values (Landis and Koch 1977)

k value

Agreement

0.8–1.0

Almost perfect agreement

0.6–0.8

Substantial agreement

0.4–0.6

Moderate agreement

0.2–0.4

Fair agreement

0.2–0.0

Slight agreement

Landslides

extent areas. The validation process was performed by comparing the produced landslide susceptibility map with the actual landslide locations using the success rate and the prediction rate methods (Chung and Fabbri 2003). The success rate curve assesses how many incidences, from the training dataset, are successfully captured by the susceptibility map and represents a measure for model efficiency. In order to estimate the predictive rate, the validation dataset is used. The prediction rate curve describes the percentage of correctly classified landslide incidence. It can be assessed how Bunknown^ landslide could be predicted. Analytical results—preparing the database and mapping the landslide susceptibility belief map Table 3 illustrates the estimated CF values for each class of each layer, according to the CF method (Eq. 3). Class D of the lithological unit layer, which corresponds to geological formations of highly susceptible values such as granodiorite and ophiolite, have the highest CF values (+0.4228). The next class with high value corresponds to the elevation layer and specifies the class with elevations between 401 and 600 m (+0.3418). According to the results, the slope aspect layer with classes that correspond to 45°–90°, 90°–135°, 270°–315°, 315°–45° and 135°–225° shows values near zero, indicating uncertainty in the prediction. The outcomes of the certainty factor method indicate that the class that corresponds with elevations greater that 801 m shows the most lowest CF value, −0.8745, followed by class A (marbles, gneiss-leptinite) with value equal to −0.7450 and the class that corresponds to distance from road network greater than 601 m (−0.6624). These three classes represent the most stable conditions. The next phase is to apply the ID3 algorithm in order to create the set of rules that assist into classifying unknown areas into two distinctive states of stability, stable or unstable. The constructed rule database was applied on training dataset and then validated into the testing dataset. The ID3 algorithm produced a total of fifty rules, twenty two of which indicated stable conditions and the remaining twenty eight indicated unstable conditions. The node rules were logical statements of the following form: Rule 1: If [EL] is CLASS A AND [SASL] is CLASS A AND [SANL] is CLASS A THEN [STABLE]

As proposed by the methodology, the next step was to estimate the belief of each decision rule, by combing the CF values pairwise according to the specific integration rules (Eq. 4). The outcome of this calculation was each rule to be assigned by a certainty belief value that corresponded to the landslide susceptibility, ranging from −1 that expresses absolute certainty of stability to +1 that expresses absolute certainty of instability. The first decision (rule 1) indicates that when an area is located having an elevation less than 400 m and also when the slope aspect is between 225° and 270° and slope angle less 15°, then the area is characterized as stable. To evaluate the belief of this statement, we calculate the CF value for the first two classes found to be +0.010124. According to Table 1 classification scheme, the result is uncertain. As a result, the rule should be excluded from the analysis. Table 4 shows the produced set of rules which were applied along with the calculated CF values of each rule. Ranking the set of rules according to their CF values, rule 14 and rule 37 had the

Table 3 The CF values

Thematic layer Lithological units—LUL

Elevation— EL

Slope angle—SANL

Slope aspect—SASL

Distance from tectonic features—DTL

Distance from hydrographic network—DHNL

Distance from geological boundaries—DGBL

Distance from road network—DRNL

Class

Pixels in class

Landslides

CF

Class A

18584

1

−0.7450

Class B

265991

25

−0.5546

Class C

155050

44

+0.2563

Class D

24614

9

+0.4228

Class E

308186

84

+0.2257

801 m

226588

6

−0.8745

46°

132634

17

−0.3926

225°–270°

101361

15

−0.2987

45°–90°

94834

20

−0.0006

90°–135°, 270°–315°

166562

34

−0.0327

315°–45°, 135°–225°

409668

94

+0.0803

501 m

284084

42

−0.2994

451 m

399529

59

−0.3002

401 m

261831

37

−0.3303

601 m

112307

8

−0.6624

highest values. Rule 14 implies that areas having elevation ranging between 401 and 600 m with distance from road network less than 200 m, also with distance from tectonic features less than 250 m and covered by formations of very high susceptible values (quaternary loose, flysch, migmatite, granite-gneiss) are areas highly unstable. Rule 37 implies that areas located elevation ranging between 401 and 600 m with distance from road network less than 200 m and covered by formations of highly susceptible values (granodiorite, ophiolite) are also areas highly unstable. According to the proposed classification scheme of Table 1, ten (10) of the produced rules fall within the class Buncertain^ since the calculated values are in the range of −0.05 to +0.05.

The next phase was to combine all the weighted rules and produce the landslide susceptibility map, according to Eq. 6 (Fig. 4). The landslide susceptibility belief values were reclassified, using the Jenks natural break classification method into five classes of susceptibility, namely, very low, low, medium, high and very high susceptibility (VLS, LS, MS, HS and VHS, respectively). From the visual analysis of the final landslide susceptibility map, it illustrates a complex spatial pattern that does not follow the spatial distribution of any of the landslide conditioning variable. The estimated landslide susceptibility belief value is a product of the combined influence of each of the eight factors. Table 5 shows the relative landslide frequency as estimated for each Landslides

Original Paper Table 4 The produced set of rules and the calculated CF values

Rules id

Description

Calculated CF values

If [EL] is CLASS A AND [SASL] is CLASS A AND [SANL] is CLASS A THEN [STABLE]

Uncertain

2

If [EL] is CLASS A AND [SASL] is CLASS A AND [SANL] is CLASS B AND [LUL] is CLASS B THEN [STABLE]

Uncertain

3

If [EL] is CLASS A AND [SASL] is CLASS A AND [SANL] is CLASS B AND [LUL] is CLASS E THEN [UNSTABLE]

Uncertain

4

If [EL] is CLASS A AND [SASL] is CLASS A AND [SANL] is CLASS C THEN [STABLE]

Uncertain

5

If [EL] is CLASS A AND [SASL] is CLASS A AND [SANL] is CLASS D THEN [STABLE]

Uncertain

6

If [EL] is CLASS A AND [SASL] is CLASS B AND [LUL] is CLASS A THEN [STABLE]

−0.63289

7

If [EL] is CLASS A AND [SASL] is CLASS B AND [LUL] is CLASS C THEN [UNSTABLE]

+0.48340

8

If [EL] is CLASS A AND [SASL] is CLASS C THEN [UNSTABLE]

+0.28233

9

If [EL] is CLASS A AND [SASL] is CLASS D AND [SANL] is CLASS A THEN [UNSTABLE]

+0.48481

10

If [EL] is CLASS A AND [SASL] is CLASS D AND [SANL] is CLASS B THEN [UNSTABLE]

+0.43826

11

If [EL] is CLASS A AND [SASL] is CLASS D AND [SANL] is CLASS C THEN [UNSTABLE]

+0.34536

12

If [EL] is CLASS A AND [SASL] is CLASS D AND [SANL] is CLASS D THEN [STABLE]

Uncertain

13

If [EL] is CLASS B AND [DRNL] is CLASS A AND [DTL] is CLASS A AND [LUL] is CLASS B THEN [STABLE]

−0.71199

14

If [EL] is CLASS B AND [DRNL] is CLASS A AND [DTL] is CLASS A AND [LUL] is CLASS E THEN [UNSTABLE]

+0.65759

15

If [EL] is CLASS B AND [DRNL] is CLASS A AND [DTL] is CLASS B THEN [UNSTABLE]

+0.58154

16

If [EL] is CLASS B AND [DRNL] is CLASS A AND [DTL] is CLASS C AND [LUL] is CLASS B THEN [STABLE]

−0.38718

1

17

If [EL] is CLASS B AND [DRNL] is CLASS A AND [DTL] is CLASS C AND [LUL] is CLASS C THEN [STABLE]

+0.45944

18

If [EL] is CLASS B AND [DRNL] is CLASS A AND [DTL] is CLASS C AND [LUL] is CLASS D THEN [UNSTABLE]

−0.90179

19

If [EL] is CLASS B AND [DRNL] is CLASS A AND [DTL] is CLASS C AND [LUL] is CLASS D THEN [STABLE]

+0.43720

20

If [EL] is CLASS B AND [DRNL] is CLASS B AND [SANL] is CLASS B THEN [UNSTABLE]

+0.33578

21

If [EL] is CLASS B AND [DRNL] is CLASS B AND [SANL] is CLASS C THEN [UNSTABLE]

+0.23776

22

If [EL] is CLASS B AND [DRNL] is CLASS B AND [SANL] is CLASS D AND [LUL] is CLASS B THEN [STABLE]

−0.64159

23

If [EL] is CLASS B AND [DRNL] is CLASS B AND [SANL] is CLASS D AND [LUL] is CLASS E THEN [UNSTABLE]

Uncertain

24

If [EL] is CLASS B AND [DRNL] is CLASS C AND [SASL] is CLASS A AND [LUL] is CLASS B THEN [STABLE]

−0.62862

25

If [EL] is CLASS B AND [DRNL] is CLASS C AND [SASL] is CLASS A AND [LUL] is CLASS C THEN [UNSTABLE]

+0.10800

26

If [EL] is CLASS B AND [DRNL] is CLASS C AND [SASL] is CLASS A AND [LUL] is CLASS E THEN [UNSTABLE]

+0.30766

27

If [EL] is CLASS B AND [DRNL] is CLASS C AND [SASL] is CLASS B THEN [UNSTABLE]

+0.15829

28

If [EL] is CLASS B AND [DRNL] is CLASS C AND [SASL] is CLASS C THEN [UNSTABLE]

+0.13039

29

If [EL] is CLASS B AND [DRNL] is CLASS C AND [SASL] is CLASS D THEN [UNSTABLE]

+0.22635

30

If [EL] is CLASS B AND [DRNL] is CLASS D AND [SASL] is CLASS A AND [LUL] is CLASS B THEN [STABLE]

−0.83977

31

If [EL] is CLASS B AND [DRNL] is CLASS D AND [SASL] is CLASS A AND [LUL] is CLASS C THEN [STABLE]

−0.5163

32

If [EL] is CLASS B AND [DRNL] is CLASS D AND [SASL] is CLASS A AND [LUL] is CLASS E THEN [UNSTABLE]

−0.53541

33

If [EL] is CLASS B AND [DRNL] is CLASS D AND [SASL] is CLASS B THEN [UNSTABLE]

Uncertain

34

If [EL] is CLASS B AND [DRNL] is CLASS D AND [SASL] is CLASS C THEN [UNSTABLE]

Uncertain

35

If [EL] is CLASS B AND [DRNL] is CLASS D AND [SASL] is CLASS D THEN [UNSTABLE]

Uncertain

36

If [EL] is CLASS C AND [DRNL] is CLASS A AND [LUL] is CLASS B THEN [STABLE]

−0.33243

37

If [EL] is CLASS C AND [DRNL] is CLASS A AND [LUL] is CLASS D THEN [UNSTABLE]

+0.61484

38

If [EL] is CLASS C AND [DRNL] is CLASS B AND [LUL] is CLASS B THEN [STABLE]

−0.54966

39

If [EL] is CLASS C AND [DRNL] is CLASS B AND [LUL] is CLASS D THEN [UNSTABLE]

+0.42906

40

If [EL] is CLASS C AND [DRNL] is CLASS C AND [DGBL] is CLASS A THEN [UNSTABLE]

+0.11948

41

If [EL] is CLASS C AND [DRNL] is CLASS C AND [DGBL] is CLASS B THEN [STABLE]

−0.11081

42

If [EL] is CLASS C AND [DRNL] is CLASS C AND [DGBL] is CLASS A THEN [STABLE]

−0.39236

43

If [EL] is CLASS D AND [DRNL] is CLASS A THEN [STABLE]

−0.83779

44

If [EL] is CLASS D AND [DRNL] is CLASS B THEN [STABLE]

−0.89057

45

If [EL] is CLASS D AND [DRNL] is CLASS C THEN [STABLE]

−0.90179

46

If [EL] is CLASS D AND [DRNL] is CLASS D AND [DGBL] is CLASS A THEN [STABLE]

−0.94696

47

If [EL] is CLASS D AND [DRNL] is CLASS D AND [DGBL] is CLASS B THEN [STABLE]

−0.95848

48

If [EL] is CLASS D AND [DRNL] is CLASS D AND [DGBL] is CLASS C AND [DHNL] is CLASS A THEN [STABLE]

−0.95973

49

If [EL] is CLASS D AND [DRNL] is CLASS D AND [DGBL] is CLASS C AND [DHNL] is CLASS A THEN [STABLE]

−0.96262

50

If [EL] is CLASS D AND [DRNL] is CLASS D AND [DGBL] is CLASS C AND [DHNL] is CLASS D THEN [STABLE]

−0.98014

Landslides

Table 6 Comparing the accuracy and Cohen kappa index

Models

Fig. 4 The landslide susceptibility map of the modified ID3 DT

landslide susceptibility class were most of the landslides are located in the areas of high and very high susceptibility. The very high susceptibility class has the highest relative frequency 67.37 %, 93 landslide incidence. Validating the classifier and the produced landslide susceptibility map The efficiency of the modified ID3 DT model was estimated through the success rate curve using only the landslide data from the training database. The calculated value for the area under the curve (AUC) was estimated to be 0.8397 (Fig. 5). The predictive power of the model was estimated though the predictive rate curve using the landslide data from the validation database. The calculated value for the area under the curve (AUC) was estimated to be 0.8035 (Fig. 5). Finally, the reliability of the model was estimated by the Cohen kappa index (k) which was found to be equal to 0.9365 indicating an almost perfect agreement between the model and reality while the classified rate was estimated to be 96.82 % (Table 6). Comparing the performance of the modified ID3 model In order to compare the performance of the developed model with other techniques, we produced three landslide susceptibility maps utilizing the CF method, the J48 and the ID3 algorithm. The CF value of each class for all affecting factors has been already calculated during the second phase of the developed methodology (Eq. 3). The next steps involve combining pairwise each layer according to the integration rules, Eq. 4, and the calculation of Table 5 The relative landslide frequency

Description

Percentage of class (%)

Relative landslide frequency

Very low susceptibility

34.14

1.36 (5)

Low susceptibility

19.41

4.31 (9)

Medium susceptibility

12.81

5.08 (7)

High susceptibility

20.81

21.88 (49)

Very high susceptibility

12.83

67.37 (93)

tp

tn

fp

fn

accuracy

k

CF

22

29

9

3

0.8095

0.6190

ID3

32

27

1

3

0.9365

0.8730

J48

32

28

0

3

0.9523

0.9047

modified ID3

31

30

1

1

0.9682

0.9365

the landslide belief value for the whole set of the layers. As already mentioned, the belief ranged between a value of −1, which expressed decreasing certainty in landslide occurrence, and a value of +1 that expressed increasing certainty in landslide occurrence. The classified rate was estimated to be 80.36 % while the Cohen kappa index was estimated to be equal to 0.6073, characterizing the model as a model with substantial agreement. Concerning the initialization of J48 algorithm, the Java reimplementation of the C4.5 algorithm, the probability of belonging to the landslide or the non-landslide class for each observation was estimated using the Laplace smoothing. The selection of the most accurate model is achieved by determining the minimum number of instances per leaf and the confidence factor, two conditions necessary for the implementation of J48 algorithm. The highest classification accuracy 95.38 % was obtained with minimum instance per leaf of 9 and with confidence factor greater than 0.20. Using the same training and testing dataset, the decision tree was constructed. The size of the generated tree was seventeen including the root tree, with thirteen leafs. The Cohen kappa index was estimated to be equal to 0.9047, characterizing the model as a model with almost perfect agreement. Finally, the implementation of the ID3 algorithm estimated a classification rate equal to be to 93.65 % while the Cohen kappa index was estimated to be equal to 0.8730, characterizing the model as a model with almost perfect agreement. Comparing the four models, the modified ID3 DT model performed better than the J48, ID3 and CF model. It is also necessary to compare how well the modified DT classifier had classified the research area according to the landslide susceptibility classes produced and the cumulative percentage of the observed landslide occurrence, against the other three techniques. As in the case of the modified ID3 model, the validation process was performed by comparing the produced landslide susceptibility map with the actual landslide locations using the success rate and the prediction rate methods. Figure 6 shows the produced landslide susceptibility maps and Fig. 7 the success and prediction rate curve of the four models. The calculated value for each of the four models for the area under the curve (AUC) showed similar results. The efficiency of the modified ID3 was the highest among the models, followed by the J48 model (0.8220), the ID3 (0.8200) and the CF model (0.7936). Also, the predictive power of the modified ID3 was the highest among the models, followed again by the J48 model (0.8024), the ID3 (0.7983) and the CF model (0.7766). Discussion Soft computing methods take advantage of their highly accurate and learning ability and seem to be capable to extract patterns Landslides

Original Paper

Fig. 5 The success and prediction rate curve of the modified ID3 DT

from large amount of multi-thematic landslide data, making more precise predictions and providing a decision supporting tool for solving spatial problems. These advantages also affected the

Fig. 6 Certainty factor, J48 and ID3 landslide susceptibility map

Landslides

methods that were implemented in spatial analysis, allowing moving from a data-poor and computation-poor to a data-rich and computation-rich environment (Miller and Han 2001). In the case

Fig. 7 Comparing the four model success and prediction rate curve

of DT models, the applied algorithm makes no statistical assumptions for the frequency distributions of the data and can handle data that are represented on different measurement scales (Pal and Mather 2003). The main advantage is that a DT model represents the results that refer to estimating the order of importance for each explanatory variable in a tree structure format that is easily understood by humans (Witten and Frank 2005). The top-down induction of the decision, tree indicates that factors, in the higher order of the tree structure, are more important. In general, the DT model provides a visual interpretation and is thought to be a powerful predictive tool when compared with other approaches from the machine learning domain. Humans are able to understand more easily DT models. They require little data preparation, while DT can handle both numerical and categorical data. Landslide susceptibility analysis performed by DT algorithms has been proven to be a valuable tool. It is significant to point out that the quality of the available data and the expert’s judgment during the classification of the landslide conditioning factors are parameters that influence the degree of accuracy of any analysis. In our study, the outcome of the analysis performed by the ID3 algorithm was a set of rules that expressed the relationship between landslide conditioning factors and the actual landslide occurrence. This relationship was illustrated by a set of rules that assisted in classifying unknown areas into the two distinctive states of stability, stable or unstable. The top-most decision node which corresponded to the best predictor factor according to the developed DT model was the elevation layer. This was based on 326 input patterns; however, it is obvious that additional data may present a different factor as best predictor factor and also would certainly improve, in terms of predictive power, the outcome of the analysis. The advantage of the developed classifier is that it grows in full extent allowing the classifier to Blearn^ in depth from the available data. However, in order to minimize the possibility of overfitting, the belief values of each decision rule that has been produced by the DT classifier were calculated by using the CF method, allowing a mechanism analogous to pruning process.

The CF method is ideal for calculating the belief value since it sufficiently can handle the problem of combining different data layers and the heterogeneity and uncertainty of the input data. By this procedure, the discrete type of result was transformed into continuous values that express the landslide susceptibility belief value for each grid cell and enabled the generation of a landslide susceptibility belief map in GIS environment. It also made possible to exclude rules that, according to the CF method, have belief values that indicate uncertainty. Conclusion In the present study, a modified ID3 DT classifier was used to estimate the landslide susceptibility belief values of a certain research area, in Xanthi, Greece. A total of eight landslide conditioning factors were analyzed and included in the study, namely lithological units, elevation, slope angle, slope aspect, distance from tectonic features, distance from hydrographic network, distance from geological boundaries and distance from road network. The landslide inventory data were divided into two subsets, one for training (80 % of the total data) and one for estimating the prediction capabilities of the developed methodology. The overall accuracy of the developed classifier was calculated and reached 96 %, making the classifier a valuable tool that can be used efficiently in spatial prediction models. The qualitative interpretation of the produced landslide susceptibility maps provides strong evidence about how well the outcomes of the analysis agree with field evidence. When compared with the results obtained from the application of J48 algorithm on the same datasets, the prediction capabilities of the modified DT model were slightly better. The highest area under the success-rate curve is for the modified DT 0.8397 and for the J48 algorithm 0.8220. The highest area under the prediction-rate curve is for the modified DT 0.8035 and for the J48 algorithm 0.8024. The reliability of the developed technique was estimated through the Cohen kappa index (k) and was calculated to be equal to 0.9365. The value indicates an almost perfect agreement between the observed and the predicted values. Landslides

Original Paper The outcomes of the study indicated that the modified DT classifier could be appreciated as a useful tool for producing high accuracy landslide susceptibility maps and could provide valuable information to local authorities and government agencies that concern decision making and policy planning. Acknowledgments The authors would like to thank the Editorial Office of Landslides for editorial handling and also two anonymous reviewers for their helpful comments and suggestions that improved in quality the previous version of the manuscript.

References Akgun A (2012) Landslide susceptibility zonation of the Chamoli region, Garhwal Himalayas, using logistic regression model. Landslides 9(1):93–106 Akgun A, Sezer EA, Nefeslioglu HA, Gockeoglu C, Pradhan B (2012) An easy to use MATL AB program (MamLand) for the assessment of landslide susceptibility using Mamdami fuzzy algorithm. Comput Geosci 38(1):23–34 Akinci H, Dogan S, Kiligoclu C, Temiz MS (2011) Production of landslide susceptibility map of Samsun (Turkey) city centre by using Frequency Ration Model. Int J Phys Sci 6(5):1015–1025 Aleotti P, Chowdhury R (1999) Landslide hazard assessment: Summary review and new perspectives. Bull Eng Geol Environ 58(1):21–44 Ayalew L, Yamagishi H (2005) The application of GIS-based logistic regression for landslide susceptibility mapping in Kakuda – Yahiko Mountains, central Japan. Geomorphology 65:15–31 Bell JF (1999) Tree-based methods. In: Fielding A (ed) Machine learning methods for ecological applications. Kluwer, Dordrecht, pp 89–105 Bin Z, Xin-gang Z, Ren-Chao W (2004) Automated soil resources mapping based on decision tree and Bayesian predictive modeling. J Zhejiang Univ Sci 5(7):782–795 Binaghi E, Luzi L, Madella P (1998) Slope instability zonation: A comparison between certainty factor and fuzzy Dempster – Shafer approaches. Nat Hazards 17:77–97 Bou Kheir R, Bøcher PK, Greve MB, Greve MH (2010) The application of GIS based decision-tree models for generating the spatial distribution of hydromorphic organic landscapes in relation to digital terrain data. Hydrol Earth Syst Sci 14:847–857 Bradley JC, Millspaugh AC (2001) Advanced programming using visual basic, Version 6.0. Irwin/McGraw-Hill, p 655 Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. Chapman & Hal. Wadsworth, Inc, New York, p 368 Can T, Nefeslioglu HA, Gokceoglu C, Sonmez H, Duman TY (2005) Susceptibility assessments of shallow earthlows triggered by heavy rainfall at three catchments by logistic regression analysis. Geomorphology 72(1–4):250–271 Caniani D, Pascale S, Sdao F, Sole A (2008) Neural networks and landslide susceptibility: a case study of the urban area of Potenza. Nat Hazards 45:55–72 Cervi F, Berti M, Borgatti L, Ronchetti F, Manenti F, Corsini A (2010) Comparing predictive capability of statistical and deterministic methods for landslide susceptibility mapping: a case study in the northern Apennines (Reggio Emilia Province, Italy). Landslides 7(4):433–444 Chung CF, Fabbri AG (1993) The representation of geoscience information for data integration. Nonrenewable Resour 2(2):122–139 Chung CF, Fabbri AG (1998) Three Bayesian prediction models for landslide hazard. In: Bucciantti A (ed) Proceedings of International Association for Mathematical Geology Annual Meeting (IAMG’98), Ischia, Italy, pp 204–211 Chung CJF, Fabbri AG (2003) Validation of spatial prediction models for landslide hazard mapping. Nat Hazards 30(3):451–472 Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Meas 20:37–46 Cruden DM, Varnes DJ (1996) Landslides types and processes. In: Turner AK, Schuster RL (eds) Landslides: investigation and mitigation. Transportation Research Board special report 247, pp 36–75 Dai FC, Lee CF (2003) A spatiotemporal probabilistic modeling of storm – induced shallow landsliding using aerial photographs and logistic regression. Earth Surf Proc Landf 28(5):527–545 Dai FC, Lee CF, Ngai YY (2002) Landslide risk assessment and management: an overview. Eng Geol 64(1):65–87

Landslides

Devkota KC, Regmi AD, Pourghasemi HR, Yoshida K, Pradhan B, Ryu IC, Dhital MR, Althuwaynee OF (2013) Landslide susceptibility mapping using certainty factor, index of entropy and logistic regression models in GIS and their comparison at MuglingNarayanghat road section in Nepal Himalaya. Nat Hazards 65(1):135–165 Duman TY, Can T, Gokceoglu C, Nefeslioglu HA, Sonmez H (2006) Application of logistic regression for landslide susceptibility zoning of Cekmece Area, Istanbul, Turkey. Environ Geol 51:241–256 Elias PB, Bandis SC (2000) Neurofuzzy systems in landslide hazard assessment. In: Proceedings of 4th International Symposium on Spatial Accuracy Assessment in Natural Resources and Environmental Sciences, pp 199–202 Ercanoglu M, Gokceoglu C (2002) Assessment of landslide susceptibility for a landslide prone area (north of Yenice, NW Turkey) by fuzzy approach. Environ Geol 41:720–730 Ercanoglu M, Gokceoglu C (2004) Use of fuzzy relations to produce landslide susceptibility map of a landslide prone area West Black Sea region, Turkey. Eng Geol 75(3– 4):229–250 Ermini L, Catani F, Casagli N (2005) Artificial neural networks applied to landslide susceptibility assessment. Geomorphology 66:327–343 Fawcett T (2004) ROC graphs: notes and practical considerations for researchers. Pattern Recogn Lett 27(8):882–891 Felicisimo AM, Cuartero A, Remondo J, Quiros E (2013) Mapping landslide susceptibility with logistic regression, multiple adaptive regression splines, classification and regression trees, and maximum entropy methods: a comparative study. Landslides 10(2):175–189 Fell R, Corominas J, Bonnard C, Cascini L, Leroi E, Savage W (2008) Guidelines for landslide susceptibility, hazard and risk zoning for land-use planning. Eng Geol 102:99–111 Ferentinou M, Sakellariou M (2007) Computational intelligence tools for the prediction of slope performance. Comput Geotech 34:362–384 Flentje P, Stirling D, Chowdhury RN (2007) Landslide susceptibility and hazard derived from a landslide inventory using data mining — an Australian case study. In: Proceedings of the First North American Landslide Conference, Landslides and Society: Integrated Science, Engineering, Management and Mitigation pp 1–10 Glade T, Anderson M, Crozier MJ (2005) Landslide hazard and risk. John Wiley and Sons, Ltd., Chichester, p 802 Gokceolu C, Sonmez H, Nefeslioglu HA, Duman TY, Can T (2005) The 17 March 2005 Kuzulu landslide (Sivas, Turkey) and landslide-susceptibility map of its near vicinity. Eng Geol 81:65–83 Gomez H, Kavzoglu T (2005) Assessment of Shallow Landslide Susceptibility using Artificial Neural Networks in Jabonosa River Basin, Venezuela. Eng Geol 78(1– 2):11–27 Guzzetti F, Carrara A, Cardinali M, Reichenbach P (1999) Landslide evaluation: a review of current techniques and their application in a multi-scale study, Central Italy. Geomorphology 31:181–216 Guzzetti F, Reichenbach P, Cardinali M, Galli M, Ardizzone F (2005) Probabilistic landslide hazard assessment at the basin scale. Geomorphology 72:272–299 Guzzetti F, Reichenbach P, Ardizzone F, Cardinali M, Galli M (2006) Estimating the quality of landslide susceptibility models. Geomorphology 81:166–184 Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. SIGKDD Explor 11(1):10–18 Heckerman D (1986) Probabilistic interpretations for MYCIN’s certainty factors. In: Kanal L, Lemmer J (eds) Uncertainty in artificial intelligence. North-Holland, pp 167–196 Hoehler FK (2000) Bias and prevalence effects on kappa viewed in terms of sensitivity and specificity. J Clin Epidemiol 53:499–503 Ilia I, Tsangaratos P, Koumantakis I, Rozos D (2010) Application of a Bayesian approach in GIS-based model for evaluating landslide susceptibility. Case study Kimi area, Euboea, Greece. Bull Geol Soc Greece 3:1590–1600 Ilias P, Rozos D, Konstandopoulou G, Dimadis E, Salapa E, Apostolidis E, Gemitzi A (2000) Engineering geology study of disastrous phenomena in Central Rhodope Mountain, Greek Institute of Geology and Mineral Exploration, Internal Report T-2117, (in Greek) Jaafari A, Najafi A, Pourghasemi HR, Rezaeian J, Sattarian A (2014) GIS-based frequency ratio and index of entropy models for landslide susceptibility assessment in the Caspian forest, northern Iran. Int J Environ Sci Technol 11(4):909–926 Jenks FG (1967) The data model concept in statistical mapping. Int Yearb Cartogr 7:186– 190 Jones JM, Fielding A, Sullivan M (2006) Analysing extinction risk in parrots using decision trees. Biodivers Conserv 15(6):1993–2007 Juang CH, Lee DH, Sheu C (1992) Mapping slope failure potential using fuzzy sets. J Geotech Eng ASCE 118:475–494

Kanungo DP, Arora MK, Sarkar S, Gupta RP (2006) A comparative study of conventional, ANN black box, fuzzy and combined neural and fuzzy weighting procedures for landslide susceptibility zonation in Darjeeling Himalayas. Eng Geol 85:347–366 Kanungo DP, Sarkar S, Sharma S (2011) Combining neural network with fuzzy, certainty factor and likelihood ratio concepts for spatial prediction of landslides. Nat Hazards 59:1491–1512 Kavzoglu Τ, Sahin EK, Colkesen Ι (2014) Landslide susceptibility mapping using GISbased multi-criteria decision analysis, support vector machines, and logistic regression. Landslides, 11(3):425–439 Kawabata D, Bandibas J (2009) Landslide susceptibility mapping using geological data, a DEM from ASTER images and an Artificial Neural Network (ANN). Geomorphology 113(1–2):97–109 Kouli M, Loupasakis C, Soupios P, Rozos D, Vallianatos F (2014) Landslide susceptibility mapping by comparing the WLC and WofE mutli-criteria methods in the West Crete Island, Greece. Environ Earth Sci. doi:10.1007/s12665-014-3389-0 Lan HX, Zhou CH, Wang LJ, Zhang HY, Li RH (2004) Landslide hazard spatial analysis and prediction using GIS in the Xiaojiang watershed, Yunnan, China. Eng Geol 76(1– 2):109–128 Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33(1):159–174 Lee S, Min KD (2001) Statistical analysis of landslide susceptibility at Yongin, Korea. Environ Geol 40(9):1095–1113 Lee S, Pradhan B (2006) Probabilistic landslide hazard and risk mapping on Penang Island, Malaysia. J Earth Syst Sci 115(6):661–672 Lee S, Pradhan B (2007) Landslide hazard mapping at Selangor, Malaysia using frequency ratio and logistic regression models. Landslides 4(1):33–41 Lee S, Sambath T (2006) Landslide susceptibility mapping in the Damrei Romel area, Cambodia using frequency ratio and logistic regression models. Environ Geol 50(6):847–855 Lee S, Ryu JH, Min K, Won JS (2003) Landslide susceptibility analysis using GIS and artificial neural network. Earth Surf Process Landf 28(12):1361–1376 Lee S, Choi J, Min K (2004a) Probabilistic landslide hazard mapping using GIS and remote sensing data at Boeun, Korea. Int J Remote Sens 25:2037–2052 Lee S, Ryu J, Won J, Park H (2004b) Determination and application of the weights for landslide susceptibility mapping using an artificial neural network. Eng Geol 71(3– 4):289–302 Lee CT, Huang CC, Lee JF, Pan KL, Lin ML, Dong JJ (2008) Statistical approach to earthquake – induced landslide susceptibility. Eng Geol 100:43–58 Li X, Claramunt C (2006) A spatial entropy-based decision tree for classification of geographical information. Trans GIS 10(3):451–467 Liati A, Seidel E (1996) Metamorphic evolution and geochemistry of kyanite eclogites in central Rhodope, northern Greece. Contrib Mineral Petrol 123(3):293–307 Luzi L, Pergalani F (1999) Slope instability in static and dynamic conditions for urban planning: The ‘Oltre Po Pavese’ Case History (Regione Lombardia – Italy). Nat Hazards 20(1):57–82 Magliulo P, Di Lisio A, Russo F, Zelano A (2008) Geomorphology and landslide susceptibility assessment using GIS and bivariate statistics: a case study in southern Italy. Nat Hazards 47:411–435 Marjanovic M, Bajat B, Kovaevi M (2009) Landslide susceptibility assessment with machine learning algorithms. Proceedings of the International Conference on Intelligent Networking and Collaborative Systems, November 4-6, 2009, IEEE, Barcelona, pp 273–278 Melchiore C, Matteucci M, Azzoni A, Zanchi A (2008) Artificial neural networks and cluster analysis in landslide susceptibility zonation. Geomorphology 94(3–4):379–400 Miller HJ, Han J (2001) Geographic data mining and knowledge discovery. CRC Press, Boca Raton, p486 Mitchell T (1997) Machine learning. McGraw-Hill, p 414 Mohammady M, Pourghasemi HR, Pradhan B (2012) Landslide susceptibility mapping at Golestan Province, Iran: a comparison between frequency ratio, Dempster-Shafer, and weights-of-evidence models. J Asian Earth Sci 61:221–236 Murthy S (1998) Automatic construction of decision trees from data: a multidisciplinary survey. Data Min Knowl Disc 2(4):345–389 Nandi A, Shakoor A (2010) A GIS-based landslide susceptibility evaluation using bivariate and multivariate statistical analyses. Eng Geol 110(1–2):11–20 Nefeslioglu HA, Sezer E, Gokceoglu C, Bozkir AS, Duman TY (2010) Assessment of landslide susceptibility by decision trees in the metropolitan area of Istanbul, Turkey. Math Probl Eng doi:10.1155/2010/901095 Article ID 901095 Neuhauser B, Damm B, Terhorst B (2012) GIS-based assessment of landslide susceptibility on the base of the Weights-of Evidence model. Landslides 9(4):511–528

Oh HJ, Pradhan B (2011) Application of a neuro-fuzzy model to landslide-susceptibility mapping for shallow landslides in a tropical hilly area. Comput Geosci 37(9):1264– 1276 Ormsby T, Napoleon E, Burke R, Groessl C, Bowden L (2008) Getting to know ArcGIS Desktop: basics of ArcView, ArcEditor, and ArcInfo. ESRI Press, p 592 Pachauri AK, Pant M (1992) Landslide hazard mapping based on geological attributes. Eng Geol 32(1–2):81–100 Pal M, Mather PM (2003) An assessment of the effectiveness of decision tree methods for land cover classification. Remote Sens Environ 86:554–556 Park NW (2010) Application of Dempster–Shafer theory of evidence to GIS-based landslide susceptibility analysis. Environ Earth Sci 62(2):367–376 Pourghasemi HR, Mohammady M, Pradhan B (2012a) Landslide susceptibility mapping using index of entropy and conditional probability models in GIS: Safarood Basin, Iran. Catena 97:71–84 Pourghasemi HR, Pradhan B, Gokceoglu C, Mohammadi M, Moradi HR (2012b) Application of weights-of-evidence and certainty factor models and their comparison in landslide susceptibility mapping at Haraz watershed, Iran. Arab J Geosci 6(7):2351– 2365 Pourghasemi HR, Pradhan B, Gokceoglu C (2012c) Application of fuzzy logic and analytical hierarchy process (AHP) to landslide susceptibility mapping at Haraz watershed, Iran. Nat Hazards 63(2):965–996 Pourghasemi HR, Moradi HR, Fatemi Aghda SM (2013) Landslide susceptibility mapping by binary logistic regression, analytical hierarchy process, and statistical index models and assessment of their performances. Nat Hazards 69(1):749–779 Pourghasemi HR, Moradi HR, Fatemi Aghda SM, Gokceoglu C, Pradhan B (2014) GISbased landslide susceptibility mapping with probabilistic likelihood ratio and spatial multi criteria evaluation models (North of Tehran, Iran). Arab J Geosci 7(5):1857–1878 Pradhan B (2013) A comparative study on the predictive ability of the decision tree, support vector machine and neuro-fuzzy models in landslide susceptibility mapping using GIS. Comput Geosci 51:350–365 Pradhan B, Lee S (2009) Landslide risk analysis using artificial neural model focusing on different training sites. Int J Phys Sci 3(11):1–15 Pradhan B, Lee S (2010a) Landslide susceptibility assessment and factor effect analysis: back-propagation artificial neural networks and their comparison with frequency ratio and bivariate logistic regression modeling. Environ Model Softw 25(6):747–759 Pradhan B, Lee S (2010b) Regional landslide susceptibility analysis using back – propagation neural network model at Cameron Highland, Malaysia. Landslides 7(1):13–30 Pradhan B, Lee S, Buchroithner MF (2009) Use of geospatial data for the development of fuzzy algebraic operators to landslide hazard mapping: a case study in Malaysia. Appl Geomatics 1:3–15 Pradhan B, Sezer E, Gokceoglu C, Buchroithner MF (2010) Landslide susceptibility mapping by neuro-fuzzy approach in a landslide prone area (Cameron Highland, Malaysia). IEEE Trans Geosci Remote Sens 48(12):4164–4177 Quinlan JR (1986) Induction of decision trees. Mach Learn 1:81–106 Quinlan JR (1993) C4.5: Programs for machine learning, Morgan Kaufmann, p 302 Rozos D, Tsagaratos P, Markantonis K, Skias S (2006) An application of rock engineering system (RES) method for ranking the instability potential of natural slopes in Achaia County, Greece. In: Proc. Of XIth International Congress of the Society for Mathematical Geology, University of Liege, Belgium, S08, p 10 Rozos D, Pyrgiotis L, Skias S, Tsagaratros P (2008) An implementation of rock engineering system for ranking the instability potential of natural slopes in Greek territory. An application in Karditsa County. Landslides 5:261–270 Sabatakakis N, Koukis G, Vassiliades E, Lainas S (2013) Landslide susceptibility zonation in Greece. Nat Hazards 65(1):523–543 Saito H, Nakayama D, Matsuyama H (2009) Comparison of landslide susceptibility based on a decision-tree model and actual landslide occurrence: the Akaishi mountains, Japan,^. Geomorphology 109(3–4):108–121 Schuster RL (1996) Socioeconomic significance of landslides. In A.K. Turner and R.L. Schuster, eds., Landslides – investigation and mitigation, National Res. Council, Washington, D.C. Transp Res Board Spec Rep 247:12–35 Sezer AE, Pradhan B, Gokceoglu C (2011) Manifestation of an adaptive neuro - fuzzy model on landslide susceptibility mapping: Klang valley, Malaysia. Expert Syst Appl 38(7):8208–8219 Shannon C (1948) The mathematical theory of communication. Bell Syst Tech J 27:379– 423 Shortliffe E, Buchanan B (1975) A model of inexact reasoning in medicine. Math Biosci 23:351–379

Landslides

Original Paper Sujatha ER, Rajamanickam GV, Kumaravel P (2012) Landslide susceptibility analysis using probabilistic certainty factor approach: a case study on Tevankarai stream watershed, India. J Earth Syst Sci 121(5):1337–1350 Tangestani MH (2009) A comparative study of Dempster-Shafer and Fuzzy models for landslide susceptibility mapping using a GIS: an experience from Zagros Mountains, SW Iran. Asian J Earth Sci 35:66–73 Thiery Y, Malet JP, Sterlacchini S, Puissant A, Maquaire O (2007) Landslide susceptibility assessment by bivariate methods at large scales: application to a complex mountainous environment. Geomorphology 92:38–59 Thiery Y, Maquaire O, Fressard M (2014) Application of expert rules in indirect approaches for landslide susceptibility assessment. Landslides 11(3):411–424 Tien Bui D, Pradhan B, Lofman O, Revhaug I, Dick OB (2012a) Spatial prediction of landslide hazards in Hoa Binh province (Vietnam): a comparative assessment of the efficacy of evidential belief functions and fuzzy logic models. Catena 96:28–40 Tien Bui D, Pradhan B, Lofman O, Revhaug I, Dick OB (2012b) Landslide susceptibility assessment in the Hoa Binh province of Vietnam using Artificial Neural Network. Geomorphology 172:12–19 Tien Bui D, Pradhan B, Lofman O, Revhaug I, Dick OB (2012c) Landslide susceptibility assessment at Hoa Binh province of Vietnam using an adaptive neuro fuzzy inference system and GIS. Comput Geosci 45:199–211 Tien Bui D, Pradhan B, Lofman O, Revhaug I (2012d) Landslide susceptibility assessment in Vietnam using support vector machines, decision tree and Naïve Bayes models. Mathematical Problems in Engineering, pp 1–26 Tien Bui D, Pradhan B, Lofman O, Revhaug I, Dick OB (2013) Regional prediction of landslide hazard in the Hoa Binh province (Vietnam) using probability analysis of intense rainfall. Nat Hazards 60(2):707–730 Tsangaratos P (2012) Research on the engineering geological behaviour of the geological formations by the use of Information Systems. Phd Thesis, Athens, Greece, p 363, (In Greek) Tsangaratos P, Benardos A (2014) Estimating landslide susceptibility through a artificial neural network classifier. Nat Hazards 74(3):1489–1516 Tsangaratos P, Ilia I, Rozos D (2013) Case event system for landslide susceptibility analysis. In: Margottini, Canuti, Sassa (eds) Landslide science and practice. Springer, Berlin, pp 585–593 Vahidnia MH, Alesheikh AA, Alimohammadi A, Hosseinali F (2010) A GIS-based neurofuzzy procedure for integrating knowledge and data in landslide susceptibility mapping. Comput Geosci 36(9):1101–1114

Landslides

Van Westen J, Van Asch J, Soeters R (2006) Landslide hazard and risk zonation — why is still so difficult? Bull Eng Geol Environ 65:167–184 Wan S (2009) A spatial decision support system for extracting the core factors and thresholds for landslide susceptibility map. Eng Geol 108(3–4):237–251 Wan S, Lei TC, Chou TY (2010) A novel data mining technique of analysis and classification for landslide problems. Nat Hazards 52(1):211–230 Witten WI, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco, p 560 WP/WLI International Geotechnical Societies UNESCO Working Party on World Landslide Inventory (1993) A suggested method for describing the activity of a landslide. Bull Int Assoc Eng Geol 47:53–57 Yeon YK, Han JG, Ryu KH (2010) Landslide susceptibility mapping in Injae, Korea, using a decision tree. Eng Geol 116(3–4):274–283 Yilmaz I (2010) Comparison of landslide susceptibility mapping methodologies for Koyulhisar, Turkey: conditional probability, logistic regression, artificial neural networks, and Support Vector Machine. Environ Earth Sci 61(4):821–836 Yilmaz C, Topal T, Suzen ML (2012) GIS-based landslide susceptibility mapping using bivariate statistical analysis in Devrek (Zonguldak Turkey). Environ Earth Sci 65(7):2161–2178 Zare M, Pourghasemi HR, Vafakhah M, Pradhan B (2013) Landslide susceptibility mapping at Vaz watershed (Iran) using an artificial neural network model: a comparison between multi-layer perceptron (MLP) and radial basic function (RBF) algorithms. Arab J Geosci 6(8):2873–2888 Zevenbergen LW, Thorne CR (1987) Quantitative analysis of land surface topography. Earth Surf Process Landf 12(1):47–56

P. Tsangaratos ()) : I. Ilia School of Mining and Metallurgical Engineering, Department of Geological Studies, National Technical University of Athens, Zografou Campus: Heroon Polytechniou 9, 15780, Zografou, Greece e-mail: [email protected] I. Ilia e-mail: [email protected]