Catena 151 (2017) 147–160
Contents lists available at ScienceDirect
Catena journal homepage: www.elsevier.com/locate/catena
A comparative study of logistic model tree, random forest, and classification and regression tree models for spatial prediction of landslide susceptibility Wei Chen a, Xiaoshen Xie a, Jiale Wang a, Biswajeet Pradhan b,c, Haoyuan Hong d,⁎, Dieu Tien Bui e, Zhao Duan a, Jianquan Ma a a
College of Geology & Environment, Xi'an University of Science and Technology, Xi'an 710054, China Department of Civil Engineering, Faculty of Engineering, University Putra Malaysia, 43400, Selangor, Malaysia c Department of Energy and Mineral Resources Engineering, Choongmu-gwan, Sejong University, 209 Neungdong-ro Gwangjin-gu, Seoul, 05006, Republic of Korea d Jiangxi Provincial Meteorological Observatory, Jiangxi Meteorological Bureau, No.109 ShengfuBeier Road, Nanchang 330046, China e Geographic Information System group, Department of Business Administration and Computer Science, University College of Southeast, Bø i Telemark N-3800, Norway b
a r t i c l e
i n f o
Article history: Received 28 May 2016 Received in revised form 24 November 2016 Accepted 29 November 2016 Available online xxxx Keywords: Logistic model tree Random forest Classification and regression tree Landslide China
a b s t r a c t The main purpose of the present study is to use three state-of-the-art data mining techniques, namely, logistic model tree (LMT), random forest (RF), and classification and regression tree (CART) models, to map landslide susceptibility. Long County was selected as the study area. First, a landslide inventory map was constructed using history reports, interpretation of aerial photographs, and extensive field surveys. A total of 171 landslide locations were identified in the study area. Twelve landslide-related parameters were considered for landslide susceptibility mapping, including slope angle, slope aspect, plan curvature, profile curvature, altitude, NDVI, land use, distance to faults, distance to roads, distance to rivers, lithology, and rainfall. The 171 landslides were randomly separated into two groups with a 70/30 ratio for training and validation purposes, and different ratios of non-landslides to landslides grid cells were used to obtain the highest classification accuracy. The linear support vector machine algorithm (LSVM) was used to evaluate the predictive capability of the 12 landslide conditioning factors. Second, LMT, RF, and CART models were constructed using training data. Finally, the applied models were validated and compared using receiver operating characteristics (ROC), and predictive accuracy (ACC) methods. Overall, all three models exhibit reasonably good performances; the RF model exhibits the highest predictive capability compared with the LMT and CART models. The RF model, with a success rate of 0.837 and a prediction rate of 0.781, is a promising technique for landslide susceptibility mapping. Therefore, these three models are useful tools for spatial prediction of landslide susceptibility. © 2016 Elsevier B.V. All rights reserved.
1. Introduction Landslides, as one of the most commonly geological hazards in the world, cause thousands of casualties and fatalities, hundreds of billions of dollars in damage, and environmental losses each year (Aleotti and Chowdhury, 1999; Gutiérrez et al., 2015). For China, many regions have been seriously affected by landslide occurrences, and landslides have caused serious threats to the environment, settlements, and industrial facilities in the recent years (Lin et al., 2012; Ma et al., 2015; Wang et al., 2015a; Xu et al., 2015; Xu et al., 2014; Zhou et al., 2013). Generally, landslide damages can be decreased to a certain extent by predicting future landslide locations (Pradhan, 2010). Globally, several ⁎ Corresponding author. E-mail addresses:
[email protected] (W. Chen),
[email protected],
[email protected] (H. Hong).
http://dx.doi.org/10.1016/j.catena.2016.11.032 0341-8162/© 2016 Elsevier B.V. All rights reserved.
statistical models combined with GIS have been used for landslide susceptibility assessment, such as statistical index (Chen et al., 2016a; Constantin et al., 2011; Nasiri Aghdam et al., 2016), index of entropy (IOE) (Constantin et al., 2011; Devkota et al., 2013; Youssef et al., 2015a), weights of evidence (WOE) (Chen et al., 2016c; Oh and Lee, 2011; Ozdemir and Altural, 2013; Sharma and Kumar, 2008), evidential belief function (EBF) (Pradhan et al., 2014; Tien Bui et al., 2013), certainty factor (CF) (Chen et al., 2016d; Devkota et al., 2013; Kanungo et al., 2011), analytical hierarchy process (AHP) (Chen et al., 2016d; Demir et al., 2013; Shahabi et al., 2014; Yalcin et al., 2011), logistic regression models (Costanzo et al., 2014; Devkota et al., 2013; Lee et al., 2007; Nourani et al., 2014; Ozdemir and Altural, 2013), and multiple logistic regression models (Felicísimo et al., 2013; Lee, 2007; Ohlmacher and Davis, 2003). Because the prediction capability of these proposed models are critical, machine learning models have also been investigated, such as fuzzy
148
W. Chen et al. / Catena 151 (2017) 147–160
logic (Guettouche, 2013; Pourghasemi et al., 2012; Pradhan, 2010; Sharma et al., 2013), fuzzy rule-based classifier (Pham et al., 2016; Tien Bui et al., 2014), neuro fuzzy (Dehnavi et al., 2015; Pradhan, 2013), multivariate adaptive regression splines (MARS) (Conoscenti et al., 2015; Felicísimo et al., 2013; Vorpahl et al., 2012; Wang et al., 2015a), neural networks (Lee et al., 2007; Park et al., 2013; Tien Bui et al., 2016c; Yilmaz, 2010), fuzzy k-nearest neighbor (Tien Bui et al., 2016a), Naïve Bayes (Tsangaratos and Ilia, 2016), support vector machines (Chen et al., 2016b; Colkesen et al., 2016; Xu et al., 2012), leastsquared support vector machines (Tien Bui et al., 2016b), and relevant support vector machines (Hoang and Tien Bui, 2016). A literature review shows that each machine learning model has its strengths and weaknesses, and in general, its behavior depends on characteristics of different study areas. Therefore, comparisons of machine learning models for landslide susceptibility assessment are highly desired. Although several comparison works have been carried out by researchers, such as Pradhan (2013), Hong et al. (2015), Youssef et al. (2015b), and Tien Bui et al. (2016c). However, there are still some state-of-the-art models, such as logistic model tree (LMT), random forest (RF), and classification and regression tree (CART), which have been rarely employed for landslide susceptibility assessment, and therefore they should be further investigated and compared. We address these investigations here by applying, verifying, and comparing three machine learning techniques LMT, RF, and CART for landslide susceptibility mapping, with a case study at the Long County area. Twelve landslide conditioning factors were considered using these three models in GIS. The results were validated using the area under the receiver operating characteristic (ROC) curve method and statistical measures. 2. General situation of the region The study area (Long County) is located in Shaanxi Province, China, within latitudes 34°35′17″ N to 35°6′45″ N, and longitudes 106°26′32″ E to 107°8′11″ E (Fig. 1). The study area land use types are mainly
farmland, bare land, residential areas, water, forest, and grass. The altitude ranges from 778 m to 2467 m, and decreases from west to the east. Qian River and Wei River are the main rivers in the study area, both belonging to the Yellow River network. According to a Shaanxi Province Meteorological Bureau (http:// www.sxmb.gov.cn) report, the study area has a warm temperate continental monsoon climate, with average annual temperature of approximately 10.7 °C and annual rainfall of approximately 600 mm. The average annual evaporation is 1363 mm, and average relative humidity of approximately 70%. The average number of days with precipitation is 120. The rainy season is from May to September, accounting for 75.4% of yearly rainfall. Average annual wind speed 1.5 m/s, and maximum wind speed is 8.4 m/s. The study area is located at the borders of the southern margin of the Ordos syncline and Qin-Qi geosyncline. The strata are mainly Mesoproterozoic, Cambrian, Ordovician, Triassic, Cretaceous, Neogene, and Quaternary. There are three major faults that divide the study area into distinct structural zones, including (1) the Guguan-Badu (NW–SE direction), (2) the Xinjichuan-Yabo (NW-SE direction), and (3) the Taoyuan-Guichuansi (NW-SE direction). The main lithologies in the study area are loess, mudstone, sandstone, conglomerate, glutenite, limestone, and igneous rocks (Fig. 2). 3. Materials and methods In the current study, a digital elevation model (DEM) with 30 × 30 m resolution was used to extract a set of topographic factors. The DEM was provided by the International Scientific & Technical Data Mirror Site, Computer Network Information Center, Chinese Academy of Sciences, and available at http://www.gscloud.cn. LANDSAT-8 satellite images with 30 × 30 m spatial resolution were also provided by the same institution. Study area lithology maps at a scale of 1:200,000 were collected from the local Land and Resources Bureau. Meteorological data was collected and compiled from the government of Meteorological Bureau of
Fig. 1. Study area location of the study area and landslide inventory map.
W. Chen et al. / Catena 151 (2017) 147–160
149
Fig. 2. Study area geological map.
the Shaanxi Province (available at http://www.sxmb.gov.cn/). In addition, satellite images available from Google Earth pro 7.1 were used. All of the above mentioned data were processed and used to create the landslide-related factors using ArcGIS 10.0.
3.1. Landslide inventory map The landslide inventory map was the first step because this modeling approach is based on the assumption that past landslides are the key to the future (Guzzetti et al., 1999). Therefore, compilations of landslide inventory maps with high accuracy and detail information are the first important step (Hong et al., 2016a; Hong et al., 2016b; Regmi et al., 2014). In this study, landslide data comes from historical records, field surveys and interpretation of Google Earth images carried out in Google Earth pro 7.1. In the end, 171 landslide locations (centroid) were identified (Fig. 1). Landslide analysis shows that the size of the smallest landslide is 120 m3, and the largest landslide is N40.000,000 m3. These
landslides affected 4153 people, and the economic losses are estimated to be approximately 12.3 million USD. 3.2. Landslide conditioning factors According to previous literature and landslide characteristics of the study area, 12 conditioning factors were selected, including slope angle, slope aspect, plan curvature, profile curvature, altitude, NDVI, land use, distance to faults, distance to roads, distance to rivers, lithology, and rainfall were taken into account (Table 1). All the above landslide conditioning factor maps were converted into raster format with a spatial resolution of 30 × 30 m. Topographic related factors were derived from the DEM with a resolution of 30 × 30 m. Slope aspect, which describes the direction of slope (Oh et al., 2010; Pourghasemi et al., 2012), is frequently used as landslide conditioning factor. In the present study, this factor was reclassified into nine directional classes (Fig. 3a). Slope angle is considered to be a principal causative factor and is frequently employed in
150
W. Chen et al. / Catena 151 (2017) 147–160
Table 1 Spatial relationship between landslide conditioning factors and landslides by frequency ratio. Conditioning factors
Classes
%Percentage of landslides
%Percentage of domain
FR
Slope aspect
Flat North Northeast East Southeast South Southwest West Northwest b10 10–20 20–30 30–40 40–50 N50 −10.57–−1.10 −1.10–−0.35 −0.35–0.25 0.25–1.08 1.08–8.59 −10.56–−1.54 −1.54–−0.49 −0.49–0.30 0.30–1.35 1.35–11.78 b1000 1000–1200 1200–1400 1400–1600 1600–1800 1800–2000 2000–2200 N2200 b1000 1000–2000 2000–3000 3000–4000 N4000 b200 200–400 400–600 600–800 N800 b500 500–1000 1000–1500 1500–2000 N2000 Group 1 Group 2 Group 3 Group 4 Group 5 Group 6 Group 7 Group 8 Group 9 Group 10 −0.26–0.21 0.21–0.35 0.35–0.47 0.47–0.59 0.59–0.75 Farmland Bareland Residential areas Water Forest and grass b530 530–550 550–570 570–590 590–610 610–630 630–650 N650
0.000 10.000 9.167 10.833 15.833 15.000 17.500 9.167 12.500 35.000 44.167 20.000 0.833 0.000 0.000 5.000 18.333 40.833 31.667 4.167 5.000 20.833 36.667 32.500 5.000 19.167 57.500 15.833 4.167 2.500 0.000 0.833 0.000 31.667 20.833 5.833 7.500 34.167 47.500 20.833 15.833 6.667 9.167 33.333 17.500 17.500 6.667 25.000 8.333 40.000 11.667 32.500 0.000 2.500 0.000 3.333 1.667 0.000 19.167 37.500 30.000 10.833 2.500 44.167 45.833 0.833 0.000 9.167 0.000 0.000 0.000 4.167 23.333 5.833 39.167 27.500
0.018 11.984 13.268 14.485 14.092 11.601 10.861 11.718 11.973 24.628 37.650 25.269 10.269 2.039 0.145 5.764 21.004 40.009 26.502 6.721 4.139 19.202 43.351 27.557 5.750 9.776 24.845 20.350 12.089 11.959 10.761 8.447 1.773 24.171 14.388 10.860 9.051 41.529 27.253 23.460 18.234 11.733 19.321 17.781 13.853 11.809 10.380 46.176 5.525 25.564 7.163 21.150 0.721 3.409 2.237 20.028 11.331 2.873 7.960 16.487 20.752 21.197 33.604 31.253 18.041 0.396 0.072 50.238 0.014 0.745 2.022 6.202 17.219 15.665 36.158 21.976
0.000 0.834 0.691 0.748 1.124 1.293 1.611 0.782 1.044 1.421 1.173 0.791 0.081 0.000 0.000 0.867 0.873 1.021 1.195 0.620 1.208 1.085 0.846 1.179 0.870 1.961 2.314 0.778 0.345 0.209 0.000 0.099 0.000 1.310 1.448 0.537 0.829 0.823 1.743 0.888 0.868 0.568 0.474 1.875 1.263 1.482 0.642 0.541 1.508 1.565 1.629 1.537 0.000 0.733 0.000 0.166 0.147 0.000 2.408 2.275 1.446 0.511 0.074 1.413 2.541 2.102 0.000 0.182 0.000 0.000 0.000 0.672 1.355 0.372 1.083 1.251
Slope angle (°)
Plan curvature
Profile curvature
Altitude (m)
Distance to faults (m)
Distance to rivers (m)
Distance to roads (m)
Lithology
NDVI
Landuse
Rainfall (mm/yr)
W. Chen et al. / Catena 151 (2017) 147–160
landslide susceptibility mapping (Kanungo et al., 2006; Nourani et al., 2014). In the current study, the slope angle map was reclassified into six classes using an interval of 10° (Fig. 2b). Plan curvature was also considered. This factor influences convergence or divergence of water during downhill flow and has been used by some authors (Yilmaz et al., 2012; Youssef et al., 2015a; Youssef et al., 2015b). Plan curvature values of the study area ranged from −10.57 to 8.59 and were divided into five classes (Fig. 3c). Profile curvature is the curvature in the vertical plane parallel to the slope direction. It influences water velocity flowing on the surface (Tien Bui et al., 2016c; Yilmaz et al., 2012; Yilmaz, 2010). Profile curvature values of the study area vary from − 10.56 to 11.78
151
and were also subdivided into five classes (Fig. 3d). Altitude is another frequently used conditioning factor in landslide susceptibility mapping (Nourani et al., 2014; Pradhan, 2010). In this study, altitude values of the study area were reclassified into eight classes with 200 m intervals (Fig. 3e). In general, geological fault areas are highly susceptible to landslides because the surrounding rock strength decreases due to tectonic breaks. In this study, the fault buffers were reclassified into five categories to produce the distance to faults map at a 1000 m interval (Fig. 3f). Rivers play an important role in the occurrence of landslides (Park et al., 2013). Rivers may induce failure of banks on account of slope undercutting,
Fig. 3. Study area thematic maps: (a) Slope aspect; (b) Slope angle; (c) Plan curvature; (d) Profile curvature; (e) Altitude; (f) Distance to faults; (g) Distance to rivers; (h) Distance to roads; (i) Lithology; (j) NDVI; (k) Land use; (l) Rainfall.
152
W. Chen et al. / Catena 151 (2017) 147–160
Fig. 3 (continued).
and terrain modification caused by gully erosion may also influence landslide initiation (Dai and Lee, 2002; Tien Bui et al., 2011). In the current study, distance to rivers was considered, and five rivers buffer categories were created using an interval of 200 m (Fig. 3g). Distance to roads is considered as an important factor for triggering landslide occurrences, and has been accepted as one of the most important anthropogenic factors by many investigators (Nourani et al., 2014; Yilmaz, 2010). In this study, five different buffer zones with a 500 m interval were generated using ArcGIS 10.0 (Fig. 3h). Lithology is also a frequently used factors in landslide susceptibility analyses (Akgun et al., 2012; Kanungo et al., 2006). The lithology map was prepared using previous geological maps and field surveys (Fig. 3i). The lithology map was constructed with ten groups based on estimated strength, lithofacies and geological age.
The normalized difference vegetation index (NDVI) was used to provide a quantitative estimate on the relationship between landslides and vegetation density (Choi et al., 2012). In this study, an NDVI map was extracted from Landsat-8 satellite images. The NDVI values range from −0.259 to 0.745 and were reclassified into five categories using the natural break method (Fig. 3j). As an important landside-related factor for landslide susceptibility assessment, land use has been taken into account in prior studies (Kanungo et al., 2006; Poudyal et al., 2010). Using interpretation of aerial photos, a land use map was constructed. The land use map was roughly reclassified into five classes (Fig. 3k). The mean annual precipitation at 21 rainfall stations was used to create the rainfall map, it was reclassified into eight categories using an interval of 20 mm: b530 mm, 530–550 mm, 550–570 mm, 570–
W. Chen et al. / Catena 151 (2017) 147–160
153
Fig. 3 (continued).
590 mm, 590–610 mm, 610–630 mm, 630–650 mm, N650 mm (Fig. 3l). 3.3. Preparation of training and validation datasets In this study, the 171 landslide locations were randomly divided into two groups with a 70/30 ratio. The first group with 70% landslide locations (120 landslide grid cells) was used for training the three models, while the remaining 30% landslide locations (51 landslide grid cells) were used for model validation. Landslide susceptibility mapping is considered a binary classification problem, landslide and non-landslide (Tien Bui et al., 2016c). Therefore, it is necessary to generate nonlandslide samples. In this study, non-landslide grid cells were randomly selected from landslide-free areas. Because areas with the 171
landslides are small compared to the total study area, in this study, different ratios of non-landslide to landslide grid cells should be considered to obtain the highest classification accuracy (Tien Bui et al., 2012b). Accordingly, 513 non-landslide grid cells (three times the number of landslide locations) showed the best result for the study area. The 513 non-landslide grid cells were also randomly divided into a ratio of 70/30 to build training and validation datasets. As a result, the training dataset consisted of 120 landslide pixels and 359 non-landslide pixels, while the validation dataset contained 51 landslide pixels and 154 non-landslide pixels. These landslide pixels were assigned a value of 1, while the non-landslide pixels were assigned a value of 0 (Tien Bui et al., 2016c). Because the logistic model tree, random forest, and classification and regression tree models in this study use numeric input factors, a conversion process suggested by Tien Bui
154
W. Chen et al. / Catena 151 (2017) 147–160
et al. (2012a) was used to transfer conditioning factor categories to numeric values. Lastly, a sampling process was performed to extract values of the 12 landslide conditioning factors for landslide and non-landslide pixels in the training and validation datasets. 3.4. Landslide conditioning factors selection based on the LSVM model In landslide susceptibility mapping, the quality of landslide susceptibility evaluation depends on both the selected models and quality of input data (Pradhan, 2013; Pradhan et al., 2014). As not all conditioning factors have equal predictive capability in landslide susceptibility modeling, and sometimes several conditioning factors may cause noise that reduces predictive capability of the employed models (Tien Bui et al., 2016c). Thus, landslide conditioning factors with low or null predictive capability should be removed in order to obtain more accurate results (Tien Bui et al., 2016c). There are several feature selection methods used to quantify the predictive capability of conditioning factors such as gain ratio (Nithya and Duraiswamy, 2014), information gain ratio (Tien Bui et al., 2016b), chi-square statistic (Moh'd and Mesleh, 2007; Rao and Scott, 1987; Sharma et al., 2013), and relief significance (Ahmad and Dey, 2005). In this study, the linear support vector machine (LSVM) method was used as a conditioning factors selection method (Guyon et al., 2002). This method could improve classification accuracy by removing unnecessary input factors (Pham et al., 2015). Quantification of the predictive capability of the 12 landslide conditioning factors was carried out using the following equation (Pham et al., 2015): g ðxÞ ¼ sgn wT m þ n
ð1Þ
where wTis the inverse matrix, m= (m1,m2, m3, ⋅ ⋅ ⋅, m12)is the input vector containing twelve factors, and n is the offset from the origin of the hyper-plane. wi represents the weight of the ith landslide conditioning factor, which means larger weight values have higher predictive capabilities (Pham et al., 2015). 3.5. Landslide susceptibility models 3.5.1. Logistic model tree The logistic model tree is a classification model, which combines decision tree learning methods and logistic regression (LR) (Landwehr et al., 2005; Quinlan, 1993; Tien Bui et al., 2016c). In the logistic variant information gain is used for splitting, the LogitBoost algorithm (Landwehr et al., 2005) is used to produce an LR model at every node in the tree, and the tree is pruned using a CART algorithm (Breiman et al., 1984). The LMT uses cross-validation to find a number of LogitBoost iterations to prevent training data overfitting. The LogitBoost algorithm uses additive logistic regression of least-squares fits for each class Mi (Doetsch et al., 2009): LM ðxÞ ¼
n X
βi xi þ β0
ð2Þ
i¼1
where βi is the coefficient of the ith component of vector x, whereas n is the number of factors. The linear logistic regression method is used to compute the posterior probabilities of leaf nodes in the LMT model (Landwehr et al., 2005; Tien Bui et al., 2016c): expðLM ðxÞÞ P ðM jxÞ ¼ D X expðLM0 ðxÞÞ M0 ¼1
where D is the number of classes.
ð3Þ
3.5.2. Random forest Random forest is a powerful ensemble-learning method that was proposed by Breiman (Breiman, 2001). Random forest can be applied for classification, regression and unsupervised learning (Liaw and Wiener, 2002), and this method has been widely used in many fields and exhibited good performance (Calderoni et al., 2015; Chen et al., 2014; Hasan et al., 2014; Youssef et al., 2015a). When solving classification problems, RF prediction is considered the unweighted majority of class votes (Kohestani et al., 2015). The bagging technique is used to select random samples of variables as the training dataset for model calibration. For each variable, the function determines model prediction error if the values of that variable are permuted across the out-of-bag observations (Trigila et al., 2015). 3.5.3. Classification and regression tree Classification and regression tree is a recursive partitioning method, which builds classification and regression trees for predicting categorical predictor variables (classification) and continuous dependent variables (regression) (Felicísimo et al., 2013; Youssef et al., 2015b). This method is widely used in many fields (Bevilacqua et al., 2003; Kim et al., 2015; Koon and Petscher, 2015; Malinowska, 2014; Yang et al., 2016). The CART is constructed by splitting subsets of the dataset using all predictor variables to create two child nodes repeatedly, and the final goal is to produce subsets of the dataset which are as homogeneous as possible with respect to the target variable (Mahjoobi and Etemad-Shahidi, 2008). 3.6. Model assessment The receiver operating characteristic (ROC) curve is a useful tool to assess performance of the landslide susceptibility models. The ROC curve is constructed using sensitivity as the Y-axis against 1-specificity as the X-axis with various cut-off thresholds (Hosmer and Lemeshow, 2000). The area under the ROC curve (AUC) represents the capability of a model to predict landslide and non-landslide pixels. An AUC value of 1 indicates a perfect model, while an AUC value of 0 indicates a non-informative model (Tien Bui et al., 2013; Tien Bui et al., 2016c), and a higher AUC value indicates a better predictive capability of a model. According to Tien Bui et al. (2016b) and Kantardzic (2011), correlation of predictive capability and AUC could be quantified as follows: excellent (0.9–1), very good (0.8–0.9), good (0.7–0.8), average (0.6–0.7), and poor (0.5–0.6). In this study, the standard error of AUC values was used, and a smaller standard error indicates a better model (Cascini et al., 2015; Conoscenti et al., 2015; Guo et al., 2015; Hussin et al., 2016). Furthermore, the predictive accuracy (ACC) has also been widely used to assess the predictive capability of landslide models. ACC is the proportion of landslide and non-landslide pixels that models correctly classified. In this study, ACC together with AUC were used to evaluate performances of the three landslide models. Accuracy ¼
TP þ TN TP þ FP þ TN þ FN
ð4Þ
where TP (true positive) and TN (true negative) are the number of pixels that are correctly classified, and FP (false positive) and FN (false negative) are the numbers of pixels incorrectly classified (Dehnavi et al., 2015; Wang et al., 2015b). 4. Results and discussion 4.1. Landslide conditioning factors selection The predictive capability of the twelve landslide conditioning factors was obtained using the LSVM method on training data. The result is shown in Fig. 4. It can be seen that altitude, with an average merit
W. Chen et al. / Catena 151 (2017) 147–160
Fig. 4. Predictive capabilities of the twelve landslide conditioning factors.
(AM) value of 11.40, has the highest predictive capability. This result is in agreement with research carried out by Tien Bui et al. (2016c). This result is likely due to most landslides occurring at altitudes b1200 m. The other conditioning factors have less predictive capabilities than altitude: distance to rivers (AM = 11.10), NDVI (AM = 10.20), profile curvature (AM = 8.10), slope aspect (AM = 7.60), rainfall (AM = 7.30), distance to faults (AM = 7.00), lithology (AM = 5.00), land use (AM = 3.90), slope angle (AM = 2.70), distance to roads (AM = 2.40), and plan curvature (AM = 1.30). According to Tien Bui et al. (2016c), conditioning factors with null predictive value should be removed. However, all twelve landslide conditioning factors revealed positive predictive capability values. Moreover, literature reveals that a single conditioning factor such as slope angle may not necessarily always have high importance in landslide susceptibility modeling. This conditioning factor is very site specific and depends on the scale adopted for analysis and selection method. For this reason, all twelve conditioning factors were used in the analysis for building the three models. 4.2. Landslide susceptibility mapping results Landslide susceptibility index values were calculated using the LMT model. This model can deal with binary and multi-class target variables, numeric and nominal attributes and missing values (Landwehr et al., 2005). The LMT model was constructed using training data. To reduce variability, ten-fold cross-validation method was used. This method partitions training data into ten subsets and averages validation results over ten rounds. The ten-fold cross-validation method was also used for the RF and CART models in this study. After that, the model was applied to calculate landslide susceptibility indices for the whole study area. The calculated LSI values were in the range 0.005 to 0.932. All LSI values were converted into ArcGIS 10 to produce the landslide susceptibility map. The landslide susceptibility map was reclassified into five classes such as very low (0.005–0.110), low (0.110–0.252), moderate (0.252– 0.434), high (0.434–0.638), and very high (0.638–0.932) using the natural break method (Fig. 5a). The area percentages of each class are shown in Fig. 6. The very low class has the largest area (51.25%), followed by low (18.61%), moderate (12.56%), high (10.57%), and very high (7.37%), (Fig. 6). For the case of the RF, the training process also employed the tenfold cross-validation method. For model building, it is necessary to determine the number of tree (numTree) in the forest, therefore, a heuristic test was carried out to find the best numTrees parameter for the RF model. NumTrees was tested from 100 to 2000 to obtain the highest area under the receiver operating characteristics curve (AUC) and predictive accuracy (ACC) values, and a parameter set to 200 showed the
155
best result. Then, the constructed model was applied to calculate landslide susceptibility indices for the whole study area. The calculated LSI values were in the range 0.000 to 0.948. The LSI values were converted into raster format using ArcGIS 10 to produce the landslide susceptibility map. The landslide susceptibility map was also reclassified into five classes: very low (0.000–0.089), low (0.089–0.245), moderate (0.245– 0.413), high (0.413–0.595), very high (0.595–0.948) using the natural break method (Fig. 5b). The area percentages for each class are shown in Fig. 6. The very low class has largest the area (46.05%), followed by low (18.73%), moderate (16.02%), high (13.01%), and very high (6.55%) (Fig. 6). Regarding CART, the model was constructed using training data and the 10-fold cross-validation method. The model was then used to calculate landslide susceptibility indices for the whole study area. The calculated LSI values were in the range 0.000 to 0.999. These values were converted into a raster format for ArcGIS 10 to produce the landslide susceptibility map. The landslide susceptibility map was also reclassified into five classes: very low (0.000–0.102), low (0.102– 0.302), moderate (0.302–0.517), high (0.517–0.733), very high (0.733–0.999) using the natural break method (Fig. 5c). The area percentages of each class are shown in Fig. 6. The very low class has the largest area (67.09%), followed by low (8.92%), moderate (8.72%), high (8.45%), and very high (7.18%), respectively (Fig. 6). 4.3. Model performance evaluation and comparison In landslide modeling, it is necessary to evaluate and assess the quality of the resulting models. In this study, the predictive accuracy (ACC), ROC curves and AUC values of these three models using training data are shown in Table 2 and Fig. 7. The RF model has the highest performance in terms of ACC and AUC, with values of 0.772 and 0.837, respectively (Table 2 and Fig. 7b). The LMT and CART models exhibited slightly lower ACC and AUC values than RF model. With ACC values of 0.745 and 0.733 for the LMT and CART models, respectively, and AUC values of 0.826 and 0.773 for LMT and CART models, respectively (Table 2 and Fig. 7b). Three evaluation statistics, namely, standard error (Std. error), confidence interval (CI) at 95%, and significance level P are included. The Std. errors are reasonably small, CIs are relatively narrow and Ps are also small for these three used models (Fig. 7). All these results indicate a reasonable goodness-of-fit for models with the training dataset, and the RF model performs better than the other two models. The prediction capabilities of the three constructed landslide models were evaluated using validation data, and results are shown in Table 3 and Fig. 8. It can be seen that all models have good prediction capabilities with the highest ACC value of 0.795 and AUC value of 0.781 for the RF model (Table 3 and Fig. 8b). The other evaluation statistics also indicate that all the three models exhibit reasonably good prediction capabilities. These findings agree with Youssef et al. (2015b) who concluded that the RF model performed better than other decision tree models. Finally, to compare statistically significant differences between the three landslide models, pairwise comparisons of the three models were conducted using the Wilcoxon signed-rank test method on training data. This method was also used in research carried out by Tien Bui et al. (2016b). The null hypothesis for this method is that there is no significant difference between landslide models at the 95% significance level. The z and p values are used to evaluate significant differences between landslide models. When z values exceed critical values of z (− 1.96–+1.96) and p values are smaller than the significant level (0.05), the null hypothesis will be rejected and therefore performances of landslide models are notably different (Tien Bui et al., 2016b). The results of the Wilcoxon signed-rank test are shown in Table 4. It can be seen that the LMT and CART model performances are significantly different (p value = 0.001, z value = 3.275). Performance of the RF and CART models is also significantly different (p value = 0.001, z value = 4.233). While performance of the LMT and RF models is not significantly different (p value = 0.362, z value = 0.911).
156
W. Chen et al. / Catena 151 (2017) 147–160
Fig. 5. (a) Landslide susceptibility maps using LMT model. (b) Landslide susceptibility maps using RF model. (c) Landslide susceptibility maps using CART model.
Fig. 6. Percentages of different landslide susceptibility classes for LMT, RF and CART models.
W. Chen et al. / Catena 151 (2017) 147–160 Table 2 ACC and AUC for the three landslide models on training data.
157
Table 3 ACC and AUC for the three landslide models on validation data.
Parameters
LMT
RF
CART
Parameters
LMT
RF
CART
ACC AUC
0.745 0.826
0.772 0.837
0.733 0.773
ACC AUC
0.717 0.752
0.795 0.781
0.746 0.742
Overall, all three landslide models are acceptable for landslide susceptibility mapping in the study area. The RF model exhibits the best performance in this study. These models could be selected to construct landslide susceptibility maps for mitigating landslide destruction. 5. Conclusions In this research, three machine learning models, namely, LMT, RF, and CART, were systematically analyzed and compared for landslide susceptibility modeling in Long County area (China). Twelve landslide conditioning factors were extracted for the study including slope angle, slope aspect, plan curvature, profile curvature, altitude, NDVI, land use, distance to faults, distance to roads, distance to rivers, lithology, and rainfall. Landslide conditioning factors were selected using the LSVM method, and all conditioning factors were used according based on assessing the results. Finally, model performances were evaluated and compared using ACC values, ROC curves, AUC values, Std. error, CI
Table 4 Pairwise comparison of the three models. Parameters
LMT vs. RF
LMT vs. CART
RF vs. CART
z value p value
0.911 0.362
3.275 0.001
4.233 0.001
at 95%, and significance level P. According to this case study, all three models exhibit reasonably good performances; the RF model has the highest predictive capability compared with the LMT and CART models. The RF model, with a success rate of 0.837 and a prediction rate of 0.781, is a promising technique for landslide susceptibility mapping. Finally, these study results may be useful for decision makers and land use planning in landslide-prone areas.
Fig. 7. Success rates using the training data.
158
W. Chen et al. / Catena 151 (2017) 147–160
Fig. 8. Prediction rates using the validation data.
Acknowledgments The authors would like to express their gratitude to the Editor in Chief (Prof. Markus Egli) and two anonymous reviewers for their helpful comments on the manuscript. This research was supported by the Doctoral Scientific Research Foundation of Xi'an University of Science and Technology (Grant No. 2015QDJ067), Opening fund of State Key Laboratory of Geohazard Prevention and Geoenvironment Protection The authors would like to express their gratitude to the Editor in Chief (Prof.Markus Egli) and two anonymous reviewers for their helpful comments on the manuscript. This research was supported by the Doctoral Scientific Research Foundation of Xi'an University of Science and Technology (Grant No. 2015QDJ067), Opening fund of State Key Laboratory of Geohazard Prevention and Geoenvironment Protection (Chengdu University of Technology) (Grant No. SKLGP2017K010), and National Science Foundation of China (Grant No. 41602359). References Ahmad, A., Dey, L., 2005. A feature selection technique for classificatory analysis. Pattern Recogn. Lett. 26 (1), 43–56. Akgun, A., Sezer, E.A., Nefeslioglu, H.A., Gokceoglu, C., Pradhan, B., 2012. An easy-to-use MATLAB program (MamLand) for the assessment of landslide susceptibility using a Mamdani fuzzy algorithm. Comput. Geosci. 38 (1), 23–34.
Aleotti, P., Chowdhury, R., 1999. Landslide hazard assessment: summary review and new perspectives. Bull. Eng. Geol. Environ. 58 (1), 21–44. Bevilacqua, M., Braglia, M., Montanari, R., 2003. The classification and regression tree approach to pump failure rate analysis. Reliab. Eng. Syst. Saf. 79 (1), 59–67. Breiman, L., 2001. Random forests. Mach. Learn. 45 (1), 5–32. Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J., 1984. Classification and Regression Trees Belmont. Wadsworth International Group, CA. Calderoni, L., Ferrara, M., Franco, A., Maio, D., 2015. Indoor localization in a hospital environment using random Forest classifiers. Expert Syst. Appl. 42 (1), 125–134. Cascini, L., Ciurleo, M., Di Nocera, S., Gulla, G., 2015. A new-old approach for shallow landslide analysis and susceptibility zoning in fine-grained weathered soils of southern Italy. Geomorphology 241, 371–381. Chen, W., Chai, H., Sun, X., Wang, Q., Ding, X., Hong, H., 2016a. A GIS-based comparative study of frequency ratio, statistical index and weights-of-evidence models in landslide susceptibility mapping. Arab. J. Geosci. 9 (3), 1–16. Chen, W., Chai, H., Zhao, Z., Wang, Q., Hong, H., 2016b. Landslide susceptibility mapping based on GIS and support vector machine models for the Qianyang County, China. Environ. Earth Sci. 75 (6), 1–13. Chen, W., Ding, X., Zhao, R., Shi, S., 2016c. Application of frequency ratio and weights of evidence models in landslide susceptibility mapping for the Shangzhou District of Shangluo City, China. Environ. Earth Sci. 75 (1), 1–10. Chen, W., Li, W., Chai, H., Hou, E., Li, X., Ding, X., 2016d. GIS-based landslide susceptibility mapping using analytical hierarchy process (AHP) and certainty factor (CF) models for the Baozhong region of Baoji City, China. Environ. Earth Sci. 75 (1), 1–14. Chen, W., Li, X., Wang, Y., Chen, G., Liu, S., 2014. Forested landslide detection using LiDAR data and the random forest algorithm: a case study of the three gorges, China. Remote Sens. Environ. 152, 291–301. Choi, J., Oh, H.-J., Lee, H.-J., Lee, C., Lee, S., 2012. Combining landslide susceptibility maps obtained from frequency ratio, logistic regression, and artificial neural network models using ASTER images and GIS. Eng. Geol. 124, 12–23.
W. Chen et al. / Catena 151 (2017) 147–160 Colkesen, I., Sahin, E.K., Kavzoglu, T., 2016. Susceptibility mapping of shallow landslides using kernel-based Gaussian process, support vector machines and logistic regression. African Earth Sciences]–>J. Afr. Earth Sci. 118, 53–64. Conoscenti, C., Ciaccio, M., Caraballo-Arias, N.A., Gomez-Gutierrez, A., Rotigliano, E., Agnesi, V., 2015. Assessment of susceptibility to earth-flow landslide using logistic regression and multivariate adaptive regression splines: a case of the Bence River basin (western Sicily, Italy). Geomorphology 242, 49–64. Constantin, M., Bednarik, M., Jurchescu, M.C., Vlaicu, M., 2011. Landslide susceptibility assessment using the bivariate statistical analysis and the index of entropy in the Sibiciu Basin (Romania). Environ. Earth Sci. 63 (2), 397–406. Costanzo, D., Chacón, J., Conoscenti, C., Irigaray, C., Rotigliano, E., 2014. Forward logistic regression for earth-flow landslide susceptibility assessment in the Platani river basin (southern Sicily, Italy). Landslides 11 (4), 639–653. Dai, F.C., Lee, C.F., 2002. Landslide characteristics and slope instability modeling using GIS, Lantau Island, Hong Kong. Geomorphology 42 (3–4), 213–228. Dehnavi, A., Aghdam, I.N., Pradhan, B., Varzandeh, M.H.M., 2015. A new hybrid model using step-wise weight assessment ratio analysis (SWAM) technique and adaptive neuro-fuzzy inference system (ANFIS) for regional landslide hazard assessment in Iran. Catena 135, 122–148. Demir, G., Aytekin, M., Akgün, A., İkizler, S.B., Tatar, O., 2013. A comparison of landslide susceptibility mapping of the eastern part of the north Anatolian fault zone (Turkey) by likelihood-frequency ratio and analytic hierarchy process methods. Nat. Hazards 65 (3), 1481–1506. Devkota, K.C., Regmi, A.D., Pourghasemi, H.R., Yoshida, K., Pradhan, B., Ryu, I.C., Dhital, M.R., Althuwaynee, O.F., 2013. Landslide susceptibility mapping using certainty factor, index of entropy and logistic regression models in GIS and their comparison at Mugling–Narayanghat road section in Nepal Himalaya. Nat. Hazards 65 (1), 135–165. Doetsch, P., Buck, C., Golik, P., Hoppe, N., Kramp, M., Laudenberg, J., Oberdörfer, C., Steingrube, P., Forster, J., Mauser, A., 2009. Logistic Model Trees with AUC Split Criterion for the KDD Cup 2009 Small Challenge. 7, 77–88. Felicísimo, Á.M., Cuartero, A., Remondo, J., Quirós, E., 2013. Mapping landslide susceptibility with logistic regression, multiple adaptive regression splines, classification and regression trees, and maximum entropy methods: a comparative study. Landslides 10 (2), 175–189. Guettouche, M.S., 2013. Modeling and risk assessment of landslides using fuzzy logic. Application on the slopes of the Algerian tell (Algeria). Arab. J. Geosci. 6 (9), 3163–3173. Guo, C.B., Montgomery, D.R., Zhang, Y.S., Wang, K., Yang, Z.H., 2015. Quantitative assessment of landslide susceptibility along the Xianshuihe fault zone, Tibetan plateau, China. Geomorphology 248, 93–110. Gutiérrez, F., Linares, R., Roqué, C., Zarroca, M., Carbonel, D., Rosell, J., Gutiérrez, M., 2015. Large landslides associated with a diapiric fold in Canelles reservoir (Spanish Pyrenees): detailed geological–geomorphological mapping, trenching and electrical resistivity imaging. Geomorphology 241, 224–242. Guyon, I., Weston, J., Barnhill, S., Vapnik, V., 2002. Gene selection for cancer classification using support vector machines. Mach. Learn. 46 (1), 389–422. Guzzetti, F., Carrara, A., Cardinali, M., Reichenbach, P., 1999. Landslide hazard evaluation: a review of current techniques and their application in a multi-scale study, Central Italy. Geomorphology 31 (1–4), 181–216. Hasan, M.A.M., Nasser, M., Pal, B., Ahmad, S., 2014. Support vector machine and random forest modeling for intrusion detection system (IDS). J. Intell. Learn. Syst. Appl. 6 (1), 45. Hoang, N.-D., Tien Bui, D., 2016. A novel relevance vector machine classifier with cuckoo search optimization for spatial prediction of landslides. J. Comput. Civ. Eng. http://dx. doi.org/10.1061/(ASCE)CP.1943-5487.0000557. Hong, H., Chen, W., Xu, C., Youssef, A.M., Pradhan, B., Tien Bui, D., 2016a. Rainfall-induced landslide susceptibility assessment at the Chongren area (China) using frequency ratio, certainty factor, and index of entropy. Geocarto Int. 1–16. Hong, H., Naghibi, S.A., Pourghasemi, H.R., Pradhan, B., 2016b. GIS-based landslide spatial modeling in Ganzhou City, China. Arab. J. Geosci. 9 (2), 1–26. Hong, H., Pradhan, B., Xu, C., Bui, D.T., 2015. Spatial prediction of landslide hazard at the Yihuang area (China) using two-class kernel logistic regression, alternating decision tree and support vector machines. Catena 133, 266–281. Hosmer, D.W., Lemeshow, S., 2000. Applied Logistic Regression. A Wiley-Interscience Publication, New York. Hussin, H.Y., Zumpano, V., Reichenbach, P., Sterlacchini, S., Micu, M., van Westen, C., Balteanu, D., 2016. Different landslide sampling strategies in a grid-based bi-variate statistical susceptibility model. Geomorphology 253, 508–523. Kantardzic, M., 2011. Data Mining: Concepts, Models, Methods, and Algorithms. John Wiley & Sons, Hoboken, New Jersey. Kanungo, D.P., Arora, M.K., Sarkar, S., Gupta, R.P., 2006. A comparative study of conventional, ANN black box, fuzzy and combined neural and fuzzy weighting procedures for landslide susceptibility zonation in Darjeeling Himalayas. Eng. Geol. 85 (3–4), 347–366. Kanungo, D.P., Sarkar, S., Sharma, S., 2011. Combining neural network with fuzzy, certainty factor and likelihood ratio concepts for spatial prediction of landslides. Nat. Hazards 59 (3), 1491–1512. Kim, K.N., Kim, D.W., Jeong, M.A., 2015. The usefulness of a classification and regression tree algorithm for detecting perioperative transfusion-related pulmonary complications. Transfusion 55 (11), 2582–2589. Kohestani, V.R., Hassanlourad, M., Ardakani, A., 2015. Evaluation of liquefaction potential based on CPT data using random forest. Nat. Hazards 79 (2), 1079–1089. Koon, S., Petscher, Y., 2015. Comparing Methodologies for Developing an Early Warning System: Classification and Regression Tree Model versus Logistic Regression. REL 2015–077. Southeast, Regional Educational Laboratory. Landwehr, N., Hall, M., Frank, E., 2005. Logistic model trees. Mach. Learn. 59 (1), 161–205.
159
Lee, S., 2007. Comparison of landslide susceptibility maps generated through multiple logistic regression for three test areas in Korea. Earth Surf. Process. Landf. 32 (14), 2133–2148. Lee, S., Ryu, J.-H., Kim, I.-S., 2007. Landslide susceptibility analysis and its verification using likelihood ratio, logistic regression, and artificial neural network models: case study of Youngin, Korea. Landslides 4 (4), 327–338. Liaw, A., Wiener, M., 2002. Classification and regression by randomForest. R News 2 (3), 18–22. Lin, G.-W., Chen, H., Shih, T.-Y., Lin, S., 2012. Various links between landslide debris and sediment flux during earthquake and rainstorm events. Asian Earth Sciences]– >J. Asian Earth Sci. 54, 41–48. Ma, T., Li, C., Lu, Z., Bao, Q., 2015. Rainfall intensity–duration thresholds for the initiation of landslides in Zhejiang Province, China. Geomorphology 245, 193–206. Mahjoobi, J., Etemad-Shahidi, A., 2008. An alternative approach for the prediction of significant wave heights based on classification and regression trees. Appl. Ocean Res. 30 (3), 172–177. Malinowska, A., 2014. Classification and regression tree theory application for assessment of building damage caused by surface deformation. Nat. Hazards 73 (2), 317–334. Moh'd, A., Mesleh, A., 2007. Chi square feature extraction based SVMs Arabic language text categorization system. J. Comput. Sci. 3 (6), 430–435. Nasiri Aghdam, I., Varzandeh, M.H.M., Pradhan, B., 2016. Landslide susceptibility mapping using an ensemble statistical index (Wi) and adaptive neuro-fuzzy inference system (ANFIS) model at Alborz Mountains (Iran). Environ. Earth Sci. 75 (7), 1–20. Nithya, N.S., Duraiswamy, K., 2014. Gain ratio based fuzzy weighted association rule mining classifier for medical diagnostic interface. Sadhana 39 (1), 39–52. Nourani, V., Pradhan, B., Ghaffari, H., Sharifi, S.S., 2014. Landslide susceptibility mapping at Zonouz plain, Iran using genetic programming and comparison with frequency ratio, logistic regression, and artificial neural network models. Nat. Hazards 71 (1), 523–547. Oh, H.-J., Lee, S., 2011. Landslide susceptibility mapping on Panaon Island, Philippines using a geographic information system. Environ. Earth Sci. 62 (5), 935–951. Oh, H.-J., Lee, S., Soedradjat, G.M., 2010. Quantitative landslide susceptibility mapping at Pemalang area, Indonesia. Environ. Earth Sci. 60 (6), 1317–1328. Ohlmacher, G.C., Davis, J.C., 2003. Using multiple logistic regression and GIS technology to predict landslide hazard in northeast Kansas, USA. Eng. Geol. 69 (3–4), 331–343. Ozdemir, A., Altural, T., 2013. A comparative study of frequency ratio, weights of evidence and logistic regression methods for landslide susceptibility mapping: Sultan Mountains, SW Turkey. Asian Earth Sciences]–>J. Asian Earth Sci. 64, 180–197. Park, S., Choi, C., Kim, B., Kim, J., 2013. Landslide susceptibility mapping using frequency ratio, analytic hierarchy process, logistic regression, and artificial neural network methods at the Inje area, Korea. Environ. Earth Sci. 68 (5), 1443–1464. Pham, B.T., Tien Bui, D., Pourghasemi, H.R., Indra, P., Dholakia, M.B., 2015. Landslide susceptibility assesssment in the Uttarakhand area (India) using GIS: a comparison study of prediction capability of naïve bayes, multilayer perceptron neural networks, and functional trees methods. Theor. Appl. Climatol. 1–19. Pham, B.T., Tien Bui, D., Prakash, I., Dholakia, M.B., 2016. Rotation forest fuzzy rule-based classifier ensemble for spatial prediction of landslides using GIS. Nat. Hazards 1–31. Poudyal, C.P., Chang, C., Oh, H.-J., Lee, S., 2010. Landslide susceptibility maps comparing frequency ratio and artificial neural networks: a case study from the Nepal Himalaya. Environ. Earth Sci. 61 (5), 1049–1064. Pourghasemi, H.R., Pradhan, B., Gokceoglu, C., 2012. Application of fuzzy logic and analytical hierarchy process (AHP) to landslide susceptibility mapping at Haraz watershed, Iran. Nat. Hazards 63 (2), 965–996. Pradhan, B., 2010. Landslide susceptibility mapping of a catchment area using frequency ratio, fuzzy logic and multivariate logistic regression approaches. J. Indian Soc. Remote Sens. 38 (2), 301–320. Pradhan, B., 2013. A comparative study on the predictive ability of the decision tree, support vector machine and neuro-fuzzy models in landslide susceptibility mapping using GIS. Comput. Geosci. 51, 350–365. Pradhan, B., Abokharima, M.H., Jebur, M.N., Tehrany, M.S., 2014. Land subsidence susceptibility mapping at Kinta Valley (Malaysia) using the evidential belief function model in GIS. Nat. Hazards 73 (2), 1019–1042. Quinlan, J.R., 1993. C4.5: Programs For Machine Learning. Morgan Kaufmann Publishers Inc. Rao, J., Scott, A., 1987. On simple adjustments to chi-square tests with sample survey data. Ann. Stat. 385–397. Regmi, N.R., Giardino, J.R., McDonald, E.V., Vitek, J.D., 2014. A comparison of logistic regression-based models of susceptibility to landslides in western Colorado, USA. Landslides 11 (2), 247–262. Shahabi, H., Khezri, S., Ahmad, B.B., Hashim, M., 2014. Landslide susceptibility mapping at central Zab basin, Iran: a comparison between analytical hierarchy process, frequency ratio and logistic regression models. Catena 115, 55–70. Sharma, L.P., Patel, N., Ghose, M.K., Debnath, P., 2013. Synergistic application of fuzzy logic and geo-informatics for landslide vulnerability zonation—a case study in Sikkim Himalayas, India. Applied Geomatics 5 (4), 271–284. Sharma, M., Kumar, R., 2008. GIS-based landslide hazard zonation: a case study from the Parwanoo area, lesser and outer Himalaya, H.P., India. Bull. Eng. Geol. Environ. 67 (1), 129–137. Tien Bui, D., Lofman, O., Revhaug, I., Dick, O., 2011. Landslide susceptibility analysis in the Hoa Binh province of Vietnam using statistical index and logistic regression. Nat. Hazards 59 (3), 1413–1444. Tien Bui, D., Nguyen, Q.-P., Hoang, N.-D., Klempe, H., 2016a. A novel fuzzy K-nearest neighbor inference model with differential evolution for spatial prediction of rainfall-induced shallow landslides in a tropical hilly area using GIS. Landslides http:// dx.doi.org/10.1007/s10346-016-0708-4. Tien Bui, D., Pradhan, B., Lofman, O., Revhaug, I., Dick, O.B., 2012a. Landslide susceptibility assessment in the Hoa Binh province of Vietnam: a comparison of the Levenberg-
160
W. Chen et al. / Catena 151 (2017) 147–160
Marquardt and Bayesian regularized neural networks. Geomorphology 171–172 (0), 12–29. Tien Bui, D., Pradhan, B., Lofman, O., Revhaug, I., Dick, O.B., 2012b. Landslide susceptibility mapping at Hoa Binh province (Vietnam) using an adaptive neuro-fuzzy inference system and GIS. Comput. Geosci. 45 (0), 199–211. Tien Bui, D., Pradhan, B., Lofman, O., Revhaug, I., Dick, Ø.B., 2013. Regional prediction of landslide hazard using probability analysis of intense rainfall in the Hoa Binh province, Vietnam. Nat. Hazards 66 (2), 707–730. Tien Bui, D., Pradhan, B., Revhaug, I., Trung Tran, C., 2014. A comparative assessment between the application of fuzzy unordered rules induction algorithm and J48 decision tree models in spatial prediction of shallow landslides at Lang Son City, Vietnam. In: Srivastava, P.K., Mukherjee, S., Gupta, M., Islam, T. (Eds.), Remote Sensing Applications in Environmental Research. Society of Earth Scientists Series. Springer International Publishing, Cham, Switzerland, pp. 87–111. Tien Bui, D., Tuan, T.A., Hoang, N.-D., Thanh, N.Q., Nguyen, D.B., Van Liem, N., Pradhan, B., 2016b. Spatial prediction of rainfall-induced landslides for the Lao Cai area (Vietnam) using a hybrid intelligent approach of least squares support vector machines inference model and artificial bee colony optimization. Landslides 1–12. Tien Bui, D., Tuan, T.A., Klempe, H., Pradhan, B., Revhaug, I., 2016c. Spatial prediction models for shallow landslide hazards: a comparative assessment of the efficacy of support vector machines, artificial neural networks, kernel logistic regression, and logistic model tree. Landslides 13 (2), 361–378. Trigila, A., Iadanza, C., Esposito, C., Scarascia-Mugnozza, G., 2015. Comparison of logistic regression and random forests techniques for shallow landslide susceptibility assessment in Giampilieri (NE Sicily, Italy). Geomorphology 249, 119–136. Tsangaratos, P., Ilia, I., 2016. Comparison of a logistic regression and Naïve Bayes classifier in landslide susceptibility assessments: the influence of models complexity and training dataset size. Catena 145, 164–179. Vorpahl, P., Elsenbeer, H., Märker, M., Schröder, B., 2012. How can statistical models help to determine driving factors of landslides? Ecol. Model. 239 (1), 27–39. Wang, L.-J., Guo, M., Sawada, K., Lin, J., Zhang, J., 2015a. Landslide susceptibility mapping in Mizunami City, Japan: a comparison between logistic regression, bivariate statistical analysis and multivariate adaptive regression spline models. Catena 135, 271–282.
Xu, C., Dai, F., Xu, X., Lee, Y.H., 2012. GIS-based support vector machine modeling of earthquake-triggered landslide susceptibility in the Jianjiang River watershed, China. Geomorphology 145, 70–80. Xu, C., Xu, X., Shyu, J.B.H., 2015. Database and spatial distribution of landslides triggered by the Lushan, China Mw 6.6 earthquake of 20 April 2013. Geomorphology 248, 77–92. Xu, C., Xu, X., Shyu, J.B.H., Zheng, W., Min, W., 2014. Landslides triggered by the 22 July 2013 Minxian–Zhangxian, China, Mw 5.9 earthquake: inventory compiling and spatial distribution analysis. Asian Earth Sciences]–>J. Asian Earth Sci. 92, 125–142. Yalcin, A., Reis, S., Aydinoglu, A.C., Yomralioglu, T., 2011. A GIS-based comparative study of frequency ratio, analytical hierarchy process, bivariate statistics and logistics regression methods for landslide susceptibility mapping in Trabzon, NE Turkey. Catena 85 (3), 274–287. Yang, T., Gao, X., Sorooshian, S., Li, X., 2016. Simulating California reservoir operation using the classification and regression-tree algorithm combined with a shuffled cross-validation scheme. Water Resour. Res. 52 (3), 1626–1651. Yilmaz, C., Topal, T., Süzen, M.L., 2012. GIS-based landslide susceptibility mapping using bivariate statistical analysis in Devrek (Zonguldak-Turkey). Environ. Earth Sci. 65 (7), 2161–2178. Yilmaz, I., 2010. Comparison of landslide susceptibility mapping methodologies for Koyulhisar, Turkey: conditional probability, logistic regression, artificial neural networks, and support vector machine. Environ. Earth Sci. 61 (4), 821–836. Youssef, A.M., Al-Kathery, M., Pradhan, B., 2015a. Landslide susceptibility mapping at AlHasher area, Jizan (Saudi Arabia) using GIS-based frequency ratio and index of entropy models. Geosci. J. 19 (1), 113–134. Youssef, A.M., Pourghasemi, H.R., Pourtaghi, Z.S., Al-Katheeri, M.M., 2015b. Landslide susceptibility mapping using random forest, boosted regression tree, classification and regression tree, and general linear models and comparison of their performance at Wadi Tayyah Basin, Asir Region, Saudi Arabia. Landslides 1–18. Zhou, J.-W., Cui, P., Yang, X.-G., 2013. Dynamic process analysis for the initiation and movement of the Donghekou landslide-debris flow triggered by the Wenchuan earthquake. Asian Earth Sciences]–>J. Asian Earth Sci. 76, 70–84.