Modeling of rainfall-induced landslide hazard for the ...

Modeling of rainfall-induced landslide hazard for the Hoa Binh province of Vietnam

Modellering av jord- og steinrasfare for Hoa Binh-provinsen i Vietnam

Philosophiae Doctor (PhD) Thesis Dieu Tien Bui Department of Mathematical Sciences and Technology Norwegian University of Life Sciences Ås 2012

Thesis number 2012:63 ISSN 1503-1667 ISBN 978-82-575-1099-2

ii

Acknowledgements The PhD research was funded by the Norwegian government through the Quota Scheme. The data analysis and write-up of the thesis were carried out at the Department of Mathematical Sciences and Technology (IMT), Norwegian University of Life Sciences (UMB). I would like to acknowledge these institutions for their support during my PhD study. The thesis would not have been possible without the guidance, collaboration, help, and support of several individuals who in one way or another contributed valuable assistance in the preparation and completion of this study. Foremost, I would like to thank Prof. Inge Revhaug, Prof. Owe Løfman, and Prof. Øystein.B Dick for their supervision, patience, steadfast encouragement, and for always being very cheerful and caring during my PhD work. Many thanks to Prof. Biswajeet Pradhan for feedback, resourcefulness, and creativity which contributed greatly to my work. Thanks to Associate Professor Håvard Tveite for help and support. I am grateful to Dr. Razak Seidu for discussion and interpretation of the study results as well as for insights he has shared. I am very grateful to Mrs. Kari Strande at the Ministry of the Environment of Norway, for her encouragement and support in diverse ways to me and my family. Furthermore, I would like to express my sincere thanks to Dr. Tran Tan Van, director of Vietnam Institute of Geosciences and Mineral Resources (VIGMR) for support and valuable comments during the preparation of thematic maps and landslide data. Special thanks to Dr. Nguyen Dai Trung, Mr. Ho Tien Chung, and Mr. Pham Viet Ha at VIGMR for providing instructions about interpreting SPOT imagery for landslides during the fieldwork phase. I also thank Mr. Vu Manh Hao at the Centre for Geological Appraisal & Technology, Ministry of Natural Resources and Environment of Vietnam, for providing the geological data for the study area. I extend thanks for help and support in data collection for the study area to Profs. Nguyen Ngoc Thach and Nhu Thi Xuan at Vietnam National University, to researchers Bui Ngoc Quy and Tran Trung Chuyen at Hanoi University of Mining and Geology, and to Dr. Nghiem Van Tuan at the National Remote Sensing Center of Vietnam. Thanks to all my friends in Norway for many nice times outside working hours, such as during holidays, football, barbecues, and all the other relaxing moments. Focusing on my PhD study is only possible if there are moments that I do not have to focus. Finally, I would like to say special thanks to my wife Nguyen Thi Hanh Minh and my son Bui Minh Hai for their understanding and love that allowed me to spend most of my time on the research work. I devote my deepest gratitude to my parents for their unlimited love and support and to my brothers for their encouragement of this work. Dieu Tien Bui Ås, September 2012

iii

Table of Contents Acknowledgements ……………………………………………………………………iii Table of contents …………………………………………………………………........iv Summary……………………………………………………………………………….vi Sammendrag ……………………………...…………………………………...….…..vii List of papers ………………………………………………………………………....viii Nomenclature ………………………………………………………………………….ix 1. Introduction .............................................................................................................. 10 2. Objective .............................................................................................................................. 12 3. Research methodology ...................................................................................................... 12 3.1. Description of the study area ................................................................................ 12 3.2. Conceptual framework ......................................................................................... 15 3.3. Data used .............................................................................................................. 15 3.4. Methods ................................................................................................................ 16 3.4.1 Statistic index and logistic regression models ................................................ 16 3.4.2 Neuro-fuzzy models ....................................................................................... 16 3.4.3 Fuzzy logic and evidential belief functions models ....................................... 17 3.4.4 Artificial neural networks ............................................................................... 18 3.4.5 Support vector machines with kernel function analysis ................................. 18 3.4.6 Decision tree and Naïve Bayes models .......................................................... 18 3.4.7 Temporal prediction of landslide and landslide hazard assessment ............... 19 4. Summary of results ............................................................................................................ 19 4.1. Paper I: Landslide susceptibility analysis in the Hoa Binh province of Vietnam using statistical index and logistic regression ............................................................. 19 4.2. Paper II. Landslide susceptibility mapping at Hoa Binh province (Vietnam) using an adaptive neuro-fuzzy inference system (ANFIS) and GIS ..................................... 19 4.3. Paper III. Spatial prediction of landslide hazards in Hoa Binh province (Vietnam): a comparative assessment of the efficacy of evidential belief functions and fuzzy logic models ...................................................................................................... 19 iv

4.4. Paper IV. Landslide susceptibility assessment in the Hoa Binh province of Vietnam: A comparison of the Levenberg-Marquardt and Bayesian regularized neural networks ...................................................................................................................... 20 4.5. Paper V. Application of support vector machines in landslide susceptibility assessment for the Hoa Binh province (Vietnam) with kernel functions analysis ...... 20 4.6. Paper VI. Landslide susceptibility assessment in Vietnam using Support vector machines, Decision tree, and Naïve Bayes models ..................................................... 20 4.7. Paper VII. Regional prediction of landslide hazard in the Hoa Binh province (Vietnam) using probability analysis of intense rainfall ............................................. 20 5. Discussions and conclusion ............................................................................................... 21 5.1 Landslide inventory and conditioning factors ....................................................... 21 5.2 Landslide susceptibility model and sampling strategy .......................................... 22 5.3. Temporal probability model ................................................................................. 23 5.4. Validation and comparison................................................................................... 23 6. Future works ....................................................................................................................... 24 6.1. Landslide risk assessment .................................................................................... 24 6.2 Design and development of a Web-GIS interface and related service .................. 24 6.3 Monitoring and prediction ..................................................................................... 25 6.4 Dissemination and communication ....................................................................... 25 6.5 Response capability ............................................................................................... 25 Reference .................................................................................................................................. 26

v

Summary Landslides are among the major types of natural hazards that cause different kinds of damage affecting people, organizations, industries, and the environment. Vietnam has been identified as a place particularly vulnerable to some of the worst manifestations of climate change. Together with flooding, landslides are among the recurrent natural hazard problems that are widespread and have caused large losses of life and property in the mountainous region in northwestern Vietnam. Landslide disasters can be reduced by understanding the triggering mechanism and hazard assessment. However, to date very few attempts have been made to assess the landslide hazards in the region. The process of creating landslide hazard maps is thus still challenging. This thesis addresses this key problem by proposing a new methodology for assessment of landslide hazards. One of the most difficult aspects of landslide hazard assessment is to estimate the spatial probability of landslides. In general, the quality of landslide susceptibility models is influenced both by the methods used and the sampling strategies employed. Although many different methods and techniques for landslide susceptibility analysis have been proposed and implemented, no agreement has been reacheded so far regarding the best method and technique for landslide susceptibility mapping. In this thesis, various new landslide susceptibility models have been successfully developed and applied for the study area. The models are derived using methods including support vector machines, neuro-fuzzy (six different models), artificial neural networks (Levenberg-Marquardt and Bayesian regularized), evidential belief functions, decision tree, and Naïve Bayes. All of these models were developed based on the assumption that landslides will occur in the future under the same conditions and triggering factors that influenced them in the past. When building these models, landslide inventories were used to derive quantitative relationships between landslide occurrences and landslide conditioning factors (slope, aspect, curvature, relief amplitude, lithology, land use, soil type, distance to roads, distance to rivers, distance to faults, and rainfall). The performance of the models was compared with landslide susceptibility models obtained from conventional methods such as bivariate statistics, logistic regression, and fuzzy logic. The result shows that the support vector machine model has the highest prediction capability compared to the other models. Estimation of temporal probability of landslide occurrence is based on the assumption that probability of occurrence of a landslide is related to probability of occurrence of the triggering rainfall threshold. Landslide activity will not occur, or occur only rarely, when rainfall amounts are below the rainfall threshold. The rainfall threshold was established based on daily and 15-day antecedent rainfalls for past landslide events. The temporal probability of a landslide to occur was then calculated based on probability of occurrence of episodes of rainfall exceeding the rainfall threshold for a period of 21 years (1990 to 2010) using a Poisson probability model. Finally, landslide hazard maps were obtained by integrating spatial and temporal probability maps.

vi

Sammendrag Jord- og steinras er alvorlige naturkatastrofer og forårsaker ulike skader som påvirker mennesker, organisasjoner, næringsliv og miljø. Vietnam er ansett for å være spesielt sårbart for virkningene av klimaendringene. Sammen med flom er jord- og steinras blant de verste i landet, av hyppig forekommende naturkatastrofer. Flom og ras har forårsaket store tap av liv og eiendom i de fjellriker områdene i det nordvestlige Vietnam. Virkningen av katastrofer forårsaket av ras kan reduseres ved å forstå de utløsende mekanismene og foreta risikovurdering. Det har hittil vært gjort svært få forsøk på å vurdere rasfarene i regionen. Prosessen med å lage kart over rasfarlige områder er derfor en utfordring. Denne avhandlingen tar for seg problemet og foreslår en ny metodikk for estimering av rasfare over større områder. En av de vanskeligste aspektene ved vurdering av mulige skader på grunn av ras, er å anslå romlig sannsynlighet. Generelt er kvaliteten på modeller som estimerer sannsynlighet for rasfare påvirket både av valgte analysemetoder og av valgte strategier for datainnsamling. Selv om mange ulike metoder og teknikker har blitt foreslått og anvendt, så har man enda ikke kommet til enighet om hvilken som er den beste til å estimere sannsynlighet for ras og registrere i kart. I denne avhandlingen har flere ulike nye modeller for å estimere tilbøyelighet for ras ut fra gitte faktorer (terrengets helling, jord/fjelltype, vegetasjon etc.) blitt utviklet og anvendt på studieområdet. Forskjellige metoder for å generere modellene er testet. Disse metodene er: Support vektor maskiner, Nevro-fuzzy (seks forskjellige modeller), Kunstige neurale nettverk (Levenberg-Marquardt og Bayesiansk regularisert), Evidential belief functions, Decision tree, og Naïve Bayes. Alle modellene ble utviklet basert på antagelsen om at skred vil skje i fremtiden under de samme betingelsene og med de samme utløsende faktorene som i fortiden. Modellene ble utviklet basert på en base med informasjon om tidligere ras. Basen ble benyttet til å utlede kvantitative sammenhenger mellom forekomster av ras og betingende faktorer (terrengets helling, aspekt, kurvatur, terrengamplitude, litologi, arealbruk, jordtype, avstand til veier, avstand til elver, avstand til forkastninger og nedbør). Modellene ble sammenlignet og resultatene viser at modellen utviklet med support vektor maskin viser best evne til å predikere. Temporal sannsynlighet for ras er beregnet ut fra sannsynligheten for at utløsende faktor overskrider en terskel. Her antas nedbør å være utløsende faktor. Det forutsettes at det ikke vil være jord- eller steinrasaktivitet eller at det bare sjeldent vil forekomme, når nedbøren ligger under nedbørterskelen. Nedbørterskelen ble estimert basert på nedbørmålinger sammenlignet med rasaktivitet. Best samsvar viste en kombinasjon av dagens og 15-dagers foregående nedbørsmengder. Den temporale sannsynligheten for at ras skal gå ble deretter beregnet basert på sannsynlighet for forekomst av episoder av nedbør over nedbørterskelen for en periode på 21 år (1990 til 2010) ved hjelp av en Poisson-sannsynlighetsmodell. Rasfarekart ble til slutt laget ved å kombinere kart over tilbøyelighet med kart over temporal sannsynlighet.

vii

List of papers This thesis is based on the following papers I.

Tien Bui, D., Lofman, O., Revhaug, I., Dick, O.B, 2011. Landslide susceptibility analysis in the Hoa Binh province of Vietnam using statistical index and logistic regression. Natural Hazards, 59, 1413–1444.

II.

Tien Bui, D., Pradhan, B., Lofman, O., Revhaug, I., Dick, O.B., 2011. Landslide susceptibility mapping at Hoa Binh province (Vietnam) using an adaptive neurofuzzy inference system and GIS. Computers & Geosciences, 45, 199-211.

III. Tien Bui, D., Pradhan, B., Lofman, O., Revhaug, I., Dick, O.B., 2012. Spatial prediction of landslide hazards in Hoa Binh province (Vietnam): a comparative assessment of the efficacy of evidential belief functions and fuzzy logic models. CATENA, 96, 28-40. IV.

Tien Bui, D., Pradhan, B., Lofman, O., Revhaug, I., Dick, O.B., 2012. Landslide susceptibility assessment in the Hoa Binh province of Vietnam: A comparison of the Levenberg-Marquardt and Bayesian regularized neural networks. Geomorphology, 171–172, 12–29.

V.

Tien Bui, D., Pradhan, B., Lofman, O., Revhaug, I., Dick, O.B., 2012. Application of support vector machines in landslide susceptibility assessment for the Hoa Binh province (Vietnam) with kernel functions analysis. Proceedings of the iEMSs Fourth Biennial Meeting: International Congress on Environmental Modelling and Software (iEMSs 2012). International Environmental Modelling and Software Society, Leipzig, Germany, July 2012.

VI. Tien Bui, D., Pradhan, B., Lofman, O., Revhaug, I., 2012.. Landslide susceptibility assessment in Vietnam using Support vector machines, Decision tree and Naïve Bayes models. Mathematical Problems in Engineering. Doi: 10.1155/2012/974638. VII. Tien Bui, D., Pradhan, B., Lofman, O., Revhaug, I., Dick, O.B., 2012. Regional prediction of landslides hazards in the Hoa Binh province (Vietnam) using probability analysis of intense rainfall. Natural Hazards. Doi: 101007/s11069012-0510-0 0:0. (Accepted). Published papers were reprinted with permission from the publishers

Supervisor Main Supervisor Prof. Owe Løfman, Department of Mathematical Sciences and Technology, Norwegian University of Life Sciences, Ås, Norway. Co- supervisors Prof. Inge Revhaug, Department of Mathematical Sciences and Technology, Norwegian University of Life Sciences, Ås, Norway. Prof. Øystein B. Dick, Department of Mathematical Sciences and Technology, Norwegian University of Life Sciences, Ås, Norway. viii

Nomenclature ANFIS ANN AUC Bel BR BRNN CF DEM DIP Dis DT EBF FIS FN FP GIS GPS IEEE LL LM LMNN LN LR LSI MLE MLP MNI NB PL Pls PL-SVM RBF RBF-SVM RMSE ROC SIG SIG-SVM SPSS SVM TN TOL TP Unc VAF VIF WEKA

Adaptive Neuro-Fuzzy Inference Systems Artificial Neural Networks Areas Under the Curve Degree of Belief Bayesian Regularized Bayesian Regularized Neural Networks Confidence Factor Digital Elevation Model Digital Image Processing Degree of Disbelief Decision Tree Evidential Belief Function Fuzzy Inference System False Negative False Positive Geographic Information System Global Positioning System The Institute of Electrical and Electronics Engineers Log Likelihood Levenberg-Marquardt Levenberg-Marquardt Neural Networks Linear Kernel Function Logistic Regression Landslide Susceptibility Index Maximum Likelihood Estimation Multi-Layered Perceptron Minimum Number of Instance Naïve Bayes Polynomial Function Degree of Plausibility Polynomial Function-Support Vector Machines Radial Basis Function Radial Basis Function-Support Vector Machines Root-Mean Squared Error Receiver Operating Characteristic Sigmoid Kernel Function Sigmoid Kernel Function -Support Vector Machines Statistical Package for Social Sciences Support Vector Machines True Negative Tolerance True Positive Degree of Uncertainty Values Account For Variance Inflation Factor Waikato Environment for Knowledge Analysis ix

1. Introduction Landslides are among the most significant natural hazards that cause different kinds of damage affecting people, organizations, industries, and the environment (Glade, 1998). Globally, landslides cause thousands of deaths and injuries, and many billions of USD of expense due to direct and indirect damage annually (Roberds, 2005). Developing countries suffer the most, with 95% of landslide disasters recorded in developing countries and about 0.5% of the Gross National Product lost through landslides (Chung et al., 1995). The probability of landslide occurrence is influenced by two types of factors: conditioning factors (slope, geology, soil type, landuse, etc) and triggering factors (earthquake, rainfall event, snow melting, etc.). Conditioning factors evolve very slowly and are driven by processes such as erosion and weathering, whereas triggering factors can change slope condition over a very short time (Corominas and Moya, 2008). Worldwide, around 89.6% of the fatalities were caused by landslides triggered by precipitation, whereas landslides triggered by earthquakes, mining and quarrying, and construction caused 0.7 , 1.8, and 3.4 % of deaths respectively (Petley, 2008). Climate change, with global warming and its anticipated consequences, is expected to lead to an increase in natural hazards, loss of lives and infrastructure damage (Nadim et al., 2006; Korup et al., 2012). A statistical analysis from the Centre for Research on the Epidemiology of Disasters (CRED, 2012) shows that approximately 17% of fatalities due to natural hazards are from landslides (Kjekstad and Highland, 2008). In general, there will be an increasing trend in the number of people affected by natural disasters in the future (EM-DAT, 2012). Asia is identified as the continent in which landslides have caused the greatest number of fatalities. This is due to development of megacities, large changes in the size and distribution of the population as well as land-use, and of course changes in climate (Petley, 2010). According to the International Landslide Centre at University of Durham, recorded landslide occurrences in 2007 shows that China was the most seriously affected country with 695 landslide-induced deaths, followed by Indonesia (465), India (352), Nepal (168), Bangladesh (150), and Vietnam (130). Vietnam, the object of this research, has many characteristics that make it prone to frequent and severe landslides. Vietnam is located in one of the storm centers of the world. It is one of the countries mostly hit by natural disasters and is within one of the most vulnerable regions for the effects of climate change (Alkema, 2010). In Vietnam about 16,000 people were killed and almost 74 million people were affected by natural disasters including landslides from 1980 to 2010. The estimated economic damage amounted to around 8 billion USD (UN-ISDR, 2012). The number of people affected by disasters in Vietnam is shown in Fig. 1. Landslide disasters can be reduced by understanding the triggering mechanism and developing appropriate tools for landslide prediction, assessment, risk management and early warning (Sassa and Canuti, 2008). Four common approaches have been widely employed to counter risk from landslides (Schuster and Highland, 2007): (i) avoiding landslide hazardous areas and restricting development in landslide prone areas; (ii) establishing standardized codes for excavation, construction and grading in landslide prone areas; (iii) providing protection for existing development such as property, 10

infrastructure, and people by technical mitigation measures; (iv) developing landslide hazard monitoring devices and early warning systems. Of these approaches, avoidance is still considered to be the most effective and economical way to cope with the hazards and minimize damage potential (Elliott and Paula, 2005). Avoidance can be achieved by incorporating landslide hazard information into land use planning. Local authorities can then give developers advance notice of potential landslides to locate developments on stable ground as well as to avoid landslide-prone areas. However, the process of creating landslide hazard maps is often difficult in developing countries (Harp et al., 2009) and this is also the case for Vietnam. This thesis addresses this issue through development of methods for assessing landslide hazard for the Hoa Binh province of Vietnam. 7.18

6.42

5.04

6

1.52

1.65 0.77

2.57

2.50

1.81 0.42 0.54 0.87

1

0.51 0.77 0.23 0.05 0.39 0.42 1.05

2

0.94 1.32

3

2.73

3.35

4

3.74

4.08

5

1.30 1.18 0.65 0.45

Number people effected (million)

7

1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010

0

Fig. 1 Number of people affected by natural disasters in Vietnam 1981-2010 (Data source: International Disaster Database. http://www.emdat.be/) Landslide hazard can be expressed as probability of occurrence of a potentially damaging landslide within a specified period of time and within a given area (Varnes, 1984; Van Westen et al., 2006). This definition incorporates the concepts of both location and time. It means that when assessing landslide hazards, one has to predict both “where” a landslide will occur (spatial probability) and “when” or how frequently it will occur (temporal probability). Since the quality of landslide susceptibility models is influenced both by the methods used and the sampling strategies followed (Yilmaz, 2010b), a variety of methods and techniques have been proposed for landslide modeling during the last two decades. Generally they can be classified into four main categories: landslide inventories, indexbased methods, statistically-based models, and deterministic approaches (Aleotti and Chowdhury, 1999; Guzzetti et al., 1999; Chacon et al., 2006; Guzzetti et al., 2006; Van Westen et al., 2006). Deterministic methods are generally considered to have the highest precision (Harp et al., 2009). Deterministic methods are based on modeling of factors from geotechnical material properties, slope, and other triggering factors. However, deterministic methods are only feasible in areas where landslide types are simple and geomorphic and geologic properties are fairly homogeneous (Van Westen and Terlien, 1996). Index-based 11

methods depend mostly on the experience and knowledge of the earth scientists who carry out the analysis. Many factors influence landslides. A landslide analysis process therefore requires contribution of knowledge from many specialized fields. The indexbased methods are generally less precise due to the subjectivity involved with assigning weights to the factors. Statistically-based models analyze functional relationships between instability factors and existing landslides using a landslide inventory (Ermini et al., 2005; Lee and Pradhan, 2007; Akgun and Turk, 2010; Pradhan, 2010; Oh and Lee, 2011). These models are less subjective than index-based models and may be applied to a large geographic area, providing rapid spatial assessment of the correlation between landslides and topographic and other map able attributes (Gorsevski et al., 2001). However, statistically-based models require collection of a large amount of data to produce reliable results, a time-consuming and complex process. In recent years, some new approaches to landslide susceptibility modeling have been proposed using soft computing techniques such as fuzzy logic (Kanungo et al., 2006; Lee, 2007a; Pradhan, 2011), neuro-fuzzy (Pradhan et al., 2010; Oh and Pradhan, 2011; Sezer et al., 2011), artificial neural networks (Lee, 2007b; Falaschi et al., 2009; Pradhan and Lee, 2010b), support vector machines (Yao et al., 2008; Yilmaz, 2010a), and decision-tree models (Saito et al., 2009; Yeon et al., 2010). However, no agreement has been reached so far as to which method and set of techniques is the best for landslide hazard modeling (Carrara and Pike, 2008; Tien Bui et al., 2012) . The research presented in this thesis is focused on this problem.

2. Objective The overall goal of the research presented in this thesis is to develop models for regional landslide hazard assessment for the Hoa Binh province of Vietnam. The approach taken was to integrate spatial and temporal probability of landslide occurrences. Specific objectives of the research are: 1. To identify landslide conditioning factors in the study area based on analysis of historical landslides and fieldwork survey (Papers I, II, III, IV, V). 2. To develop and apply models for spatial prediction of landslide hazard, assess the role of the conditioning factors in the susceptibility models, and validate and compare the results. The models are developed using the following statistical methods: statistical index and logistic regression (Paper I); neuro-fuzzy (Paper II); fuzzy logic and evidential belief functions (Paper III); artificial neural networks (Paper IV); support vector machines, decision tree, and Naïve Bayes (Papers V, VI). 3. To determine a regional rainfall threshold for landslide initiation, develop a temporal probability model, and perform landslide hazard assessment (Paper VII).

3. Research methodology 3.1. Description of the study area The Hoa Binh province is a hilly area located between the Northwestern mountainous region and the Red River plain of Vietnam. It lies from 104o48’ to 105o50’ East 12

longitude, and from 20o17’ to 21o08’North latitude. The province has a total area of about 4660 km2, and is administratively comprised of one city and 10 districts: Hoa Binh city, Luong Son, Cao Phong, Da Bac, Kim Boi, Ky Son, Lac Son, Lac Thuy, Mai Chau, Tan Lac, and Yen Thuy (Fig. 2).

Fig. 2 Location of the Hoa Binh province of Vietnam The province is situated in the tropical monsoon region where the climate is characterized as being hot, rainy, and dry depending on the season. The average annual temperature is 24oC. The warmest period is from May to August with average temperature of 26.7oC. The lowest average monthly temperature 14.9oC is in January. 13

The observations for the last two decades showed maximum and minimum temperature of 38oC and 5oC in July and January respectively. The average monthly humidity is 85%. The highest and lowest humidity are approximately 90% (July to November) and 75% (December).

Total rainfall per month(mm)

350 300 250 200 150 100 50 0 I

II

III

IV

V

VI VII Month

VIII

IX

X

XI

XII

Fig. 3 Average monthly rainfall for 1973-2002 in the Hoa Binh province (Source: Vietnam Institute of Meteorology and Hydrology, 2010) Seasons in the province are classified as rainy or dry. The rainy season is normally from May to October with a high frequency of intense rainfalls with a monthly average rainfall of about 200 mm. In August and September, rainfall peaks at 300 to 400 mm per month. About 84–90% of annual rainfall falls during the rainy season. The annual maximum for daily rainfall is often over 100 mm. In some years, the maximum daily rainfall can reach 350 mm. In Mai Chau a record high of 950 mm fell on 26 September 2006. Rainfall that is concentrated over a short period of time, often triggering landslides, flooding, and erosion in the study area. Elevation in the province ranges from 0-1,510 m above sea level, gradually decreasing from northwest to southeast. The landscape in the province is rather diverse. Generally speaking it can be reclassified into three basic classes: the mountainous complex, the hilly complex, and the valley. The topography includes mountains, small valleys, hills, mounts, cliffs, and plains. The mountainous region is strongly dissected and steep. The plains are small and intermixed with valleys, whereas the hills are dispersed between the mountains and the plains. This topography makes the province one of the regions in Vietnam most prone to natural disasters such as floods and landslides. The reasons for selecting this study area are: - The area typically has intense rainfall. Landslides and flooding are among the recurrent natural hazard problems affecting the province which lead to loss of lives and property. Landslides have mainly occurred during heavy rainfall events, especially during tropical storms. - The population of the province has increased from 786,694 in 1999 to 832,532 in 2009 leading to growth of new residential areas. Settlement pressure is increasing towards highlands. New settlements combined with inappropriate land use management are main factors increasing the frequency of landslides in this area. The terrains, in combination 14

with operation of the Hoa Binh hydropower station and population growth, have caused natural disasters such as flooding, erosion, and in particular, landslides. 3.2. Conceptual framework The conceptual framework underpinning this thesis is the United Nations International Strategy for Disaster Reduction (UN-ISDR) Platform for the Promotion of Early Warning (UN, 2006). Fig. 4 presents the components of a landslide early warning system. The framework consists of four inter-link components: (1) landslide hazard and risk mapping; (2) monitoring and warning service; (3) dissemination and communication; and (4) response capability. Key components of the early warning system are landslide hazard modeling and landslide risk assessment. High quality landslide hazard and risk maps play a very important role. The quality of landslide models is influenced both by the methods used and the sampling strategies followed (Yilmaz, 2010b); a variety of methods and techniques have been proposed during the last two decades. Agreement has not yet been reached as to which method and set of techniques is the best for landslide hazards modeling (Carrara and Pike, 2008; Tien Bui et al., 2012). The landslide hazard assessment component of the framework was the topic of the work presented here. (A) LANDSLIDE RISK KNOWLEDGE Systematically collect data and undertake risk assessment - Are the landslide hazards and the vulnerabilities well known? - What are the patterns and trends in these factors? - Are risk maps and data available?

(B) MONITORING & WARNING SERVICE Develop landslide hazard monitoring and early warning service - Are the right parameters being monitored? - Is there a sound scientific basic for making a forecast? - Can accurate and timely warnings be generated?

(C) DISSEMINATION & COMMUNICATION Communicate landslide risk information and early warnings - Do warnings reach all of those at landslide risk? - Are the landslide risks and -warning understood? - Is the landslide risk warning information clear and useable?

(D) RESPONSIBILITY CAPACITY Build national and community response capabilities - Are response plans up to date and tested? - Are local capacities and knowledge made use of? - Are people prepared and ready to react to a warning?

Fig. 4 Conceptual framework for the early warning system for rainfall-induced landslides (Source: UN-ISDR Platform for the Promotion of Early Warning) 3.3. Data used The landslide inventory map of the study area was compiled based mainly on landslide inventories from three projects: (1) “Investigation and assessment of the types of geological hazard in the territory of Vietnam and recommendation of remedial measures. Phase II: A Study of the northern mountainous province of Vietnam” (Hue et 15

al., 2004); (2) “The report of an investigation of natural hazards in the northwest of Vietnam” (Thinh et al., 2005); (3) “Construction of the environmental hazard zonation map for northwest territory of Vietnam” (My, 2007). In addition, some recent landslide data were collected through interpretation of SPOT satellite imagery with image resolution 2.5 m. In this thesis, eleven landslide conditioning factors were used. They are slope, aspect, curvature, relief amplitude, lithology, soil type, land use, distance to roads, distance to rivers, distance to faults, and rainfall. Slope, aspect, curvature, and relief amplitude were extracted from a digital elevation model (DEM) with a DEM spatial resolution of 20×20 m. The DEM was generated from national topographic maps at scale of 1:25,000. Lithology and distance to faults were extracted from four tiles of the Geological and Mineral Resources Map Series of Vietnam at a scale of 1:200,000. The four tile maps are: The Hanoi F-48-XXVII; the Ninh Binh F-48-XXXIV; the Van Yen F-48-XXVII; the Sam Nua F-48-XXXIII. Soil type was extracted from the National Pedology map. The scale of the map is 1:100,000. Land use was extracted from the Land Use Status Map of the Hoa Binh province. The scale of this map is 1:50,000. Distance to roads and distance to faults were calculated from the road and river networks of the Hoa Binh province. The networks were extracted from the national topographic maps at the scale of 1:50,000. Rainfall data used in this study were extracted from a database from the Institute of Meteorology and Hydrology in Vietnam. 3.4. Methods 3.4.1 Statistical index and logistic regression models Based on literature review, I concluded that the statistical index and the logistic regression should be investigated first. These are conventional methods widely used in landslide modeling and having a high prediction capability as shown in many case studies. The main purpose of using the two methods was to check a potential application for the study area and to compare the results with other published researches. The results will also be compared with the results from other models later. In the statistical index analysis, weights for each input map (representing one landslide conditioning factor) are determined based on their correlation with existing landslides (Bednarik et al., 2010). The weights are then summed to obtain the landslide susceptibility map. In the logistic regression, the dependent variable is dichotomous (a landslide pixel was set to value 1 and no-landslide pixel was set to value 0). The independent variables (landslide conditioning factors) can be continuous, discrete, dichotomous, or a mix of any of these. The predicted value is calculated as a probability between 0 and 1. 3.4.2 Neuro-fuzzy models The neuro-fuzzy models, which combine neural networks and fuzzy logic, are a relatively new approach in landslide analysis (Kanungo et al., 2006; Pradhan et al., 16

2010; Vahidnia et al., 2010; Oh and Pradhan, 2011; Sezer et al., 2011). There are several ways to combine neural networks with fuzzy logic. Kanungo et al. (2006) used weights obtained from a trained neural network and integrated them with ratings obtained from fuzzy logic to generate landslide susceptibility indexes. In another case study, Vahidnia et al (2010) used the output of a fuzzy inference system (FIS) as the target for a neural network. There is no doubt that expert knowledge played an important role in obtaining accurate results. Subjectivity is not easy to eliminate. Another combined method is the Adaptive Neuro-Fuzzy Inference System (ANFIS) developed by Jang (1993). ANFIS has been widely used in many fields (Soyguder and Alli, 2009). However, its application in landslide studies is still limited to a very few cases such as Pradhan et al. (2010), Oh and Pradhan (2011), and Sezer et al. (2011). The disadvantage of ANFIS is that it is difficult to objectively determine the epoch where the landslide model starts over-fitting in the training phase. The authors suggest that expert opinion be used to determine the number of membership functions for the inputs, the physical meanings of the inputs, and the number of training epochs for preventing over-learning. The ANFIS models developed in this thesis address the above problems by using the subtractive clustering method proposed by Chiu (1994). The problem of over-fitting was controlled by using the method proposed by Jang et al. (1997). Six different membership functions were used to build six ANFIS models: Sigmf, Dsigmf, Psigmf, Gaussmf, Gauss2mf, and Gbellmf. Finally, the models were compared to find the most suitable one for the study area. 3.4.3 Fuzzy logic and evidential belief function models Fuzzy logic has been widely used in many fields (Cheng and Agterberg, 1999; Carranza and Hale, 2001; Porwal et al., 2003; Porwal et al., 2006; Topcu and Sarıdemir, 2008). The advantage of fuzzy logic is that it is straightforward to apply, and the process of weighting landslide conditioning factors is totally controlled by the experts (Lee, 2007a). Integration of GIS and fuzzy logic has shown a high potential and robustness for landslide hazard predictions (Gorsevski et al., 2003). The knowledge-driven Evidence Belief Functions (EBF) approaches have been widely used in mineral potential mapping (Moon, 1989; An et al., 1992; Carranza and Hale, 2001; Carranza et al., 2005; Carranza et al., 2008b; Carranza et al., 2008a; Carranza, 2009; Carranza et al., 2009; Carranza and Sadeghi, 2010), while application of a datadriven EBF model is still limited to a few case studies. Carranza and Castro (2006) showed that the data-driven EBF model can be used to predict areas that can be inundated by volcanic lahars in Mount Pinatubo (Philippines). Ghosh and Carranza (2010) have shown that the data-driven EBF model can be used to map rockslide prone areas in Darjeeling Himalaya (India). In a different approach, Park (2011) applied the data-driven Dempster-Shafer model in the Jangheung area (Korea) and concluded that the data-driven Dempster-Shafer model shows a better prediction capacity than the logistic regression. Park (2011) also stated that more research should be done on the application of EBF in extensive case studies.

17

In this thesis, a comparative assessment was carried out of the efficacy of two datadriven methods (the fuzzy logic and the evidential belief functions) for spatial prediction of landslide hazards in the study area. 3.4.4 Artificial neural networks A literature review of recent landslide studies shows that susceptibility maps produced using artificial neural networks (ANN) are more realistic than those produced by conventional methods (Garcia-Rodriguez and Malpica, 2010). Several researchers have compared ANN with conventional methods such as logistic regression and frequency ratio using different datasets. Yesilnacar and Topal (2005), Nefeslioglu et al. (2008), and Falaschi et al. (2009) stated that ANNs give a more realistic result than logistic regression. Yilmaz (2009; 2010a) and Pradhan and Lee (2010b) found ANN superior to logistic regression and frequency ratio, whereas Poudyal et al. (2010) reported that the accuracy of ANN and frequency ratio is similar. Pradhan and Lee (2010a) concluded that the prediction capability of the logistic regression model is better than that of the frequency ratio and ANN methods. However, one of the difficulties in designing an ANN model is to determine the number of hidden neurons. Too many neurons will lead to over-fitting. Conversely, a network with an insufficient number of hidden nodes will have difficulty in learning. An ANN model which is too simple or too complex will have poor prediction performance. This thesis addresses this problem by using Bayesian Regularized neural network (BRNN). This is a relatively new method that has seldom been applied in landslide susceptibility assessment. In addition, BRNN was compared with the LevenbergMarquardt neural network (LMNN). 3.4.5 Support vector machines with kernel function analysis Support vector machines is a relatively new supervised learning method based on statistical learning theory and the principle of structural risk minimization (Vapnik, 1998). This method is well suited to non-linear, high dimensional data modeling problems and provide promising perspectives in landslide susceptibility mapping (Bai et al., 2008). Micheletti et al. (2011) stated that SVM methods can be used in landslide studies because of their ability to deal with high-dimensional spaces effectively and with high classification performance. However, the performance of the SVM model depends on the choice of kernel parameters. In the literature, few studies investigate SVM using different kernel functions as related to landslide susceptibility. This thesis addresses this problem by evaluating a potential application of SVM with four kernel functions (linear, radial basis, polynomial, sigmoid) for landslide modeling in the study area. 3.4.6 Decision tree and Naïve Bayes models Decision tree, Naïve Bayes, and SVM belong to the top 10 data mining algorithms identified by the IEEE (Wu et al., 2008). Although Decision tree has been successfully applied in many real-world situations (Murthy, 1998), its application in landslide study is still limited to a few case studies such as Saito et al. (2009), Nefeslioglu et al. (2010), and Yeon et al. (2010). In general, the Decision-tree model was reported to have appropriate accuracy for estimating the probability of future landslides.

18

Naïve Bayes is regarded as a fast supervised-learning algorithm for data mining applications based on the Bayes theorem. Although the method has been successfully applied in many domains (Ratanamahatana and Gunopulos, 2003), it has seldom been used in landslide susceptibility assessment. This thesis investigates the results of Decision tree and Naïve Bayes when used for spatial prediction of landslide hazards in the study area. The results are compared with those obtained from SVM and logistic regression. 3.4.7 Temporal prediction of landslide and landslide hazard assessment The probability of occurrence of episodes of rainfall and the rainfall threshold were deduced from records of rainfall for a period of 21 years (1990 to 2010). The result was used to estimate temporal probability of a landslide occurrence using a Poisson probability model (Mateos et al., 2007). Daily and 15-day antecedent rainfall were used to establish a threshold model. Finally, landslide hazard maps were obtained by integrating spatial and temporal probability maps.

4. Summary of results 4.1. Paper I: Landslide susceptibility analysis in the Hoa Binh province of Vietnam using statistical index and logistic regression Using logistic regression and statistical index, two spatial landslide occurrence probability maps were computed. The areas under the ROC curves are 0.946 for the statistical index and 0.950 for the logistic regression methods. It indicates that both of the models have a high and almost equal prediction capability. In general, the two methods have shown to be relatively simple and cost-effective for assessing landslide susceptibility. For the logistic regression, distance to roads, slope, and lithology have the highest contribution to the model. 4.2. Paper II. Landslide susceptibility mapping at Hoa Binh province (Vietnam) using an adaptive neuro-fuzzy inference system (ANFIS) and GIS A total of six ANFIS models were constructed. The ANFIS model with Sigmf has the highest prediction capability (84.8%), followed by Gaussmf (82.5%), Dsigmf (80.7%), Psigmf (80.6%), and Gauss2mf (80.3%). The results show that landslide susceptibility mapping for the Hoa Binh province of Vietnam is viable using ANFIS. 4.3. Paper III. Spatial prediction of landslide hazards in Hoa Binh province (Vietnam): a comparative assessment of the efficacy of evidential belief functions and fuzzy logic models. A total of ten landslide susceptibility models were constructed: one evidential belief functions (EBF) and nine fuzzy logic models (the fuzzy GAMMA with seven values of λ (0.1, 0.3, 0.5, 0.7, 0.9, 0.95, and 0.975), the fuzzy PRODUCT, and the fuzzy SUM). All of the susceptibility models have reasonably good prediction capability. The EBF model has the highest prediction capability (93.7%), whereas the fuzzy SUM model has lowest prediction capability (91.9%). The remaining models with almost equal 19

prediction capabilities are intermediate between the EBF and fuzzy SUM models (from 92.7 to 92.6%).

4.4. Paper IV. Landslide susceptibility assessment in the Hoa Binh province of Vietnam: A comparison of the Levenberg-Marquardt and Bayesian regularized neural networks Two landslide susceptibility models derived from the Levenberg-Marquardt and Bayesian regularized neural networks were successfully developed. The result shows that they are effective for complex problems such as landslides susceptibility analysis, although the internal processing steps are difficult to follow. The prediction capability for the LMNN was 86.1% and for the BRNN 90.3 %. In general, the BRNN performed better than the LMNN in terms of both success and prediction rates, and was also found to be far more robust and efficient. Although the BRNN performed well, its prediction capability was slightly lower than that of logistic regression. 4.5. Paper V. Application of support vector machines in landslide susceptibility assessment for the Hoa Binh province (Vietnam) with kernel functions analysis Four support vector machines (SVM) models for landslide susceptibility were successfully developed for the study area using linear kernel function (LN), radial basis functions (RBF), polynomial function (PL), and sigmoid function (SIG). The highest prediction capability is provided by RBF-SVM (95.5%) and PL-SVM (95.6%). They are followed by LN-SVM (95.2%) and SIG-SVM (94.5%). The prediction capability of the two RBF-SVM and PL-SVM models seems to be slightly better than that of logistic regression and Bayesian regularized neural network. 4.6. Paper VI. Landslide susceptibility assessment in Vietnam using Support vector machines, Decision tree, and Naïve Bayes models Two landslide susceptibility models were successfully constructed using Decision tree (DT), and Naïve Bayes (NB) methods. The prediction capability of DT and NB models are 90.7 and 93.5% respectively. This is lower than that obtained from the logistic regression and support vector machines using the same data. 4.7. Paper VII. Regional prediction of landslide hazard in the Hoa Binh province (Vietnam) using probability analysis of intense rainfall The 15-day antecedent rainfall gives the best fit used as a supposed trigger for the existing landslides in the inventory. The result was obtained using correlation analysis between the daily rainfall relative to past landslide events and the six corresponding antecedent rainfall periods (3, 5, 7, 10, 15, and 30 days). The rainfall threshold RTH, equation for the initiation of landslides, was RTH = 128.5 – 0.164 R15Ad, where R15Ad is accumulated rainfall values for the antecedent 15 days. Temporal probability for 11 sub-regions (Table 1) was estimated based on the threshold equation and the number of times the threshold was exceeded during the 21 year period (1990-2010).

20

Five landslide susceptibility maps obtained from support vector machines, logistic regression, evidential belief functions, neuro-fuzzy, and Bayesian neural networks, were selected. A total of five landslide hazard maps with return period of one year were obtained by integrating spatial and temporal probabilities of landslides. Table 1 Temporal probability of landslide hazard for the Hoa Binh province with different return periods. No

Sub-region

1 2 3 3 4 5 6 7 8 9 10 11

Luong Son Cao Phong HoaBinh Kim Boi Tu Ly MuongChieng PhiengVe Mai Chau Tan Lac Lac Son Hung Thi Chi Ne

Temporal probability ( 1 year) (3 years) (5 years) 0.749 0.984 0.999 0.760 0.986 0.999 0.802 0.992 1 0.736 0.982 0.999 0.736 0.982 0.999 0.614 0.943 0.991 0.595 0.934 0.989 0.771 0.988 0.999 0.681 0.968 0.997 0.749 0.984 0.999 0.760 0.986 0.999 0.888 0.999 1

5. Discussion and conclusion 5.1 Landslide inventory and conditioning factors The landslide inventory is of crucial importance for any susceptibility, hazard and risk analysis (Van Westen et al., 2006). It can be used as input data for the calculation, verification, and validation of a landslide susceptibility or hazard model (Chung and Fabbri, 1999; Glade et al., 2005). Landslide inventories generally fall into one of two classes: (1) landslide events associated with a trigger; and (2) historical landslide inventories that include landslide events over time in a region (Malamud et al., 2004). A comprehensive landslide inventory should provide information on the locations, the types, failure mechanisms, causal factors, frequency of occurrence, volumes, and the damages that have been caused (Van Westen et al., 2008). With the rapid development of the application of digital tools such as geographic information systems (GIS), global positioning system (GPS), and Digital Image Processing (DIP) in the last decade, landslide inventory databases are becoming available for more countries (Van Westen et al., 2008). However, no significant attempt has been carried out so far to investigate and create a landslide inventory database in Vietnam. The landslide events were only investigated through some separate projects conducted during the last 15 years. A total of 118 landslide events were recorded in the study area. The map was mainly based on three projects recording landslides that occurred during the last 10 years, where the main trigger was heavy intensity rainfalls (Tien Bui et al., 2011). Mostly it is mass movements near the road system or near the populated areas that are recorded. 21

This means that many small landslides that may have occurred in mountainous areas far from roads and far from populated areas have not been investigated. In addition, no information about the landslide depth and volume was reported. Furthermore, the knowledge of time of occurrence for the landslides is limited and not recorded for all events. Factors that affect slope stability are numerous and depend on landslide types as well as failure mechanisms of a particular area (Varnes, 1984). The identification of landslide conditioning factors for assessing landslide susceptibility is an important task that influences the quality of the resulting landslide models. Factors are selected depending on the map scale of analysis, the landslide type, the failure mechanisms, and the characteristics of the study area (Glade and Crozier, 2005). Factors related to topography, geology, soil types, hydrology, geomorphology, and land use are the most common used in landslide analyses (Van Westen et al., 2008). In this thesis, slope, aspect, curvature, relief amplitude, lithology, distance to faults, land use, soil type, distance to roads, distance to rivers, and rainfall were selected based on the analysis of the landslide inventory and the review of literature. Including more conditioning factors does not necessary increase the prediction capability of the resulting landslide models because the more variables included, the larger uncertainty between the conditioning factors (Glade and Crozier, 2005). Thus it may also increase the cost of the analysis. Another issue addressed in this thesis is use of landslide conditioning factors from maps in different scales, such as the DEM generated from topographic maps at the scale of 1:25,000; the land use (1:50 000); lithology (1:200,000); soil type (1:100.000); etc. In general, the input data are required to be of identical quality and resolution and a downscaled high-resolution raster may lead to an incorrect result (Glade and Crozier, 2005). However, this is not a problem for the study area because landslide models were constructed at the regional scale. Landslide conditioning factors were generalized to be of identical resolution. 5.2 Landslide susceptibility model and sampling strategy Published studies show that the quality of landslide susceptibility maps is affected by the method used and the sampling strategy (Nefeslioglu et al., 2008). For this reason, it is essential to investigate the prediction and generalization capacities of different methods and techniques for the study area. The results in this thesis show that soft computing techniques can be used to create qualitative and quantitative maps of landslide prone areas. The support vector machines model has the highest prediction capability. The findings agree with Yao et al. (2008), Marjanovic et al. (2011), and Ballabio and Sterlacchini (2012), who found that SVM outperformed the logistic regression, linear discriminant, and other conventional methods. The quality of landslide models is also influenced by data sampling strategies. When building landslide models, the landslide data need to be split into two parts, training and validation data (Dixon, 2005). Literature review shows that some researchers used the same dataset for both training and validation of a landslide model (Lee and Sambath, 2006; Biswajeet and Saied, 2010). Using the same data set for training and validation 22

may reduce the reliability of the validation. Chung and Fabbri (2008) suggested dividing the study area into sub-regions such as left and right: one region for training of the model and the other for validation. However, given the extensiveness of the Hoa Binh province and the variable geological conditions, the prediction capability may not be transferrable from a sub-region to the entire study area. Due to the above limitations of the landslide inventory, partitioning by time is also impossible. Therefore we chose to randomly split the training and validation datasets in this thesis. The main disadvantage of this method is that the estimated prediction capability of a model may be too optimistic if the spatial separation between training and validation pixels is small (Brenning, 2005). An additional sampling method will be used in the future to address this issue. 5.3. Temporal probability model The temporal probability of landslide occurrence is one of the key components of landslide hazard assessment. Two independent approaches have traditionally been used to estimate the temporal probability of landslides: (1) analysis of the potential for slope failure and (2) a statistical treatment of past landslide events (Lopez Saez et al., 2012). The first approach is based on the analysis of present slope conditions and evaluates the potential for instability. The second one use frequency analysis of historical landslide events (Corominas and Moya, 2008) and was used in this thesis. Since complete landslide records for a long time span for the Hoa Binh province are not available, calculation of landslide probability cannot be performed directly. Alternatively, the information on relative recurrence of landslide triggering events was used indirectly to estimate temporal probability of landslides. The rainfall threshold was established based on 15-day antecedent rainfall and daily rainfall of past landslide events. Established thresholds were then used to calculate the annual probability that the thresholds would be exceeded (paper VII). This approach is considered preferable for regional scale cases such as this study area. Since the map scale is too small to depict the potential travel distances (Corominas and Moya, 2008), run-out analysis was not carried out to quantify hazard. 5.4. Validation and comparison In paper I, the two landslide susceptibility maps were verified using two rules for spatially effective landslide susceptibility maps: (i) the observed landslide pixels should belong to the high susceptibility class and (ii) the high susceptibility class should cover only small areas (Can et al., 2005; Bai et al., 2010). In general, these are qualitative rules. A more quantitative method should therefore be used to assess the model classification such as the ROC curve analysis, success and prediction rate method, and Cohen’s Kappa index. The ROC curve and area under the ROC curve (AUC) have been used to estimate the prediction capability of the landslide models (Gorsevski et al., 2006; Garcia-Rodriguez et al., 2008; Nandi and Shakoor, 2010). These have been considered to be an appropriate evaluation and validation tool because they are not sensitive to prevalence, that is, considerable difference between no-landslide and landslide pixels (Van Den Eeckhaut et al., 2009). When using landslide pixels, the ROC curve measures the degree of model fit using training data and measure prediction capability using the validation data. 23

The ROC curve is sometimes criticized. This is because the AUC value is affected by several factors such as: (i) the statistical model; (ii) the selection of landslides conditioning factors; (iii) the landslides inventory; and (iv) the study area. Therefore, does not the AUC value always give an accurate impressive of the landslide models predictive capability. In this case, the success-rate and prediction-rate methods (Chung and Fabbri, 1999; 2003; Van Westen et al., 2003; Guzzetti et al., 2006; Lee and Pradhan, 2007; Pradhan and Lee, 2010b) were recommended (papers II, III, IV, V, VI). The area under the success-rate and prediction-rate curves can provide quantitative values to estimate the degree of fit as well as the prediction capability of the landslide models. Cohen’s Kappa index (Cohen, 1960; Hoehler, 2000; Guzzetti et al., 2006; GarciaRodriguez and Malpica, 2010) is another quantitative measure of the models’ classification or prediction skills (Van Den Eeckhaut et al., 2009) (paper VI).

6. Future work Landslide disasters can be reduced by understanding the triggering mechanism and developing appropriate tools for landslides prediction, assessment, risk management and early warning (Sassa and Canuti, 2008). Over the last decade, methods for forecasting and predicting some types of natural hazards have become more sophisticated and reliable; however a challenge remains in providing early warning in real-time for landslide disasters (Kron et al., 2003). Early warning systems will help reduce economic loss and mitigate the number of injuries or deaths from a disaster by providing information that allows individuals and communities to protect their lives and property (Sarun, 2011). Based on the conceptual framework, the following works will be carried out: 6.1. Landslide risk assessment Elements at risk located in zones with high hazard levels will be analyzed for different types of landslides. We will focus on risk to human life, buildings, facilities, transportation networks, protected areas, cultivated land, and cultural heritage sites. Based on landslide hazard analysis and vulnerability assessment, landslide risk for the study area will be assessed. Finally, a framework for assessment of landslide hazards, vulnerability, and risk will be developed for the study area. 6.2 Design and development of a Web‐GIS interface and related services An early warning system for rainfall-induced landslides will be developed based on Web-GIS on the basis of landslide spatial prediction, in combination with real-time weather information. This system will predict spatial position and failure time of landslides. An Internet-based service with open architecture, user friendliness through graphical user interface (GUI), and capability to host multiple users will be set up. The Web-GIS system will include a GIS database server, Web-GIS server and Web-GIS browser. The GIS database based on Web-GIS will be constructed for the web map with three main group layers: (i) basic geographic information group consisting of administrative boundaries, land use, road and river systems, contour lines and so on; (ii) geo-hazard environment information group including lithology, geological structure, 24

soil type, and rainfall distribution maps; (iii) specific information containing landslide hazard, risk and vulnerability maps. The GIS server will store, operate and analyze the landslide hazards data and will send the prediction and warning forecasts to the WebGIS server. The users can access a variety of tools to query and get statistical information through interactive operation with the web maps using the Web-GIS browser. 6.3 Monitoring and prediction Real-time meteorological and hydrological monitoring is considered to be a backbone of the operational landslide warning (NOAA-USGS Debris-Flow Task Force, 2005). This is because spatial distribution, duration, and intensity of precipitation play an important role in triggering landslides. The real-time space-borne precipitation estimation system (Hong and Adler, 2007) that is available on the NASA TRMM web site (http://trmm.gsfc.nasa.gov) will be considered for use in the early warning system as a dynamic trigger warning. When receiving the weather forecast with the real-time precipitation value or precipitation contour, the spatial landslide zonation map will be overlaid with the precipitation distribution maps. Mathematical model analysis will be used to estimate areas prone for landslides and create warnings. The rainfall will be forecasted by 12 h, 24 h and 48 h. 6.4 Dissemination and communication The system will deliver landslide early warning information using existing local communication tools and channels such as SMS, radio, TV, etc. In addition, a Web-GIS technology service will be developed. The system will ensure that: (1) communication infrastructure hardware is reliable and robust, especially during the natural disasters; (2) there is appropriate and effective interaction among the main actors of the early warning process such as the scientific community, stakeholders, decision makers, the public, and the media (The Hoa Binh province committee for prevention of natural disasters, Department of Natural Resources and Environment, Department of Agricultural and Rural development, The Hoa Binh Radio and Television, The National Center for Hydro-Meteorological Forecasting). The warning messages will be transferred into adequate warning messages and distributed to the targeted population. The warning messages will be clearly formulated and adapted to the context of the target group, and contain clear instructions for appropriate protective action. 6.5 Response capability The overall aim of this phase of the project is to create ownership of solutions that communities can implement by themselves. Round table talks will be organized for officials of local government, management of the subdivision, and residents. Some activities will be developed to increase awareness in the communities, including landslide hazards, risk, warning and prevention. Step by step methodology for landslide risk assessment and early warning will be created and will be followed by a participatory risk and early warning exercise. This could comprise, for example, (1) identification of the community boundary; (2) definition and establishment of roads; (3) determination of low landslide risk areas; (4) delineation of high landslide risk areas and reporting to local authorities for action; and (5) identification of safe areas for evacuation). 25

The Hoa Binh province Committee for Natural Disasters Prevention, Search and Rescue will take an active role in response activity. We will provide intensive training and technical assistance to strengthen capacity in interpreting landslide hazards and vulnerability maps. Communities and local authorities will be trained to recognize the various levels of warning. This is crucial to creating response plans for every vulnerable location at the municipal and community level. We will develop a plan to increase public awareness of landslide hazards and build confidence in warning services at the community and municipality levels. The plan will actively engage local media, TV and radio broadcasters, schools, volunteer organizations, religious and political leaders and NGOs in annual planning, evaluation and training processes.

References Akgun, A., Turk, N., 2010. Landslide susceptibility mapping for Ayvalik (Western Turkey) and its vicinity by multicriteria decision analysis. Environmental Earth Sciences, 61, 595-611. Aleotti, P., Chowdhury, R., 1999. Landslide hazard assessment: summary review and new perspectives. Bulletin of Engineering Geology and the Environment, 58, 2144. Alkema, D., 2010. Geo-information technology for hazard risk assessment. A case study site in Yen Bai (Vietnam). The International Institute for Geo-Information Science and Earth Observation (ITC), University Twente. An, P., Moon, W.M., Bonhamcarter, G.F., 1992. On knowledge-based approach on integrating remote sensing, geophysical and geological information. International Space Year : Space Remote Sensing, Vols 1 and 2, City, pp. 34-38. Bai, S., Wang, J., Lu, G., Zhou, P., Hou, S., Xu, S., 2010. GIS-based logistic regression for landslide susceptibility mapping of the Zhongxian segment in the Three Gorges area, China. Geomorphology, 115, 23–31. Bai, S.B., Wang, J., Lu, G.N., Kanevski, M., Pozdnoukhov, A., 2008. GIS-Based landslide susceptibility mapping with comparisons of results from machine learning methods versus logistic regression in basin scale. Geophysical Research Abstracts, EGU, 10,A-06367. Ballabio, C., Sterlacchini, S., 2012. Support Vector Machines for Landslide Susceptibility Mapping: The Staffora River Basin Case Study, Italy. Mathematical Geosciences, 44, 47-70. Bednarik, M., Magulova, B., Matys, M., Marschalko, M., 2010. Landslide susceptibility assessment of the Kralovany-Liptovsky Mikulas railway case study. Physics and Chemistry of the Earth, 35, 162-171. Biswajeet, P., Saied, P., 2010. Comparison between prediction capabilities of neural network and fuzzy logic techniques for landslide susceptibility mapping. Disaster Advances, 3, 26-34. Brenning, A., 2005. Spatial prediction models for landslide hazards: review, comparison and evaluation. Natural Hazards and Earth System Sciences, 5, 853-862.

26

Can, T., Nefeslioglu. H.A, Gokceoglu.C, Sonmez.H, Duman.T.Y, 2005. Susceptibility assessments of shallow earthflows triggered by heavy rainfall at three subcatchments by logistic regression analyses. Geomorphology, 72, 250-271. Carranza, E.J.M., 2009. Controls on mineral deposit occurrence inferred from analysis of their spatial pattern and spatial association with geological features. Ore Geology Reviews, 35, 383-400. Carranza, E.J.M., Castro, O., 2006. Predicting lahar-inundation zones: case study in West Mount Pinatubo, Philippines. Natural Hazards, 37, 331-372. Carranza, E.J.M., Hale, M., 2001. Geologically constrained fuzzy mapping of gold mineralization potential, Baguio District, Philippines. Natural Resources Research, 10, 125-136. Carranza, E.J.M., Hale, M., Faassen, C., 2008a. Selection of coherent deposit-type locations and their application in data-driven mineral prospectivity mapping. Ore Geology Reviews, 33, 536-558. Carranza, E.J.M., Owusu, E., Hale, M., 2009. Mapping of prospectivity and estimation of number of undiscovered prospects for lode gold, southwestern Ashanti Belt, Ghana. Mineralium Deposita, 44, 915-938. Carranza, E.J.M., Sadeghi, M., 2010. Predictive mapping of prospectivity and quantitative estimation of undiscovered VMS deposits in Skellefte district (Sweden). Ore Geology Reviews, 38, 219-241. Carranza, E.J.M., van Ruitenbeek, F.J.A., Hecker, C., van der Meijde, M., van der Meer, F.D., 2008b. Knowledge-guided data-driven evidential belief modeling of mineral prospectivity in Cabo de Gata, SE Spain. International Journal of Applied Earth Observation and Geoinformation, 10, 374-387. Carranza, E.J.M., Woldai, T., Chikambwe, E.M., 2005. Application of data-driven evidential belief functions to prospectivity mapping for aquamarine-bearing pegmatites, Lundazi District, Zambia. Natural Resources Research, 14, 47-63. Carrara, A., Pike, R.J., 2008. GIS technology and models for assessing landslide hazard and risk. Geomorphology, 94, 257-260. Chacon, J., Irigaray, C., Fernandez, T., El Hamdouni, R., 2006. Engineering geology maps: landslides and geographical information systems. Bulletin of Engineering Geology and the Environment, 65, 341-411. Cheng, Q., Agterberg, F.P., 1999. Fuzzy weights of evidence method and its application in mineral potential mapping. Natural Resources Research, 8, 27-35. Chiu, S.L., 1994. Fuzzy model identification based on cluster estimation. Journal of Intelligent and Fuzzy Systems, 2, 267–278. Chung, C.-J., Fabbri, A.G., 2008. Predicting landslides for risk analysis — Spatial models tested by a cross-validation technique. Geomorphology, 94, 438-452. Chung, C.F., Fabbri, A.G., 1999. Probabilistic prediction models for landslide hazard mapping. Photogramm Eng Remote Sensing, 65, 1389–1399. Chung, C.J.F., Fabbri, A.G., 2003. Validation of spatial prediction models for landslide hazard mapping. Natural Hazards, 30, 451-472. Chung, C.J.F., Fabbri, A.G., Van westen, C.J., 1995. Multivariate regression analysis for landslide hazard zonation. In: A. Carrara, F. Guzzetti (Eds.), Geographical Information Systems in Assessing Natural Hazards, pp. 107-133. Cohen, J., 1960. A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37-46. 27

Corominas, J., Moya, J., 2008. A review of assessing landslide frequency for hazard zoning purposes. Engineering Geology, 102, 193-213. CRED, 2012. The Centre for Research on the Epidemiology of Disasters. http://www.cred.be. Dixon, B., 2005. Applicability of neuro-fuzzy techniques in predicting ground-water vulnerability: a GIS-based sensitivity analysis. Journal of Hydrology, 309, 1738. Elliott, C.S., Paula, L.G., 2005. National Landslide Hazards Mitigation Strategy --A Framework for Loss Reduction. U.S. Geological Survey. EM-DAT, 2012. The OFDA/CRED International Disaster Database. http://www.emdat.be/natural-disasters-trends. Ermini, L., Catani, F., Casagli, N., 2005. Artificial Neural Networks applied to landslide susceptibility assessment. Geomorphology, 66, 327-343. Falaschi, F., Giacomelli, F., Federici, P.R., Puccinelli, A., Avanzi, G.D., Pochini, A., Ribolini, A., 2009. Logistic regression versus artificial neural networks: landslide susceptibility evaluation in a sample area of the Serchio River valley, Italy. Natural Hazards, 50, 551-569. Garcia-Rodriguez, M.J., Malpica, J.A., 2010. Assessment of earthquake-triggered landslide susceptibility in El Salvador based on an Artificial Neural Network model. Natural Hazards and Earth System Sciences, 10, 1307-1315. Garcia-Rodriguez, M.J., Malpica, J.A., Benito, B., Diaz, M., 2008. Susceptibility assessment of earthquake-triggered landslides in El Salvador using logistic regression. Geomorphology, 95, 172-191. Ghosh, S., Carranza, E.J.M., 2010. Spatial analysis of mutual fault/fracture and slope controls on rocksliding in Darjeeling Himalaya, India. Geomorphology, 122, 124. Glade, T., 1998. Establishing the frequency and magnitude of landslide-triggering rainstorm events in New Zealand. Environmental Geology, 35, 160-174. Glade, T., Anderson, M., Crozier, M.J., 2005. Landslide Hazard and Risk. John Wiley & Sons Ltd. Glade, T., Crozier, M.J., 2005. A review of scale dependency in landslide hazard and risk analysis. In: T. Glade, Anderson, M., Crozier, M.J. (Ed.), Landslide Hazard and Risk. John Wiley and Sons Ltd, England, pp. 75–138. Gorsevski, P.V., Foltz, R.B., Gessler, P.E., Cundy, T.W., 2001. Statistical modeling of landslide hazard using GIS. Seventh Federal Interagency Sedimentation Conference, Silver Legacy, Reno, Nevada, City, pp. 103-109. Gorsevski, P.V., Gessler, P.E., Foltz, R.B., Elliot, W.J., 2006. Spatial prediction of landslide hazard using logistic regression and ROC analysis. Transactions in GIS, 10, 395–415. Gorsevski, P.V., Gessler, P.E., Jankowski, P., 2003. Integrating a fuzzy k-means classification and a Bayesian approach for spatial prediction of landslide hazard. Geograph Syst, 5, 223–251. Guzzetti, F., Carrara, A., Cardinali, M., Reichenbach, P., 1999. Landslide hazard evaluation: a review of current techniques and their application in a multi-scale study, Central Italy. Geomorphology, 31, 181-216. Guzzetti, F., Reichenbach, P., Ardizzone, F., Cardinali, M., Galli, M., 2006. Estimating the quality of landslide susceptibility models. Geomorphology, 81, 166-184. 28

Harp, E.L., Reid, M.E., McKenna, J.P., Michael, J.A., 2009. Mapping of hazard from rainfall-triggered landslides in developing countries: Examples from Honduras and Micronesia. Engineering Geology, 104, 295-311. Hoehler, F.K., 2000. Bias and prevalence effects on kappa viewed in terms of sensitivity and specificity. Journal of Clinical Epidemiology, 53, 499-503. Hong, Y., Adler, R.F., 2007. Towards an early-warning system for global landslides triggered by rainfall and earthquake. International Journal of Remote Sensing, 28, 3713-3719. Hue, T.T., Duong, T.V., Toan, D.V., Nghinh, L.T., Minh, V.C., Pho, N.V., Xuan, P.T., Hoan, L.T., Huyen, N.X., Pha, P.D., Chinh, V.V., Thom, B.V., 2004. Investigation and Assessment of the Types of Geological Hazard in the Territory of Vietnam and Recommendation of Remedial Measures. Phase II: A Study of the Northern Mountainous Province of Vietnam. Institute of Geological Sciences, Vietnam Academy of Science and Technology, Hanoi, 361.p. Jang, J.S.R., 1993. ANFIS : Adaptive-network-based fuzzy inference system. IEEE Transactions on Systems Man and Cybernatic, 23, 665-685. Jang, J.S.R., Sun, C.T., Mizutani, E., 1997. Neuro-Fuzzy and Soft Computing: A Computational Approach to Learning and Machine Intelligence (Matlab Curriculum Series). Prentice Hall. Kanungo, D.P., Arora, M.K., Sarkar, S., Gupta, R.P., 2006. A comparative study of conventional, ANN black box, fuzzy and combined neural and fuzzy weighting procedures for landslide susceptibility zonation in Darjeeling Himalayas. Engineering Geology, 85, 347-366. Kjekstad, O., Highland, L., 2008. Economic and Social Impacts of Landslides. In: K. Sassa, P. Canuti (Eds.), Landslides – Disaster Risk Reduction. Spinger, Berlin, pp. 573-587. Korup, O., Gorum, T., Hayakawa, Y., 2012. Without power? Landslide inventories in the face of climate change. Earth Surface Processes and Landforms, 37, 92-99. Kron, W., Smolka, A., Berz, G., 2003. Benefits of early warning from the viewpoint of the insurance industry In: J. Zschau, A.N. Küppers (Eds.), Early warning systems for natural disasters reduction. Springer, pp. 95-102. Lee, S., 2007a. Application and verification of fuzzy algebraic operators to landslide susceptibility mapping. Environmental Geology, 52, 615-623. Lee, S., 2007b. Landslide susceptibility mapping using an artificial neural network in the Gangneung area, Korea. International Journal of Remote Sensing, 28, 47634783. Lee, S., Pradhan, B., 2007. Landslide hazard mapping at Selangor, Malaysia using frequency ratio and logistic regression models. Landslides, 4, 33-41. Lee, S., Sambath, T., 2006. Landslide susceptibility mapping in the Damrei Romel area, Cambodia using frequency ratio and logistic regression models. Environmental Geology, 50, 847-855. Lopez Saez, J., Corona, C., Stoffel, M., Schoeneich, P., Berger, F., 2012. Probability maps of landslide reactivation derived from tree-ring records: Pra Bellon landslide, southern French Alps. Geomorphology, 138, 189-202. Malamud, B.D., Turcotte, D.L., Guzzetti, F., Reichenbach, P., 2004. Landslide inventories and their statistical properties. Earth Surface Processes and Landforms, 29, 687-711. 29

Marjanovic, M., Kovacevic, M., Bajat, B., Vozenílek, V., 2011. Landslide susceptibility assessment using SVM machine learning algorithm. Engineering Geology, 123, 225-234. Mateos, R.M., Azañón;, J.M., Morales;, R., López-Chicano, M., 2007. Regional prediction of landslides in the Tramuntana Range (Majorca) using probability analysis of intense rainfall. Micheletti, N., Foresti, L., Kanevski, M., Pedrazzini, A., Jaboyedoff, M., 2011. Landslide susceptibility mapping using adaptive support vector machines and feature selection. Geophysical Research Abstracts, EGU, 13. Moon, W.M., 1989. Integration of remote sensing and geological/geophysical data using Dempster-Shafer approach. City, pp. 838-841. Murthy, S.K., 1998. Automatic Construction of Decision Trees from Data: A MultiDisciplinary Survey. Data Mining and Knowledge Discovery, 2, 345-389. My, N.Q., 2007. Construction of the Environmental Hazard Zonation Map for Northwest Territory of Vietnam. Vietnam Geography Assosiation, Hanoi, 98.p. Nadim, F., Kjekstad, O., Peduzzi, P., Herold, C., Jaedicke, C., 2006. Global landslide and avalanche hotspots. Landslides, 3, 159-173. Nandi, A., Shakoor, A., 2010. A GIS-based landslide susceptibility evaluation using bivariate and multivariate statistical analyses. Engineering Geology, 110, 11-20. Nefeslioglu, H.A., Gokceoglu, C., Sonmez, H., 2008. An assessment on the use of logistic regression and artificial neural networks with different sampling strategies for the preparation of landslide susceptibility maps. Engineering Geology, 97, 171-191. Nefeslioglu, H.A., Sezer, E., Gokceoglu, C., Bozkir, A.S., Duman, T.Y., 2010. Assessment of landslide susceptibility by Decision Trees in the Metropolitan area of Istanbul, Turkey. Mathematical Problems in Engineering. doi:10.1155/2010/901095. NOAA-USGS Debris-Flow Task Force, 2005. NOAA-USGS Debris-Flow Warning System—final report. U.S. Geological Survey Circular 1283. Oh, H.-J., Lee, S., 2011. Cross-application used to validate landslide susceptibility maps using a probabilistic model from Korea. Environmental Earth Sciences, 64, 395409. Oh, H.-J., Pradhan, B., 2011. Application of a neuro-fuzzy model to landslidesusceptibility mapping for shallow landslides in a tropical hilly area. Computers & Geosciences, 37, 1264-1276 Park, N.-W., 2011. Application of Dempster-Shafer theory of evidence to GIS-based landslide susceptibility analysis. Environmental Earth Sciences, 62, 367-376. Petley, D.N., 2008. The global occurrence of fatal landslides in 2007. GeophysicalResearch Abstracts, Vol. 10, EGU General Assembly 2008. Petley, D.N., 2010. On the impact of climate change and population growth on the occurrence of fatal landslides in South, East and SE Asia. Quarterly Journal of Engineering Geology and Hydrogeology, 43, 487-496. Porwal, A., Carranza, E.J.M., Hale, M., 2003. Knowledge-driven and data-driven fuzzy models for predictive mineral potential mapping. Natural Resources Research, 12, 1-25. Porwal, A., Carranza, E.J.M., Hale, M., 2006. A hybrid fuzzy weights-of-evidence model for mineral potential mapping. Natural Resources Research, 15, 1-14. 30

Poudyal, C.P., Chang, C., Oh, H.J., Lee, S., 2010. Landslide susceptibility maps comparing frequency ratio and artificial neural networks: a case study from the Nepal Himalaya. Environmental Earth Sciences, 61, 1049-1064. Pradhan, B., 2010. Application of an advanced fuzzy logic model for landslide susceptibility analysis. International Journal of Computational Intelligence Systems, 3, 370-381. Pradhan, B., 2011. Manifestation of an advanced fuzzy logic model coupled with geoinformation techniques to landslide susceptibility mapping and their comparison with logistic regression modelling. Environmental and Ecological Statistics, 18, 471-493. Pradhan, B., Lee, S., 2010a. Delineation of landslide hazard areas on Penang Island, Malaysia, by using frequency ratio, logistic regression, and artificial neural network models. Environmental Earth Sciences, 60, 1037-1054. Pradhan, B., Lee, S., 2010b. Landslide susceptibility assessment and factor effect analysis: backpropagation artificial neural networks and their comparison with frequency ratio and bivariate logistic regression modelling. Environmental Modelling & Software, 25, 747-759. Pradhan, B., Sezer, E.A., Gokceoglu, C., Buchroithner, M.F., 2010. Landslide susceptibility mapping by neuro-fuzzy approach in a landslide-prone area (Cameron Highlands, Malaysia). IEEE Transactions on Geoscience and Remote Sensing, 48, 4164-4177. Ratanamahatana, C., Gunopulos, D., 2003. Feature selection for the naive Bayesian classifier using decision trees Applied artificial intelligence, 17, 475-487. Roberds, W., 2005. Estimating temporal and spatial variability and vulnerability. In: Hungr, Fell, Couture, Eberhardt (Eds.), Landslide Risk Management. Taylor and Francis, London. Saito, H., Nakayama, D., Matsuyama, H., 2009. Comparison of landslide susceptibility based on a decision-tree model and actual landslide occurrence: The Akaishi Mountains, Japan. Geomorphology, 109, 108-121. Sarun, S., 2011. Disaster risk communication over early warning technologies - A case study of coastal Kerala. Disaster Risk Vulnerablity Conference, Kerala, India, City, pp. 177-183. Sassa, K., Canuti, P., 2008. Landslides-Disaster Risk Reduction. Springer. Schuster, R.L., Highland, L.M., 2007. The third hans cloos lecture. Urban landslides: socioeconomic impacts and overview of mitigative strategies. Bulletin of Engineering Geology and the Environment, 66, 1-27. Sezer, E.A., Pradhan, B., Gokceoglu, C., 2011. Manifestation of an adaptive neurofuzzy model on landslide susceptibility mapping: Klang valley, Malaysia. Expert Systems with Applications, 38, 8208-8219. Soyguder, S., Alli, H., 2009. An expert system for the humidity and temperature control in HVAC systems using ANFIS and optimization with Fuzzy Modeling Approach. Energy and Buildings, 41, 814-822. Thinh, D.V., Dong, N.P., Hong, P.M., Hung, P.V., Khoi, T.N., Ke, T.D., Phu, D.V., Thang, P.X., Thanh, P.V., Thang, P.H., Thay, B.V., Thinh, N.T., Thien, T.V., Tu, M.T., Vinh, B.X., 2005. The Investigated Report of Natural Hazards in the Northwest of Vietnam. Northern Geological Mapping Division, Hanoi, 12.p.

31

Tien Bui, D., Lofman, O., Revhaug, I., Dick, O., 2011. Landslide susceptibility analysis in the Hoa Binh province of Vietnam using statistical index and logistic regression. Natural Hazards, 59, 1413–1444. Tien Bui, D., Pradhan, B., Lofman, O., Revhaug, I., Dick, O.B., 2012. Landslide susceptibility assessment in the Hoa Binh province of Vietnam: A comparison of the Levenberg-Marquardt and Bayesian regularized neural networks. Geomorphology, 171–172, 12–29. Topcu, İ.B., Sarıdemir, M., 2008. Prediction of mechanical properties of recycled aggregate concretes containing silica fume using artificial neural networks and fuzzy logic. Computational Materials Science, 42, 74-82. UN-ISDR, 2012. Viet Nam- Disaster Statistics. http://www.preventionweb.net/english/countries/statistics/?cid=190. UN, 2006. Global Survey of Early Warning Systems: An assessment of capacities, gaps and opportunities towards building a comprehensive global early warning system for all natural hazards. Vahidnia, M.H., Alesheikh, A.A., Alimohammadi, A., Hosseinali, F., 2010. A GISbased neuro-fuzzy procedure for integrating knowledge and data in landslide susceptibility mapping. Computers & Geosciences, 36, 1101-1114. Van Den Eeckhaut, M., Reichenbach, P., Guzzetti, F., Rossi, M., Poesen, J., 2009. Combined landslide inventory and susceptibility assessment based on different mapping units: an example from the Flemish Ardennes, Belgium. Natural Hazards and Earth System Sciences, 9, 507-521. Van Westen, C.J., Castellanos, E., Kuriakose, S.L., 2008. Spatial data for landslide susceptibility, hazard, and vulnerability assessment: An overview. Engineering Geology, 102, 112-131. Van Westen, C.J., Rengers, N., Soeters, R., 2003. Use of geomorphological information in indirect landslide susceptibility assessment. Natural Hazards, 30, 399-419. Van Westen, C.J., Terlien, M.T.J., 1996. An approach towards deterministic landslide hazard analysis in GIS. A case study from Manizales (Colombia). Earth Surface Processes and Landforms, 21, 853-868. Van Westen, C.J., Van Asch, T.W.J., Soeters, R., 2006. Landslide hazard and risk zonation—why is it still so difficult? Bull Eng Geol Env (2006), 65, 167–184. Vapnik, V.N., 1998. Statistical Learning Theory. Wiley-Interscience Varnes, D.J., 1984. Landslide Hazard Zonation: A Review of Principles and Practice. UNESCO, Paris. Wu, X.D., Kumar, V., Quinlan, J.R., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G.J., Ng, A., Liu, B., Yu, P.S., Zhou, Z.H., Steinbach, M., Hand, D.J., Steinberg, D., 2008. Top 10 algorithms in data mining. Knowl. Inf. Syst., 14, 1-37. Yao, X., Tham, L.G., Dai, F.C., 2008. Landslide susceptibility mapping based on Support Vector Machine: A case study on natural slopes of Hong Kong, China. Geomorphology, 101, 572-582. Yeon, Y.-K., Han, J.-G., Ryu, K.H., 2010. Landslide susceptibility mapping in Injae, Korea, using a decision tree. Engineering Geology, 116, 274-283. Yesilnacar, E., Topal, T., 2005. Landslide susceptibility mapping: A comparison of logistic regression and neural networks methods in a medium scale study, Hendek region (Turkey). Engineering Geology, 79, 251-266. 32

Yilmaz, I., 2009. Landslide susceptibility mapping using frequency ratio, logistic regression, artificial neural networks and their comparison: A case study from Kat landslides (Tokat-Turkey). Computers & Geosciences, 35, 1125-1138. Yilmaz, I., 2010a. Comparison of landslide susceptibility mapping methodologies for Koyulhisar, Turkey: conditional probability, logistic regression, artificial neural networks, and support vector machine. Environmental Earth Sciences, 61, 821836. Yilmaz, I., 2010b. The effect of the sampling strategies on the landslide susceptibility mapping by conditional probability and artificial neural networks. Environmental Earth Sciences, 60, 505-519.

33

Paper I Tien Bui, D., Lofman, O., Revhaug, I., Dick, O.B., 2011. Landslide susceptibility analysis in the Hoa Binh province of Vietnam using statistical index and logistic regression. Natural Hazards, 59, 1413–1444.

Landslide susceptibility analysis in the Hoa Binh province of Vietnam using statistical index and logistic regression Dieu Tien Bui, Owe Lofman, Inge Revhaug & Oystein Dick

Natural Hazards Journal of the International Society for the Prevention and Mitigation of Natural Hazards ISSN 0921-030X Volume 59 Number 3 Nat Hazards (2001) 59:1413-1444 DOI 10.1007/s11069-011-9844-2

1 23

Your article is protected by copyright and all rights are held exclusively by Springer Science+Business Media B.V.. This e-offprint is for personal use only and shall not be selfarchived in electronic repositories. If you wish to self-archive your work, please use the accepted author’s version for posting to your own website or your institution’s repository. You may further deposit the accepted author’s version on a funder’s repository at a funder’s request, provided it is not made publicly available until 12 months after publication.

1 23

Author's personal copy Nat Hazards (2011) 59:1413–1444 DOI 10.1007/s11069-011-9844-2 ORIGINAL PAPER

Landslide susceptibility analysis in the Hoa Binh province of Vietnam using statistical index and logistic regression Dieu Tien Bui • Owe Lofman • Inge Revhaug • Oystein Dick

Received: 8 February 2011 / Accepted: 29 April 2011 / Published online: 27 May 2011 Springer Science+Business Media B.V. 2011

Abstract The purpose of this study is to evaluate and compare the results of applying the statistical index and the logistic regression methods for estimating landslide susceptibility in the Hoa Binh province of Vietnam. In order to do this, first, a landslide inventory map was constructed mainly based on investigated landslide locations from three projects conducted over the last 10 years. In addition, some recent landslide locations were identified from SPOT satellite images, fieldwork, and literature. Secondly, ten influencing factors for landslide occurrence were utilized. The slope gradient map, the slope curvature map, and the slope aspect map were derived from a digital elevation model (DEM) with resolution 20 9 20 m. The DEM was generated from topographic maps at a scale of 1:25,000. The lithology map and the distance to faults map were extracted from Geological and Mineral Resources maps. The soil type and the land use maps were extracted from National Pedology maps and National Land Use Status maps, respectively. Distance to rivers and distance to roads were computed based on river and road networks from topographic maps. In addition, a rainfall map was included in the models. Actual landslide locations were used to verify and to compare the results of landslide susceptibility maps. The accuracy of the results was evaluated by ROC analysis. The area under the curve (AUC) for the statistical index model was 0.946 and for the logistic regression model, 0.950, indicating an almost equal predicting capacity. Keywords Landslide susceptibility Logistic regression Statistical index Hoa Binh province

1 Introduction Landslides are considered to be one of the most dangerous natural hazards that may suddenly occur, resulting in loss of human life and substantial property damage. The D. T. Bui (&) O. Lofman I. Revhaug O. Dick Department of Mathematical Sciences and Technology, Norwegian University of Life Sciences, ˚ s, Norway P.O. Box 5003IMT, 1432 A e-mail: [email protected]; [email protected]

123

Author's personal copy 1414

Nat Hazards (2011) 59:1413–1444

northwest mountainous areas in Vietnam are regions that have been heavily affected by landslides in the recent years. In particular, the Hoa Binh province has suffered landslides following heavy rainfall, especially in combination with tropical storms. The results from this study may provide valuable knowledge that helps to forecast such events. The aims of the study also include determination of which measures that should be considered to mitigate subsequent losses to landslides. For this reason, in this study, two different landslide susceptibility analysis techniques have been applied and validated for this region. They are the statistical index and the logistic regression. A variety of approaches for modeling landslide hazards have been employed by different researchers throughout the world. In general, these models can be classified into either qualitative or quantitative methods. Qualitative methods are subjective methods that are based on expert opinions and portray hazard zoning in descriptive terms. Qualitative methods can be further classified into two groups: The first group is the geomorphologic analysis where the landslide susceptibility is determined directly either in the field or by the interpretation of air photos and satellite images. Using expert knowledge, a direct relationship between existing landslides and causative terrain parameters are determined and used to construct landslide susceptibility. The second group is a qualitative map combination where a landslide susceptibility map is obtained by combining a number of landslide influence factor maps. The weights are assigned to subclasses of thematic maps based on the field knowledge of experts; therefore, a landslide inventory map is not needed. A map resulting from qualitative methods may be strongly influenced by the subjectivity of the involved experts. In contrast, quantitative methods focus on the analysis of numerical data and statistics that express the relationship between instability factors and landslides. Since information from past and present landslides incorporate keys to the future, landslide inventories are used to elucidate relationships with instability factors in order to predict future patterns of instability (Guzzetti et al. 1999). Quantitative methods can be further divided into deterministic and statistical subtypes. Deterministic methods are focused on analyzing the mechanical equilibrium of a potential slide block and calculating a slope safety factor (Zhou et al. 2003). They are applicable only if the landslide types are simple and the geomorphic and geologic properties are fairly homogeneous. Statistical methods are based on the analysis of the functional relationships between instability factors and existing landslides. These methods require the collection of a large amount of data to produce reliable results. Quantitative predictions are made for areas where landslides have not yet occurred but have similar conditions. Various statistical methods have been applied for slope instability such as bivariate statistical analysis, multiple linear regression, logistic regression, and discriminant analysis. In the bivariate statistical analysis, weights for each input map are determined based on the comparison between a landslide inventory map and all the separate input parametric maps (Bednarik et al. 2010). Then, the results of the weights are summed up and ranked to obtain a landslide susceptibility map. In order to carry out the bivariate statistical analysis, all of the continuous parameters in the maps have to be converted into categorical classes. In the literature, however, it is not clearly defined how to reclassify the parameters properly (class interval, number of classes, etc.). Most of the authors use their expert opinion for the division. According to Van Westen et al. (1997), the bivariate statistical technique is the most preferable model in the medium scale of 1:25,000 to 1:500,000.

123

Author's personal copy Nat Hazards (2011) 59:1413–1444

1415

Logistic regression, discriminant analysis, and likelihood ratio methods belong to the most frequently chosen methods in a majority of studies. Logistic regression technique using stepwise variable selection is the appropriate method for the prediction of landslide susceptibility (Brenning 2005). The main objective of logistic regression is to predict the probability of occurrence of a dichotomous event from a set of variables that may be continuous, discrete, or both in combinations. The primary difference between logistic regression and other multiple statistical analyses is that independent variables do not have to be normally distributed or linearly related, and the predicted values are transformed into probabilities between 0 and 1. Therefore, many studies have used logistic regression for the assessment of landslide analysis (Jade and Sarkar 1993; Guzzetti et al. 1999; Dai et al. 2001; Dai and Lee 2002; Lee 2005; Suzen and Doyuran 2004; Ayalew and Yamagishi 2005; Lee et al. 2007; Ohlmacher and Davis 2003; Nandi and Shakoor 2008; Yilmaz 2009b; Falaschi et al. 2009). In recent years, some methods such as fuzzy logic, neuron fuzzy, and neural network models have been proposed as new approaches to evaluate landslide susceptibility in order to overcome the limitations of the aforementioned techniques (Yilmaz 2009a). In summary, although many different methods and techniques for landslide susceptibility analysis have been proposed and implemented, no agreement has so far been reached on which method and technique are the best for landslide susceptibility mapping (Yesilnacar and Topal 2005; Wang et al. 2005).

2 Data and landslide density analysis 2.1 Geographical summary of the study area The Hoa Binh province is located in the northwest of Vietnam, between the longitude 104480 E and 105500 E, and between the latitude 20170 N and 21080 N, covering an area about 4,660 sq km. The province is a hilly area situated between mountains and the Red River plain. The altitude ranges from 0 to 1,510 m above sea level. The elevation decreases from the northwest to the southeast. The Hoa Binh province is a part of the structure of the Paleozoic Northwest–Northern, with the different structures constituting the Fansipan zone in the northwest, the Son La zone in the southwest, and the remaining Ninh Binh zone. Hoa Binh is situated in the monsoonal region characterized by hot, rainy, and dry seasons. According to observations over the last decade, the coldest month was January with an average temperature of 14.9C, and the hottest month being July with an average temperature of 26.7C. The rainy season is normally from May to October with both a high frequency and intensity of rainfall. In August and September, rainfall peaks at values from 300 to 400 mm per month. The rainfall during this period accounts for 84–90% of the yearly rainfall. The frequency and intensity of the rainfall is concentrated over a short period where rainstorms and super rainstorms are major contributions to the landslide hazard in the area. The population of the province increased from 786,694 in 1999 to 817,700 in 2008, leading to the growth of new residential areas. In addition, new settlements are being forced toward highland. New settlements combined with inappropriate land use are the main factors for the increasing frequency of landslides in this area. The vegetation layer has been changed much by clear-cut logging, especially in the catchment areas of the Buoi River, Boi River, and Bui River.

123


Nat Hazards (2011) 59:1413–1444

The terrains in combination with the operation of the Hoa Binh hydropower station, as well as the population growth, have caused subsequent natural disasters, such as flooding, erosion, and especially landslides. According to the report of the project ‘‘The hoarding of water in Hoa Binh Lake and the environmental impact,’’ the landslide occurrence trend at the banks of the Hoa Binh Lake has increased significantly after the water hoarding (Thach et al. 2002). 2.2 Landslide inventory map Landslide inventory maps are considered to be the basis of most susceptibility mapping techniques. In addition, the inventory maps may be used for evaluating and reducing landslide hazards or risks on a regional scale (Wieczorek 1984). The study area has a recorded history of landslides that have caused damages to life, property, and infrastructures. These landslides occur at various topographic types and mainly after heavy rain and tropical storms. For this analysis, a landslide inventory map was constructed where landslide areas were depicted by polygons. This map is based on several sources: (1) The natural hazard map 2005, the result of the project ‘‘Investigation of natural hazards in the Northwest of Vietnam’’ (Thinh et al. 2005). The project was conducted mainly along and near the roads and resulted in a map with 50 landslide positions for the Hoa Binh area. (2) The environmental hazard map 2007, the result of the project ‘‘Construction of the environmental hazard zonation map for the northwest territory of Vietnam’’ (My 2007). A total of 30 landslide positions were recorded and registered in a map for the Hoa Binh province. (3) ‘‘Investigation and assessment of the types of geological hazard in the territory of Vietnam and recommendation of remedial measures. Phase II: A study of the Northern mountainous province’’ (Hue et al. 2004). The result of this project showed 34 landslide positions. (4) Some recent landslides identified by the interpretation of SPOT satellite imagery with resolution 2.5 m plus other information that had been collected regarding landslides in this area. Fieldworks were randomly conducted to verify the landslide positions. A total of 118 landslides that occurred during the last 10 years were identified and registered in the map (97 areas of landslide and 21 areas of rock falls). The size of the smallest landslide is about 383 m2, whereas the largest landslide covers an area of 14,343 m2. The average landslide size in general is 3,443 m2. Rainfall is the main triggering factor. For example, many landslides occurred in the study area caused by the heavy rainfall in the tropical storm Lekima from 3rd to 5th of October 2007, where the accumulated measured rainfall in these 3 days at the rainfall gauges was 334.0–529.4 mm. For this analysis, the landslide inventory map was randomly split into two separate subsets: A training data set 70% (82 landslide areas with 684 grid cells) and a testing data set 30% (36 landslides areas with 315 grid cells). This map was converted into a raster format with a resolution of 20 m. The training data set was used for building landslide models, whereas the testing data set was used for the model validation. Figure 1 shows the distribution of landslides position used in this analysis. Figure 2 shows two pictures of newly mapped landslide locations in field (photographs were taken in May 2010 by Dieu Tien Bui). 2.3 Factors influencing landslide susceptibility 2.3.1 The lithology map Lithology has been considered to be a very important factor, the most frequently used in landslide susceptibility analysis so far. Lithology with its structural and property variations

123


1417

Fig. 1 The landslide inventory map of the study area

Fig. 2 Two pictures of newly mapped landslide locations in field of the study area. a The landslide in the Doc Cun area. b The landslide in the Km7-Da Bac

may lead to differences in strength and permeability of rocks and soils (Ayalew and Yamagishi 2005). Many researchers have used lithology for susceptibility mapping in their studies (Dai et al. 2001; Dai and Lee 2002; Donati and Turrini 2002; Cevik and Topal 2003; Ohlmacher and Davis 2003; Ayalew and Yamagishi 2005; Yalcin 2008). In this study, the lithology map was extracted from four tiles of the Geological and Mineral Resources Map of Vietnam at the scale of 1:200,000. According to Van et al. (2002, 2006), the lithology of the study area can be classified into seven groups based on

123


Nat Hazards (2011) 59:1413–1444

the criteria of material components (rich clay or little clay), degree of weathering, as well as estimated strength and density (see ‘‘Appendix’’). The lithological classification is grouped into seven subclasses as shown in Table 1 and Fig. 3. The landslide density percentage in each lithology group is shown in Fig. 13a. 2.3.2 The slope map The slope gradient is an important component of the slope stability analysis and is frequently used in landslide susceptibility studies. The more the slope gradient increases, the more it will correlate with increased likelihood of failure. However, soil thickness and strength are two factors that vary over a wide range between sites (Dai et al. 2001). The slope gradient map of the study area was divided into six slope categories, Fig. 4. The landslide density percentage in each slope class is shown in Fig. 13b. Landslide density is the highest in the 20–30 category, followed by the 30–40, 10–20, and 40– 50 categories. There were very few landslides in the 0–10 and [50 categories. 2.3.3 The soil type map The soil map at the scale of 1:100,000 was extracted from the National Pedology map. By utilizing this map, 27 original soil types were simplified and reduced to 13 layers that were included in the analysis, Fig. 5. The 13 aggregated subtypes are as follows: (1) degraded soil (DS), (2) dystric fluvisols (DF), (3) dystric gleysols (DG), (4) eutric fluvisols (EF), (5) ferralic acrisols (FA), (6) gley fluvisols (GF), (7) humic acrisols (HA), (8) humic ferralsols (HF), (9) limestone mountain (LM), (10) luvisols (LS), (11) populated area (PA), (12) rhodic ferralsols (RF), and (13) water (WT). The correlation of soil subtypes with landslide density is shown in Fig. 13c. High landslide density concentrates on the DF layer (35.7%), followed by the EF layer (16.4%), the DG layer (14.2%), the FA layer (11.2%), the LM layer (9.8%), and the HA layer (8.5%). 2.3.4 The land use map The Land Use Status Map of the Hoa Binh province (scale 1:50,000), the result of the National Land Use Survey in Vietnam in 2006, is the base for producing the land use map for this study. There are 53 land use types in the Land Use Status Map, which were reduced to twelve categories for this project Fig. 6: (1) grass land (GR), (2) annual crop land (CR), (3) natural forest land (NF), (4) paddy land (PA), (5) orchard land (OR) including citrus and land under fruit trees and nut trees, (6) protective forest land (PT), (7) productive forest land (PD), (8) non-tree rocky mountain (RM), (9) populated area (PO), (10) barren land (BR), (11) specially used forest land (SF), and (12) water area (WT) including lakes, rivers, ponds, and streams. The correlation with landslide density is shown in Fig. 13d. It can be seen that the high landslide density is concentrated on the four layers: the PT layer (22%), the PD layer (18%), the PO layer (17%), and the RM layer (17%). The high landslide density of these areas can be explained by a very high activity of clear-cut logging as well as the increase in inappropriate new highland settlements due to population growth. The barren land, orchard land, and mountainous cultivated areas also have a significant number of landslide events.

123


1419

Table 1 Lithological classification of the geological and mineral resources map of Vietnam at the scale of 200,000 Group

Formation

Main component description

1

Thai Binh Formation (Q32tb)

Chocolate sand, clay, silt, grayish brown sand, clayey silt

2

3

Vinh Phuc Formation (Q31vp)

Yellow sand, silt, motley lateralized clay

Ha Noi Formation (Q2–3 1 hn)

Boulder, pebble, granule, dark yellow sand, clayey silt

Lower–Middle Holocene (Q1–2 2 )

Pebble, granule mixed with sand, grit, grading upward to clay, silt

Upper Pleistocene (Q31)

Pebble, granule, sand, boulder

Upper Yen Chau Subformation (K2yc3)

Red sandstone, siltstone, conglomerate

Middle Yen Chau Subformation (K2yc2)

Calcareous conglomerate, chocolate sandstone, clay stone

Lower Yen Chau Subformation (K2yc1)

Polymictic conglomerate, coarse sandstone, conglomerate, grit stone, sandstone, chocolate siltstone, chocolate sandstone

Nam Thiep Formation (J1–2nt)

Gritstone, sandstone, polymictic conglomerate, chocolate siltstone

Upper Suoi Bang Subformation (T3n - rsb2)

Sandstone, siltstone, clay shale, coal seams or lenses

Lower Suoi Bang Subformation (T3n - rsb1)

Conglomerate, sandstone, siltstone, black clay shale

Phia Bioc Complex (caT3npd)

Conglomerate, gritstone, sandstone, siltstone, marl, biotitic granite, two mica granite, granophyres

Upper Song Boi Subformation (T2–3sb2)

Sandstone, silty sandstone, black clay shale, siltstone

Lower Song Boi Subformation (T2–3sb1)

Conglomerate, sandstone, tuffaceous silty sandstone, limestone

Tan Lac Formation (T1otl)

Conglomerate, sandstone, tuffaceous sandstone, violetish tuffaceous siltstone, black clay shale, brown-violetish tuffs

Co Noi Formation (T1cn)

Sandstone, tuffaceous siltstone, clay shale, marl

Ban Nguon Formation (D1bn)

Sandstone, siltstone, black clay shale, clayey limestone, cherty shale

Lower SinhVinh Subformation (O3 - Ssv1)

Conglomerate, sandstone, calcareous siltstone

Lower Ben Khe Subformation (e - Obk1)

Conglomerate, gray coarse sandstone, siltstone, sericite shale

Upper Dong Giao Subformation (T2adg2)

Massive limestone, light-colored massive limestone marl, dolomitized limestone

Lower Dong Giao Subformation (T2adg1)

Limestone, thin-bedded black-gray limestone, marl, cherty limestone

Na Vang Formation (P2nv)

Cherty limestone, clayey limestone, thick-bedded to massive gray limestone

Si Phay Formation (P1–2sp)

Cherty limestone, marl, black clay shale, silty sandstone, limestone lenses

123


Nat Hazards (2011) 59:1413–1444

Table 1 continued Group

4

5 6

7

Formation

Main component description

Bac Son Formation (C - Pbs)

Light-gray dolomitic limestone, cherty limestone, dark-gray massive oolitic limestone

Ban Pap Formation (D1–2bp)

Thick-bedded to massive limestone, black-gray dolomitic limestone

Upper Ham Rong Subformation (e3 - O1hr2)

Dolomitic marble, quartz- sericite schist

Da Dinh Formation (NP - e1dd)

Dolomite, tremolitized marble

Upper Sinh Vinh Subformation (e3 - O1hr2)

Gray sandy limestone, thin-to-thick-bedded black limestone, dolomitic limestone

Ba Vi Complex (dvT1bv)

Peridotite, dunite, gabbro, gabbro-diabase, diabase

Ban Xang Complex (dT1bx)

Peridotite, dunite

Vien Nam Formation (T1vn)

Aphyric basalt, magnesium-high basalt, andesitic basalt, andesite-dacite, trachyte, porphyritic trachyte, agglomerate

Cam Thuy Formation (P3ct)

Aphyric basalt, basalt, various basaltic tuffs

Bao Ha Complex (vPP - MPbh)

Gabbro, amphibolites

Ban Ngam Complex (cPZ1bn)

Granite, granosyenite

Po Sen Complex (dcPZ1ps)

Tonalite, granodiorite, gneissoid granite

Yen Duyet Formation (P3yd)

Black clay shale, cherty shale, small lenses of limestone, coaly shale, lenses of coat

Ban Cai Formation (D3bc)

Clay shale, cherty shale interbedded with striped cherty limestone, limestone, clayish limestone, manganese lenses

Song Mua Formation (D1sm)

Black clay shale, siltstone, a little sandstone, marl, limestone

Nam Pia Formation (D1np)

Clay shale, marl, sericite schist

Upper Bo Hieng Sub formation (S1bh2)

Clay shale, marl, limestone lenses, coaly shale

Lower Bo Hieng Sub formation (S1bh1)

Clay shale, marl, limestone lenses, coaly shale

Nam Tham Formation (T2lnt)

Clay shale, siltstone, marl

Xom Giau Complex (cPP - MPxg)

Gneissoid biotite-microcline granite, granite, granosyenite

Suoi Chieng Formation (PP - Mpsc)

Biotite gneiss, gneiss amphibole, amphibolite, quartzite biotite, biotite-amphibole schist, biotite schist, calciphyre

Upper Ben Khe Sub formation (e - Obk2)

Quartzite, clay shale, marl, calcareous siltstone

Sinh Quyen Formation (PPsc)

Quarzite, biotite gneiss, quartz-mica-feldspar schist, amphibolite with interbeds of magnetite and calciphyre

Unknown in dyke

Aplite

2.3.5 The aspect map Aspect can be defined as the compass direction that a slope faces measured in degrees from north in a clockwise direction, ranking from 0 to 360. In landslide susceptibility studies, aspect is considered to be an important factor influencing the slope instability, since aspectrelated parameters, such as exposure to sunlight and drying winds, control the

123


1421

Fig. 3 The lithologic map

Fig. 4 The slope map

123


Nat Hazards (2011) 59:1413–1444

Fig. 5 The soil type map

concentration of the soil moisture, which in turn is a determinant for the occurrence of landslides (Magliulo et al. 2008). In this study, the slope aspect map was obtained from the DEM and is shown in Fig. 7. The landslide density percentage in each slope aspect class is shown in Fig. 13e. It is clear that on the north aspect, the landslide percentage is relatively low and that it increases with the orientation angle reaching the maximum on the southwest aspect and then decreases. 2.3.6 The curvature map Curvature can be defined as the change in slope angle along a very small arc of the curve. Curvature is the reciprocal of the radius of a circle that is tangent to the given curve at a point (Ohlmacher 2007). The slope curvature map of the study area is shown in Fig. 8, and the correlation with landslide density is shown in Fig. 13f. 2.3.7 The distance to faults map Geological faults have been considered as a factor that may influence landslides. In addition, the degree of fracturing and shearing plays an important role in determining slope

123


1423

Fig. 6 The land use map

instability (Varnes 1984). Therefore, a distance to faults map (Fig. 9) was included in the landslide analysis. The faults buffer categories were defined as 0–200 m, 200–400 m, 400–700 m, 700–1,000 m, 1,000–1,500 m, and [1,500 m. The landslide density for each class is shown in Fig. 13g. 2.3.8 The distance to rivers map Water is considered as the primary factor to trigger landslide mechanisms. Rivers may induce failure of the banks due to slope undercutting. Many studies have shown that the proximity to the drainage lines is an important factor controlling the occurrence of landslides (Gokceoglu and Aksoy 1996). This can be attributed to the fact that terrain modification caused by gully erosion may influence the initiation of a landslide (Dai and Lee 2002). In order to assess the influence of drainage lines on landslide occurrences, a map of distance to rivers (Fig. 10) was calculated by a buffer operation in ArcGIS 9.3, based on rivers from the topographic map at the scale of 1:50,000. The rivers buffer categories were defined as 0–100 m, 100–200 m, 200–300 m, 300–400 m, 400–500 m, and [500 m. The correlation of the respective distance class with the landslides occurrences is shown in Fig. 13h. There is a clear trend showing that locations close to rivers have increased landslide activity.

123


Nat Hazards (2011) 59:1413–1444

Fig. 7 The aspect map

Fig. 8 The curvature map

2.3.9 The distance to roads map Distance to roads has been considered as one of the most important anthropogenic factors influencing landslides because road-cuts are usually the sites that induce instability (Ayalew and Yamagishi 2005).

123


1425

Fig. 9 The distance to faults map

Fig. 10 The distance to rivers map

In this study, the road network was, in the same way as for rivers, extracted from the topographic map. In order to determine the effect of the road on the stability of a slope, a distance to roads map (Fig. 11) was constructed using buffer algorithms in the ArcGIS 9.3. The roads buffer categories were defined as 0–100 m, 100–200 m, 200–300 m,

123


Nat Hazards (2011) 59:1413–1444

Fig. 11 The distance to roads map

Fig. 12 The rainfall map

300–400 m, 400–500 m, and [500 m. The landslide density for each distance class was calculated, and the result is shown in Fig. 13i. Nearly 80% of the landslides occurred within the interval less than 100 m from the roads, and then the frequency rapidly decreases.

123


1427

2.3.10 The rainfall map Rainfall is widely considered as the main triggering factor of landslides. The study area is strongly influenced by the tropical monsoon climate. The rainy season is from May to October, with total yearly precipitation ranging from 85 to 90%. The average seasonal precipitation from the year 1973 to 2002 was compiled from the precipitation database of the Institute of Meteorology and Hydrology, Vietnam. The data were interpolated and used to create a mean rainy seasonal precipitation map (Fig. 12) using the inverse distance weighed method. The landslide density percentage in each rainfall class is shown in Fig. 13j. As can be seen, the highest concentration of landslides occurred in the two highest rainfall classes ranging from 1,535 to 1,635 mm (45.4%) and 1,635 to 1,858 mm (39.4%), respectively. This clearly indicates that a great amount of rainfall considerably impacts the landslide activity.

3 Mapping of landslide susceptibility 3.1 The statistical index method The statistical index method is applied for landslide susceptibility analyses in this study and was proposed by Van Westen (1997). Later, other researchers such as Cevik, Topal, and Oztekin applied this method for their studies (Cevik and Topal 2003; Oztekin and Topal 2005). In this method, the weight for a parameter class, such as a slope class or elevation class, is defined as the natural logarithm of the landslide density in the class divided by the landslide density in the entire map. This method is based upon the formula given by Van Westen as follows: P NpixðSi Þ Densclas NpixðSi Þ P ¼ ln wi ¼ ln Densmap NpixðNi Þ NpixðNi Þ where wi is the weight given to the parameter class, Densclass is the landslide density within the parameter class, and Densmap is the landslide density within the entire map. Npix (Si) is the number of landslide pixels in parameter class i, and Npix (Ni) is the total number of pixels in the same parameter class. The statistical index method is based on statistical correlation of the landslide inventory map with the explanatory attributes of the parameter maps. It means that the wi is only calculated for landslide occurrence classes. If the parameter class contains no landslide occurrence, it will have no correlation with the landslide inventory. Using the training data set (82 landslide areas with 684 pixels), we calculate the landslide density in each parameter class by crossing the respective layer with the landslide inventory map. In the next step, the wi value of each parameter class was calculated. Finally, all weighted layers were summed up to build the landslide susceptibility index (LSI), resulting in a susceptibility map. Table 2 presents the detailed distribution of landslides in the class layers. The LSI was ranked into four classes. The classes are as follows: low susceptibility (-23 to -0.89), moderate susceptibility (-0.89 to 1.03), high susceptibility (1.03–2.40), and very high susceptibility (2.40–5.85). There are many methods for dividing weight values into classes, such as the equal interval method, the natural break method, and standard deviation method. In this study, the manual classification method is used. This

123


Nat Hazards (2011) 59:1413–1444

Fig. 13 Correlation of landslide density. a Lithology groups, b slope (), c soil type, d land use, e aspect (), f curvature, g distance to faults 9 100 (m), h distance to rivers (m), i distance to roads (m), and j rainfall (mm)

method is based on the assumption that the expected number of landslide pixels in the higher landslide susceptibility class equals two times of the expected number in the next lower susceptibility class (Long 2008; Galang 2004). Based on this rule, the landslide

123


1429

Table 2 Distribution of landslide in class layers Data layers

Lithology

Soils

Classes

Weight

Logistic regression coefficients

Group 1

468,851

36

0.255

1.702

4,552,855

264

-0.026

2.438

Group 3

3,740,521

183

-0.196

1.543

Group 4

1,338,571

131

0.497

5.178

Group 5

135,801

0

-4.086

-16.276

Group 6

645,785

59

0.428

Group 7

607,922

11

-1.191

0

1.440

Degraded soil (DS)

3,006

0

-4.086

0

Gley fluvisols (GF)

9,043

0

-4.086

0

Humic ferralsols (HF)

131,881

0

-4.086

1.002

Rhodic ferralsols (RF)

1,031,126

30

-0.716

17.330 19.568

Humic acrisols (HA)

3,551,123

199

-0.060

Limestone mountain (LM)

1,657,233

104

0.053

19.795

400,659

45

0.635

19.059

Ferralic acrisols (FA)

4,196,906

294

0.163

20.049

Dystric fluvisols (DF)

84,133

6

0.181

16.565

Dystric gleysols (DG)

45,288

6

0.800

19.011

Luvisols (LS)

52,858

0

-4.086

-1.645

Populated area (PA)

50,440

0

-4.086

0

276,610

0

-4.086

0

49,547

0

-4.086

184,205

2

-1.702

18.914

Natural forest land (NF)

3,666,190

119

-0.606

19.920

Paddy land (PA)

1,053,442

27

-0.843

19.427

Orchard land (OR)

426,524

3

-2.136

19.157

Protective forest land (PT)

985,889

120

0.715

21.745

1,347,068

158

0.678

21.916

468,692

42

0.409

20.946

864,840

109

0.750

21.898

1,947,114

103

-0.118

20.619

41,129

0

-4.086

7.125

455,666

1

-3.300

18.098

6,557

0

-4.086

0.000

739,496

9

-1.587

-2.051

Water (WT) Grass land (GR) Annual crop land (CR)

Productive forest land (PD) Non-tree rocky mountain (RM) Populated area (PO) Barren land (BR) Specially used forest land (SF) Water (WT) Aspect

Landslide pixels

Group 2

Eutric fluvisols (EF)

Land use

Pixels in classes

Flat (-1) North (0–22.5)

0

Northeast (22.5–67.5)

1,672,940

82

-0.194

-1.305

East (67.5–112.5)

1,385,498

44

-0.628

-2.810

Southeast (112.5–157.5)

1,383,072

106

0.253

1.858

South (157.5–202.5)

1,482,483

138

0.447

0.877

Southwest (202.5–247.5)

1,677,042

174

0.556

0.132

West (247.5–292.5)

1,299,469

65

-0.174

0.707

Northwest (292.5–337.5)

1,202,391

48

-0.400

2.402

641,358

18

-0.752

0.000

North (337.5–360)

123


Nat Hazards (2011) 59:1413–1444

Table 2 continued Data layers

Rainfall (mm)

Curvature

Classes

Distance to rivers (m)

Logistic regression coefficients 0.010

1,347,729

9

-2.188

2,345,073

73

-0.648

1,535–1,635

4,877,617

369

0.240

1635–1858

2,919,887

233

0.293

Concave (-)

6,320,019

406

0.076

92,419

0

-4.086

Convex (?)

5,077,868

278

-0.084

0–10

4,919,804

2

-4.086

10–20

3,346,950

222

0.108

20–30

2,326,636

368

0.977

30–40

785,451

92

0.677

40–50

106,715

0

-4.086

4,750

0

-4.086

-1.194

0.252

0–200

2,078,812

179

0.369

2.174

200–400

1,832,167

106

0.007

1.355 2.336

400–700

2,285,798

86

-0.238

700–1,000

1,644,799

192

0.344

1.423

1,000–1,500

1,768,589

83

-0.165

0.553

[1,500

1,880,141

38

-1.080

0.000

0–100

2,184,247

194

0.400

1.614

100–200

1,802,821

100

-0.071

0.417

200–300

1,428,573

72

-0.166

1.602

300–400

1,127,884

72

0.070

0.481

400–500

891,591

72

0.305

1.242

4,055,190

174

-0.327

0.000

[500 Distance to roads (m)

Weight

1,200–1,435

[50 Distance to faults (m)

Landslide pixels

1435–1,535

Flat (0) Slope ()

Pixels in classes

0–100

1,266,378

460

1.809

5.811

100–200

1,144,182

63

-0.078

4.643

200–300

1,018,549

67

0.100

3.866

300–400

900,579

17

-1.149

0.689

791,866

14

-1.214

0.851

6,368,752

63

-1.795

0.000

400–500 [500

pixels were compared with the LSI, and the graph of cumulative percentage of observed landslide occurrence against landslide susceptibility index values was constructed (Fig. 14). Three cutoff percentages of existing landslide pixels in the cumulative curve were used to identify four landslide susceptibility classes: (1) 6.7% (LSI value of 0.89) was used for separating the low and moderate; (2) 20% (LSI value of 1.03) was used for separating the moderate and high; (3) 46.7% (LSI value of 2.39) was used for separating the high and very high. The final result map of landslide susceptibility is shown in Fig. 15.

123


1431

Fig. 14 Cumulative percentage of observed landslide occurrence against LSI value

Fig. 15 Landslide susceptibility zonation map of Hoa Binh province based on the statistical index method

123


Nat Hazards (2011) 59:1413–1444

3.2 The logistic regression 3.2.1 The logistic regression method Logistic regression is a mathematical modeling approach that can be used for predicting the presence or absence of outcome based on the values of a set of predictor variables (Lee 2005). As stated by Nandi and Shakoor (2008), the advantage of the logistic regression modeling over multiple linear regression is that logistic regression does not assume a linear relationship between the dependent and the independent variables. It assumes a linear relationship between the logit of the independent variables and the response. In addition, the dependent variable does not need to be normally distributed. No assumptions are needed for the homogeneity of variance and normally distributed error terms. In the logistic regression, the dependent variable is dichotomous, while the independent variables can be continuous, discrete, dichotomous, or a mix of any of these. The dependent variable can only have two values as presence/absence, success/failure, or an event occurring/not occurring. The predicted value is calculated as a probability between 0 and 1. Predicted probability of the dependent variable in logistic regression will fit an S-shaped curve if the independent variables used for the analysis are a good estimator of the model (Dai and Lee 2002; Lee 2005). The logistic regression is based on the logistic function f(z), which is defined as: f ðzÞ ¼

1 1 þ ez

where z is a linear sum of a constant and the product of the independent variables and their respective coefficients. The value of z varies from -? to ?, subsequently f(z) ranges from 0 to 1: z ¼ a þ b1 x1 þ b2 x2 þ þ bn xn where a is the constant; bi ði ¼ 0; 1; 2; . . .; nÞ are the coefficients; and xi ði ¼ 1; 2; . . .; nÞ are the independent variables. Another popular transformation is the logit transformation, which has a relatively simple mathematical form: PðY ¼ 1Þ ¼ a þ b1 x1 þ b2 x2 þ þ bn xn LogitðPÞ ¼ ln 1 PðY ¼ 1Þ where p is the probability of an event occurring. The expression P(Y = 1)/(1 - P(Y = 1)) is the so-called odds or likelihood ratio. The constant a and the coefficients bi are estimated based on the data of the independent variables and the landslide condition of the pixels, using the maximum likelihood method, which maximizes the probability of getting the observed results given the fitted regression coefficients. 3.2.2 Data sampling The literature review showed that the choice of the data sampling methods is quite different for different investigations. According to Atkinson and Massari (1998a), because the area covered by landslides is less than that not covered by landslides, only a single pixel at

123


1433

the center of 442 landslide rupture areas was extracted together with 1,458 non-landslide cells to build a training data set. Ayalew and Yamagishi (2005) and Ohlmacher and Davis (2003) use all grid points of 1,054,768 and 2,022,861 cells with landslide and non-landslide, respectively. Undoubtedly, this large amount of data with unequal proportions of landslide and non-landslide cells may increase the risk of bias. In order to control for bias, Dai and Lee (2002) used all locations of the 2,135 landslides cells plus an equal number of extracted points from locations not yet affected by landslide to obtain the training data set. In the same way, Yesilnacar and Topal (2005) used the total number of seed cells (6,018 cells) and randomly selected cells from landslide-free areas (6,018 cells), but they iterated the procedure six times to create six different training data sets. It is clear that the model using the same landslide data training and data testing may cause a reduced reliability. Bai et al. (2010) and Van Den Eeckhaut et al. (2006) used all the landslide seed cells, and random points were chosen from the landslide-free area, with a ratio value of 1, 2, 3, 4, and 5. The process was repeated four times to build four different data sets to see whether there was any convergence in the final result of logistic regression analyses. In general, the accuracy of landslide models is influenced by the data sampling strategies. The ways in which the data sets are obtained will affect both the nature of the regression relation and the accuracy of the resulting estimates (Atkinson and Massari 1998b). More specifically, Yilmaz (2010) stated that the more realistic landslide model result can be obtained by using scarp and seed cell sampling strategies. In this study, all 684 landslide grid cells in the training data set were used to represent the presence of landslide and assigned the value of 1. The remaining 315 landslide grid cells in the testing data set were used to validate the model result. In order to avoid the bias caused by unequal proportions of landslide and non-landslide pixels, the same number of grid cells was randomly sampled from the landslide-free area and assigned the value of 0. The final step was to extract the values of ten independent variables to build a database. This database contains 1,368 observations, one dependent variable, seven categorical independent variables (lithology, aspect, land use, soil type, distance to roads, distance to rivers, and distance to faults), and three continuous independent variables (slope, curvature, and rainfall). Using this database, logistic regression coefficients were estimated. This process was carried out by using the SPSS 16.0 software. 3.2.3 Multicollinearity checking In logistic regression, multicollinearity checking is necessary to check the correlation of independent variables. Tolerance (TOL) and the variance inflation factor are two important indexes that are widely used for multicollinearity checking. According to Menard (1995), a TOL value less than 0.2 is one indicator for multicollinearity, and serious multicollinearity occurs between independent variables when the TOL values are smaller than 0.1. The variance inflation factor (VIF) is calculated by 1/tolerance. If VIF value exceeds 10, it is often regarded as indicating multicollinearity. The TOL and VIF values in this study are showed in Table 3. It reveals that there is no multicolinearity between any of the factors. 3.2.4 Model results and assessment Using the ten independent variables as the input factors and the presence/absence of landslide as a dependent variable, forward stepwise logistic regression was used to analyze the data. First, all independent variables were excluded in the model, then each variable

123

Author's personal copy 1434 Table 3 The multicollinearity diagnosis indexes for variables

Nat Hazards (2011) 59:1413–1444

Independent variables

Tolerance

VIF

Slope

0.907

1.103

Lithology

0.889

1.124

Rainfall

0.871

1.148

Soil type

0.948

1.054

Land use

0.872

1.147

Aspect

0.944

1.059

Distance to roads

0.834

1.199

Distance to rivers

0.901

1.110

Distance to faults

0.921

1.086

Curvature

0.939

1.065

was sequentially added to the starting model. The independent variables were selected to be included in the regression model based on the descending order of the largest significant correlation coefficient. The process continues until no independent variable has significant contribution when entered into the model. The maximum likelihood estimation method (MLE) is used to calculate the model coefficients. The MLE method identifies the value(s) of the parameter(s) that give rise to the maximum log likelihood (LL) and is calculated through an iterative process. The difference in the -2 log likelihood (-2LL) measures to what extent the final model improves over the null model. It explains that the lower the value of -2LL of the model, the better is the fit of the model to the data. As shown in Table 4, the -2LL value decreased from 1286.860 at the first step to 566.748 at the final step. In addition, the Cox & Snell’s and Nagelkerke’s R-square are used to measure the usefulness (explained variance) of the model. A higher R-square value of Cox & Snell and Nagelkerke indicates a better model. Hence, the most significant model is found in the step 10 (Table 4). We consider a hypothesis testing for each coefficient. Thus, the null hypothesis for the logistic regression coefficient is H0: Bi = 0 versus Bi = 0. If Bi = 0, it will indicate that the independent variables have no effect on the dependent variable. The P-value, which is Table 4 Maximum likelihood estimation, Cox & Snell’s and Nagelkerke’s R-square

123

Step

-2 LL

Cox & Snell’s R-square

Nagelkerke’s R-square

1

1286.860

0.360

0.479

2

1066.961

0.455

0.606

3

915.987

0.512

0.682

4

814.405

0.547

0.729

5

726.980

0.575

0.766

6

672.843

0.591

0.788

7

641.663

0.600

0.801

8

614.092

0.608

0.811

9

575.146

0.619

0.826

10

566.748

0.622

0.829


1435

the estimated probability of rejecting the null hypothesis if the hypothesis is true, is calculated for each individual variable. The lower the P-value is, the less likely that the null hypothesis is true. Table 5 lists the P-value of each logistic coefficient in the model. It shows that all the factors (slope gradient, slope aspect, rainfall, lithology, distance to faults, soil type, land use, distance to rivers, and distance to roads) have a P-value less than 0.05. This is explained by the statistical relationship between variables at the 95% confidence level. The linear sum of the constant and the product of the independent variables and their respective coefficients are given as in the following equation: Z ¼ 69:819 þ 0:252 Slope 16:276 Lithology ð1Þ þ 1:440 Lithology ð2Þ þ 1:543 Lithology ð3Þ þ 5:178 Lithology ð4Þ þ 1:702 Lithology ð5Þ þ 2:438 Lithology ð6Þ þ 0:01 Rainfall þ 19:059 Soil ð1Þ þ 19:795 Soil ð2Þ þ 20:049 Soil ð3Þ þ 17:330 Soil ð4Þ þ 19:568 Soil ð5Þ þ 19:568 Soil ð6Þ þ 19:011 Soil ð7Þ 1:645 Soil ð8Þ þ 1:002 Soil ð9Þ þ 21:898 Landuse ð1Þ þ 19:157 Landuse ð2Þ þ 19:427 Landuse ð3Þ þ 21:745 Landuse ð4Þ þ 19:920 Landuse ð5Þ þ 21:916 Landuse ð6Þ þ 18:098 Landuse ð7Þ þ 18:914 Landuse ð8Þ þ 20:946 Landuse ð9Þ þ 20:619 Landuse ð10Þ þ 7:125 Landuse ð11Þ þ 2:174 Fault ð1Þ þ 1:355 Fault ð2Þ þ 2:336 Fault ð3Þ þ 1:423 Fault ð4Þ þ 0:553 Fault ð5Þ þ 5:811 Road ð1Þ þ 4:643 Road ð2Þ þ 3:866 Road ð3Þ þ 0:689 Road ð4Þ þ 0:851 Road ð5Þ þ 1:614 River ð1Þ þ 0:417 River ð2Þ þ 1:602 River ð3Þ þ 0:481 River ð4Þ þ 1:242 River ð5Þ 2:501 Aspect ð1Þ 1:305 Aspect ð2Þ 2:810 Aspect ð3Þ 1:858 Aspect ð4Þ þ 0:877 Aspect ð5Þ þ 0:132 Aspect ð5Þ þ 0:707 Aspect ð7Þ þ 2:402 Aspect ð8Þ 1:194 Curvature: The probability of landslide occurrence was calculated using the above logistic regression coefficients. The probability ranges from 0.0001 to 0.999. The subsequent landslide susceptibility map was obtained based on the cumulative percentage of the observed landslide occurrence against probability index values (Fig. 16). The classification method used was the manual classification that was explained in the statistical index method section. The probability map was divided into four susceptibility classes: low (0–0.37), moderate (0.37–0.70), high (0.70–0.92), and very high (0.92–1) (Fig. 17). 3.3 Validation and comparison In landslide prediction modeling, a validation of the result is considered to be one of the most important tasks, and without validation, the prediction model will have no scientific significance (Chung and Fabbri 2003). The results of the landslide susceptibility map were validated using the test subset of landslide inventory (36 landslide areas with 315 grid cells) using the receiver operating characteristic technique (ROC). The ROC curve is one of those statistical techniques that can be used to provide predictions of the performance and to compare the different models (sensitivity vs. specificity). An ROC curve is a twodimension graph showing true-positive rate on the vertical axis and false-positive rate on the horizontal axis. The area under the ROC curve (AUC), which is the summarized information of the plot, can be used to estimate the validity of the model: accuracy or the

123


Nat Hazards (2011) 59:1413–1444

Table 5 Logistic regression coefficients Independent variables

B

df

P-value

Exp(B) Odds ratio

95% CI for Exp(B) Lower

Slope

0.252

Lithology

1

0.000

6

0.000

1.286

1.232

Upper 1.343

Lithology (1)

-16.276

1

0.999

0.000

0.000

Lithology (2)

1.440

1

0.060

4.219

0.941

Lithology (3)

1.543

1

0.019

4.680

1.288

16.999

Lithology (4)

5.178

1

0.000

177.281

39.016

805.528

Lithology (5)

1.702

1

0.135

5.483

0.588

51.131

Lithology (6)

2.438

1

0.000

11.454

3.370

38.928

Rainfall

0.010

1

0.000

1.010

1.007

1.014

9

0.002

Soil type Soil (1)

19.059

1

0.998

1.893E8

0.000

Soil (2)

19.795

1

0.998

3.954E8

0.000

Soil (3)

20.049

1

0.998

5.096E8

0.000

Soil (4)

17.330

1

0.998

3.361E7

0.000

Soil (5)

19.568

1

0.998

3.148E8

0.000

Soil (6)

16.565

1

0.998

1.564E7

0.000

Soil (7)

19.011

1

0.998

1.804E8

0.000

Soil (8)

-1.645

1

1.000

0.193

0.000

Soil (9)

1.002

2.723

0.000

Land use

1

1.000

11

0.000

Land use (1)

21.898

1

0.999

3.236E9

0.000

Land use (2)

19.157

1

0.999

2.089E8

0.000

Land use (3)

19.427

1

0.999

2.736E8

0.000

Land use (4)

21.745

1

0.999

2.778E9

0.000

Land use (5)

19.920

1

0.999

4.477E8

0.000

Land use (6)

21.916

1

0.999

3.297E9

0.000

Land use (7)

18.098

1

0.999

7.242E7

0.000

Land use (8)

18.914

1

0.999

1.638E8

0.000

Land use (9)

20.946

1

0.999

1.249E9

0.000

Land use (10)

20.619

1

0.999

9.006E8

0.000

Land use (11)

7.125

1

1.000

1.242E3

0.000

8

0.000

Aspect

18.913

Aspect (1)

-2.051

1

0.023

0.129

0.022

0.751

Aspect (2)

-1.305

1

0.041

0.271

0.078

0.946

Aspect (3)

-2.810

1

0.000

0.060

0.013

0.271

Aspect (4)

1.858

1

0.003

6.409

1.871

21.946

Aspect (5)

0.877

1

0.147

2.405

0.735

7.862

Aspect (6)

0.132

1

0.825

1.141

0.355

3.665

Aspect (7)

0.707

1

0.284

2.028

0.557

7.388

Aspect (8)

2.402

1

0.000

11.046

3.117

39.140

123


1437

Table 5 continued Independent variables

B

df

P-value

Exp(B) Odds ratio

95% CI for Exp(B) Lower

Distance to roads

Upper

5

0.000

Road (1)

5.811

1

0.000

334.034

138.403

806.191

Road (2)

4.643

1

0.000

103.872

36.925

292.199

Road (3)

3.866

1

0.000

47.754

18.345

124.310

Road (4)

0.689

1

0.229

1.992

0.648

6.125

Road (5)

0.851

1

0.198

2.343

0.641

8.564

5

0.000 11.151

Distance to rivers River (1)

1.614

1

0.000

5.024

2.264

River (2)

0.417

1

0.309

1.517

0.679

3.387

River (3)

1.602

1

0.000

4.962

2.037

12.088

River (4)

0.481

1

0.274

1.617

0.684

3.824

River (5)

1.242

1

0.008

3.463

1.378

8.701

5

0.000

Distance to faults Fault (1)

2.174

1

0.000

8.797

3.148

24.584

Fault (2)

1.355

1

0.012

3.878

1.351

11.132

Fault (3)

2.336

1

0.000

10.338

3.782

28.254

Fault (4)

1.423

1

0.011

4.148

1.391

12.370

Fault (5)

0.553

1

0.294

1.739

0.619

4.890

Curvature

-1.194

1

0.004

0.303

0.133

0.688

Constant

-69.819

1

0.997

0.000

Fig. 16 Cumulative percentage of observed landslide occurrence against LSI value

123


Nat Hazards (2011) 59:1413–1444

Fig. 17 Landslide susceptibility zonation map of Hoa Binh province based on the logistic regression method

overall quality of a model (Hosmer and Lemeshow 2000). The area value (AUC) varying from 0.5 to 1.0 is a portion of the area of the unit square. AUC measures the ability of a test to correctly classify pixels with and without landslide. If the AUC value is close to 1, the model accuracy is considered to be high. The main limitation of a ROC curve analysis is that the data set must be separately divided into two groups, and in some cases, a problem occurs if some data not clearly falls into one group or the other. And in order to avoid insignificance and negative effect in the ROC curve analysis, the sampling data must be large enough and should be representative for the actual population. In this study, the ROC curves were plotted based on the true-positive rate of identified landslides and false-positive rate of identified landslides, as the classification threshold varies. The AUC of the statistical index and logistic regression model are 0.946 and 0.950, respectively. It indicates both of the models having high and equal prediction capabilities (Table 6; Fig. 18). Using the two methods of logistic regression and statistical index, two spatial landslide occurrence probability maps were computed. Map results were also verified using the two rules for spatial effective landslide susceptibility maps: First, the observed landslide pixels should belong to the high susceptibility class, and second, the high susceptibility class should cover only small areas (Bai et al. 2010; Can et al. 2005).

123

Author's personal copy Nat Hazards (2011) 59:1413–1444 Table 6 Comparison of the two ROC curve

1439

Models

AUC

SE

95% CI

Logistic regression

0.950

0.00851

0.930–0.966

Statistical index

0.942

0.0105

0.921–0.959

100

Fig. 18 The receiver operating characteristic (ROC) curve

Sensitivity

80

60

Logistic Regression,AUC=0.950 Statistical Index, AUC=0.942

40

20

0 0

20

40

60

80

100

100-Specificity Table 7 The four susceptibility classes of the statistical index model Class number

Statistical index model Reclassified landslide susceptibility index value

Susceptibility class

Total number of pixels

% Area coverage

Number of pixels with landslides

1

-23 to -0.89

Low

9,609,513

83.9

69

2

-0.89 to 1.03

Moderate

1,472,775

12.9

130

3

1.03 to 2.40

High

286,166

2.5

284

4

2.40 to 5.85

Very high

121,852

1.1

516

For this study, all of the landslide pixels were overlaid on the two landslide susceptibility maps. The numbers of existing landslide pixels that fall into the four susceptibility classes were determined. Tables 7 and 8 show the characteristics for the two models. And 80.1% (statistical index method) and 78.8% (logistic regression) of the observed landslides are in areas classified as having high and very high susceptibility. In contrast, 93.9% and 87.5% of the study area fell in the low susceptibility class for the statistical index and for the logistic regression, respectively.

4 Discussion and conclusion Landslide susceptibility maps provide fundamental knowledge of the cause and incidence of landsliding and can help in hazard management and to set up the foundation for

123


Nat Hazards (2011) 59:1413–1444

Table 8 The four susceptibility classes of the logistic regression model Class number

Logistic regression method Reclassified landslide susceptibility index value

Susceptibility class

Total number of pixels

% Area coverage

Number of pixels with landslides

Low

10,053,632

87.5

76

668,560

5.8

136

1

0–0.37

2

0.37–0.70

Moderate

3

0.70–0.92

High

454,002

4.0

273

4

0.92–1

Very high

314,098

2.7

514

mitigation measures. Based on this idea, we attempt to present the result of landslide susceptibility mapping in the Hoa Binh province, Vietnam. These maps only present a predicted spatial distribution of landslides. They do not include the temporal probability of landsliding events. In landslide analysis generally, a reliable, accurate, and sufficient landslide inventory map will be the basic for high-quality landslide models. However, up to the recent years, there has been no significant attempt for the investigation and creation of a landslide inventory database in Vietnam. The landslide events were only investigated in some separate projects conducted during the last 15 years. In this study, the landslide inventory map was mainly inherited from three projects recording the landslides that occurred during the last 10 years. A total of 118 landslide events for the large areas of the Hoa Binh province were recorded and mapped. These events mainly occurred near the road system or near the populated areas. It means that many landslides that occurred in the mountainous areas far from roads and far from populated areas have not been investigated. Almost all of the mapped landslides occurred following a heavy rainfall, especially when combined with tropical storms. In order to get a good-quality landslide model, the selection of landslide causal factors is important. However, there is no agreement on the universal guidelines for how to select the landslide influencing factors. For this study, we considered 10 factors for the analysis (slope, lithology, rainfall, land use, soil type, aspect, distance to roads, distance to rivers, distance to faults, and curvature). There are many available methods and techniques for landslide susceptibility modeling. In this study, the statistical index and logistic regression approaches were applied to generate landslide susceptibility maps. The statistical index method separately computes the existing landslide in the training data set with the landslide causal maps to compute the weights of each map class. The final landslide susceptibility map was a simple combination of all the weight maps. The logistic regression method computes weights based on the combination of the significant landslide causal factor to construct a susceptibility map. By use of ROC curve analysis and AUC, the validation process shows that both maps have a good quality. The AUCs were 0.946 and 0.950 for the statistical index and the logistic regression model, respectively. It indicated that the two models have almost equal prediction capabilities. For the logistic regression, the model statistics and regression coefficients are the results that are usually used for accuracy assessment and the relative importance of the landslide causal factors in the model. The results show that distance to roads, slope, and lithology are the most important factors. They are followed by aspect, land use, rainfall, distance to faults, distance to rivers, soil type, and curvature. The landslide susceptibility analysis conducted for a relatively large area (about 4,660 sq km) requires a large number of

123


1441

landslide pixels. Therefore, the accuracy of the logistic regression model can be improved if additional landslides are included in the analysis. This work needs a long-term period and requires funding as well as human resources. A regional-scale map of the study area with detailed data of the geologic properties, soil type, and weathering layer was not available. Therefore, the statistical index method may be considered preferable. The application of this method has been shown to be relatively simple and cost-effective for assessing landslide susceptibility. Landslide susceptibility mapping has shown to be a great help for planners and engineers in choosing suitable locations for the implementation of developments (Lee and Min 2001). The landslide susceptibility maps of this study may be helpful for planners, decision makers, and engineers in slope management and land use planning. Since the results are given in a medium-scale map, the determination of the exact extent of the slope instability areas demands further site-specific research. This will clarify the details of high and very high susceptibility area classes, respectively. Acknowledgments This research was funded by the Norwegian Quota scholarship. The data analysis and write-up were carried out as a part of the first author’s PhD studies at the Geomatics section, Department of Mathematical Sciences and Technology, Norwegian University of Life Sciences, Norway. I would like to thank Dr. Tran Tan Van, director of Vietnam Institute of Geosciences and Mineral Resources, for valuable comments.

Appendix

1. Four tiles of the Geological and Mineral Resources Map of Vietnam at the scale of 1:200.000 are: (1) The Hanoi F-48-XXVII; (2) the Ninh Binh F-48-XXXIV; (3) the Van Yen F-48-XXVII; (4) the Sam Nua F-48-XXXIII. 2. Characteristics of lithology groups, which were used in this study Group 1: Quaternary deposits: Primarily distributed in plains and river valleys, characterized by incoherent textures, diversified components, abundant material size, and essential alluvial facies. Group 2: Sedimentary aluminosilicate rocks and sedimentary quartz rocks: Consisting of pebbles, cobble, gravel, gritstone, sandstone, siltstone, claystone, carbonates, alternated rhyolites, dacites, andesite sediments, and tuff. Sedimentary quartz rocks consist of quartz–mica sandstone, quartzitic sandstone, and cherty shale. Group 3: Sedimentary carbonate rocks: consisting of limestone, dolomitized limestone, cherty limestone, clayish limestone. Group 4: Mafic–ultramafic magma rocks: Consisting of dunit, peridotit, pyroxenit, tremolite schist, artinolite schist, gabbro–pyroxenit, gabbro–amphibolit, gabbro– norit, gabbro–anorthosit, gabbro–diorit, gabbro–diabas, diabas, mafic bazan olivin, bazan tholeite, and bazan dolerite. Group 5: Acid–neutral magmatic rocks: The extrusive magmas consist of rhyolite, dacite, felsite, and andesite rocks. The intrusive granite magmas consist of plagioclasegranite, granophyre, granosyenite, granodiorite, diorite, and quartz-diorite. Group 6: Metamorphic rock with rich aluminosilicate component: The high-rank metamorphic rocks consist of biotite–garnet-gneiss, amphibole-biotite-lagiogneiss, biotite-amphibolite, plagioclase-migmatite, quartz-biotite schist, biotite schist… The low-rank metamorphic rocks consist of green schist, chlorite schist, sericite schist, and quartz-sericite schist

123


Nat Hazards (2011) 59:1413–1444

Group 7: Metamorphic rock with rich quartz component: Consists of quartz-mica schist, quartz-sericite schist, quartzite, and sericite-quartzite.

References Atkinson PM, Massari R (1998a) Generalised linear modelling of susceptibility to landsliding in the central Apennines, Italy. Comput Geosci 24(4):373–385 Atkinson PM, Massari R (1998b) Generallised linear modelling of susceptibility to landsliding in the central Apennines, Italia. Comput Geosci 24:373–385 Ayalew L, Yamagishi H (2005) The application of GIS-based logistic regression for landslide susceptibility mapping in the Kakuda-Yahiko Mountains, Central Japan. Geomorphology 65(1–2):15–31. doi: 10.1016/j.geomorph.2004.06.010 Bai S, Wang J, Lu G, Zhou P, Hou S, Xu S (2010) GIS-based logistic regression for landslide susceptibility mapping of the Zhongxian segment in the Three Gorges area, China. Geomorphology 115:23–31 Bednarik M, Magulova B, Matys M, Marschalko M (2010) Landslide susceptibility assessment of the Kralovany-Liptovsky Mikulas railway case study. Phys Chem Earth 35(3–5):162–171. doi:10.1016/ j.pce.2009.12.002 Brenning A (2005) Spatial prediction models for landslide hazards: review, comparison and evaluation. Nat Hazards Earth Syst Sci 5(6):853–862 Can T, Nefeslioglu HA, Gokceoglu C, Sonmez H, Duman TY (2005) Susceptibility assessments of shallow earthflows triggered by heavy rainfall at three subcatchments by logistic regression analyses. Geomorphology 72:250–271 Cevik E, Topal T (2003) GIS-based landslide susceptibility mapping for a problematic segment of the natural gas pipeline, Hendek (Turkey). Environ Geol 44(8):949–962. doi:10.1007/s00254-003-0838-6 Chung CJF, Fabbri AG (2003) Validation of spatial prediction models for landslide hazard mapping. Nat Hazards 30(3):451–472 Dai FC, Lee CF (2002) Landslide characteristics and, slope instability modeling using GIS, Lantau Island, Hong Kong. Geomorphology 42(3–4):213–228 Dai FC, Lee CF, Li J, Xu JW (2001) Assessment of Landslide susceptibility on the natural terrain of Lantau Island, Hong Kong. Environ Geol 40:381–391 Donati L, Turrini MC (2002) An objective method to rank, the importance of the factors predisposing to landslides with the GIS methodology: application to an area of the Apennines, (Valnerina; Perugia, Italy). Eng Geol 63(3–4):277–289 Falaschi F, Giacomelli F, Federici PR, Puccinelli A, Avanzi GD, Pochini A, Ribolini A (2009) Logistic regression versus artificial neural networks: landslide susceptibility evaluation in a sample area of the Serchio River valley, Italy. Nat Hazards 50(3):551–569. doi:10.1007/s11069-009-9356-5 Galang JS (2004) A comparison of GIS approaches instability zonation in the central Blue Ridge mountain of Virginia. Master Thesis, State University, Blacksburg Gokceoglu C, Aksoy H (1996) Landslide susceptibility mapping of the slopes in the residual soils of the Mengen region (Turkey) by deterministic stability analyses and image processing techniques. Eng Geol 44(1–4):147–161 Guzzetti F, Carrara A, Cardinali M, Reichenbach P (1999) Landslide hazard evaluation: a review of current techniques and their application in a multi-scale study, Central Italy. Geomorphology 31(1–4):181–216 Hosmer DW, Lemeshow S (2000) Applied logistic regression, 2nd edn. Wiley, New York Hue TT, Duong TV, Toan DV, Nghinh LT, Minh VC, Pho NV, Xuan PT, Hoan LT, Huyen NX, Pha PD, Chinh VV, Thom BV (2004) Investigation and assessment of the types of geological hazard in the territory of Vietnam and recommendation of remedial measures. Phase II: a study of the Northern mountainous province. Vietnam Academy of Science and Technology, Institute of Geological Sciences, Hanoi Jade S, Sarkar S (1993) Statistical-models for slope instability classification. Eng Geol 36(1–2):91–98 Lee S (2005) Application of logistic regression model and its validation for landslide susceptibility mapping using GIS and remote sensing data journals. Int J Remote Sens 26(7):1477–1491. doi:10.1080/ 01431160412331331012 Lee S, Min K (2001) Statistical analysis of landslide susceptibility at Yongin, Korea. Environ Geol 40(9):1095–1113

123


1443

Lee S, Ryu JH, Kim IS (2007) Landslide susceptibility analysis and its verification using likelihood ratio, logistic regression, and artificial neural network models: case study of Youngin, Korea. Landslides 4(4):327–338. doi:10.1007/s10346-007-0088-x Long NT (2008) Landslide susceptibility mapping of the mountainous area in A Luoi distric, Thua Thien Hue province, Vietnam. PhD Thesis, Vrije University Brussel, Brussel Magliulo P, Di Lisio A, Russo F, Zelano A (2008) Geomorphology and landslide susceptibility assessment using GIS and bivariate statistics: a case study in southern Italy. Nat Hazards 47(3):411–435. doi: 10.1007/s11069-008-9230-x Menard SW (1995) Applied logistic regression analysis. SAGE, Thousand Oaks My NQ (2007) Construction of the environmental hazard zonation map for northwest territory of Vietnam. Vietnam Geography Association, Hanoi Nandi A, Shakoor A (2008) Application of logistic regression model for slope instability prediction in Cuyahoga River Watershed, Ohio, USA. Georisk 1:12 Ohlmacher GC (2007) Plan curvature and landslide probability in regions dominated by earth flows and earth slides. Eng Geol 91(2–4):117–134. doi:10.1016/j.enggeo.2007.01.005 Ohlmacher GC, Davis JC (2003) Using multiple logistic regression and GIS technology to predict landslide hazard in northeast Kansas, USA. Eng Geol 69(3–4):331–343. doi:10.1016/s0013-7952(03)00069-3 Oztekin B, Topal T (2005) GIS-based detachment susceptibility analyses of a cut slope in limestone, Ankara-Turkey. Environ Geol 49(1):124–132. doi:10.1007/s00254-005-0071-6 Suzen ML, Doyuran V (2004) A comparison of the GIS based landslide susceptibility assessment methods: multivariate versus bivariate. Environ Geol 45(5):665–679. doi:10.1007/s00254-003-0917-8 Thach NN, Xuan NT, My NQ, Quynh PV, Minh ND, Hoa DB, Bao DV, Dan NV, Thuy TV, Hien NT (2002) Application of remote sensing and geographical information system (GIS) for research and forecast of natural hazard in Hoa Binh province. National University Hanoi, Hanoi Thinh DV, Dong NP, Hong PM, Hung PV, Khoi TN, Ke TD, Phu DV, Thang PX, Thanh PV, Thang PH, Thay BV, Thinh NT, Thien TV, Tu MT, Vinh BX (2005) The investigated report of natural hazard in the Northwest of Vietnam Northern Geological Mapping Division, Hanoi Van Den Eeckhaut M, Marre A, Poesen J (2006) Comparison of two landslide susceptibility assessments in the Champagne-Ardenne region (France). Geomorphology 115(1–2):141–155. doi:10.1016/j.geomorph. 2009.09.042 Van Westen CJ (1997) Statistical landslide hazard analysis. ILWIS 2.1 for Windows Application guide. ITC Publication, Enschede Van Westen CJ, Rengers N, Terlien MTJ, Soeters R (1997) Prediction of the occurrence of slope instability phenomena through GIS-based hazard zonation. Geologische Rundschau 86(2):404–414 Van TT, Tuy PK, Giap NX, Ke TD, Thai TN, Giang NT, Tho HM, Tuat LT, San DN, Hung LQ, Chung HT, Hoan NT (2002) Assessment and prediction of geological hazards in the 8 coastal provinces of central Vietnam from Quang Binh to Phu Yen—current status, causes, prediction and recommendation of remedial measures. Vietnam Institute of Geoscience and Mineral Resources, Hanoi Van TT, Anh DT, Hieu HH, Giap NX, Ke TD, Nam TD, Ngoc D, Ngoc DTY, Thai TN, Thang DV, Tinh NV, Tuat LT, Tung NT, Tuy PK, Viet HA (2006) Investigation and assessment of the current status and potential of landslide in some sections of the Ho Chi Minh Road, National Road 1A and proposed remedial measures to prevent landslide from threat of safety of people, property, and infrastructure. Vietnam Institute of Geoscience and Mineral Resources, Hanoi Varnes DJ (1984) Landslide hazard zonation: a review of principles and practice. UNESCO, Paris Wang HB, Liu GJ, Xu WY, Wang GH (2005) GIS-based landslide hazard assessment: an overview. Prog Phys Geogr 29(4):548–567. doi:10.1191/0309133305pp462ra Wieczorek GF (1984) Preparing a detailed landslide-inventory map for hazard evaluation and reduction. Bull As Eng Geol 21(3):337–342 Yalcin A (2008) GIS-based landslide susceptibility mapping using analytical hierarchy process and bivariate statistics in Ardesen (Turkey): comparisons of results and confirmations. Catena 72(1):1–12. doi: 10.1016/j.catena.2007.01.003 Yesilnacar E, Topal T (2005) Landslide susceptibility mapping: a comparison of logistic regression and neural networks methods in a medium scale study, Hendek region (Turkey). Eng Geol 79(3–4):251– 266. doi:10.1016/j.enggeo.2005.02.002 Yilmaz I (2009a) A case study from Koyulhisar (Sivas-Turkey) for landslide susceptibility mapping by artificial neural networks. Bull Eng Geol Environ 68(3):297–306. doi:10.1007/s10064-009-0185-2 Yilmaz I (2009b) Landslide susceptibility mapping using frequency ratio, logistic regression, artificial neural networks and their comparison: a case study from Kat landslides (Tokat-Turkey). Comput Geosci 35(6):1125–1138. doi:10.1016/j.cageo.2008.08.007

123


Nat Hazards (2011) 59:1413–1444

Yilmaz I (2010) The effect of the sampling strategies on the landslide susceptibility mapping by conditional probability and artificial neural networks. Environ Earth Sci 60(3):505–519. doi:10.1007/s12665-0090191-5 Zhou G, Esaki T, Mitani Y, Xie M, Mori J (2003) Spatial probabilistic modeling of slope failure using an integrated GIS Monte Carlo simulation approach. Eng Geol 68(3–4):373–386

123

Paper II Tien Bui, D., Pradhan, B., Lofman, O., Revhaug, I., Dick, O.B., 2011. Landslide susceptibility mapping at Hoa Binh province (Vietnam) using an adaptive neuro-fuzzy inference system and GIS. Computers & Geosciences, 45, 199-211.

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution and sharing with colleagues. Other uses, including reproduction and distribution, or selling or licensing copies, or posting to personal, institutional or third party websites are prohibited. In most cases authors are permitted to post their version of the article (e.g. in Word or Tex form) to their personal website or institutional repository. Authors requiring further information regarding Elsevier’s archiving and manuscript policies are encouraged to visit: http://www.elsevier.com/copyright

Author's personal copy Computers & Geosciences 45 (2012) 199–211

Contents lists available at SciVerse ScienceDirect

Computers & Geosciences journal homepage: www.elsevier.com/locate/cageo

Landslide susceptibility mapping at Hoa Binh province (Vietnam) using an adaptive neuro-fuzzy inference system and GIS Dieu Tien Bui a,b,n, Biswajeet Pradhan c, Owe Lofman a, Inge Revhaug a, Oystein B. Dick a a

Department of Mathematical Sciences and Technology, Norwegian University of Life Sciences, P.O. Box 5003-IMT, N-1432 Aas, Norway Faculty of Surveying and Mapping, Hanoi University of Mining and Geology, Dong Ngac, Tu Liem, Hanoi, Vietnam c Institute of Advanced Technology, Spatial and Numerical Modelling Laboratory, University Putra Malaysia, Serdang, Selangor Darul Ehsan 43400, Malaysia b

a r t i c l e i n f o

abstract

Article history: Received 25 July 2011 Received in revised form 6 October 2011 Accepted 7 October 2011 Available online 12 November 2011

The objective of this study is to investigate a potential application of the Adaptive Neuro-Fuzzy Inference System (ANFIS) and the Geographic Information System (GIS) as a relatively new approach for landslide susceptibility mapping in the Hoa Binh province of Vietnam. Firstly, a landslide inventory map with a total of 118 landslide locations was constructed from various sources. Then the landslide inventory was randomly split into a testing dataset 70% (82 landslide locations) for training the models and the remaining 30% (36 landslides locations) was used for validation purpose. Ten landslide conditioning factors such as slope, aspect, curvature, lithology, land use, soil type, rainfall, distance to roads, distance to rivers, and distance to faults were considered in the analysis. The hybrid learning algorithm and six different membership functions (Gaussmf, Gauss2mf, Gbellmf, Sigmf, Dsigmf, Psigmf) were applied to generate the landslide susceptibility maps. The validation dataset, which was not considered in the ANFIS modeling process, was used to validate the landslide susceptibility maps using the prediction rate method. The validation results showed that the area under the curve (AUC) for six ANFIS models vary from 0.739 to 0.848. It indicates that the prediction capability depends on the membership functions used in the ANFIS. The models with Sigmf (0.848) and Gaussmf (0.825) have shown the highest prediction capability. The results of this study show that landslide susceptibility mapping in the Hoa Binh province of Vietnam using the ANFIS approach is viable. As far as the performance of the ANFIS approach is concerned, the results appeared to be quite satisfactory, the zones determined on the map being zones of relative susceptibility. & 2011 Elsevier Ltd. All rights reserved.

Keywords: Adaptive neuro-fuzzy inference system (ANFIS) Landslide susceptibility GIS Hoa Binh province Vietnam

1. Introduction In recent years, the occurrences of natural disasters in Vietnam have increased significantly mostly due to the effect of the climate change. The northwest mountainous area of Vietnam is one of the regions, which is heavily affected by landslide activities and flooding events. In most of the cases, landslides occurred following heavy rainfalls and especially during tropical rainstorms. However, to date, little effort has been made to assess or forecast these events. Through scientific analysis of landslides, geoscientists and civil engineers can assess and delineate landslide-susceptible areas, offering the potential for a decrease in landslide damage through proper slope management (Pradhan, 2011). So research on landslide susceptibility mapping for the Northwest area, including the Hoa Binh province, is an urgent task in Vietnam. The result of landslide research may provide valuable information that helps to forecast such events as

n Corresponding author at: Department of Mathematical Sciences and Technology, Norwegian University of Life Sciences, P.O. Box 5003-IMT, N-1432 Aas, Norway. Tel.: þ 47 64965424. E-mail addresses: [email protected], [email protected] (D. Tien Bui).

0098-3004/$ - see front matter & 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.cageo.2011.10.031

well as to find measures to mitigate subsequent losses to future landslides. Many techniques and methods have been proposed and developed in the landslide literature. Basically, they can be divided into direct and indirect methods (Guzzetti et al., 1999). Direct methods can be the methods of using total stations, global satellite navigation systems, or other methods for landslide surveying (Van Westen et al., 2003). Through direct methods, using experiences and knowledge of experts, the degree of susceptibility can be directly determined. However, in such cases fieldworks are extensively required, so they are time-consuming and not cost effective. Whereas in the case of the indirect methods, usually a landslide inventory map is used in conjunction with the landslide conditioning factors (Van Westen et al., 2003). Generally, the landslide susceptibility levels are determined based on the correlation between the existing landslide inventory and the conditioning factors as mentioned above. However, the accuracy of landslide susceptibility maps depends on the level of accuracy of mapping of different conditioning factors and landslide inventory. In recent years, some newer approaches for landslide analysis have been carried out such as artificial neural networks (Ercanoglu, 2005; Ermini et al., 2005; Lee et al., 2003, 2004; Nefeslioglu et al.,


D. Tien Bui et al. / Computers & Geosciences 45 (2012) 199–211

2008; Pradhan and Lee, 2010a, c; Pradhan et al., 2010a), fuzzy logic (Akgun et al., 2011; Ercanoglu and Gokceoglu, 2002, 2004; Kanungo et al., 2008; Pradhan, 2010a), decision tree (Nefeslioglu et al., 2010), and neuro-fuzzy (Kanungo et al., 2006; Oh and Pradhan, 2011; Pradhan et al., 2010b; Sezer et al., 2011; Vahidnia et al., 2010). Generally, these approaches give rise to qualitative and quantitative maps of the landslide hazard areas, and the spatial results are appealing (Pradhan, 2010b). In the case of the neuro-fuzzy, which is a combination of fuzzy logic and neural networks, Kanungo et al. (2006) used the weights of the layers obtained from the trained neural network integrated with the ratings obtained from fuzzy logic to obtain a landslide susceptibility index. The membership degrees of each layer class were determined based on the relationship of existing landslides with the classes. Vahidnia et al. (2010) used the output of a fuzzy inference system as target for a neural network. There is no doubt that expert knowledge played an important role to obtain the accuracy of these results. And the subjectivity is not easy to eliminate. Another combined method, developed by Jang (1993), is the Adaptive Neuro-Fuzzy Inference System (ANFIS). This method, using the Takagi–Sugeno rule format, is the combination of an optimized premise membership function (gradient descent) with an optimized consequent equation (linear least squares estimator). Based on a given input and target, ANFIS can construct a fuzzy inference system where their membership function parameters are to be adjusted using the hybrid learning algorithm. ANFIS has been widely used in modeling complex systems (Soyguder and Alli, 2009). However, its application in landslide studies is still limited to a very few cases. Pradhan et al. (2010b) used eight landslide conditioning factors for the ANFIS model for landslide susceptibility mapping in a study area in Malaysia. Their results indicated a very high prediction accuracy of 97%. Oh and Pradhan (2011) and Sezer et al. (2011) have also used an ANFIS based model for landslide susceptibility mapping in different parts of Malaysia. However, the disadvantage of these approaches is that it is difficult to objectively determine the epoch where the landslide model starts over-fitting in the training phase. Expert opinions are suggested used to determine the number of membership functions of the inputs, the physical meanings of the inputs, and the number of training epochs for preventing overlearning. Oh and Pradhan (2011) also asserted that, in order to check the performance of the neuron-fuzzy model in landslide susceptibility mapping, and for the method to be more generally applied, more case studies should be conducted. The main difference between this study and the aforementioned literature is that the ANFIS was applied for landslide susceptibility assessment in the Hoa Binh province of Vietnam, using the subtractive clustering method proposed by Chiu (1994). This is the automated data-driven based method for constructing the primary fuzzy models. The main advantages of this method are that it can process a large number of input observations and avoid the explosion of the rule base (Eftekhari and Katebi, 2008). In addition, the method proposed by Jang et al. (1997) was used to control over-fitting of the landslide model by testing the fuzzy inference system (FIS), trained on the training data against the checking data. Using the six different membership functions, six ANFIS models were constructed. Finally, the comparison was carried out to find the most suitable model for the study area.

2. Study area The Hoa Binh province is situated at the monsoonal region in the North West part of Vietnam. It covers an area of about 4660 km2 (Fig. 1) between the longitude 1041480 E and 1051500 E, and between the latitude 201170 N and 211080 N. The altitude decreases from

Fig. 1. Landslide inventory of the study area.

Northwest to Southeast and varies in the range from 0 to 1510 m. This province is a ‘‘transition’’ area situated between the Northwest Mountains and the Red River delta. The climate of this region is characterized by high temperatures and high humidity, with two distinct rainy and dry seasons. The rainy season is mainly from May to October with the total rainfall accounting to 84–90% of the yearly rainfall. The highest rainfall frequency and intensity occurs during August and September with rainfall peaks that vary between 300 and 400 mm per month.

3. Data used Since landslide occurrences in the past and present are keys to future spatial prediction (Guzzetti et al., 1999), a landslide inventory map is a pre-requisite for such a study. The landslide inventory map of the study area was compiled by inheriting the landslide locations from three projects: (1) the landslide inventory map in the Northern mountainous province (Hue et al., 2004); (2) the landslide inventory map 2005 (Thinh et al., 2005); and (3) the landslide inventory map 2007 (My, 2007). Four recent landslide locations were identified on SPOT satellite imagery (5 August 2009) with spatial resolution of 2.5 m and other supplementary information. Fieldworks were conducted to verify all the recent landslide locations. A total of 118 landslides that occurred during the last ten years were identified and registered in the landslide inventory map (Fig. 1). The size of the smallest landslide is about 383 m2. The largest landslide covers an area of 14,343 m2. The average landslide size in general is 3443 m2. Bui et al. (2011) examined the correlations between landslide occurrence and ten landslide conditioning factors: slope, aspect, curvature, lithology, land use, soil type, rainfall, distance to roads, distance to rivers, and distance to faults for the Hoa Binh province of Vietnam. Based on the findings, we selected the ten aforementioned factors for landslide modeling for this study. A digital Elevation Model (DEM), with a resolution of 20 m, was generated from the national topographic maps in 1:25,000 scale having a contour interval of 10 m. The slope, aspect, and curvature layers were extracted from the DEM. The slope map was grouped into six different classes e.g., 01–101, 101–201, 201–301, 301–401, 401–501, 4501 (Fig. 2a). The aspect map was prepared with ten conventional classes (Fig. 2b). The slope curvature map was compiled with three categories: convex, concave, and flat (Fig. 2c).

Author's personal copy D. Tien Bui et al. / Computers & Geosciences 45 (2012) 199–211

The lithology and tectonic faults were extracted from the Geological and Mineral Resources Map of Vietnam at the scale of 1:200,000. This is the only geological map available for the study area. Based on the criteria of material components (clay composition), degree of weathering, and estimated strength and density, the lithology map (Fig. 2d) was classified into seven

201

groups (Van et al., 2006, 2002): quaternary deposits; sedimentary aluminosilicate rocks and sedimentary quartz rocks; sedimentary carbonate rocks; mafic–ultramafic magma rocks; acid-neutral magmatic rocks; metamorphic rock with rich aluminosilicate component; metamorphic rock with rich quartz component (more details are presented in the Appendix). Subsequently,

Fig. 2. Ten input landslide conditioning factors used for ANFIS: (a) slope; (b) aspect; (c) curvarure; (d) lithology; (e) distance to faults; (f) soil type; (g) land use; (h) distance to roads; (i) distance to rivers; (k) rainfall.



Fig. 2. (continued)

a distance to faults map with six classes was constructed based on the geologic faults (Fig. 2e). The soil type map was compiled from the National Pedology Map (NPM) on a scale of 1:100,000. From the 27 original data layers of NPM, a generalization process was conducted to form 13 data categories (Fig. 2f). This is the only soil map available for the study area. The land use map (Fig. 2g) was compiled from the Hoa Binh Land Use Status Map on a scale of 1:50,000. A total of twelve data categories were formed from 53 original land use types using expert opinions (Fig. 2g). The roads and rivers were extracted from the national topographic map at the scale 1:50,000, and were used to construct the distance to roads map (Fig. 2h) and distance to rivers map (Fig. 2i), respectively. An average rainy season precipitation index, which was recorded from 1973 to 2002, was used to construct the rainfall map (Fig. 2k) using the Inverse Distance Weighed method. The precipitation data was extracted from the database of the Institute of Meteorology and Hydrology, Vietnam.

4. Landslide susceptibility mapping using ANFIS 4.1. Adaptive neuro-fuzzy inference system 4.1.1. Fuzzy inference system Fuzzy inference is the process of using fuzzy logic to formulate a non-linear mapping from input data to an output. A fuzzy

inference system (FIS) consists of three components: (1) a rule base, which contains a selection of fuzzy rules, (2) a data base, which defines the membership functions used in the fuzzy rules, and (3) a reasoning mechanism that performs the inference procedure on the rules and given facts (Alakhras, 2005; Jang et al., 1997; Kumanan et al., 2008). There are three types of FIS most frequently used in fuzzy logic applications: (1) the Mamdani model; (2) the Takagi and Sugemo model (TKS); and (3) the Tsukamoto model. The difference between them is mainly in the way of determining of the consequent constituents. In this study, the Takagi and Sugemo fuzzy inference system was used. 4.1.2. Structure of ANFIS ANFIS is a multi-layer feed forward structure, formed by a number of nodes connected through directional links. ANFIS possesses the advantages of both fuzzy logic and artificial neural networks; it complements the low-level learning and computation power of artificial neural networks with fuzzy systems, and in contrast, it brings the high-level if–then rule reasoning in fuzzy systems to artificial neural networks (Keskin et al., 2006). Fig. 3 shows the typical structure of ANFIS with two input variables x, y and one output f. For the first order Sugemo fuzzy model two fuzzy if–then rules were employed (Takagi and Sugeno, 1985): If x is A1 and y is B1 then f 1 ¼ p1 x þ q1 yþ r 1 ðrule 1Þ

ð1Þ

If x is A2 and y is B2 then f 2 ¼ p2 x þ q2 y þ r 2 ðrule 2Þ

ð2Þ


203

4.1.3. Hybrid learning algorithm The hybrid learning algorithm of the ANFIS is a combination of the gradient method and the least squares method to adjust the parameters in an adaptive network (Ying and Pan, 2008). Suppose that the training data set has n inputs. The aim of the learning process is to find the optimal FIS parameters that minimize an error function E as follows: E¼

n X

ðf outi t i Þ2

ð9Þ

i¼1

where t i is the target value and f outi is the output value of ANFIS. The output from layer 5 can be written as f ¼ o1 :f 1 þ o2 :f 2 ¼ ðo1 xÞp1 þ ðo1 yÞq1 þ o1 r1 þ ðo2 xÞp2 þ ðo2 yÞq2 þ o2 r 2

ð10Þ This equation can be shortened 2 3 2 o1 :x1 o1 :y1 f out 1 6 7 6 6 f out 2 7 6 o1 :x2 o1 :y2 7 6 f ¼6 6 . . . 7; A ¼ 6 . . . ... 4 5 4 o1 :xn o1 :yn f outn h i RT ¼ p1 q1 r 1 p2 q2 r 2 Fig. 3. The first order Sugemo fuzzy model (a) and a typical ANFIS network architecture (b).

where A1, A2, B1, B2 are the membership functions for inputs x and y; p1, q1, r1, p2, q2, r2 are the parameters of the output function. The fuzzy reasoning for the model is illustrated in Fig. 3. A fuzzy inference system consists of five layers and each layer is formed by several nodes and node functions. There are two types of nodes: adaptive nodes and fixed nodes. Adaptive nodes are marked by squares that represent the parameter sets, which may be adjustable. Fixed nodes are marked by circles, and their parameter sets are fixed in the system. Layer 1: All nodes i in this layer are adaptive nodes O1,i ¼ mAi ðxÞ

ð3Þ

O1,i ¼ mBi ðyÞ

ð4Þ

for i¼1, 2where x and y are the input nodes. A and B are the linguistic labels. mAi ðxÞ and mBi ðyÞ are the membership functions. Layer 2: Every node of this layer is a fixed node. The nodes are Q labeled as and marked by a circle. The output of each node is the product of all the incoming signals O2,i ¼ oi ¼ mAi ðyÞmBi ðyÞ with i ¼ 1,2

ð5Þ

The output node oi presents the firing strength of a rule. Layer 3: Every node of this layer is a fixed node to be marked by a circle and labeled as N. The outputs of this layer are called normalized firing strengths. The output is calculated by the ratio of the i-th rule’s firing strength over the sum of all rules’ firing strength. O3,i ¼ oi ¼ oi =ðo1 þ o2 Þ with i ¼ 1,2

ð6Þ

Layer 4: In this layer, every node is an adaptive node with the node function as follows: ð7Þ O4,i ¼ oi f i ¼ oi ðpi x þqi y þr i Þ where parameters pi ,qi ,r i are called the consequent parameters.

oi is called a normalized firing strength from layer 3. Layer 5: This is the final layer with a single fixed node. It is marked by a circle and labeled as S. The output value is computed as the sum of all incoming signals: O5,i ¼ Si oi f i ¼ Si oi f i =Si oi ;

i ¼ 1,2

ð8Þ

as f ¼ A:R, where 3

o1 o2 :x1 o2 :y1 o2 o1 o2 :x2 o2 :y2 o2 7 7 ...

...

...

7 ... 7 5

o1 o2 :xn o2 :yn o2

The unknown parameters of pi, qi, ri can be obtained using the following equation: R ¼ ðAT AÞ1 AT f T

ð11Þ T

where A and R are the transposes of A and R, respectively. The learning algorithm includes two processes: a forward pass and a backward pass. In the forward pass, the premised parameters are fixed. Using the input training data, the consequent parameters (in the layer 4) are determined using the least squares method as in Eq. (11) and the error calculated using Eq. (9). The backward pass starts immediately when optimal consequent parameters are found (Sengur, 2008). In the backward pass, the consequent parameters are fixed and the errors are propagated backward. The gradient method is used to update the premise parameters a in the following equation:

Da ¼ Zð@E=@aÞ where Z is a learning rate

ð12Þ

4.2. Preparation of training and validation data The generalization capability of the ANFIS network models depends heavily on a sufficient number of training data. However, there is no thumb rule for estimating the number of training data pairs for the ANFIS models. And thus, the determination of the optimal number of training data points for these models is still considered a challenge (Fernandes and Lona, 2005). Generally, in ANFIS modeling, the data needs to be split into two parts such as training and validation data (Dixon, 2005). The training dataset is used for calibrating landslide models, whereas the validation dataset is used to check the landslide models performance as well as to confirm its accuracy (Oh and Pradhan, 2011). In this study, the landslide database was randomly partitioned into two parts: (1) part 1, which accounts for 70% (82 landslide areas comprising 684 landslide grid cells), is used in the training phase of the ANFIS models. (2) part 2 is the validation data set, the rest 30% (36 landslide areas comprising of 315 landslide grid cells) is used for the validation of the models. All of the landslide grid cells denoting the presence of landslide were assigned the value of 1. Since the ANFIS models work efficiently if the training pixels represent the features of the entire study area, 4104 pixels denoting the absence of landslides were randomly sampled from the landslide-free area and assigned a value of 0. And then, the



values of the ten landslide conditioning factors were extracted to build a database. This database contains 4788 pixels in total, with ten landslide conditioning factors and one target variable (presence or absence of landslide). This database was further randomly divided into a training dataset and a checking dataset. This method is recommended to control over-fitting of the models (Jang et al., 1997). The idea of using the checking dataset is that after a certain point in the training phase, the landslide model begins over-fitting. The checking error tends to decrease as the training takes place up to the point where over-fitting begins, then the checking error ¨ glu, 2009). Based on this, the best increases (Taylan and Karagozo˘ models were determined. The training dataset was used for the calibration of the models by adjusting the membership function parameters to best fit the data, whereas the checking dataset was used to control over-fitting of the models. When splitting the data, there is no agreement on mathematical rules for the relative size of the two

subsets. In this study, approximately 80% (3830 data pairs) of the extracted database was randomly split into the training dataset and the remaining 20% (958 data pairs) was used as the checking dataset. Since values of the membership functions of the ANFIS vary between 0 and 1, the input data needed to be scaled in the range of 0–1 (Nayak et al., 2004). Each category in the landslide conditioning factor maps was assigned an attribute sequence number, which may avoid the introduction of diverse types of variables (Caniani et al., 2007). The attribute values were normalized to the range 0.1–0.9 (Table 1) using the Max–Min normalization formula (Fernandes and Lona, 2005) as follows: v0i ¼ ½ðvi minðvÞÞ=ðmaxðvÞminðvÞÞðULÞ þL

ð13Þ

where v0 is the normalized data matrix; v is the original data matrix; U and L are the upper and the lower normalization bounds, respectively.

Table 1 Normalized attributes of categories of landslide conditioning factors used in the ANFIS models. GIS input data layers

Layer classes

Pixels in classes

Landslide pixels

Slope (degree)

0–10 10–20 20–30 30–40 40–50 450

4,919,804 3,346,950 2,326,636 785,451 106,715 4750

2 299 547 143 8 0

1 2 3 4 5 6

0.10 0.26 0.42 0.58 0.74 0.90

Aspect

Flat ( 1) North (0–22.5) Northeast (22.5–67.5) East (67.5–112.5) Southeast (112.5–157.5) South (157.5–202.5) Southwest (202.5–247.5) West (247.5–292.5) Northwest (292.5–337.5) North (337.5–360)

6556 739,496 1,672,941 1,385,498 1,383,072 1,482,483 1,677,042 1,299,469 1,202,391 641,358

0 25 118 78 145 227 263 71 50 22

1 2 3 4 5 6 7 8 9 10

0.10 0.19 0.28 0.37 0.46 0.54 0.63 0.72 0.81 0.90

Curvature

Concave ( ) Flat (0) Convex (þ)

6,320,019 92,419 5,077,868

574 0 425

1 2 3

0.10 0.50 0.90

Lithology

Group Group Group Group Group Group Group

468,851 4,552,855 3,740,521 1,338,571 135,801 645,785 607,922

63 334 271 216 0 78 37

1 2 3 4 5 6 7

0.10 0.23 0.37 0.50 0.63 0.77 0.90

Land use

Grass land (GR) Annual crop land (CR) Natural forest land (NF) Paddy land (PA) Orchard land (OR) Protective forest land (PT) Productive forest land (PD) Non tree rocky mountain(RM) Populated area (PO) Barren land (BR) Specially used forest land (SF) Water (WT)

49,547 184,205 3,666,190 1,053,442 426,524 985,889 1,347,068 468,692 864,840 1,947,114 41,129 455,666

0 2 156 41 25 203 226 72 140 124 0 10

1 2 3 4 5 6 7 8 9 10 11 12

0.10 0.17 0.25 0.32 0.39 0.46 0.54 0.61 0.68 0.75 0.83 0.90

Soil type

Degraded soil (DS) Gley fluvisols (GF) Humic ferralsols (HF)

3006 9043 131,881

0 0 0

1 2 3

0.10 0.17 0.23

1 2 3 4 5 6 7

Attributes

Normalized classes


205

Table 1 (continued ) GIS input data layers

Layer classes

Pixels in classes

Landslide pixels

Attributes

Normalized classes

Rhodic ferralsols (RF) Humic acrisols (HA) Limestone mountain(LM) Eutric fluvisols (EF) Ferralic acrisols (FA) Dystric fluvisols (DF) Dystric Gleysols (DG) Luvisols (LS) Populated area (PA) Water (WT)

1,031,126 3,551,123 1,657,233 400,659 4,196,906 84,133 45,288 52,858 50,440 276,610

34 281 151 61 438 28 6 0 0 0

4 5 6 7 8 9 10 11 12 13

0.30 0.37 0.43 0.50 0.57 0.63 0.70 0.77 0.83 0.90

Rain fall (mm)

1200–1435 1435–1535 1535–1635 1635–1858

1,347,729 2,345,073 4,877,617 2,919,887

13 74 600 312

1 2 3 4

0.10 0.37 0.63 0.90

Distance to roads (m)

0–100 100–200 200–300 300–400 400–500 4500

1,266,378 1,144,182 1,018,549 900,579 791,866 6,368,752

674 103 69 34 25 94

1 2 3 4 5 6

0.10 0.26 0.42 0.58 0.74 0.90


0–100 100–200 200–300 300–400 400–500 4500

2,184,247 1,802,821 1,428,573 1,127,884 891,591 4,055,190

329 141 93 97 84 255

1 2 3 4 5 6

0.10 0.26 0.42 0.58 0.74 0.90

Distance to faults (m)

0–200 200–400 400–700 700–1000 1000–1500 41500

2,078,812 1,832,167 2,285,798 1,644,799 1,768,589 1,880,141

240 116 242 184 160 57

1 2 3 4 5 6

0.10 0.26 0.42 0.58 0.74 0.90

Table 2 Membership functions. No

Type of MF

Descriptions

1 2 3 4 5 6

Gaussmf Gauss2mf Gbellmf Sigmf Dsigmf Psigmf

Gaussian curve membership function. Two-sided Gaussian membership function. Generalized bell curve membership function. Sigmoid curve membership function. Difference of two sigmoid membership functions. Product of two sigmoidal membership functions.

4.3. Training models and generation of landslide susceptibility indexes The capability of ANFIS models depends on the number and the shape of membership functions. The type and numbers of membership functions determine the contribution of each input parameter on the desired output (El-Shafie et al., 2007). In this study, six types of membership functions (MFs) (Table 2) were used and their performances were tabulated and compared. In order to start the training process, the initial FIS was first generated. This step includes a determination of the number of membership functions for each of the ten landslide conditioning factors’ inputs, the shape of the membership functions for the premise part and membership functions for the consequence part of the rules. There are two available methods that have been widely used for generating the initial FIS, the grid partition, and the

subtractive clustering. In this study, the subtractive clustering technique was used, which resulted into 41 if–then rules. The major advantage of this method is that it can process a large number of input observations. This helps in locating cluster centers within the input space. Each cluster center is used to generate one initial fuzzy rule with rough estimates of the membership functions (Buragohain and Mahanta, 2008). The purpose of the subjective clustering is to distill natural groups of data from a large data set, which results in a concise representation of a system’s behavior (Chiu, 1994). The detailed description of this method can be seen in Chiu (1997). The values of the parameters used for generating the initial FIS were set as follows: range of influence is 0.76, squash factor is 1.25, accept ratio is 0.5, and reject ratio 0.15 (Cui et al., 2010). The performances of the trained landslide models were assessed using several statistical evaluation criteria such as the root-mean squared error (RMSE), the coefficient of multiple determination (R2 ), and the values account for VAF (Ayata et al., 2007; Yilmaz and Kaynar, 2011). The RMSE, R2 , and VAF can be evaluated, respectively, using the equations as follows: " RMSE ¼ ð1=nÞ

n X

#0:5 ðf outi t i Þ2

ð14Þ

i¼1

R2 ¼ 1

n X i¼1

ðf out i t i Þ2 =

n X i¼1

! f outi 2

ð15Þ



VAF ¼ ½1varðf outtÞ=varðf outÞ100%

ð16Þ

where var is the variance. The objectives of these statistical values are as follows: the smaller RMSE value the better is the landslide model. A good fit between the output and the target of the model will be when the R2 value is near 1 and in that case the VAF value approaches 100%. The formed ANFIS models were trained using the hybrid learning algorithm. The maximum epoch was set to 1000. The Table 3 ANFIS model structure for landslide modeling. ANFIS parameters

error tolerance was set to 0.001. The computation process was implemented using the software of MATLAB 7.11. The structures of the ANFIS models are presented in Table 3. The error curves of the checking and the training dataset of the ANFIS models in the training process are shown in Fig. 4. The minimum checking error in the epoch is marked by a circle for each case. When the checking error curves increase it indicates that the models start over-fitting. Based on this, the final trained ANFIS models were selected. The final ANFIS models were obtained at epoch 62 for the Gaussmf, epoch 42 for the Gauss2mf, epoch 38 for the Gbellmf, epoch 141 for the Sigmf, epoch 141 for


Number of nodes 915 Number of linear 451 parameters Number of nonlinear 820 parameters Total number of 1271 parameters Number of training 3830 data pairs Number of checking 958 data pairs Number of fuzzy rules 41

915 451

915 451

915 451

915 451

915 451

1640

1230

820

1640

1640

2091

1681

1271

2091

2091

3830

3830

3830

3830

3830

958

958

958

958

958

41

41

41

41

41

Table 4 Summary of the statistical criteria of RMSE, R2, and VAF of the six trained ANFIS models. Model Membership function

Epoch starts over-fitting

Training RMSE

VAF

R2

Checking RMSE

1 2 3 4 5 6

62 42 38 141 141 141

0.1105 0.1231 0.1287 0.1159 0.1063 0.1059

84.40 80.65 78.85 82.87 85.58 85.69

0.891 0.861 0.846 0.879 0.900 0.901

0.1469 0.1641 0.1774 0.1402 0.1502 0.1501


Fig. 4. The training and checking errors of the ANFIS models.


the Dsigmf, and epoch 141 for the Psigmf. The performances of the trained ANFIS models with different MFs are shown in Table 4. The highest performance is obtained with Psigmf. The training RMSE value is 0.1059, the R2 value is 0.901, and the VAF value is 85.69. However, the ANFIS model with Sigmf has the smallest

207

checking error value of 0.1402. In contrast, the lowest performance has the ANFIS model with Gbellmf with the highest RMSE values of 0.1287 and 0.1774 for training and checking, respectively. The R2 and VAF values for training are 0.846 and 78.85, respectively.

Fig. 5. Landslide susceptibility zonation maps using the six ANFIS model with the membership functions of the following: (a) Gaussmf; (b) Gauss2mf; (c) Gbellmf; (d) Sigmf; (e) Dsigmf; (f) Psigmf.



Once the six ANFIS models were successfully trained, six FIS models were determined. The final FIS models were applied for the entire study area to generate landslide susceptibility indexes (LSI). And then, the LSIs were converted to GIS grid data to create the landslide susceptibility maps. Six landslide susceptibility maps were prepared (Fig. 5). There are many methods for ranging the LSI into susceptibility zones. In this study, the classification method (Pradhan and Lee, 2010a, b, c) was used. Cumulative percentage of all observed landslide occurrence against the LSI values of the six models were computed and shown in Fig. 6. The LSI values were classified into five susceptibility classes based on area’s percentage for visual and easy interpretation: very high (10%), high (10%), moderate (20%), low (20%), very low (40%). Frequency ratio analysis was carried out on the classification results and landslide location data (Kanungo et al., 2006). All of the landslide grid cells were overlaid on five landslide susceptibility zones, and frequency ratios were calculated for each of the susceptibility zones. Theoretically, the frequency ratio value should increase from very low to very high susceptibility zones (Pradhan and Lee, 2010b). Fig. 7 shows frequency ratio plots of

five landslide susceptibility zones for the six ANFIS models. Generally, there is a gradual increase in the frequency from the very low susceptible zone to the very high susceptible zone for the study area. Characteristics of the five susceptibility zones of the six ANFIS models are shown in Table 5.

5. Validation and comparison of landslide susceptibility maps Using the success rate and the prediction rate methods, the validation process was performed by comparing the existing landslide data with the six landslide susceptibility maps. (Brenning, 2005; Chung and Fabbri, 1999, 2003; Lee et al., 2007). The success rate results were obtained by comparing the landslide training pixels (684 landslide pixels) with the six landslide susceptibility maps. Fig. 8 shows the success rate curves for the six ANFIS landslide models. The model with Sigmf has the highest area under the curve (AUC) value (0.949), followed by Gaussmf (0.896), Dsigmg (0.889), Psigmf (0.889), Gauss2mf (0.849), and Gbellmf (0.849). Since the success rate method used the training landslide pixels that have already been used for building the landslide models, the success rate is not a suitable method for assessing the prediction capability of the models. However, the success rate method may help to determine how well the resulting landslide susceptibility maps have classified the areas of existing landslides. The prediction rate explains how well the model and predictor variables predicts the landslide (Lee, 2007). Therefore, the area under the prediction rate curves can assess the prediction accuracy qualitatively (Chung and Fabbri, 2003; Lee et al., 2003; Pradhan and Lee, 2010c). Thus, an AUC value varies from 0.5 to 1.0. A value of 1.0 indicates a perfect prediction by the model. A value near 0.5 shows that no or low relationship exists other than by chance. The closer the AUC value gets to 1, the better is the model assumed to be. An AUC value in the range 0.90–1.00 is

Fig. 6. Cumulative percentage of observed landslide occurrence against LSI value.

Fig. 7. Frequency ratio plots of five landslide susceptibility zones of the six ANFIS models.

Fig. 8. Success rate curves of the six ANFIS models.

Table 5 Characteristics of the five susceptibility zones of the six ANFIS models. Landslide susceptibility zone

Percent area

Very high High Moderate Low Very low

10.0 10.0 20.0 20.0 40.0

Percent of landslide Gaussmf

Gauss2mf

Gbellmf

Sigmf

Dsigmf

Psigmf

80.48 5.51 2.80 1.40 9.81

73.07 8.21 3.60 1.21 13.91

69.57 8.01 6.40 1.51 14.51

85.49 4.70 3.30 1.10 5.41

72.97 9.21 6.71 1.00 10.11

73.07 8.31 7.21 1.30 10.11


Fig. 9. Prediction rate curves of the six ANFIS models.

considered as indicator of excellent model quality while a value in the range 0.80–0.90 indicates that the model quality is good. For a value in the interval [0.70–0.80], the model quality is fair and the values of [0.60–0.70] and [0.50–0.60] indicates poor and fail, respectively (Abul Hasanat et al., 2010). The prediction rate validation was carried out using the landslide grid cells in the validation dataset, not used in the training phase (i.e. 36 landslide areas with 315 landslide grid cells). Prediction rate curves and AUC of the six ANFIS models are shown in Fig. 9. The model with Sigmf has the highest AUC value (0.848), followed by Gaussmf (0.825), Dsigmf (0.807), Psigmf (0.806), Gauss2mf (0.803), and Gbellmf (0.739). It indicates that the ANFIS model with Gbellmf has a prediction capacity of fair, and that the rest of the ANFIS models have a good prediction capacity.

6. Concluding remarks In this study, an adaptive neuro-fuzzy inference system (ANFIS) was applied to assess the regional landslide susceptibility for the Hoa Binh province of Vietnam. The landslide inventory, comprising all significant landslide events that occurred during the last ten years, was used in the analysis. Among the total of 118 landslide areas, 82 cases with 864 landslide grid cells were selected for the ANFIS training process and the remaining 36 cases with 315 landslide grid cells were used for the model validation. Ten landslide conditioning factors, namely slope, lithology, rainfall, soil type, land use, aspect, distance to roads, distance to rivers, distance to faults, and curvature, were used in this analysis. In general, there are three steps in the ANFIS modeling for landslide analysis in this study: (1) Construction of the ANFIS models; (2) training the models; and (3) Operating the models. In the model construction step, the subtractive clustering method was used to generate the initial FIS and the structure parameters were determined. The training step was conducted to determine the optimal parameters, and in the final step, the trained ANFIS model was applied for the entire study area to generate the landslide susceptibility maps. The ANFIS model has capability to preserve the advantages of both fuzzy logic and artificial neural networks, and can be considered as a robust method for landslide modeling. However, ANFIS is very sensitive to over fitting, so the training process should be carefully implemented. For this reason, in this study, the training database was split into the training set (80%) and the checking set (20%). The training set was used for calibrating the

209

ANFIS models, whereas the checking set was used to control the over-fitting in the training phase. A total of six ANFIS models with the hybrid training algorithm and six different membership functions (Gaussmf, Gauss2mf, Gbellmf, Sigmf, Dsigmf, Psigmf) were used. Model results were validated using the known landslide locations in the validation dataset using the prediction rate method. The areas under the prediction curves were used to assess and compare the prediction capabilities of these models. AUC results showed that the ANFIS model using Gbellmf has a fair quality with the lowest prediction capacity. The five remaining models have good prediction capabilities. The highest prediction capability has the ANFIS model with Sigmf, followed by Gaussmf, Dsigmf, Psigmf, and Gauss2mf. A limitation of the ANFIS modeling in this study is that the relative importance of the ten input factors was not assessed. Additionally, the determination of optimal range influence to generate the initial FIS is not easy. A large value of the range of influence will produce fewer clusters and hence a coarser model. In contrast, a small value of the range of influence may produce an excessive number of clusters, resulting in a less optimal model (Chiu, 1997). In summary, the results of this study suggest that landslide susceptibility mapping for the Hoa Binh province of Vietnam is viable. The maps may be helpful for planners, decision makers, and engineers in slope management and land use planning in the study area. This map is produced on a regional scale, so further study needs to be carried out at the site-specific level.

Acknowledgment This research was funded by the Norwegian Quota scholarship program. The data analysis and write-up were carried out as a part of the first author’s Ph.D. studies at the Geomatics section, Department of Mathematical Sciences and Technology, Norwegian University of Life Sciences, Norway. Thanks to the anonymous reviewers for their valuable and constructive comments, which were useful to revise the manuscript.

Appendix Characteristics of lithology groups, which were used in this study. Group 1: Quaternary deposits: Primary distribution in plains and river valleys, characterized by incoherent textures, diversified components, abundant material size, essential alluvial facies. Group 2: Sedimentary aluminosilicate rocks and sedimentary quartz rocks: Consisting of pebbles, cobble, gravel, gritstone, sandstone, siltstone, claystone, carbonates, alternated rhyolites, dacites, andesite sediments, and tuff. Sedimentary quartz rocks consist of quartz–mica sandstone, quartzitic sandstone, cherty shale, etc. Group 3: Sedimentary carbonate rocks: consisting of limestone, dolomitized limestone, cherty limestone, clayish limestone. Group 4: Mafic–ultramafic magma rocks: Consisting of dunit, peridotit, pyroxenit, tremolite schist, artinolite schist, gabbro– pyroxenit, gabbro–amphibolit, gabbro–norit, gabbro–anorthosit, gabbro–diorit, gabbro–diabas, diabas, mafic bazan olivin, bazan tholeite, bazan dolerite, etc. Group 5: Acid-neutral magmatic rocks: The extrusive magmas consist of rhyolite, dacite, felsite, and andesite rocks. The intrusive granite magmas consist of plagioclase–granite, granophyre, granosyenite, granodiorite, diorite, and quartz–diorite. Group 6: Metamorphic rock with rich aluminosilicate component: The high-rank metamorphic rocks consist of biotite



–garnet–gneiss, amphibole–biotite–lagiogneiss, biotite–amphibolite, plagioclase–migmatite, quartz–biotite schist, biotite schist, etc. The low-rank metamorphic rocks consist of green schist, chlorite schist, sericite schist, and quartz–sericite schist. Group 7: Metamorphic rock with rich quartz component: Consists of quartz mica–schist, quartz sericite–schist, quartzite, and sericite–quartzite.

References Abul Hasanat, M., Ramachandram, D., Mandava, R., 2010. Bayesian belief network learning algorithms for modeling contextual relationships in natural imagery: a comparative study. Artificial Intelligence Review 34, 291–308. Akgun, A., Sezer, E.A., Nefeslioglu, H.A., Gokceoglu, C., Pradhan, B., 2011. An easyto-use MATLAB program (MamLand) for the assessment of landslide susceptibility using a Mamdani fuzzy algorithm. Computers & Geosciences. doi:10.1016/j.cageo.2011.04.012 (Article on-line first avaiable). Alakhras, M.N.Y., 2005. Neural network-based fuzzy inference system for exchange rate prediction. Journal of Computer Science, 112–120. Ayata, T., C - am, E., Yıldız, O., 2007. Adaptive neuro-fuzzy inference systems (ANFIS) application to investigate potential use of natural ventilation in new building designs in Turkey. Energy Conversion and Management 48, 1472–1479. Brenning, A., 2005. Spatial prediction models for landslide hazards: review, comparison and evaluation. Natural Hazards and Earth System Sciences 5, 853–862. Bui, D., Lofman, O., Revhaug, I., Dick, O., 2011. Landslide susceptibility analysis in the Hoa Binh province of Vietnam using statistical index and logistic regression. Natural Hazards, 1–32. doi:10.1007/s11069-011-9844-2. Buragohain, M., Mahanta, C., 2008. A novel approach for ANFIS modelling based on full factorial design. Applied Soft Computing 8, 609–625. Caniani, D., Pascale, S., Sdao, F., Sole, A., 2007. Neural networks and landslide susceptibility: a case study of the urban area of Potenza. Natural Hazards 45, 55–72. Chiu, S.L., 1994. Fuzzy model identification based on cluster estimation. Journal of Intelligent and Fuzzy Systems 2, 267–278. Chiu, S.L., 1997. An efficient method for extracting fuzzy classification rules from high dimensional data. Advanced Computational Intelligence 1, 1–7. Chung, C.F., Fabbri, A.G., 1999. Probabilistic prediction models for landslide hazard mapping. Photogrammetric Engineering & Remote Sensing 65, 1389–1399. Chung, C.J.F., Fabbri, A.G., 2003. Validation of spatial prediction models for landslide hazard mapping. Natural Hazards 30, 451–472. Cui, Z.-D., Tang, Y.-Q., Yan, X.-X., Yan, C.-L., Wang, H.-M., Wang, J.-X., 2010. Evaluation of the geology-environmental capacity of buildings based on the ANFIS model of the floor area ratio. Bulletin of Engineering Geology and the Environment 69, 111–118. Dixon, B., 2005. Applicability of neuro-fuzzy techniques in predicting groundwater vulnerability: a GIS-based sensitivity analysis. Journal of Hydrology 309, 17–38. Eftekhari, M., Katebi, S.D., 2008. Extracting compact fuzzy rules for nonlinear system modeling using subtractive clustering, GA and unscented filter. Applied Mathematical Modelling 32, 2634–2651. El-Shafie, A., Taha, M., Noureldin, A., 2007. A neuro-fuzzy model for inflow forecasting of the Nile river at Aswan high dam. Water Resources Management 21, 533–556. Ercanoglu, M., 2005. Landslide susceptibility assessment of SE Bartin (West Black Sea region, Turkey) by artificial neural networks. Natural Hazards and Earth System Sciences 5, 14. Ercanoglu, M., Gokceoglu, C., 2002. Assessment of landslide susceptibility for a landslide-prone area (north of Yenice, NW Turkey) by fuzzy approach. Environmental Geology 41, 720–730. Ercanoglu, M., Gokceoglu, C., 2004. Use of fuzzy relations to produce landslide susceptibility map of a landslide prone area (West Black Sea Region, Turkey). Engineering Geology 75, 229–250. Ermini, L., Catani, F., Casagli, N., 2005. Artificial neural networks applied to landslide susceptibility assessment. Geomorphology 66, 327–343. Fernandes, F.A.N., Lona, L.M.F., 2005. Neural network applications in polymerization processes. Brazilian Journal of Chemical Engineering Geology 22, 401–418. Guzzetti, F., Carrara, A., Cardinali, M., Reichenbach, P., 1999. Landslide hazard evaluation: a review of current techniques and their application in a multiscale study, Central Italy. Geomorphology 31, 181–216. Hue, T.T., Duong, T.V., Toan, D.V., Nghinh, L.T., Minh, V.C., Pho, N.V., Xuan, P.T., Hoan, L.T., Huyen, N.X., Pha, P.D., Chinh, V.V., Thom, B.V., 2004. Investigation and assessment of the types of geological hazard in the territory of Vietnam and recommendation of remedial measures. Phase II: a study of the Northern mountainous province. Vietnam Academy of Science and Technology— Institute of Geological Sciences, Hanoi, p. 361. Jang, J.S.R., 1993. ANFIS: adaptive-network-based fuzzy inference system. IEEE Transactions on Systems Man and Cybernatic 23, 665–685. Jang, J.S.R., Sun, C.T., Mizutani, E., 1997. Neuro-Fuzzy and Soft Computing: A Computational Approach to Learning and Machine Intelligence (Matlab Curriculum Series). Prentice Hall.

Kanungo, D., Arora, M., Gupta, R., Sarkar, S., 2008. Landslide risk assessment using concepts of danger pixels and fuzzy set theory in Darjeeling Himalayas. Landslides 5, 407–416. Kanungo, D.P., Arora, M.K., Sarkar, S., Gupta, R.P., 2006. A comparative study of conventional, ANN black box, fuzzy and combined neural and fuzzy weighting procedures for landslide susceptibility zonation in Darjeeling Himalayas. Engineering Geology 85, 347–366. Keskin, M.E., Taylan, D., Terzi, O., 2006. Adaptive neural-based fuzzy inference system (ANFIS) approach for modelling hydrological time series. Hydrological Sciences, 51. Kumanan, S., Jesuthanam, C.P., Kumar, R.A., 2008. Application of multiple regression and adaptive neuro fuzzy inference system for the prediction of surface roughness. International Journal of Advanced Manufacturing Technology 35, 778–788. Lee, S., 2007. Landslide susceptibility mapping using an artificial neural network in the Gangneung area, Korea. International Journal of Remote Sensing 28, 4763–4783. Lee, S., Ryu, J.H., Kim, I.S., 2007. Landslide susceptibility analysis and its verification using likelihood ratio, logistic regression, and artificial neural network models: case study of Youngin, Korea. Landslides 4, 327–338. Lee, S., Ryu, J.H., Lee, M.J., Won, J.S., 2003. Use of an artificial neural network for analysis of the susceptibility to landslides at Boun, Korea. Environmental Geology 44, 820–833. Lee, S., Ryu, J.H., Won, J.S., Park, H.J., 2004. Determination and application of the weights for landslide susceptibility mapping using an artificial neural network. Engineering Geology 71, 289–302. My, N.Q., 2007. Construction of the environmental hazard zonation map for northwest territory of Vietnam. Vietnam Geography Assosiation, Hanoi, p. 98. Nayak, P.C., Sudheer, K.P., Rangan, D.M., Ramasastri, K.S., 2004. A neuro-fuzzy computing technique for modeling hydrological time series. Journal of Hydrology 291, 52–66. Nefeslioglu, H.A., Gokceoglu, C., Sonmez, H., 2008. An assessment on the use of logistic regression and artificial neural networks with different sampling strategies for the preparation of landslide susceptibility maps. Engineering Geology 97, 171–191. Nefeslioglu, H.A., Sezer, E., Gokceoglu, C., Bozkir, A.S., Duman, T.Y., 2010. Assessment of landslide susceptibility by decision trees in the metropolitan area of Istanbul, Turkey. Mathematical Problems in Engineering. Oh, H.-J., Pradhan, B., 2011. Application of a neuro-fuzzy model to landslidesusceptibility mapping for shallow landslides in a tropical hilly area. Computers & Geosciences 37, 1264–1276. Pradhan, B., 2010a. Application of an advanced fuzzy logic model for landslide susceptibility analysis. International Journal of Computational Intelligence Systems 3, 370–381. Pradhan, B., 2010b. Remote sensing and GIS-based landslide hazard analysis and cross-validation using multivariate logistic regression model on three test areas in Malaysia. Advances in Space Research 45, 1244–1256. Pradhan, B., 2011. Manifestation of an advanced fuzzy logic model coupled with Geo-information techniques to landslide susceptibility mapping and their comparison with logistic regression modelling. Environmental and Ecological Statistics 18, 471–493. Pradhan, B., Lee, S., 2010a. Delineation of landslide hazard areas on Penang Island, Malaysia, by using frequency ratio, logistic regression, and artificial neural network models. Environmental Earth Sciences 60, 1037–1054. Pradhan, B., Lee, S., 2010b. Landslide susceptibility assessment and factor effect analysis: backpropagation artificial neural networks and their comparison with frequency ratio and bivariate logistic regression modelling. Environmental Modelling & Software 25, 747–759. Pradhan, B., Lee, S., 2010c. Regional landslide susceptibility analysis using backpropagation neural network model at Cameron Highland, Malaysia. Landslides 7, 13–30. Pradhan, B., Lee, S., Buchroithner, M.F., 2010a. A GIS-based back-propagation neural network model and its cross-application and validation for landslide susceptibility analyses. Computers Environment and Urban Systems 34, 216–235. Pradhan, B., Sezer, E.A., Gokceoglu, C., Buchroithner, M.F., 2010b. Landslide susceptibility mapping by neuro-fuzzy approach in a landslide-prone area (Cameron Highlands, Malaysia). IEEE Transactions on Geoscience and Remote Sensing 48, 4164–4177. Sengur, A., 2008. Wavelet transform and adaptive neuro-fuzzy inference system for color texture classification. Expert Systems with Applications 34, 2120–2128. Sezer, E.A., Pradhan, B., Gokceoglu, C., 2011. Manifestation of an adaptive neurofuzzy model on landslide susceptibility mapping: Klang valley, Malaysia. Expert Systems with Applications 38, 8208–8219. Soyguder, S., Alli, H., 2009. An expert system for the humidity and temperature control in HVAC systems using ANFIS and optimization with fuzzy modeling approach. Energy and Buildings 41, 814–822. Takagi, T., Sugeno, M., 1985. Fuzzy identification of systems and its applications to modeling and control. IEEE Transactions on Systems, Man, and Cybernetics 15, 116–132. ¨ glu, B., 2009. An adaptive neuro-fuzzy model for prediction of Taylan, O., Karagozo˘ student’s academic performance. Computers & Industrial Engineering 57, 732–741. Thinh, D.V., Dong, N.P., Hong, P.M., Hung, P.V., Khoi, T.N., Ke, T.D., Phu, D.V., Thang, P.X., Thanh, P.V., Thang, P.H., Thay, B.V., Thinh, N.T., Thien, T.V., Tu, M.T., Vinh, B.X., 2005. The investigated report of natural hazard in the Northwest of Vietnam Northern Geological Mapping Division. Hanoi, 12.


Vahidnia, M.H., Alesheikh, A.A., Alimohammadi, A., Hosseinali, F., 2010. A GISbased neuro-fuzzy procedure for integrating knowledge and data in landslide susceptibility mapping. Computers & Geosciences 36, 1101–1114. Van, T.T., Anh, D.T., Hieu, H.H., Giap, N.X., Ke, T.D., Nam, T.D., Ngoc, D., Ngoc, D.T.Y., Thai, T.N., Thang, D.V., Tinh, N.V., Tuat, L.T., Tung, N.T., Tuy, P.K., Viet, H.A., 2006. Investigation and assessment of the current status and potential of landslide in some sections of the Ho Chi Minh Road, National Road 1A and proposed remedial measures to prevent landslide from threat of safety of people, property, and infrastructure. Vietnam Institute of Geoscience and Mineral Resources, Hanoi, p. 249. Van, T.T., Tuy, P.K., Giap, N.X., Ke, T.D., Thai, T.N., Giang, N.T., Tho, H.M., Tuat, L.T., San, D.N., Hung, L.Q., Chung, H.T., Hoan, N.T., 2002. Assessment and prediction of geological hazards in the 8 coastal provinces of central Vietnam from Quang

211

Binh to Phu Yen—current status, causes, prediction and recommendation of remedial measures. Vietnam Institude of Geoscience and Mineral Resourses, Hanoi, p. 215. Van Westen, C.J., Rengers, N., Soeters, R., 2003. Use of geomorphological information in indirect landslide susceptibility assessment. Natural Hazards 30, 399–419. Yilmaz, I., Kaynar, O., 2011. Multiple regression, ANN (RBF, MLP) and ANFIS models for prediction of swell potential of clayey soils. Expert Systems with Applications 38, 5958–5966. Ying, L.C., Pan, M.C., 2008. Using adaptive network based fuzzy inference system to forecast regional electricity loads. Energy Conversion and Management 49, 205–211.

Paper III Tien Bui, D., Pradhan, B., Lofman, O., Revhaug, I., Dick, O.B., 2012. Spatial prediction of landslide hazards in Hoa Binh province (Vietnam): a comparative assessment of the efficacy of evidential belief functions and fuzzy logic models. CATENA, 96, 28-40.

(This is a sample cover image for this issue. The actual cover is not yet available at this time.)


Author's personal copy Catena 96 (2012) 28–40


Catena journal homepage: www.elsevier.com/locate/catena

Spatial prediction of landslide hazards in Hoa Binh province (Vietnam): A comparative assessment of the efficacy of evidential belief functions and fuzzy logic models Dieu Tien Bui a, b,⁎, Biswajeet Pradhan c, Owe Lofman a, Inge Revhaug a, Oystein B. Dick a a b c

Department of Mathematical Sciences and Technology, Norwegian University of Life Sciences, P.O. Box 5003IMT, N-1432, Aas, Norway Faculty of Surveying and Mapping, Hanoi University of Mining and Geology, Dong Ngac, Tu Liem, Hanoi, Vietnam Faculty of Engineering, Spatial and Numerical Modelling Research Group, University Putra Malaysia, Serdang, Selangor Darul Ehsan 43400, Malaysia

a r t i c l e

i n f o

Article history: Received 24 November 2011 Received in revised form 31 March 2012 Accepted 2 April 2012 Available online xxxx Keywords: Landslide GIS Fuzzy operator Evidential belief functions Vietnam

a b s t r a c t The main objective of this study is to evaluate and compare the results of evidential belief functions and fuzzy logic models for spatial prediction of landslide hazards in the Hoa Binh province of Vietnam, using geographic information systems. First, a landslide inventory map showing the locations of 118 landslides that have occurred during the last ten years was constructed using data from various sources. Then, the landslide inventory was randomly partitioned into training and validation datasets (70% of the known landslide locations were used for training and building the landslide models and the remaining 30% for the model validation). Secondly, nine landslide conditioning factors were selected (i.e., slope, aspect, relief amplitude, lithology, landuse, soil type, distance to roads, distance to rivers and distance to faults). Using these factors, landslide susceptibility index values were calculated using evidential belief functions and fuzzy logic models. Finally, landslide susceptibility maps were validated and compared using the validation dataset that was not used in the model building. The prediction-rate curves and area under the curves were calculated to assess prediction capability. The results show that all the models have good prediction capabilities. The model derived using evidential belief functions has the highest prediction capability. The model derived using fuzzy SUM has the lowest prediction capability. The fuzzy PRODUCT and fuzzy GAMMA models have almost the same prediction capabilities. In general, all the models yield reasonable results that may be used for preliminary landuse planning purposes. © 2012 Elsevier B.V. All rights reserved.

1. Introduction Vietnam is located in one of the storm centers in the world, in one of the region's most hit by natural disasters and in one of the most vulnerable regions for the impact of climate change (Alkema, 2010). Together with flooding, landslides are among the recurrent natural hazard problems that occur widespread and that have caused large losses of life and property in the mountainous region in northwestern of Vietnam (Lee and Dan, 2005). In particular, in the Hoa Binh area, many large landslides occurred during the heavy rainfalls of the tropical storm Lekima in October 2007. Those landslides mainly occurred on cut slopes in mountainous regions including residential areas. Therefore, understanding landslides and preventing them from occurring through suitable landuse planning and management, are one of the urgent tasks in Vietnam. However, only a few attempts have been carried out to assess and predict landslide prone so far. Through scientific analyses, geoscientists and civil engineers can ⁎ Corresponding author at: Department of Mathematical Sciences and Technology, Norwegian University of Life Sciences, P.O. Box 5003IMT, N-1432, Aas, Norway. Tel.: + 47 64965424. E-mail addresses: [email protected], [email protected] (D. Tien Bui). 0341-8162/$ – see front matter © 2012 Elsevier B.V. All rights reserved. doi:10.1016/j.catena.2012.04.001

assess and predict landslide prone areas, offering potential measures to decrease landslide damages through proper slope management (Pradhan, 2011b). The susceptibility to landslide occurrence can be expressed as the probability of spatial occurrence of slope failures over a set of geoenvironmental conditions (Guzzetti et al., 2005). Landslide susceptibility can be estimated using a variety of methods and techniques, including heuristic methods, statistically based classification models and physical based models (Guzzetti et al., 2006). A brief review about the advantages and disadvantages of these techniques and methods are given by many researchers such as Varnes (1984), Aleotti and Chowdhury (1999), Guzzetti et al. (1999) and Chacon et al. (2006). In recent years, various modeling approaches using geographical information systems (GIS) have been widely used as the basic analysis tool for landslide studies and predictions worldwide. GIS has been considered to be effective for spatial data management and manipulation for the analysis of landslides (Lee and Sambath, 2006). The main advantage of the GIS-based approaches is that they can be successfully applied in multisource data analysis and especially with heterogenic and uncertain data (Binaghi et al., 1998; Chacon et al., 2006). Although many GISbased models have been proposed in the literature, it is still too early in

Author's personal copy D. Tien Bui et al. / Catena 96 (2012) 28–40

the evolution of GIS-based landslide modeling to identify which method or set of techniques is the best for prediction of landslide prone areas (Carrara and Pike, 2008). In the more recent years, some new approaches for landslide susceptibility assessment using soft computing techniques, such as knowledge-based systems using the fuzzy set theory (Akgun et al., 2012; Pradhan, 2011a, 2011b), neuro-fuzzy (Oh and Pradhan, 2011; Pradhan et al., 2010b; Sezer et al., 2011; Tien Bui et al., 2011b), artificial neural networks (Biswajeet and Saied, 2010; Caniani et al., 2007; Lee et al., 2007; Melchiorre et al., 2008; Pradhan and Lee, 2010b; Pradhan et al., 2010a; Yilmaz, 2009a, 2010a), support vector machines (Ballabio and Sterlacchini, 2012; Marjanović et al., 2011; Yao et al., 2008; Yilmaz, 2010a), and decision-tree model (Nefeslioglu et al., 2010; Saito et al., 2009; Yeon et al., 2010) have been proposed. Generally, these approaches give rise to qualitative and quantitative maps of landslide prone areas, and the results are appealing (Pradhan, 2010c). In general, the quality of landslide susceptibility models is influenced both by the methods used and the sampling strategies followed (Yilmaz, 2010b). Therefore, the comparative studies of using different methods are highly necessary. In the literature, there are some studies comparing the prediction and generalization capabilities of different methods and techniques for landslide susceptibility assessment (Akgun, 2012; Pradhan, 2011a; Pradhan and Lee, 2010a; Yilmaz, 2009b, 2010a). In the case of the evidential belief functions (EBF) model, the application to landslide mapping is still limited except a few case studies (Althuwaynee et al., 2012). The EBF model has been widely used in knowledge-driven approaches to mineral potential mapping (An et al., 1992; Carranza, 2009; Carranza and Hale, 2001; Carranza and Sadeghi, 2010; Carranza et al., 2005, 2008a, 2008b, 2009; Moon, 1989). Carranza and Hale (2002) proposed a data-driven mineral potential mapping method using the EBF model, in which ‘expert knowledge’ was used as a guide for classifying geological maps. To apply EBF models in mineral potential mapping, they used Dempster's rule of combination (Dempster, 1967, 1968) and a GIS (Carranza et al., 2008c). Tangestani (2009) compared the Dempster–Shafer with fuzzy models for landslide modeling of the Zagros Mountains in Iran, with the conclusion that the Dempster-Shafer model obtained less reliable results than the fuzzy logic model. It is clear that Tangestani (2009) determined the fuzzy membership function values based on expert opinions. This is different from our study where data-driven methods were used. Carranza and Castro (2006) showed that the data-driven EBF model can be used for prediction of areas that can be inundated by volcanic lahars in Mount Pinatubo (Philippines). Ghosh and Carranza (2010) have shown that the data-driven EBF model can be used for mapping of rockslide prone areas in Darjeeling Himalaya (India). In a different approach, Park (2011) applied the data-driven DempsterShafer model in the Jangheung area (Korea) and concluded that the data-driven Dempster-Shafer model shows better prediction capacity than logistic regression. Park (2011) also stated that more research should be done on application of EBF in extensive case studies. Fuzzy logic has been widely used in many fields (Carranza and Hale, 2001; Cheng and Agterberg, 1999; Porwal et al., 2003, 2006; Topcu and Sarıdemir, 2008). The advantage of fuzzy logic is that it is straightforward to apply, and the process of weighting landslide conditioning factors is totally controlled by the experts (Lee, 2007a). In addition, the fuzzy logic method provides a variety of fuzzy combination operators for generating landslide susceptibility index values. According to Gorsevski et al. (2003), integration of GIS and fuzzy logic showed to be very interesting, with high potential and robustness for landslide hazard predictions. In the present study, two data-driven models, EBF and fuzzy logic, were used to obtain more accurate and reliable landslide susceptibility maps. The main objective of this study is to evaluate the data-driven fuzzy logic and evidential belief functions in a GIS for

29

spatial prediction of landslide hazards in the Hoa Binh province of Vietnam. 2. Study area and spatial database The Hoa Binh province is a hilly area situated between mountains and the Red River plain in the northwestern region of Vietnam. It covers an area of about 4660 km 2 between longitudes 104°48′E and 105°50′E, and latitudes 20°17′N and 21°08′N. The altitude varies from 0 to 1510 m, and it decreases from the northwest to the southeast. Approximately 42.5% of the study area has ground slopes greater than 15° and about 16% with slopes greater than 25°. The climate is characteristic of a monsoonal region, being hot, rainy, and with high humidity. According to statistics for the last decade, the coldest month was January with an average temperature of 14.9 °C while the warmest month was July with an average temperature of 26.7 °C. The rainy season falls within the period of May to October and accounts for 84–90% of the yearly rainfall. The study area receives the largest amount of rainfall with high frequency and intensity peaks in the months of August and September. In August and September, the rainfall varies between 300 and 400 mm per month. Regarding landuse, the study area is comprised of approximately 7.5% populated areas, 14.5% agricultural land, 52.6% forest land, 21% barren land and non-forest rocky mountain, 0.4% grass land and 4% water surface. The study area is geologically a part of the Paleozoic with different structures consisting of the Fansipan zone in the northwest, the Son La zone in the southwest, and the remaining in the Ninh Binh zone. There are five main fracture zones with a variety of active faults, passing through this region, causing rock mass weakness: the Hoa Binh, the Da Bac, the Muong La - Cho Bo, the Son La - Bim Son, and the Song Da fracture zones. Thirty nine lithologic formations outcrop in this region and their spatial distributions are different. Four lithologic formations (Dong Giao, Tan Lac, Vien An, and Song Boi) cover 62.3% of the study area where the main lithologies are limestone, conglomerate, sandstone, aphyric basalt, massive limestone, tuffaceous sandstone, silty sandstone, magnesium-high basalt and black clay shale. According to Varnes (1984), landslide occurrences in the past and present are keys to the spatial prediction of landslide hazard in the future. Hence, compiling the landslide inventory map, which is a dataset containing a single or multiple landslide events, is the first step in landslide modeling. In this study, the landslide inventory map prepared by Tien Bui et al. (2011a) was used to derive quantitative relationships between landslide occurrences and landslide conditioning factors. A total of 118 landslides during the last ten years were identified and registered in the map with 97 landslide-polygons and 21 rock fall locations. The size of the smallest landslide is about 380 m2. The largest landslide covers an area of 14,340 m 2 and the average landslide size is 3,440 m 2. The landslide inventory map was randomly partitioned into a training dataset with 70% (82 landslides) for building the landslide models and the remaining 30% (36 landslides locations) for the validation purpose. Fig. 1 shows the distribution of landslide locations in the study area. The next step is the construction of landslide conditioning factors. In our previous study (Tien Bui et al., 2011a), we investigated the relationship between landslide occurrences and landslide conditioning factors. Based on the findings, nine landslide conditioning factors were chosen for this study: slope, aspect, relief amplitude, lithology, landuse, soil type, distance to roads, distance to rivers and distance to faults. A digital elevation model (DEM) with a spatial resolution of 20×20 m was generated using 1:25,000 scale national topographic maps. Maps of three geomorphometric factors (slope, aspect and relief amplitude) were extracted from the DEM. In the slope map, six slope categories


D. Tien Bui et al. / Catena 96 (2012) 28–40

Fig. 1. Landslide inventory of the study area.

were constructed for the analysis (Fig. 2a). In the aspect map, nine aspect classes were determined (Fig. 2b). For the relief amplitude map, six classes (0–50 m, 50–100 m, 100–150 m, 150–200 m, 200–250 m, 250–532 m) were constructed using the Focal statistic module in ArcGIS 10.0 software with the unit area size of 20×20 pixels (Fig. 2c). The lithology and faults data were extracted from the 1:200,000 scale Geological and Mineral Resources Map of Vietnam. This is the only geological map available for the study area. In the lithological map, seven lithological groups were constructed based on the criteria of clay composition, degree of weathering and estimated strength and density (Arıkan et al., 2007; Van et al., 2006) (Fig. 2d). In the case of the distance-to-faults map, five buffer categories (0–200 m, 200–400 m, 400–700 m, 700–1000 m, and >1000 m) were compiled using the buffer tool in ArcGIS 10 software. The 1:50,000 scale landuse map, with 12 categories, was extracted from the National Status Landuse database (Fig. 2e). The soil type map with 13 soil groups (Fig. 2f) was extracted from the 1:100,000 scale National Pedology map. This is the only soil map available for the study area. The roads and rivers that undercut slopes were extracted from the 1:50,000 scale national topographic map, and used to construct the distance-to-roads map and the distance-to-rivers map. Five buffer categories (0–40 m, 40–80 m, 80–120 m and >120 m) were compiled for each of the two maps using the buffer tool of ArcGIS 10.0 software. 3. Landslide susceptibility mapping 3.1. Evidential belief functions The Dempster-Shafer theory of evidence, which was first developed by Dempster (1967; 1968) and then by Shafer (1976), is a generalization of the Bayesian theory of subjective probability. The main advantage of the Dempster-Shafer theory is that it has a relative flexibility to accept uncertainty and the ability to combine beliefs from multiple sources of evidence (Thiam, 2005). Rather than estimating probabilities that an hypothesis is true, the Dempster-Shafer theory

estimates how close the evidence proves the truth of that hypothesis (Pearl, 1989). The Dempster-Shafer theory has been successfully implemented using a GIS in many fields (Malpica et al., 2007). Suppose that we have a set of landslide conditioning factors C = (Ci, i = 1, 2, 3, …, n), consisting of mutually exclusive and exhaustive factors Ci. C is called the frame of discernment. A basic probability assignment is a function m : P(C) → [0, 1]. P(C) is the set of all subsets of C, including the empty set and C itself. This function is also called a mass function and satisfies m(Ф) = 0 and ∑A C mðAÞ ¼ 1, where Ф is an empty set, A is any subset of C. The m(A) measures the degree to which the evidence support A; it is denoted Bel(A), a belief function. There are four basic EBF used (Althuwaynee et al., 2012; Carranza et al., 2005): Bel (degree of belief), Dis (degree of disbelief), Unc (degree of uncertainty) and Pls (degree of plausibility). Bel and Pls present the lower and upper bounds of the probability, for the proposition (Awasthi and Chauhan, 2011). Unc is the difference between the belief and the plausibility. Unc presents the ignorance. Dis is the belief of the proposition being false on given evidence. Dis = 1 − Pls or 1 − Unc − Bel, and we always have: Bel + Unc + Dis = 1. For a case of Cij with no landslide occurrence indicating that Bel = 0, Dis is reset to zero, even if Dis is not (Carranza and Hale, 2002; Carranza et al., 2008a). The estimation of EBF can be based on subjective judgment or it can be data-driven (Carranza et al., 2005; Srivastava et al., 2011). By overlaying the landslide inventory map (L) on each of the nine maps of landslides conditioning factors, we determined the numbers of pixels with landslides and pixels with no-landslides for each factor class. Assuming that N(L) is the total number of landslide pixels and N(C) is the total number of pixels in the study area, Cij is the j-th class attribute of the landslide conditioning factors Ci(i = 1, 2, …, n), N(Cij) is the total number of pixels in the class Cij, and N(L ∩ Cij) is the number of landslide pixels in Cij. According to Carranza and Hale (2002), data-driven estimation of EBF may be done by: Bel C ij ¼

W C ij ðLandslideÞ ∑nj¼1 W C ij ðLandslideÞ

ð1Þ


31

Fig. 2. Landslide conditioning factor maps (a) Slope ; (b) Aspect ; (c) Relief amplitude; (d) Lithology ; (e) Soil type ; (f) Landuse ; (g) Distance to roads; (h) Distance to rivers; and (i) Distance to faults.



Fig. 2 (continued).

where

W C ij ðLandslideÞ Dis C ij ¼

N L∩Cij =NðLÞ i Þ ¼h N Cij −N L∩Cij =½NðCÞ−NðL W C ij ðNon−LandslideÞ

∑nj¼1 W C ij ðNon−LandslideÞ

ð2Þ

ð3Þ

where

W C ij ðNon−LandslideÞ

h i N Cij −N L∩Cij =NðLÞ i ð4Þ ¼h N ðCÞ−NðLÞ N Cij þ N L∩Cij =½NðCÞ−NðLÞ

The numerator in Eq. (2) is the proportion of landslide pixels that occur in factor class Cij. The numerator in Eq. (4) is the proportion of landslide pixels that do not occur in factor class Cij. The denominator in Eq. (2) is the proportion of non-landslide pixels in factor class Cij. The denominator in Eq. (4) is the proportion of non-landslide pixels in other attributes outside factor class of Cij. Parameter WCij(Landslide) is the weight of Cij that supports the belief that landslides are more present than absent. Parameter WCij(Non − Landslide) is the weight of Cij that supports the belief that landslides are more absent than present.

Once the EBF function are calculated for all the landslide conditioning factors, the Dempster's rule of combination was used to obtain the four integrated EBF (Dempster, 1968). The formulae for combining of two landslide conditioning factors C1 and C2 are as follows (Carranza et al., 2005): BelC 1 C 2 ¼

DisC 1 C 2 ¼

BelC 1 BelC 2 þBelC 1 UncC 2 þ BelC 2 UncC 1 1−BelC 1 DisC 2 −DisC 1 BelC 2 DisC 1 DisC 2 þDisC 1 UncC 2 þ DisC 2 UncC 1

UncC 1 C 2 ¼

1−BelC 1 DisC 2 −DisC 1 BelC 2 UncC 1 UncC 2 1−BelC 1 DisC 2 −DisC 1 BelC 2

ð5Þ

ð6Þ

ð7Þ

Integrated EBF of the remaining landslide conditioning factors are implemented one after another by using Eqs. (5)–(7). Table 1 shows the estimated EBF for the nine landslide conditioning factors. In the slope map, slope angles in the range of 20°–30° have the highest Bel and low Dis values, indicating the highest probability of landslides, followed by slope ranges of 30°–40° and then of 10°–20°. For the remaining slope ranges, the Bel values are low, indicating low probability of landslide occurrence.


33

Table 1 Values of fuzzy membership and evidential belief functions for classes of landslide conditioning factors. Data

Class

layers Slope (0)

Aspect (0)

Relief Amplitude (m)

Lithology

Land use

Soil type




0–10 10–20 20–30 30–40 40–50 >50 Flat (− 1) North (0–22.5; 337.5–360) Northeast (22.5–67.5) East (67.5–112.5) Southeast (112.5–157.5) South (157.5–202.5) Southwest (202.5–247.5) West (247.5–292.5) Northwest (292.5–337.5) 0–50 50–100 100–150 150–200 200–250 250–532 Group 1 Group 2 Group 3 Group 4 Group 5 Group 6 Group 7 Populated Area Orchard Land Paddy Land Protective Forest Land Natural Forest Land Productive Forest Land Water Annual Crop land Non Forest Rocky Mountain Barren Land Specially Used Forest Land Grass Land Eutric Fluvisols Degraded soil Limestone Mountain Ferralic Acrisols Rhodic Ferralsols Humic Acrisols Dystric Fluvisols Dystric Gleysols Luvisols Humic Ferralsols Populated Area Water Gley Fluvisols 0–40 40–80 80–120 >120 0–40 40–80 80–120 >120 0–200 200–400 400–700 700–1000 >1000

Number of

Landslide

Frequency

Fuzzy

class pixels

pixels

ratio

value

Bel

Dis

Unc

4,919,804 3,346,950 2,326,636 785,451 106,715 4750 6556 1,380,854 1,672,941 1,385,498 1,383,072 1,482,483 1,677,042 1,299,469 1,202,391 3,101,843 2,753,898 2,640,812 1,694,502 811,634 487,617 468,851 4,552,855 3,740,521 1,338,571 135,801 645,785 607,922 864,840 426,524 1,053,442 985,889 3,666,190 1,347,068 455,666 184,205 468,692 1,947,114 41,129 49,547 400,659 3006 1,657,233 4,196,906 1,031,126 3,551,123 84,133 45,288 52,858 131,881 50,440 276,610 9043 160,806 193,382 215,495 10,920,623 443,909 519,168 553,405 9,973,824 2,078,812 1,832,167 2,285,798 1,644,799 3,648,730

2 222 368 92 0 0 0 27 82 44 106 138 174 65 48 6 228 235 126 53 36 36 264 183 131 0 59 11 109 3 27 120 119 158 1 2 42 103 0 0 45 0 104 294 30 199 6 6 0 0 0 0 0 306 126 33 219 79 74 64 467 179 106 86 192 121

0.007 1.114 2.657 1.968 0.000 0.000 0.000 0.328 0.823 0.533 1.287 1.564 1.743 0.840 0.671 0.032 1.391 1.495 1.249 1.097 1.240 1.290 0.974 0.822 1.644 0.000 1.535 0.304 2.117 0.118 0.431 2.045 0.545 1.970 0.037 0.182 1.505 0.889 0.000 0.000 1.887 0.000 1.054 1.177 0.489 0.941 1.198 2.226 0.000 0.000 0.000 0.000 0.000 31.966 10.945 2.572 0.337 2.990 2.394 1.943 0.787 1.446 0.972 0.632 1.961 0.557

0.102 0.435 0.900 0.692 0.100 0.100 0.100 0.251 0.478 0.345 0.691 0.818 0.900 0.486 0.408 0.100 0.843 0.900 0.766 0.682 0.761 0.728 0.574 0.500 0.900 0.100 0.847 0.248 0.900 0.145 0.263 0.873 0.306 0.845 0.114 0.169 0.669 0.436 0.100 0.100 0.778 0.100 0.479 0.523 0.276 0.438 0.531 0.900 0.100 0.100 0.100 0.100 0.100 0.900 0.368 0.157 0.100 0.900 0.684 0.520 0.100 0.607 0.336 0.143 0.900 0.100

0.001 0.194 0.462 0.342 0.000 0.000 0.000 0.042 0.106 0.068 0.165 0.201 0.224 0.108 0.086 0.005 0.214 0.230 0.192 0.169 0.191 0.196 0.148 0.125 0.250 0.000 0.234 0.046 0.215 0.012 0.044 0.208 0.055 0.200 0.004 0.019 0.153 0.090 0.000 0.000 0.210 0.000 0.118 0.131 0.054 0.105 0.134 0.248 0.000 0.000 0.000 0.000 0.000 0.698 0.239 0.056 0.007 0.368 0.295 0.239 0.097 0.260 0.175 0.113 0.352 0.100

0.281 0.153 0.093 0.149 0.000 0.000 0.000 0.121 0.115 0.118 0.107 0.102 0.097 0.113 0.115 0.225 0.146 0.141 0.159 0.165 0.164 0.141 0.145 0.155 0.130 0.000 0.138 0.148 0.075 0.086 0.088 0.075 0.101 0.072 0.086 0.084 0.081 0.085 0.000 0.000 0.075 0.077 0.076 0.069 0.081 0.079 0.077 0.077 0.000 0.000 0.000 0.000 0.000 0.035 0.052 0.060 0.853 0.177 0.179 0.183 0.461 0.179 0.199 0.216 0.166 0.239

0.718 0.653 0.444 0.508 1.000 1.000 1.000 0.836 0.780 0.813 0.728 0.697 0.679 0.779 0.798 0.770 0.641 0.629 0.649 0.667 0.645 0.663 0.707 0.720 0.619 1.000 0.629 0.806 0.709 0.902 0.868 0.717 0.844 0.727 0.910 0.897 0.766 0.825 1.000 1.000 0.715 0.923 0.806 0.800 0.865 0.816 0.790 0.675 1.000 1.000 1.000 1.000 1.000 0.267 0.710 0.884 0.139 0.455 0.526 0.578 0.442 0.562 0.626 0.670 0.481 0.661

In the aspect map, high Bel and low Dis values for south and southwest facing slopes indicate that these categories have positive spatial associations with landslides. They are followed by the southeast, west and northeast categories. In the case of the relief amplitude factor, the classes 50–100 m, 100–150 m, and 200–250 m have high

EBF

Bel and low Dis values, indicating high probability of landslide occurrence. Values of Bel for the remaining categories are relatively low and indicate low probability of landslides. For lithology, there are high Bel and low Dis values for group 4 (mafic–ultramafic magma rocks), group 6 (metamorphic rock with



Fig. 3. Integrated EBF map: (a) Belief; (b) Disbelief; (c) Uncertainty; and (d) Plausibility.

rich aluminosilicate component) and group 1 (quaternary deposit). This suggests a higher probability of landslide occurrence than in other lithologies. Based on the interpretation of Bel and Dis values for the landuse factor, there is high probability of landslides for populated areas (PO), protective forest land (PT), productive forest land (PD), and non-forest rocky mountains (RM). High probability of landslides in these landuse categories is due to very high activity of clearcut logging and the increase in inappropriate new highland settlements because of population growth during the last 15 years. For the soil type factor, the highest Bel value with low Dis value is for the dystric glaysols (DG), followed by eutric fluvisols (EF), dystric fluvisols (DF), ferralic acrisols (FA), limestone mountain (LM), and humic acrysols (HA), indicating high probability of landslide. The low Bel and high Dis values for the remaining classes indicate that the probability of landslides is low. For distance to roads and distance to rivers, the Bel and Dis values show that when the distance from roads increases, the probability of landslides decreases. The highest probability for landslide occurrence is within distances less than 40 m. There is very low probability of landslide occurrence at distances >120 m. For distance to faults, the Bel and Dis values show that as the distance to faults decreases, the probability of landslide occurrence increases. The integrated results are shown in Fig. 3. Comparison between the belief map (Fig. 3a) and the disbelief map (Fig. 3b) shows that belief values are high for areas where disbelief values are low and vice

versa. It indicates high susceptibility of landslides for areas where there are high degrees of belief and low degree of disbelief for the occurrence. The pixel values in the integrated belief function map are used as the landslide susceptibility index values in this study. The uncertainty map (Fig. 3c) shows lack of information support uncertainty for landslide occurrences. The uncertainty is the difference between plausibility and support. The high uncertainty values are in areas where belief values are low. The plausibility map (Fig. 3d) shows high values for areas where both belief and uncertainty values are high. 3.2. Fuzzy logic model Fuzzy logic, which was introduced by Zadeh (1965), is a way of mapping an input space to an output space by using a list of if-then rules. A fuzzy logic system provides a systematic calculation for processing knowledge that is uncertain, imprecise or with incomplete information. The difference between classical logic and fuzzy logic is that classical logic gives an output only either as 1 or 0, while fuzzy logic allows objects to be partially true or partially false corresponding to membership function values that can fall within any value between 0 and 1 (Gottwald, 1995). A fuzzy set contains elements that have no crisp membership values and where there is no clearly defined boundary (Uzkent et


al., 2011). Fuzzy set theory allows an element to be a member in a fuzzy set and also member in other fuzzy sets with different degrees of membership values (Ross, 2004). Assume that Ci (i = 1, 2, 3, …, n) is a set of landslide conditioning factors and Cij represents the value of the j-th factor class of Ci, then μ(Cij) ∈ [0, 1] is the corresponding membership value in a landslide conditioning factor. The value 0 indicates non-membership and 1 indicates full membership. Fuzzy set theory does not tell us how to specify fuzzy membership functions and it does not require that the sum of all fuzzy membership values in a set equal 1 (Singpurwalla and Booker, 2004). A fuzzy membership function can provide a vehicle for developing operations with fuzzy sets in landslide modeling. Therefore, determination of fuzzy membership values is considered to be the most important task. Various methods have been proposed for assigning fuzzy membership values to elements and they can be classified into four groups: standard membership functions, problem-specific membership functions, specification of standard membership functions, and fuzzy clustering (Robinson, 2003). In general, these methods are either knowledge-based approaches or data-driven approaches or a combination of both. In knowledge-based approaches for landslide studies, fuzzy membership values are assigned to each factor class mainly based on experts knowledge, whereas in data-driven approaches fuzzy membership values are determined according to correlations between landslide locations and landslide conditioning factors (Champati ray et al., 2007). Many data-driven methods have been developed for determining fuzzy membership values, however, the cosine amplitude (Ercanoglu and Gokceoglu, 2004; Kanungo et al., 2008, 2009; Shujun et al., 2006) and the frequency ratio methods (Lee, 2007a; Pradhan, 2010a, 2010b, 2011b) are the most widely used in landslide modeling. In the cosine amplitude method, the fuzzy membership value, rij, for each factor class can be estimated as the ratio of the total number of landslide grid cells over the square root of the product of the total number of pixels in that factor class and the total number landslide grid cells in the study area (Kanungo et al., 2009). Values of rij closer to 1 indicate a strong relationship between landslide occurrence and the factor class. The cosine amplitude method, however, is not suitable for this study area because the ratio of the number of landslide grid cells to the number of pixels in the factor class is too small. In the frequency ratio method, the frequency ratio for each factor class is calculated first using Eq. (8). N L∩Cij =NðLÞ FRij ¼ N Cij =NðCÞ

ð8Þ

Fig. 4. Success-rate curves of the EBF and fuzzy logic prediction models.

fuzzy membership values by using the Max-Min normalization procedure as:

μ Cij ¼

h i FRij −Min FRij Max μ Cij −Min μ Cij Max FRij −Min FRij þ Min μ Cij

n μ PRODUCT ¼ ∏ μ Cij

Ratio greater than 1 indicates high probability for landslides in the factor class, while ratio less than 1 indicates low probability. Since fuzzy membership values are in the range from 0 to 1, the next step is the normalization process to transform the frequency ratio into

Table 2 Statistics of landslide susceptibility index values in each of the ten landslide prediction models. Landslide susceptibility Model

LSI statistics Min

Max

Mean

StD

1 2 3 4 5 6 7 8 9 10

Evidential Belief Functions Fuzzy SUM Fuzzy PRODUCT Fuzzy GAMMA (λ = 0.1) Fuzzy GAMMA (λ = 0.3) Fuzzy GAMMA (λ = 0.5) Fuzzy GAMMA (λ = 0.7) Fuzzy GAMMA (λ = 0.9) Fuzzy GAMMA (λ = 0.95) Fuzzy GAMMA (λ = 0.975)

0.285 0.735 0.000 0.000 0.000 0.000 0.003 0.113 0.288 0.460

3.238 1.000 0.188 0.222 0.310 0.443 0.606 0.846 0.920 0.959

1.100 0.990 0.001 0.002 0.006 0.021 0.085 0.418 0.641 0.798

0.330 0.014 0.005 0.007 0.013 0.028 0.062 0.099 0.077 0.051

ð9Þ

where μ(Cij) is the fuzzy membership value; Max(μ(Cij)) and Min(μ(Cij)) are the upper and lower normalization bounds. In this study, the frequency ratio was calculated for each class of the landslide conditioning factors using the landslide grid cells in the training dataset. Then, the fuzzy membership values were calculated by normalizing the ratio values into the range of 0.1 to 0.9 (Pradhan, 2011a). The results are shown in Table 1. In order to calculate landslide susceptibility index values, membership values of the factor classes were combined using the fuzzy operator method. This method provide various tools to combine different datasets (Champati ray et al., 2007). Three fuzzy operators (An et al., 1991; Bonham-Carter, 1994; Robinson, 2003; Zimmermann, 1991) were used in this study: fuzzy PRODUCT, fuzzy SUM, and fuzzy GAMMA. Fuzzy PRODUCT is defined as:

i¼1

No

35

Fig. 5. Prediction-rate curves of the EBF and fuzzy logic prediction models.

ð10Þ



Fuzzy SUM is defined as: i n h μ SUM ¼ 1− ∏ 1−μ Cij

ð11Þ

i¼1

Fuzzy GAMMA is defined as: λ

1−λ

μ GAMMA ¼ ðFuzzy SUMÞ :ðFuzzy PRODUCT Þ

ð12Þ

Using the three fuzzy operators (Eqs. (10)–(12)), the landslide susceptibility index values were calculated by combining fuzzy membership values of all factor classes. In the case of fuzzy GAMMA, values of lambda (λ) are chosen in the range of [0, 1]. The gamma operator enables a compromise between the increasing tendencies of fuzzy SUM and the decreasing effect of the fuzzy PRODUCT (Malins and Metternicht, 2006). In this study, the fuzzy GAMMA with seven values of λ (0.1, 0.3, 0.5, 0.7, 0.9, 0.95, and 0.975), the fuzzy PRODUCT, and the fuzzy SUM were used to generate landslide susceptibility index values (i.e., nine susceptibility cases). The statistics of the landslide susceptibility index values for all landslide models are shown in Table 2. 4. Validation and comparison of landslide susceptibility models Validation of predictive landslide susceptibility maps is an absolutely essential component in landslide modeling. Without a validation, prediction models are totally useless and have no scientific significance (Chung and Fabbri, 2003). Using the success-rate and prediction-rate methods, the ten landslide susceptibility maps were validated by comparing them with known landslide locations. The success-rate results were obtained based on a comparison of the landslide grid cells in the training dataset (684 landslide grid cells) with each of the ten landslide susceptibility maps that were obtained from the fuzzy logic and EBF models. The success-rate curve for each case was obtained by varying the decision threshold and plotting the respective sensitivities against the total proportions of the classified data set (Brenning, 2005). Subsequently, the areas under the success- rate curves (AUC) were calculated for all cases. The result shows that all the models have a good fit (Fig. 4). The validation results showed that the EBF model had the highest AUC (0.9350) and the fuzzy SUM model (0.8965) showed the lowest AUC. The remaining models have almost equal values of AUC (Fig. 4). It indicates that the capability for correctly classifying the areas with existing landslides is highest for the EBF model, lowest for the fuzzy SUM model, and almost equal for the remaining models. Since the success-rate measures the goodness of fit for the landslide models to the training data, the success rate is not a suitable

Fig. 6. Cumulative percentages of observed landslide occurrence according to landslide susceptibility index values.

Fig. 7. Frequency ratio plots of five landslide susceptibility classes of the ten prediction models.

method for measuring the prediction capability of the landslide models. The prediction-rate can provide the validation and explains how well the model and landslide conditioning factors predicts the existing landslides (Chung and Fabbri, 2003; Lee, 2007b; Lee et al., 2003; Pradhan and Lee, 2010c). In this study, the prediction capabilities of the ten landslide models were assessed by comparing the landslide grid cells in the validation dataset (315 landslide grid cells) with the landslide susceptibility maps using the prediction-rate method. Fig. 5 shows ten prediction-rate curves for the ten landslide susceptibility models. In order to compare the accuracy of the ten landslide models quantitatively, the areas under the prediction-rate curves (AUC) were calculated. When the AUC is equal to 1, it indicates a perfect prediction accuracy (Lee and Dan, 2005). The results show that values of AUC for the ten models vary from 0.9185 to 0.9370, indicating that all the models have a reasonable good prediction capability. The EBF model has the highest prediction capability. The fuzzy SUM model has lowest prediction capability. The remaining models with almost equal prediction capabilities are intermediate between the EBF and fuzzy SUM models. Landslide susceptibility index values were visualized by means of five susceptibility levels (very high, high, moderate, low, very low). There are many methods for classifying susceptibility index values such as the equal interval method, the natural break method and the standard deviation method. In this study, the equal area classification method (Pradhan and Lee, 2010a, 2010b, 2010c) was used. By overlaying all landslide grid cells with ten landslide susceptibility maps, cumulative percentages of observed landslide occurrence against landslide susceptibility index values were calculated and shown in Fig. 6. Then, based on percentage of area, five susceptibility classes were determined as very high (10%), high (10%), moderate (20%), low (20%) and very low (40%). The relative frequency ratio analysis was performed on the classification results and landslide location data (Sarkar and Kanungo, 2004) by overlaying the five landslide susceptibility zones with the landslide inventory map. Ideally, the frequency ratio value should increase from very low to very high susceptibility zones, since the high and very high zones are generally more prone to landslides than other zones (Pradhan and Lee, 2010b, 2010c; Pradhan et al., 2010b; Sarkar and Kanungo, 2004). The graph of the relative frequencies of areas with the five landslide susceptibility zones for the ten landslide models (Fig. 7) shows that there is a gradual increase in landslide frequency from the very low susceptible zone to the very high susceptible zone. In general, the EBF model performs better than the other models. For visualization purpose, only two landslide susceptibility maps the EBF and fuzzy GAMMA (λ = 0.975) models are shown (Fig. 8). Characteristics of the five susceptibility zones for the two landslide models are shown in Table 3. The percentage of existing landslide pixels that fell into the high and very high susceptibility classes is lowest with the fuzzy SUM model (81.7%) and highest with the EBF model (88.49%).


37

Fig. 8. Landslide susceptibility zonation maps: (a) EBF model and (b) fuzzy GAMMA (0.975) model.

5. Discussions and conclusion In this study, fuzzy logic and evidential belief functions were used for landslide susceptibility mapping in the Hoa Binh province of Vietnam. The resulting maps represent spatial predictions of landslide hazards; they do not forecast “when” and “how frequently” a landslide will occur. Ten landslide susceptibility maps were prepared. The validation shows that both the EBF model and the nine fuzzy logic models have high prediction capabilities with the best being the EBF model. Among the fuzzy logic models, the fuzzy SUM model has the lowest prediction capacity (AUC

equal to 0.9185); the remaining fuzzy logic models have almost equal prediction capacity (AUC around 0.9265). Many methods and techniques for landslide susceptibility assessment have been proposed so far. However, it is important to note that simpler procedures and techniques with high accuracy give better landslide models. The results of this study showed that both EBF and fuzzy logic models are simple, cost-effective. They are easy to apply with high prediction capability. For the EBF model, four maps (belief map, disbelief map, uncertainty map, and plausibility map) are presented. These maps can



Table 3 Characteristics of the five susceptibility classes in each of the ten landslide prediction models. LSZ

VL L M H VH

PA

40.0 20.0 20.0 10.0 10.0

Frequency ratio Fuzzy

Fuzzy

Fuzzy

Fuzzy

Fuzzy

Fuzzy

Fuzzy

Fuzzy

Fuzzy

GAMMA

GAMMA

GAMMA

GAMMA

GAMMA

GAMMA

GAMMA

PRODUCT

SUM

(0.975)

(0.950)

(0.900)

(0.700)

(0.500)

(0.300)

(0.100)

0.018 0.135 0.511 1.101 7.538

0.004 0.117 0.585 1.462 7.120

0.004 0.117 0.585 1.461 7.120

0.004 0.117 0.592 1.447 7.120

0.004 0.117 0.592 1.447 7.120

0.004 0.117 0.592 1.447 7.120

0.004 0.117 0.592 1.447 7.120

0.004 0.117 0.592 1.447 7.120

0.004 0.204 0.702 1.623 6.550

EBF

0.015 0.135 0.361 0.851 8.098

LSZ: Landslide susceptibility zonation; PA: Percentage of area.

give meaningful interpretations for landslide susceptibility. Landslides in the study area are strongly correlated with many factors. Using the EBF model, the quantitative relationships between landslide occurrence and the nine landslide conditioning factors (slope, aspect, relief amplitude, lithology, landuse, soil types, distance to roads, distance to rivers and distance to faults) were assessed. The results show that slope angles between 10° and 40° provide the highest susceptibility for landslides. In the case of the aspect factor, three facing slopes (southeast, south and southwest) have high susceptibility for landslides. In the relief amplitude factor, areas with relief amplitudes from 50 to 150 m have high susceptibility for landslides. In the lithology, mafic–ultramafic rocks and metamorphic rocks with rich aluminosilicate component yield high susceptibility for landslides. Landuse with populated areas and soil type with ferralic acrysols showed high susceptibility for landslides. In the distance-to-roads and distance-to-rivers maps, the range of 0–40 m has the highest susceptibility for landslides. High susceptibility for landslides also exists for to the class 0–200 m of distance to faults. In the case of the fuzzy logic modeling, the selection of method for determining fuzzy membership function values, plays an important role and influences the final result. In this study, the frequency ratio method was used to derive fuzzy membership values to eliminate the subjectivity of assigning such values. Three fuzzy operators (fuzzy SUM, fuzzy PRODUCT, and fuzzy GAMMA with different lambda values) were used to generate nine landslide susceptibility models. The validation result shows that the selection of fuzzy operator affects the quality of the resulting landslide model. In the case of the fuzzy GAMMA, AUC values in the success-rate and prediction-rate curves show that there are no significant differences between landslide models with different lambda values in this case study. The fuzzy logic and the evidential belief functions are generally considered useful for regional-scale landslide susceptibility mapping, such as the present study. The results and findings of this study can help developers, planners, and engineers in slope management and landuse planning. Since the output maps are regional-scale, they may be less useful for a site-specific development that requires large-scale maps.

Acknowledgements The authors would like to thank Dr. Emmanuel John M. Carranza, Prof. Isik Yilmaz and two anonymous reviewers for their valuable and constructive comments on the earlier version of the manuscript. This research was funded by the Norwegian Quota scholarship. The data analysis and write-up were carried out as a part of the first author's PhD studies at the Geomatics Section, Department of Mathematical Sciences and Technology, Norwegian University of Life Sciences, Norway.

References Akgun, A., 2012. A comparison of landslide susceptibility maps produced by logistic regression, multi-criteria decision, and likelihood ratio methods: a case study at İzmir, Turkey. Landslides 9, 93–106. Akgun, A., Sezer, E.A., Nefeslioglu, H.A., Gokceoglu, C., Pradhan, B., 2012. An easy-to-use MATLAB program (MamLand) for the assessment of landslide susceptibility using a Mamdani fuzzy algorithm. Computers & Geosciences 38, 23–34. Aleotti, P., Chowdhury, R., 1999. Landslide hazard assessment: summary review and new perspectives. Bulletin of Engineering Geology and the Environment 58, 21–44. Alkema, D., 2010. Geo-information technology for hazard risk assessment. A case study site in Yen Bai (Vietnam), The International Institute for Geo-Information Science and Earth Observation (ITC), University Twente. Althuwaynee, O.F., Pradhan, B., Lee, S., 2012. Application of an evidential belief function model in landslide susceptibility mapping. Computers & Geosciences, http://dx.doi.org/10.1016/j.cageo.2012.03.003. An, P., Moon, W.M., Rencz, A., 1991. Application of fuzzy set theory to integrated mineral exploration. Canadian Journal of Exploration Geophysics 27, 1–11. An, P., Moon, W.M., Bonhamcarter, G.F., 1992. On knowledge-based approach on integrating remote sensing, geophysical and geological information. International Space Year: Space Remote Sensing 1 and 2, 34–38. Arıkan, F., Ulusay, R., Aydın, N., 2007. Characterization of weathered acidic volcanic rocks and a weathering classification based on a rating system. Bulletin of Engineering Geology and the Environment 66, 415–430. Awasthi, A., Chauhan, S.S., 2011. Using AHP and Dempster-Shafer theory for evaluating sustainable transport solutions. Environmental Modelling & Software 26, 787–796. Ballabio, C., Sterlacchini, S., 2012. Support vector machines for landslide susceptibility mapping: The Staffora River Basin case study, Italy. Mathematical Geosciences 44, 47–70. Binaghi, E., Luzi, L., Madella, P., Pergalani, F., Rampini, A., 1998. Slope instability zonation: a comparison between certainty factor and fuzzy Dempster–Shafer approaches. Natural Hazards 17, 77–97. Biswajeet, P., Saied, P., 2010. Comparison between prediction capabilities of neural network and fuzzy logic Techniques for Landslide Susceptibility Mapping. Disaster Advances 3, 26–34. Bonham-Carter, G.F., 1994. Geographic Information Systems for Geoscientists: Modelling with GIS. Pegamon Press. Brenning, A., 2005. Spatial prediction models for landslide hazards: review, comparison and evaluation. Natural Hazards and Earth System Sciences 5, 853–862. Caniani, D., Pascale, S., Sdao, F., Sole, A., 2007. Neural networks and landslide susceptibility: a case study of the urban area of Potenza. Natural Hazards 45, 55–72. Carranza, E.J.M., 2009. Controls on mineral deposit occurrence inferred from analysis of their spatial pattern and spatial association with geological features. Ore Geology Reviews 35, 383–400. Carranza, E.J.M., Castro, O., 2006. Predicting lahar-inundation zones: case study in West Mount Pinatubo, Philippines. Natural Hazards 37, 331–372. Carranza, E.J.M., Hale, M., 2001. Geologically constrained fuzzy mapping of gold mineralization potential, Baguio District, Philippines. Natural Resources Research 10, 125–136. Carranza, E.J.M., Hale, M., 2002. Evidential belief functions for data-driven geologically constrained mapping of gold potential, Baguio district, Philippines. Ore Geology Reviews 22, 117–132. Carranza, E.J.M., Sadeghi, M., 2010. Predictive mapping of prospectivity and quantitative estimation of undiscovered VMS deposits in Skellefte district (Sweden). Ore Geology Reviews 38, 219–241. Carranza, E.J.M., Woldai, T., Chikambwe, E.M., 2005. Application of data-driven evidential belief functions to prospectivity mapping for aquamarine-bearing pegmatites, Lundazi District, Zambia. Natural Resources Research 14, 47–63. Carranza, E.J.M., Hale, M., Faassen, C., 2008a. Selection of coherent deposit-type locations and their application in data-driven mineral prospectivity mapping. Ore Geology Reviews 33, 536–558. Carranza, E.J.M., van Ruitenbeek, F.J.A., Hecker, C., van der Meijde, M., van der Meer, F.D., 2008b. Knowledge-guided data-driven evidential belief modeling of mineral prospectivity in Cabo de Gata, SE Spain. International Journal of Applied Earth Observation and Geoinformation 10, 374–387.

Author's personal copy D. Tien Bui et al. / Catena 96 (2012) 28–40 Carranza, E.J.M., Wibowo, H., Barritt, S.D., Sumintadireja, P., 2008c. Spatial data analysis and integration for regional-scale geothermal potential mapping, West Java, Indonesia. Geothermics 37, 267–299. Carranza, E.J.M., Owusu, E., Hale, M., 2009. Mapping of prospectivity and estimation of number of undiscovered prospects for lode gold, southwestern Ashanti Belt, Ghana. Mineralium Deposita 44, 915–938. Carrara, A., Pike, R.J., 2008. GIS technology and models for assessing landslide hazard and risk. Geomorphology 94, 257–260. Chacon, J., Irigaray, C., Fernandez, T., El Hamdouni, R., 2006. Engineering geology maps: landslides and geographical information systems. Bulletin of Engineering Geology and the Environment 65, 341–411. Champati ray, P.K.C., Dimri, S., Lakhera, R.C., Sati, S., 2007. Fuzzy-based method for landslide hazard assessment in active seismic zone of Himalaya. Landslides 4, 101–111. Cheng, Q., Agterberg, F.P., 1999. Fuzzy weights of evidence method and its application in mineral potential mapping. Natural Resources Research 8, 27–35. Chung, C.J.F., Fabbri, A.G., 2003. Validation of spatial prediction models for landslide hazard mapping. Natural Hazards 30, 451–472. Dempster, A.P., 1967. Upper and lower probabilities induced by a multi-valued mapping. Annals of Mathematical Statistics 325–339. Dempster, A.P., 1968. A generalisation of Bayesian inference. Journal of the Royal Statistical Society 205–247. Ercanoglu, M., Gokceoglu, C., 2004. Use of fuzzy relations to produce landslide susceptibility map of a landslide prone area (West Black Sea Region, Turkey). Engineering Geology 75, 229–250. Ghosh, S., Carranza, E.J.M., 2010. Spatial analysis of mutual fault/fracture and slope controls on rocksliding in Darjeeling Himalaya, India. Geomorphology 122, 1–24. Gorsevski, P.V., Gessler, P.E., Jankowski, P., 2003. Integrating a fuzzy k-means classification and a Bayesian approach for spatial prediction of landslide hazard. Journal of Geographical Systems 5, 223–251. Gottwald, S., 1995. An approach to handle partially sound rules of inference. In: Bouchon-Meunier, B., Yager, R., Zadeh, L. (Eds.), Advances in Intelligent Computing — IPMU '94. Springer, Berlin, pp. 380–388. Guzzetti, F., Carrara, A., Cardinali, M., Reichenbach, P., 1999. Landslide hazard evaluation: a review of current techniques and their application in a multi-scale study, Central Italy. Geomorphology 31, 181–216. Guzzetti, F., Reichenbach, P., Cardinali, M., Galli, M., Ardizzone, F., 2005. Probabilistic landslide hazard assessment at the basin scale. Geomorphology 72, 272–299. Guzzetti, F., Reichenbach, P., Ardizzone, F., Cardinali, M., Galli, M., 2006. Estimating the quality of landslide susceptibility models. Geomorphology 81, 166–184. Kanungo, D., Arora, M., Gupta, R., Sarkar, S., 2008. Landslide risk assessment using concepts of danger pixels and fuzzy set theory in Darjeeling Himalayas. Landslides 5, 407–416. Kanungo, D.P., Arora, M.K., Sarkar, S., Gupta, R.P., 2009. A fuzzy set based approach for integration of thematic maps for landslide susceptibility zonation. Georisk: Assessment and Management of Risk for Engineered Systems and Geohazards 3, 30–43. Lee, S., 2007a. Application and verification of fuzzy algebraic operators to landslide susceptibility mapping. Environmental Geology 52, 615–623. Lee, S., 2007b. Landslide susceptibility mapping using an artificial neural network in the Gangneung area, Korea. International Journal of Remote Sensing 28, 4763–4783. Lee, S., Dan, N.T., 2005. Probabilistic landslide susceptibility mapping on the Lai Chau province of Vietnam: focus on the relationship between tectonic fractures and landslides. Environmental Geology 48, 778–787. Lee, S., Sambath, T., 2006. Landslide susceptibility mapping in the Damrei Romel area, Cambodia using frequency ratio and logistic regression models. Environmental Geology 50, 847–855. Lee, S., Ryu, J.H., Lee, M.J., Won, J.S., 2003. Use of an artificial neural network for analysis of the susceptibility to landslides at Boun, Korea. Environmental Geology 44, 820–833. Lee, S., Ryu, J.H., Kim, I.S., 2007. Landslide susceptibility analysis and its verification using likelihood ratio, logistic regression, and artificial neural network models: case study of Youngin, Korea. Landslides 4, 327–338. Malins, D., Metternicht, G., 2006. Assessing the spatial extent of dryland salinity through fuzzy modeling. Ecological Modelling 193, 387–411. Malpica, J.A., Alonso, M.C., Sanz, M.A., 2007. Dempster–Shafer Theory in geographic information systems: A survey. Expert Systems with Applications 32, 47–55. Marjanović, M., Kovačević, M., Bajat, B., Voženílek, V., 2011. Landslide susceptibility assessment using SVM machine learning algorithm. Engineering Geology 123, 225–234. Melchiorre, C., Matteucci, M., Azzoni, A., Zanchi, A., 2008. Artificial neural networks and cluster analysis in landslide susceptibility zonation. Geomorphology 94, 379–400. Moon, W.M., 1989. Integration of remote sensing and geological/geophysical data using Dempster-Shafer approach. Digest-International Geoscience and Remote Sensing Symposium (IGARSS), pp. 838–841. Nefeslioglu, H.A., Sezer, E., Gokceoglu, C., Bozkir, A.S., Duman, T.Y., 2010. Assessment of Landslide Susceptibility by Decision Trees in the Metropolitan Area of Istanbul, Turkey. Mathematical Problems in Engineering, http://dx.doi.org/10.1155/2010/901095. Oh, H.-J., Pradhan, B., 2011. Application of a neuro-fuzzy model to landslidesusceptibility mapping for shallow landslides in a tropical hilly area. Computers & Geosciences 37, 1264–1276. Park, N.-W., 2011. Application of Dempster-Shafer theory of evidence to GIS-based landslide susceptibility analysis. Environmental Earth Sciences 62, 367–376. Pearl, J., 1989. Reasoning under uncertainty. Annual Review of Computer Science 4, 37–72. Porwal, A., Carranza, E.J.M., Hale, M., 2003. Knowledge-driven and data-driven fuzzy models for predictive mineral potential mapping. Natural Resources Research 12, 1–25.

39

Porwal, A., Carranza, E.J.M., Hale, M., 2006. A hybrid fuzzy weights-of-evidence model for mineral potential mapping. Natural Resources Research 15, 1–14. Pradhan, B., 2010a. Application of an advanced fuzzy logic model for landslide susceptibility analysis. International Journal of Computational Intelligence Systems 3, 370–381. Pradhan, B., 2010b. Landslide Susceptibility mapping of a catchment area using frequency ratio, fuzzy logic and multivariate logistic regression approaches. Journal of the Indian Society of Remote Sensing 38, 301–320. Pradhan, B., 2010c. Remote sensing and GIS-based landslide hazard analysis and crossvalidation using multivariate logistic regression model on three test areas in Malaysia. Advances in Space Research 45, 1244–1256. Pradhan, B., 2011a. Manifestation of an advanced fuzzy logic model coupled with geoinformation techniques to landslide susceptibility mapping and their comparison with logistic regression modelling. Environmental and Ecological Statistics 18, 471–493. Pradhan, B., 2011b. Use of GIS-based fuzzy logic relations and its cross application to produce landslide susceptibility maps in three test areas in Malaysia. Environmental Earth Sciences 63, 329–349. Pradhan, B., Lee, S., 2010a. Delineation of landslide hazard areas on Penang Island, Malaysia, by using frequency ratio, logistic regression, and artificial neural network models. Environmental Earth Sciences 60, 1037–1054. Pradhan, B., Lee, S., 2010b. Landslide susceptibility assessment and factor effect analysis: backpropagation artificial neural networks and their comparison with frequency ratio and bivariate logistic regression modelling. Environmental Modelling & Software 25, 747–759. Pradhan, B., Lee, S., 2010c. Regional landslide susceptibility analysis using backpropagation neural network model at Cameron Highland, Malaysia. Landslides 7, 13–30. Pradhan, B., Lee, S., Buchroithner, M.F., 2010a. A GIS-based back-propagation neural network model and its cross-application and validation for landslide susceptibility analyses. Computers Environment and Urban Systems 34, 216–235. Pradhan, B., Sezer, E.A., Gokceoglu, C., Buchroithner, M.F., 2010b. Landslide susceptibility mapping by neuro-fuzzy approach in a landslide-prone area (Cameron Highlands, Malaysia). IEEE Transactions on Geoscience and Remote Sensing 48, 4164–4177. Robinson, V.B., 2003. A perspective on the fundamentals of fuzzy sets and their use in geographic information systems. Transactions in GIS 7, 3–30. Ross, T.J., 2004. Fuzzy Logic with Engineering Applications. Wiley. Saito, H., Nakayama, D., Matsuyama, H., 2009. Comparison of landslide susceptibility based on a decision-tree model and actual landslide occurrence: The Akaishi Mountains, Japan. Geomorphology 109, 108–121. Sarkar, S., Kanungo, D.P., 2004. An integrated approach for landslide susceptibility mapping using remote sensing and GIS. Photogrammetric Engineering and Remote Sensing 70, 617–625. Sezer, E.A., Pradhan, B., Gokceoglu, C., 2011. Manifestation of an adaptive neuro-fuzzy model on landslide susceptibility mapping: Klang valley, Malaysia. Expert Systems with Applications 38, 8208–8219. Shafer, G., 1976. A Mathematical Theory of Evidence. Princetown University Press, New Jersey. Shujun, S., Baolei, Z., Wenlan, F., Wancun, Z., 2006. Using fuzzy relations and GIS method to evaluate debris flow hazard. Wuhan University Journal of Natural Sciences 11, 875–881. Singpurwalla, N.D., Booker, J.M., 2004. Membership functions and probability measures of fuzzy sets. Journal of the American Statistical Association 99, 867–877. Srivastava, R.P., Mock, T.J., Gao, L., 2011. The Dempster-Shafer theory: an introduction and fraud risk assessment Illustration. Australian Accounting Review 21, 282–291. Tangestani, M.H., 2009. A comparative study of Dempster–Shafer and fuzzy models for landslide susceptibility mapping using a GIS: An experience from Zagros Mountains, SW Iran. Journal of Asian Earth Sciences 35, 66–73. Thiam, A.K., 2005. An evidential reasoning approach to land degradation evaluation: Dempster-Shafer Theory of Evidence. Transactions in GIS 9, 507–520. Tien Bui, D., Lofman, O., Revhaug, I., Dick, O., 2011a. Landslide susceptibility analysis in the Hoa Binh province of Vietnam using statistical index and logistic regression. Natural Hazards 59, 1413–1444. Tien Bui, D., Pradhan, B., Lofman, O., Revhaug, I., Dick, O.B., 2011b. Landslide susceptibility mapping at Hoa Binh province (Vietnam) using an adaptive neuro-fuzzy inference system and GIS. Computers & Geosciences, http://dx.doi.org/10.1016/ j.cageo.2011.10.031. Topcu, İ.B., Sarıdemir, M., 2008. Prediction of mechanical properties of recycled aggregate concretes containing silica fume using artificial neural networks and fuzzy logic. Computational Materials Science 42, 74–82. Uzkent, B., Barkana, B.D., Yang, J.D., 2011. Automatic environmental noise source classification model using fuzzy logic. Expert Systems with Applications 38, 8751–8755. Van, T.T., Anh, D.T., Hieu, H.H., Giap, N.X., Ke, T.D., Nam, T.D., Ngoc, D., Ngoc, D.T.Y., Thai, T.N., Thang, D.V., Tinh, N.V., Tuat, L.T., Tung, N.T., Tuy, P.K., Viet, H.A., 2006. Investigation and Assessment of the Current Status and Potential of Landslides in Some Sections of the Ho Chi Minh Road, National Road 1A and Proposed Remedial Measures to Prevent Landslides from Threat of Safety of People, Property, and Infrastructure. Vietnam Institute of Geosciences and Mineral Resources, Hanoi. Varnes, D.J., 1984. Landslide Hazard Zonation: A Review of Principles and Practice. UNESCO, Paris. Yao, X., Tham, L.G., Dai, F.C., 2008. Landslide susceptibility mapping based on Support Vector Machine: A case study on natural slopes of Hong Kong, China. Geomorphology 101, 572–582. Yeon, Y.-K., Han, J.-G., Ryu, K.H., 2010. Landslide susceptibility mapping in Injae, Korea, using a decision tree. Engineering Geology 116, 274–283.



Yilmaz, I., 2009a. A case study from Koyulhisar (Sivas-Turkey) for landslide susceptibility mapping by artificial neural networks. Bulletin of Engineering Geology and the Environment 68, 297–306. Yilmaz, I., 2009b. Landslide susceptibility mapping using frequency ratio, logistic regression, artificial neural networks and their comparison: A case study from Kat landslides (Tokat-Turkey). Computers & Geosciences 35, 1125–1138. Yilmaz, I., 2010a. Comparison of landslide susceptibility mapping methodologies for Koyulhisar, Turkey: conditional probability, logistic regression, artificial neural networks, and support vector machine. Environmental Earth Sciences 61, 821–836.

Yilmaz, I., 2010b. The effect of the sampling strategies on the landslide susceptibility mapping by conditional probability and artificial neural networks. Environmental Earth Sciences 60, 505–519. Zadeh, 1965. Fuzzy sets. IEEE Information and Control 338–353. Zimmermann, H.J., 1991. Fuzzy Set Theory - and Its Applications. Springer.

Paper IV Tien Bui, D., Pradhan, B., Lofman, O., Revhaug, I., Dick, O.B., 2012. Landslide susceptibility assessment in the Hoa Binh province of Vietnam: A comparison of the Levenberg-Marquardt and Bayesian regularized neural networks. Geomorphology, 171–172, 12–29.


Author's personal copy Geomorphology 171–172 (2012) 12–29


Geomorphology journal homepage: www.elsevier.com/locate/geomorph

Landslide susceptibility assessment in the Hoa Binh province of Vietnam: A comparison of the Levenberg–Marquardt and Bayesian regularized neural networks Dieu Tien Bui a, b,⁎, Biswajeet Pradhan c, Owe Lofman a, Inge Revhaug a, Oystein B. Dick a a b c

Department of Mathematical Sciences and Technology, Norwegian University of Life Sciences, P.O. Box 5003 IMT, NO-1432, Aas, Norway Faculty of Surveying and Mapping, Hanoi University of Mining and Geology, Dong Ngac, Tu Liem, Hanoi, Vietnam Faculty of Engineering, Spatial and Numerical Modelling Research Group, University Putra Malaysia, Serdang, Selangor Darul Ehsan 43400, Malaysia

a r t i c l e

i n f o

Article history: Received 13 June 2011 Received in revised form 23 March 2012 Accepted 29 April 2012 Available online 5 May 2012 Keywords: Artificial neural networks Landslides GIS Levenberg–Marquardt Bayesian Regularization Hoa Binh province

a b s t r a c t This study investigates the potential application of artificial neural networks in landslide susceptibility mapping in the Hoa Binh province of Vietnam. A landslide inventory map of the study area was prepared by combining landslide locations investigated through three projects during the last 10 years. Some recent landslide locations were identified based on SPOT satellite images, field surveys, and existing literature. The images have a spatial resolution of 2.5 m. Ten landslide conditioning factors were utilized in the multilayer feed-forward neural network analysis: slope, aspect, relief amplitude, lithology, land use, soil type, rainfall, distance to roads, distance to rivers and distance to faults. Two back-propagation training algorithms, Levenberg–Marquardt and Bayesian regularization, were utilized to determine synoptic weights using a training dataset. Relative importance of each landslide conditioning factor was assessed using the above mentioned synoptic weights. The final connection weights obtained in the training phase were applied to the entire study area to produce landslide susceptibility indexes. The results were then imported to a GIS and landslide susceptibility maps were constructed. Landslide locations not used in the training phase were used to verify and compare the results of the landslide susceptibility maps. Finally, the two landslide susceptibility maps were validated using the prediction-rate method. Subsequently, areas under the prediction curves were assessed. The prediction accuracy of landslide susceptibility maps produced by the Bayesian regularization neural network and the Levenberg–Marquardt neural network were 90.3% and 86.1% respectively. These results indicate that the two models seem to have good predictive capability. The Bayesian regularization network model appears more robust and efficient than the Levenberg–Marquardt network model for landslide susceptibility mapping. © 2012 Elsevier B.V. All rights reserved.

1. Introduction Landslides are considered one of the most dangerous natural hazards that may follow triggering events (e.g. extreme rainfall and earthquakes) in mountainous areas, causing loss of human life and damage to property. The Hoa Binh province, located in the Northwest mountainous area of Vietnam, has been heavily affected by landslides in the recent years. Landslides are normally triggered by heavy rainfall, but very few attempts have been made to forecast their location or prevent their damage. Previously only a few investigations of landslide susceptibility analysis have been carried out in Vietnam (Lee and Dan, 2005). Therefore we carried out landslide susceptibility analysis in the Hoa Binh region. In recent years, various techniques have been developed for landslide hazard assessment using geographical information systems ⁎ Corresponding author at: Department of Mathematical Sciences and Technology, Norwegian University of Life Sciences, P.O. Box 5003 IMT, NO-1432, Ås, Norway. Tel.: +47 64965424. E-mail addresses: [email protected], [email protected] (D. Tien Bui). 0169-555X/$ – see front matter © 2012 Elsevier B.V. All rights reserved. doi:10.1016/j.geomorph.2012.04.023

(GIS). Generally, they can be classified into four main categories: landslide inventories, heuristic or index based methods, statistically based models, and deterministic approaches (Aleotti and Chowdhury, 1999; Guzzetti et al., 1999). Of these techniques, deterministic methods based on modeling of factors from the geotechnical material properties, as well as slope and other triggering factors, are considered to have the highest precision. The main limitation of deterministic methods is that they are feasible only for areas where landslide types are simple and geomorphic and geologic properties are fairly homogeneous (Van Westen and Terlien, 1996). Heuristic or index based methods are generally less precise and depend mostly on the experience and knowledge of the earth scientists who carry out the analysis. Landslides are related to many factors. A landslide analysis process therefore requires contribution of knowledge from many specialized fields. The main limitation of these methods is the subjectivity of assigning weights to the factors. Statistically based models, which are based on analysis of functional relationships between instability factors and a landslide inventory, are preferred by academics and research institutions for susceptibility assessment at medium scales (Ermini et al., 2005; Lee and Pradhan,

Author's personal copy D. Tien Bui et al. / Geomorphology 171–172 (2012) 12–29

13

Fig. 1. Landslide inventory map of the study area.

2007; Akgun and Turk, 2010; Pradhan, 2010; Oh and Lee, 2011). These models are less subjective than heuristically based models, but may be applied to a large geographic area and provide rapid spatial correlation

assessment of topographic and other mappable attributes (Gorsevski et al., 2001). Statistically based models require collection of a large amount of data to produce reliable results, a time-consuming and complex

Fig. 2. Geologic map of the study area.


D. Tien Bui et al. / Geomorphology 171–172 (2012) 12–29

process. Furthermore, many of the statistically based models (except the logistic regression method) require normally distributed landslide conditioning factors, a prerequisite which is not always satisfied. Therefore, in the past decade, some new methods such as artificial neural networks (ANNs) (Lee et al., 2003a, 2004), fuzzy logic (Ercanoglu and Gokceoglu, 2002; Lee, 2007a; Pradhan et al., 2009; Pradhan, 2011), and neuro-fuzzy based study (Pradhan et al., 2010; Vahidnia et al., 2010; Oh and Pradhan, 2011; Sezer et al., 2011; Akgun et al., 2012) have been proposed. In the application of ANNs for landslide study, Lee et al. (2003a) used the Multi-Layered Perceptron (MLP) network in the study area of Boun in Korea. Satisfactory agreement was found between the susceptibility map and the landslide location data. Neaupane and Achet (2004) successfully applied a neural network model to predict slope movements and obtained a promising result with good accuracy. Ermini et al. (2005) compared the MLP neural network with the Probabilistic Neural Network. The result showed that both models have satisfactory results, but the first one was slightly better. Kanungo et al. (2006) showed that the landslide susceptibility zonation map derived from a combined neural and fuzzy weighting procedure was the best map among those derived from the weighting methods. Melchiorre et al. (2008) stated that the prediction capability of a model can be improved, and that the most robust susceptibility map is obtained when a cluster analysis method is applied to data preprocessing. Then the data were used to build the Levenberg–Marquardt (LM) neural network (LMNN) model. Falaschi et al. (2009) concluded that ANN could be suitable for wider application in landslide susceptibility and hazard assessment. The advantage of ANNs compared to other statistical models is that ANNs require a smaller amount of training data for an accurate analysis (Paola and Schowengerdt, 1995). ANNs are independent of the statistical distribution of the data and do not need specific statistical variables (Lee et al., 2003b). This characteristic allows ANNs to incorporate different types of data into landslide models. In addition, ANNs can automatically approximate any nonlinear mathematical function (Kawabata and Bandibas, 2009; Garcia-Rodriguez and Malpica, 2010). This will be useful in predicting the outcome when the relationships between landslide conditioning factors are complex or unknown. Furthermore, ANNs have the ability to generalize in noisy environments, making ANN solutions more robust in the presence of incomplete or imprecise data (Fausett, 1994). ANNs can also incorporate a priori knowledge and realistic physical constraints into the analysis (Foody, 1995). During the ANN model development, there is a problem in determination of architecture parameters (such as the number of neurons

in the hidden layer) and this may not always be a straightforward process. Although several heuristic formulas have been proposed to estimate the optimal number of neurons in the hidden layer (Hecht-Nielsen, 1987; Hush, 1989; Aldrich et al., 1994; Kaastra and Boyd, 1996; Kanellopoulos and Wilkinson, 1997), none of them has been accepted as a universal guideline (Mas and Flores, 2008). Moreover, determination of the size, representation, and distribution of the training data as well as the method used to avoid over-fitting are also difficult. One of the most criticized features of neural network models is that they lack interpretability at the level of individual variables (Paliwal and Kumar, 2009). Many rules have been proposed, but no agreement has been reached about the standard rule in the choice of these parameters (Mas and Flores, 2008). Although there are some drawbacks, ANN is still considered a robust method which many researchers have applied to landslide analysis. The main difference between this study and the aforementioned literature is that the Bayesian regularized neural network (BRNN) was applied in landslide susceptibility assessment. BRNN is a relatively new method that has seldom been applied to landslide susceptibility assessment. The major advantage of the Bayesian regularization (BR) algorithm is that it determines the optimal number of neurons in the hidden layer objectively. Furthermore, a comparison of BRNN with LMNN was performed. 2. Study area and data 2.1. Study area The Hoa Binh province (Fig. 1) is located in the northwestern part of Vietnam between longitudes 104°48′E and 105°50′E, and latitudes 20°17′N and 21°08′N. It covers an area of about 4660 km 2. The altitude of the area ranges from 0 to 1510 m. The topographic inclination follows a NW–SE direction. The province is a hilly area situated between mountains and the Red River plain. Soil types are mostly ferralic acrisols, humic acrisols, rhodic erralsols, and eutric fluvisols that account for 80% of the total study area and 81% of total landslide pixels. Areas with slope gradient less than 10° account for 43% of the total study area; 28% of the study area has a slope gradient larger than 20°, and the remaining area falls into the slope category 10°–20°. The province is comprised of approximately 52.6% forest land, 21.0% barren land and non-tree rocky mountain, 14.5% agricultural land, 7.5% populated areas, 4.0% water surface, and 0.4% grass land. Statistical results show that 58.6% of landslide pixels observed are in forest land, followed by barren land and non-tree rocky mountain (19.6%), populated area (14.0%), and agricultural land (6.8%).

Table 1 Description of the main geologic formations in the study area. No

Formation

Symbol

Area (%)

Landslide pixel (%)

Main characteristic

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

Dong Giao Tan Lac Vien Nam Song Boi Suoi Bang Ben Khe Sinh Vinh Yen Chau Thai Binh Song Mua Co Noi Ban Nguon Sinh Quyen Phia Bioc Ban Pap Bo Hieng Nam Tham

T2adg T1o tl T1vn T2 − 3sb T3n − r sb ε − Obk O3 − Ssv K2yc Q23tb D1 sm T1 cn D1 bn PPsc γaT3npb D1 − 2bp S2 bh T2 lnt

28.0 12.5 11.1 10.7 5.7 4.8 3.2 2.9 2.4 2.2 2.2 2.1 2.0 1.6 1.3 1.1 0.9

25.7 12.8 21.6 9.7 5.0 3.6 0.2 0.0 6.3 1.8 1.7 3.7 0.1 0.5 1.2 1.5 4.5

Limestone, massive limestone, light-colored massive limestone marl dolomitized limestone Conglomerate, sandstone, tuffaceous sandstone, violetish tuffaceous siltstone, black clay shale, brown-violetish tuffs Aphyric basalt, magnesium-high basalt, andesitic basalt, adesite-dacite, trachyte, porphyritic trachyte, agglomerate Sandstone, silty sandstone, black clay shale, siltstone, conglomerate, tuffaceous silty sandstone, limestone Sandstone, conglomerate, siltstone, clay shale, black clay shale, coal seams or lenses Conglomerate, gray coarse sandstone, siltstone, sericite shale, quartzite, clay shale, marl, calcareous siltstone Gray sandy limestone, thin-to thick-bedded black limestone, dolomitic limestone Red sandstone, calcareous conglomerate, polymictic conglomerate, siltstone, chocolate sandstone, clay stone Chocolate sand, clay, silt, grayish brown sand, clayey silt Black clay shale, siltstone, a little sandstone, marl, limestone Sandstone, tuffaceous siltstone, clay shale, marl Sandstone, siltstone, black clay shale, clayey limestone Quartzite, biotite gneiss, quartz-mica-feldspar schist Conglomerate, gritstone, sandstone, siltstone, marl, biotitic granite, two mica granite, granophyres Thick-bedded to massive limestone, black-gray dolomitic limestone Clay shale, marl, limestone lenses, coaly shale. Clay shale, siltstone, marl


The province consists of rocks of varying age (Paleozoic to Cenozoic), physical/mechanical properties and chemical composition. Five main fracture zones pass through the province causing rock mass weakness: Hoa Binh, Da Bac, Muong La–Cho Bo, Son La–Bim Son, and Song Da (Thach et al., 2002). More than 38 geologic formations are recognized (Fig. 2), and 17 of them account for 94.7% of the study area (Table 1). Particularly 74.9% of landslide pixels observed in the

15

study area are located in the Dong Giao, Tan Lac, Song Boi, and Suoi Bang formations. The main characteristics of these formations are limestone, conglomerate, sandstone, aphyric basalt, magnesiumhigh basalt, silty sandstone, and black clay shale. The province is situated in the monsoonal region, with hot, rainy and dry seasons. The rainy season is normally from May to October with high rainfall frequency and intensity. Rainfall in the rainy season

Fig. 3. Factor maps (1). (a) Slope map. (b) Aspect map. (c) Relief amplitude map.



Fig. 3 (continued).

accounts for 84–90% of the annual rainfall, with short and effective storms being an important landslide triggering factor. For example, heavy rainfall during the tropical storm Lekima in 2007 gave an accumulated rainfall of 334.0 to 529.4 mm in three days (October 3–5), causing many landslides. Population growth has resulted in settlements in the highlands. This, combined with inappropriate land use practices, contributes to an increasing frequency of landslides. Deforestation and operation of the Hoa Binh hydro-power project have also resulted in natural disasters such as flooding, soil erosion and landslides in the past several years. Landslides on the banks of the Hoa Binh Lake significantly increased following water hoarding (Thach et al., 2002). 2.2. Data 2.2.1. Landslide inventory map A landslide inventory map was constructed from several sources: (1) The landslide inventory map 2005 (Thinh et al., 2005); (2) The landslide inventory map 2007 (My, 2007); (3) The landslide inventory map in the northern mountainous provinces (Hue et al., 2004); and (4) Recent landslides identified by interpretation of SPOT satellite imagery (with 2.5 m resolution) and from other information. Fieldwork was carried out at randomly selected landslide sites to verify the landslide locations. A total of 118 landslides during the last 10 years were identified and registered in the landslide inventory map, and depicted as 97 landslide-polygons and 21 rock fall locations. The size of the smallest landslide is 380 m 2, the largest is 14,340 m 2 and the average is 3440 m 2. For this analysis the landslide inventory map was converted to a raster format with spatial resolution of 20 m. Fig. 1 shows the distribution of landslide locations in the study area. 2.2.2. Digital elevation model and derivatives A digital elevation model (DEM) for the study area with a resolution of 20 × 20 m was generated from national topographic maps (1:25,000). Slope, aspect and relief amplitude were extracted

from the DEM. Six slope categories were used as layer classes for analysis (Fig. 3a). The aspect map with nine layer classes was constructed (Fig. 3b). The relief amplitude presents the maximum difference in height per unit area (Vergari et al., 2011). To calculate the relief amplitude map, different sizes of the unit area (pixel sizes: 10 × 10, 15 × 15, 20 × 20, 30 × 30, and 35 × 35) were tested to choose the best one (20 × 20 pixels). This work was carried out using the Focal statistic module in ArcGIS 9.3 software. The relief amplitude map for the study area was compiled into six classes (Fig. 3c). 2.2.3. Lithology and distance to faults Lithology was classified into seven groups based on the criteria of material components (clay composition), degree of weathering, and estimated strength and density (Van et al., 2002, 2006; Arıkan et al., 2007). The lithology and faults were extracted from four tiles of the Geological and Mineral Resources Map of Vietnam at a scale of 1:200,000. The lithological map was reclassified into seven groups (Fig. 4a). Faults have been considered to be a factor that may influence landslides. The degree of fracturing and weathering also plays an important role in determining slope instability (Varnes, 1984). In this study, a distance-to-faults map was constructed by buffering the fault lines. The fault buffer categories were defined as: 0–200, 200–400, 400–700, 700–1000, and >1000 m. 2.2.4. Soil type and land use A soil type map at a scale of 1:100,000 was extracted from the National Pedology map (Thach et al., 2002). Using this map, 27 original soil types were generalized to form 13 layers used in the analysis (Fig. 4b). The Land Use Status Map of the Hoa Binh province at a scale of 1:50,000 (Tien Bui et al., 2011) was used to extract land use data. The map is a product of the Status Land Use project of the National Land Use Survey in Vietnam in 2006. The 53 land use types in the map were generalized into twelve categories (Fig. 5a).


2.2.5. Distance to roads and distance to rivers A road network that undercut slopes was extracted from the topographic map at a scale of 1:50,000. A distance-to-roads map was constructed using buffer algorithms in the ArcGIS 9.3 software with categories defined as 0–40, 40–80, 80–120 and >120 m. Similarly, a hydrological network that undercut slopes was also extracted from the topographic map at a scale of 1:50,000 by buffering the river lines, resulting in a distance-to-rivers map. The river buffer categories were defined as 0–40, 40–80, 80–120, and >120 m.

2.2.6. Rainfall Rainfall in the study area is concentrated during the rainy season from May to October. In the Hoa Binh province, landslides usually occurred after heavy rain exceeding 100 mm day − 1, following a continuous period of rain for seven to ten days. Landslides also occurred when rainfalls exceeded 100 mm day − 1 and continued for three days. The value of maximum rainfall for eight days (seven rainfall day plus last day of rainfall larger than 100 mm) for the period from 1990 to 2010 was used to create a rainfall map (Fig. 5b) using the Inverse Distance Weighed method. The precipitation data were extracted from a database from the Institute of Meteorology and Hydrology in Vietnam.

17

3. Landslide susceptibility mapping using artificial neural network 3.1. Artificial neural network 3.1.1. Preview An ANN is defined as a massively parallel-distributed informationprocessing system made up of simple processing units, having a natural propensity for storing experiential knowledge and making it available for use (Haykin, 1998). The purpose of an ANN is to build a model for problems such as pattern recognition and classification. Once an ANN has been trained on samples of datasets, it can predict outputs from inputs (Lee et al., 2003a). An ANN is constructed using a large number of interconnected simple neurons, in which each neuron is called an informationprocessing unit. The behavior of an ANN depends on its architecture, the method of determining the connection weights and the activation functions. There are two types of architectural connections in ANNs, feed-forward and feed-back. A feed-forward structure means that all interconnections between the layers propagate forward to the next layer. Feed-back structures have a connection backward from the output to the input neurons. Many different architectural types of ANNs have been introduced to date. In landslide studies, a supervised feed-forward network with back-propagation learning algorithms may be the most widely

Fig. 4. Factor maps (2). (a) Lithological map. (b) Soil type map.



Fig. 4 (continued).

used (Lee et al., 2006). An ANN can be of the single or multiple layers type. In general, the architecture of a feed-forward neural network based on a back-propagation learning rule, consists of three layers: input, output, and hidden. In this study, neural network models were constructed using one hidden layer, one output layer and ten input neurons representing ten landslide conditioning factors (slope, aspect, relief amplitude, lithology, land use, soil type, rainfall, distance to roads, distance to rivers and distance to faults). The output layer of the ANN contains a single neuron that presents the absence or presence of existing landslide occurrence. The structure of the artificial neural network in this study is shown in Fig. 6. There are many algorithms available for training ANN models. In this analysis, the two algorithms (LM and BR) were applied to the training data to calculate the weights between the input and hidden layers, and between the hidden and output layers. The activation function of the hidden layer was set to the log-sigmoid function as follows:

3.1.2. Levenberg–Marquardt (LM) algorithm For an ANN with a vector input x (x1, x2,…, xn) and an output vector y, the equation that expresses the relationship between the input and output can be written as:

x f ðxÞ ¼ 1= 1−e

ED ¼

ð1Þ

where x is the input vector. The tan‐sigmoid transfer function was set to the output layer: x −x x −x = e þe : f ðxÞ ¼ e −e

ð2Þ

2 3 ! h n X X 4 y ¼ f output wj f hidden wij xi þ bias1 þ bias2 5 j¼0

ð3Þ

i¼0

where n is the number of input units; h is the number of neurons in the hidden layer; xiis the i-th input unit; wij is the weight parameter between input i and hidden neuron j; wj is the weight parameter between hidden neuron j and the output neuron; fhidden is the activation function of the hidden layer; and foutput is the transfer function of the output layer. The weights were estimated and adjusted in the learning process with an aim of minimizing an error function ED as follows: n X i¼1

2

ðyi −t i Þ ¼

n X

ei

2

ð4Þ

i¼1

where n is the number of input and output examples of the training dataset D, and t is the target value. The errors were fed backward through the network to adjust the weights until the error ED was acceptable for the network model. Once the ANN is satisfied in the training process, the synaptic weights will be saved and then used


Fig. 5. Factor maps (3). (a) Land use map. (b) Rainfall map.

19



Fig. 6. General architecture of the neural networks used in this study.

to predict the outcome for new data. To minimize ED, optimal parameters of weights and biases have to be determined. One of the algorithms for solving this problem is the LM algorithm (Hagan and Menhaj, 1994). This algorithm is a modification of the Newton algorithm for finding optimal solutions to a minimization problem. The weights of an LMNN are calculated using the following equation (Wilamowski et al., 1999): −1 T T wiþ1 ¼ wi − J i J i þ μ i I J i ei

ð5Þ

where J is the Jacobian matrix of output errors, I is the identity matrix, and μ is a learning parameter. When μ = 0, it becomes the Gauss– Newton method using the approximate Hessian matrix. If μ is large, the LM algorithm becomes a gradient descent with small step size (the same as in the standard back propagation algorithm). Using the LM algorithms, the quality of the LMNN model was assessed by using the coefficient of multiple determination (R 2) as follows (Trowsdale et al., 1998): 2

R ¼ 1− ED =

n X

ðyi −y Þ

SðwÞ ¼ βED þ αEW ; EW ¼

where y is the mean of y. The R 2 value compares the accuracy of the LMNN model with that of a trivial benchmark model, where the prediction is simply the mean of all the sample cases (Luxhoj, 1998). One of the difficulties in training LMNNs for landslide studies is to control over-fitting. The early stopping technique has been proposed to resolve this problem (Melchiorre et al., 2008), which splits the training dataset into three subsets: training, verification, and test sets. The training set was used to compute the gradient and update the network weights and biases. The verification set was used to control over-fitting by monitoring the error in the verification set. The training process will be stopped when the network performance on the verification data fails to improve or remains the same. The error in the test dataset, which was not used during the training phase, was used to compare the different LMNN models. 3.1.3. Bayesian regularization (BR) algorithm One of the difficulties in designing an ANN model is to determine the number of hidden neurons. Too many neurons will lead to overfitting, and inversely, a network with an insufficient number of hidden nodes will have difficulty in learning. An ANN model which is too simple or too complex will have a poor prediction performance. To overcome this, the BR algorithm was applied to incorporate the Bayes' theorem into the regularization scheme.

2

wi

ð7Þ

where the terms α and β are called regularization parameters or hyperparameters, and EW is the penalty term, which penalizes large values of the weights, with m being the number of weights. In the Bayesian framework, the learning process takes into account the uncertainty in the weight vector by assigning a probability distribution to the weights that represents the relative degrees of belief in the different values. This function is initially set to some prior distribution. Once the data have been observed, they can be converted to a posterior distribution using the Bayes' theorem (Bishop, 1995):

ð6Þ

i¼1

m X i¼1

Pðwjα; β; DÞ ¼

! 2

A BRNN is basically a back propagation network that combines the conventional sum of the least squares error function with an additional term called “regularization”. Thus, from Eq. (4) we have the following equation:

PðDjw; βÞPðwjα Þ P ðDjα; βÞ

ð8Þ

where P(w|α) is the prior density, which presents the degree of belief in the weights before any data are collected; P(D|w, β) is the likelihood function, which is the probability of error; and P(D|α, β) is the normalization factor, named evidence for the model (MacKay, 1992). Eq. (8) can be expressed in words as: Posterior ¼

Likelihood:Prior : Evidence

ð9Þ

The optimal weights of the model can be obtained in the training phase by maximizing the posterior probability. This is equivalent to minimizing the regularized objective function of Eq. (7) (Foresee and Hagan, 1997). Prior: Assuming that the weight and data probability distribution is Gaussian, the prior probability over the weight can be written as (Burden and Winkler, 2008): pðwjα Þ ¼

1 expð−αEW Þ: Z W ðα Þ

ð10Þ

Likelihood: Similarly, the probability of the errors can be written as: pðDjw; βÞ ¼

1 expð−βED Þ: Z D ðβÞ

ð11Þ


21

Fig. 7. Relationship between the ten landslide conditioning factors and landslide occurrence (PO: populated area, OR: orchard land, PA: paddy land, PT: protective forest land, NF: natural forest land, PD: productive forest land, WT: water, CR: annual crop land, RM: non-tree rocky mountain, BR: barren land, SF: specially used forest land, GR: grass land, EF: eutric fluvisols, DS: degraded soil, LM: limestone bedrock (mountain), FA: ferralic acrisols, RF: rhodic ferralsols, HA: humic acrisols, DF: dystric fluvisols, DG: dystric gleysols, LS: luvisols, HF: humic ferralsols, GF: gley fluvisols, N: north, NE: northeast, E: east, SE: southeast, S: south, SW: southwest, W: west, NW: northwest).



Posterior distribution: We can finally obtain the posterior distribution Pðwjα; β; DÞ ¼

1 expð−SðwÞÞ: Z S ðα; βÞ

ð12Þ

Regularization parameters α and β: Use of the Bayes' theorem allows us to infer the optimal values of the regularization parameters from the data. Pðα; βjDÞ ¼

PðDjα; βÞ:P ðα; βÞ P ðDÞ

ð13Þ

Table 2 Normalized classes of landslide conditioning factors used in the neural network models. Data layers

Classes

Attributes

Normalized classes

Slope (°)

0–10 10–20 20–30 30–40 40–50 >50 Flat (− 1) North (0–22.5 and 337.5–360) Northeast (22.5–67.5) East (67.5–112.5) Southeast (112.5–157.5) South (157.5–202.5) Southwest (202.5–247.5) West (247.5–292.5) Northwest (292.5–337.5) 0–50 50–100 100–150 150–200 200–250 250–532 Group 1 Group 2 Group 3 Group 4 Group 5 Group 6 Group 7 Populated area Orchard land Paddy land Protective forest land Natural forest land Productive forest land Water Annual crop land Non tree rocky mountain Barren land Specially used forest land Grass land Eutric fluvisols Degraded soil Limestone exposure (mountain) Ferralic acrisols Rhodic ferralsols Humic acrisols Dystric fluvisols Dystric gleysols Luvisols Humic ferralsols Populated area Water Gley fluvisols 362–470 470–540 540–610 610–950 0–40 40–80 80–120 >120 0–40 40–80 80–120 >120 0–200 200–400 400–100 700–1000 >1000

1 2 3 4 5 6 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 1 2 3 4 5 6 7 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 13 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 5

0.10 0.26 0.42 0.58 0.74 0.90 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 0.10 0.26 0.42 0.58 0.74 0.90 0.10 0.23 0.37 0.50 0.63 0.77 0.90 0.10 0.17 0.25 0.32 0.39 0.46 0.54 0.61 0.68 0.75 0.83 0.90 0.10 0.17 0.23 0.30 0.37 0.43 0.50 0.57 0.63 0.70 0.77 0.83 0.90 0.10 0.37 0.63 0.90 0.10 0.37 0.63 0.90 0.10 0.37 0.63 0.90 0.10 0.30 0.50 0.70 0.90

Aspect

where P(α, β) is the prior probability for the regularization parameters α and β. P(D|α, β) is the likelihood term, which is called the evidence for α and β (MacKay, 1992). The optimal values of α and β are obtained using Eq. (14) (Burden and Winkler, 2008). α ¼ γ=2EW ; β ¼ ðn−γ Þ=2ED ; γ ¼

m X

m−α:trace

−1

ðAÞ

ð14Þ

i¼1

The quantity γ is called the effective parameter; m is the number of parameters; A is the Hessian matrix of the objective function S(w). In the Bayesian framework, the optimization process may be carried out to find the optimal weights in Eq. (7) and the optimal values of α and β in Eq. (14). According to Foresee and Hagan (1997) the iterative procedure is as follows: (1) Choose initial values for α and β and the weights. (2) Take one step of the LM algorithm to find the weights that minimize the objective function S(w) in Eq. (7). (3) Compute the effective number of parameters γ and new values for α and β. (4) Iterate steps 2 to 3 until convergence.

Relief amplitude (m)

Lithology

Land use

3.2. Analysis of relationship between landslide occurrence and conditioning factors In order to explore the relationship between landslide occurrence and conditioning factors, frequency distribution and absolute landslide incidence analyses were performed. The frequency distribution shows the percentage of pixels of a factor class in the study area, whereas the absolute landslide incidence shows the percentage of landslide pixels that falls in the factor class (Kawabata and Bandibas, 2009). Fig. 7 shows the relationship between landslide occurrences and the ten landslide conditioning factors. Most of the landslides have occurred in the slope category 20°–30° (54.8%), followed by 10°–20° (29.1%), and 30°–40° (14.3%) (Fig. 7a). Few landslides have occurred where slope angles were less than 10° or greater than 40°. In the case of aspect, the result (Fig. 7b) shows that most of the landslides occurred in the SW (26.3%), S (22.7%), SE (14.5%), and NE (11.8%) directions. In the case of relief amplitude (Fig. 7c), the landslides occurred mostly in the amplitude 100–150 m (41%), followed by 50–100 (25.4%), 150–200 (20.4%) and 200–250 (8.4%) m. The relationship between landslide occurrence and lithology (Fig. 7d) shows that most landslides occurred in group 2 (sedimentary aluminosilicate and sedimentary quartz rocks) with 33.4%, group 3 (sedimentary carbonate rocks) with 27.1%, and group 4 (mafic–ultramafic magma rocks) with 21.6%. In the case of land use (Fig. 7e), 22.6% of landslide pixels are in protective forest, followed by productive forest (20.3%), natural forest (15.6%), populated area (14%), barren land (12.4%), and non-tree rocky mountain (7.2%). Regarding soil type (Fig. 7f), 43.8% of the landslide pixels occurred in ferralic acrisols, followed by humic acrisols (28.1%), mountainous limestone bare rocks (15.1%), eutric fluvisols (6.1%), rhodic ferralsols (3.4%), and dystric fluvisols (2.8%). In the case of rainfall (Fig. 7g), 35.8% of landslide pixels occurred in the

Soil type

Rainfall (mm)





23

category of 470–540 mm, followed by the 610–950 (27.9%), 362–470 (27.2%), and 450–610 (9.0%) mm. It is clear that the highest landslide density is in the 610–950 mm category. The relationship between landslide occurrence and distance to roads (Fig. 7h) shows that 67.9% of landslides occurred at distances less than 120 m. In the case of distance to rivers (Fig. 7i), 14.4% of the landslides occurred in the distance category of 0–40 m, 12.4% in 40–80 m and 8.3% in 80–120 m. In the case of distance to faults (Fig. 7j), 78.2% of the landslides occurred at distances less than 1000 m. 3.3. Preparation of training and validation datasets A total of ten maps of landslide conditioning factors (slope, lithology, rainfall, soil type, land use, aspect, distance to roads, distance to rivers, distance to faults, and relief amplitude) were used in this analysis. The maps were converted into a raster format with a grid cell size of 20 × 20 m. Each category of the ten maps was assigned an attribute value, which was then normalized to the range 0.1 to 0.9 (Table 2) using the Max–Min normalization formula (Fernandes and Lona, 2005) as follows: ′

v ¼

v− minðvÞ ðU−LÞ þ L maxðvÞ− minðvÞ

ð15Þ

where v′ is the normalized data matrix; v is the original data matrix; and U and L are the upper and lower normalization boundaries. In landslide modeling, a landslide inventory map should be partitioned into two subsets for training and validation. Without the partitioning, it would not be possible to validate the results (Chung and Fabbri, 2003). The training dataset is used to obtain landslide models whereas the validation dataset is used to evaluate the prediction results. When partitioning data, there is no rule of thumb for the relative sizes of the two subsets. In this study, the landslide inventory was randomly partitioned into two subsets. Part 1 comprised 70% of the data (82 landslides with 684 landslide grid cells) and was used in the training phase of the neural network models. Part 2 is a validation dataset with 30% of the data (36 landslides with 315 landslide grid cells) for the validation of the neural network models and to estimate their accuracy. All of the 684 grid cells in the part 1 dataset denoting the presence of landslides were assigned the value of 1. The same number of grid cells denoting the absence of landslide were randomly sampled from the landslide-free area and assigned a value of 0. Values for the ten landslide conditioning factors were then extracted to build a training dataset. This dataset contains a total of 1368 pixels, with Table 3 Coefficient of determination (R2) and SSE for different LMNN models. Number of neurons in the hidden layer

Training set (R2)

Validation set (R2)

Test set (R2)

All (R2)

SSE

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

0.756 0.795 0.833 0.839 0.858 0.862 0.869 0.906 0.912 0.923 0.925 0.93 0.955 0.959 0.976 0.976 0.987 0.994

0.744 0.799 0.843 0.832 0.841 0.833 0.86 0.868 0.855 0.876 0.860 0.872 0.892 0.913 0.908 0.946 0.957 0.934

0.751 0.788 0.775 0.836 0.869 0.787 0.798 0.866 0.819 0.852 0.832 0.895 0.848 0.905 0.925 0.912 0.884 0.926

0.754 0.795 0.825 0.838 0.857 0.847 0.857 0.895 0.890 0.903 0.903 0.916 0.929 0.943 0.958 0.962 0.967 0.974

65.50 55.80 42.70 37.50 35.40 28.60 26.60 19.30 17.80 12.90 7.52 7.04 5.23 3.88 2.65 2.56 1.92 1.28

Fig. 8. Changes of optimal BRNNs along with the number of hidden neurons: (a) SSE vs. hidden neurons and (b) effective number of parameter vs. hidden neurons.

one target variable denoting the landslide presence/absence and the 10 landslide conditioning factors.

3.4. Training neural network models and generation of landslide susceptibility indexes In the case of the LMNN, the analysis was performed using 10 network inputs and one network output. Using the early stopping technique to control over-fitting, the training dataset (1368 observations) was divided into three subsets: training (70%), verification (15%), and test (15%). Various structures of the LMNN with differing numbers of hidden neurons were tested in the training process, and the values of the sum of squared errors (SSE) were used to determine the best one. The maximum number of training epochs was set to 1000. The SSE goal was set to 0.001. All other training parameters were set to their default values in the Matlab 7.11 software. During the training phase, the data were processed several times to see whether any changes occurred due to the assignment of random initial weights. The results obtained are shown in Table 3. When the number of hidden neurons increases, the performance of the network model improves. SSE decreases with an increase in the number of hidden neurons. The LMNN with one hidden neuron has the highest SSE value (65.5), decreasing to 1.28 with 18 hidden neurons. It can also be seen from Table 3 that the R 2 value increases from 0.756 to 0.994 in the training set, from 0.744 to 0.934 in the verification set, and from 0.751 to 0.926 in the test set. The R 2 value for all three datasets increased from 0.754 with one hidden neuron to 0.974 with 18 hidden neurons. A higher value of R 2 means better agreement between the predicted and observed values. For the comparison of the LMNN model with the BRNN model, a network structure with 18 hidden neurons was selected. In the case of the neural network using the BR algorithm, with the same 10 network inputs and one network output, the BRNN was independently trained several times to ensure that the starting weights were reasonable. The maximum number of epochs was set to 1000. The SSE goal was set to 0.001. Other training parameters were set to the default values in Matlab 7.11.



Fig. 9. Landslide susceptibility map of the Hoa Binh province using (a) LMNN and (b) BRNN.

BRNNs automatically control the structural complexity, so there is no need to use separate training and verification sets. The training phase was performed using all the training data. Another main advantage of the BRNN method is its ability to determine optimal network structure. Fig. 8 describes the change of SSE and in the number of effective parameters with varying numbers of neurons in the hidden layer. The effective number of parameters increased from a smallest value of 12.4 with one neuron in the hidden layer to 150 with 18 hidden neurons. At this point, it stabilized even

though we increased the number of hidden neurons. In contrast, the value of SSE decreases from the largest value of 87.7 with one hidden neuron to the value of 1.30 with 18 hidden neurons. SSE was then stable at this value with an increasing number of hidden neurons. This suggests that the BRNN is robust and the optimal number of neurons in the hidden layer was determined to be 18. The optimal structure of the BRNN model is determined to be 10 × 18 × 1; γ was determined to be 150; and the sum of squared parameters is 1520.9 and SSE is 1.30.


25

Fig. 10. Frequency ratio plots of four landslide susceptibility classes of LMNN and BRNN models.

Once the LMNN and BRNN models were successfully trained, the connection weights of the two models were used to calculate the landslide susceptibility indexes (LSI) for all the pixels in the study area. The results were transferred to maps using the ARCGIS 9.3 software. 4. Validation and comparison of landslide susceptibility maps 4.1. Reclassification of landslide susceptibility indexes In order to obtain a landslide susceptibility map, the LSI values were reclassified into four different susceptibility classes. There are many classification methods available such as quantiles, natural breaks, equal intervals and standard deviations (Ayalew and Yamagishi, 2005). Generally, the selection of classification methods may depend on the histogram of landslide susceptibility indexes. In this study, the classification method proposed by Pradhan and Lee (2010c) was used to determine landslide susceptibility class breaks: high (10%), moderate (10%), low (20%), and very low (60%). Fig. 9 shows two landslide susceptibility maps using the (a) LMNN and (b) BRNN models. Relative frequency ratio analysis was performed on the classification results by overlaying the landslide grid cells of the four landslide susceptibility classes for each landslide susceptibility map. Frequency ratio values were then calculated for each of the four susceptibility classes (Sarkar and Kanungo, 2004). Theoretically, the frequency ratio value should increase from a very low to a higher susceptibility class (Pradhan and Lee, 2010b). Fig. 10 shows the plotting frequency values of the four landslide susceptibility classes for the two neural network models. It shows that there is a gradual increase in the frequency from the very low to the high susceptibility class. Characteristics of the four susceptibility classes for the two maps obtained from the two neural network models are shown in Table 4. The percentages of existing landslide pixels that fell into the high class are 87.79% and 88.49% for the LMNN and BRNN models respectively. About 80% of the pixels in the study areas fell into the low and very low susceptibility classes.

Fig. 11. Success-rate curves and area under the curves (AUC) of LMNN and BRNN models in comparison with the logistic regression model.

them with the existing landslide locations (Chung and Fabbri, 2003). The success-rate results were obtained based on a comparison of the landslide grid cells in the training dataset (684 landslide grid cells) with the two landslide susceptibility maps. The success rate measures how the landslide analysis results fit the training dataset. This method divides the area of a landslide susceptibility map into classes, ranging from the highest to lowest LSI values. Then, the number of landslide grid cells in each class is calculated and a cumulative curve is plotted. The success-rate curves of the two landslide susceptibility maps obtained from the LMNN and BRNN models are shown in Fig. 11. For comparison, the success-rate curve of a landslide susceptibility map obtained from logistic regression (using the same data) is also included (Fig. 11). The areas under the prediction-rate curves (AUC) were calculated. An AUC equal to 1 indicates perfect prediction accuracy (Lee and Dan, 2005). The result shows that the BRNN has the largest AUC value (0.971), followed by logistic regression (0.962) and the LMNN (0.903), indicating that the BRNN correctly classified the areas with existing landslides. The success-rate method uses the training dataset for the evaluation of the neural networks. Strictly speaking, the method may not be suitable for assessing the prediction capacity of the landslide models (Lee et al., 2007). The prediction rate can explain how well a model and conditioning factors predict landslides (Chung and Fabbri, 2003; Brenning, 2005; Lee, 2007b; Pradhan and Lee, 2010c). In this study, the prediction rate result was obtained by comparing landslide grid cells in the validation dataset (315 cells that is not used in the training phase) with the two landslide susceptibility maps. Fig. 12 shows the prediction-rate results of the two landslide susceptibility maps obtained from the LMNN and BRNN in comparison with the result of logistic regression. Although the LMNN (AUC = 0.861) and the BRNN (AUC = 0.903) have good

4.2. Success and prediction rates for landslide susceptibility maps Using the success-rate and prediction-rate methods, the results of the two landslide susceptibility maps were validated by comparing Table 4 Characteristic of the four susceptibility zones of LMNN and BRNN models. Landslide susceptibility classes

Percentage of area

LMNN model % landslide grid cell

Frequency ratio

% landslide grid cell

BRNN model Frequency ratio

High Moderate Low Very low

10.0 10.0 20.0 60.0

87.89 4.90 2.20 5.01

8.789 0.490 0.110 0.084

88.49 5.30 5.11 1.10

8.849 0.530 0.256 0.018

Fig. 12. Prediction-rate curves and area under the curves (AUC) of LMNN and BRNN models in comparison with the logistic regression model.



Table 5 Relative importance of input neurons for LMNN and BRNN models. No

Input neuron

1 2 3 4 5 6 7 8 9 10

Relative importance of input neurons (%)

Slope Lithology Rainfall Soil type Landuse Aspect Distance to roads Distance to rivers Distance to faults Relief amplitude

LMNN

BRNN

11.8 13.9 10.1 12.4 7.9 7.1 8.3 7.7 11.2 9.7

13.2 12.8 10.2 12.3 8.9 9.3 6.7 6.5 10.6 9.5

Difference (%)

1.4 1.1 0.1 0.1 1.0 2.2 1.6 1.2 0.6 0.2

prediction capability, the logistic regression (AUC = 0.938) has a higher prediction capability. 4.3. Interpretation of the weights of LMNN and BRNN models Sensitivity analysis is the most commonly used method in the literature to evaluate the effect of the input neurons on outputs (Lamy, 1996; Aggarwal et al., 2005; Palmer et al., 2008). This method involves running the network, varying each input by a small amount and determining how much the output changes (Burden and Winkler, 2008). However the sensitivity analysis method only demonstrates how a trained network reacts to the change of each input. It does not show the contribution of each input to the output (Joseph et al., 2003). The connection weight matrix of the neural network can be used to assess the relative importance of the various input neurons on the output (Lee et al., 2004). However, due to the non-linear nature of the activate functions, the relative importance can only be a coarse measure of the effect (Yesilnacar and Topal, 2005). Therefore the formula proposed by Pareek et al. (2002) was used: h P

Ij ¼

i¼1

n P k¼1

(

h P i¼1

! ! P n jW ki j :jWOi j W ji = k¼1

jW ki j=

n P

!

!)

ð16Þ

jW ki j :jWOi j

k¼1

where Ij is the relative importance of the input factors j for the output, n is the number of input factors, h is the number of hidden neurons, W is the synaptic weight matrix between the input and the hidden layer, and WO is the synaptic weight matrix between the hidden and output layers. Using the weights obtained from the two final neural network models and Eq. (16), the contribution of each landslide input factor for the two network models was assessed. Table 5 shows the relative importance of ten input neurons on the output neuron for the LMNN and BRNN models. The result shows that the relative importance of four input neurons (rainfall, soil type, relief amplitude, and distance to faults) is almost equal for both networks. However, some input neurons have significantly different effects in the two networks: aspect (by 2.2%), distance to roads (1.6%), and slope (1.4%). 5. Discussion The results of this study indicate that landslide susceptibility assessment is viable using the LMNN or BRNN models with the 10 conditioning factors. Qualitative interpretation of the high landslide susceptibility class of the two maps, obtained from the LMNN and BRNN, shows that they agree quite well with field evidence. Areas with high probability of landslides often occur along road-cut

sections. Roads in the study area have been expanded and reconstructed for the last five years. The probability of landslides is also high along active faults, especially in the zones passing through the Tan Mai and Phuc San communes on the Hoa Binh Lake (Fig. 9). In these areas, 350 families had to relocate in 2010 and 2011 due to landslides. Selection of conditioning factors for assessing landslide susceptibility is an important task that influences the quality of ANN models. However, agreement has not been reached on universal guidelines for the selection of conditioning factors. Slope, aspect, lithology, soil type, and land use may be the most widely used conditioning factors. Selection of other conditioning factors (relief amplitude, rainfall, distance to roads, distance to rives, and distance to faults) is dependent on the characteristics of each study area (Van Westen et al., 2003; Ayalew and Yamagishi, 2005). As shown in Figs. 11 and 12, the inclusion of these factors in our study has provided the results comparable to those by others such as Chung and Fabbri (2003), Yesilnacar and Topal (2005), Corsini et al. (2009), Yilmaz (2009), Garcia-Rodriguez and Malpica (2010) and Poudyal et al. (2010). The accuracy of ANN models is also influenced by data sampling strategies. Some researchers used the same dataset for both training and validating a landslide model (Lee and Sambath, 2006; Biswajeet and Saied, 2010), which reduces the reliability of validation. Chung and Fabbri (2008) suggested dividing the study area into subregions such as left and right: one for training and the other for validation. However, given the extensiveness of the Hoa Binh province and the variable geological conditions, the prediction capability may not be transferrable from a particular sub-region to the entire study area. The occurrence dates of landslides are not known; therefore, partitioning by time is also impossible. Therefore we partitioned the training and validation datasets randomly. The main disadvantage of this method is that the estimated prediction capability of a model may be too optimistic if spatial separation between training and validation pixels is small (Brenning, 2005). An additional sampling method will be used in the future to address this issue. Application of ANNs is sometimes criticized for the lack of interpretability regarding the contribution of individual variables. We alleviated this problem using objectively determined weights. The relative importance of conditioning factors of landslides has been discussed (Pavel et al., 2008) and slope is widely accepted as the most important factor (Lee et al., 2003a; Van Den Eeckhaut et al., 2006; Pradhan and Lee, 2010a, 2010b, 2010c). In this study, the relative importance of the ten conditioning factors is different between the two models. However, slope, lithology, soil type, and distance to faults are commonly important (Table 5), particularly slope in the BRNN as indicated in the literature. The greatest difference (2.2%) occurs in the contribution of aspect, which may be due to a higher weight assigned during the training phase. In the BRNN, 93.8% of the landslide pixels are correctly located in moderate and high susceptibility zones (Table 4). This is slightly higher (1%) than those from the LMNN. In contrast, about 5% of the landslideaffected pixels are incorrectly classified in the very low susceptibility zone in the LMNN. The rate of misclassification for the LMNN is ca. 4% higher than that for the BRNN. A further analysis was carried out to explain the better performance of the BRNN. We selected slope, lithology and aspect for this analysis because the former two factors have the highest contributions in the BRNN and the latter one strongly reflects the difference between the two models. The four landslide susceptibility zones from these three factors are shown in Fig. 13. The LMNN apparently underestimated the role of steep slopes (>40°). There is also a small difference between the models for the role of the intermediate slope gradient (10–40°) in the moderate and high susceptibility zones. The LMNN has given a wider zone of very low susceptibility (Fig. 13) because of an inadequate weight assignment


27

Fig. 13. Distribution of slope, lithology and aspect classes of the four landslide susceptibility zones for the landslide susceptibility maps from LMNN and BRNN models.

to the intermediate slope gradient. Concerning lithology, the BRNN overestimates the role of the acid-neural magmatic rocks (group 5) with 30% pixels having moderate and high susceptibilities (Fig. 13) due primarily to the small number of landslides available for this rock group. Although the highest landslide density occurs in the mafic–ultramafic magma rock group (group 4), the LMNN classified about 63% of the pixels of the group as very low susceptibility. This also seems to be due to the less accurate assigned weight. A similar problem also occurs with aspect; in the LMNN, about 51% of the areas in the flat class were classified as moderate and high susceptibility zones by the LMNN, whereas the rate is only 9% in the BRNN. 6. Conclusions This study has applied the LMNN and BRNN models to assess landslide susceptibility in the Hoa Binh province, Vietnam. It has confirmed that ANN models are effective for complex problems such as landslides susceptibility analysis, although the internal processing steps are difficult to follow. The LMNN model is considered to be one of the fastest. However, the model has difficulty in determining the optimal neural network structure, and is prone to over-fitting. In addition, the dataset used should be divided into three subsets: training, verification and test. In this study, the best results

were archived with an LMNN structure of 10 input neurons, 18 hidden neurons, and one output neuron. Nevertheless, one cannot clearly conclude that this structure is optimal. The BRNN model is much more robust because it determines an optimal network from a changing number of effective parameters as a function of the number of hidden neurons. Although the optimal structure of the BRNN in this study is the same as that for the LMNN (10 × 18 × 1), the BRNN is less sensitive to overtraining and the training set does not need to be partitioned. This allows the BRNN to be trained with larger observation data than the LMNN. Indeed, the BRNN performed better than the LMNN in terms of both success and prediction rates, and was found far more robust and efficient. Although the BRNN performed well, its prediction capability was slightly lower than that of logistic regression. The results from this study may be useful for decision making and policy planning in areas prone to landslides. Acknowledgment The authors would like to thank Prof. Takashi Oguchi, Prof. Cees van Westen, and two anonymous reviewers for their valuable and constructive comments that improved the paper. The first author gratefully acknowledges Mr Vu Manh Hao at the Centre for Geological Appraisal & Technology, Ministry of Natural Resources



and Environment (Vietnam) for providing the geological data. This research was funded by the Norwegian Quota scholarship program, and was carried out as a part of the first author's PhD studies at the Geomatics Section, Department of Mathematical Sciences and Technology, Norwegian University of Life Sciences, Norway. References Aggarwal, K.K., Singh, Y., Chandra, P., Puri, M., 2005. Sensitivity analysis of fuzzy and neural network models. SIGSOFT Software Engineering Notes 30, 1–4. Akgun, A., Turk, N., 2010. Landslide susceptibility mapping for Ayvalik (Western Turkey) and its vicinity by multicriteria decision analysis. Environmental Earth Sciences 61, 595–611. Akgun, A., Sezer, E.A., Nefeslioglu, H.A., Gokceoglu, C., Pradhan, B., 2012. An easy-to-use MATLAB program (MamLand) for the assessment of landslide susceptibility using a Mamdani fuzzy algorithm. Computers & Geosciences 38, 23–34. Aldrich, C., Vandeventer, J.S.J., Reuter, M.A., 1994. The application of neural nets in the metallurgical industry. Minerals Engineering 7, 793–809. Aleotti, P., Chowdhury, R., 1999. Landslide hazard assessment: summary review and new perspectives. Bulletin of Engineering Geology and the Environment 58, 21–48. Arıkan, F., Ulusay, R., Aydın, N., 2007. Characterization of weathered acidic volcanic rocks and a weathering classification based on a rating system. Bulletin of Engineering Geology and the Environment 66, 415–430. Ayalew, L., Yamagishi, H., 2005. The application of GIS-based logistic regression for landslide susceptibility mapping in the Kakuda-Yahiko Mountains, Central Japan. Geomorphology 65, 15–31. Bishop, C.M., 1995. Neural Networks for Pattern Recognition. Oxford University Press, Oxford. Biswajeet, P., Saied, P., 2010. Comparison between prediction capabilities of neural network and fuzzy logic techniques for landslide susceptibility mapping. Disaster Advances 3, 26–34. Brenning, A., 2005. Spatial prediction models for landslide hazards: review, comparison and evaluation. Natural Hazards and Earth System Sciences 5, 853–862. Burden, F., Winkler, D., 2008. Bayesian regularization of neural networks. In: Livingstone, D.J. (Ed.), Artificial Neural Network: Method and Application. Humana Press, Totowa, pp. 25–44. Chung, C.J.F., Fabbri, A.G., 2003. Validation of spatial prediction models for landslide hazard mapping. Natural Hazards 30, 451–472. Chung, C.-J., Fabbri, A.G., 2008. Predicting landslides for risk analysis — spatial models tested by a cross-validation technique. Geomorphology 94, 438–452. Corsini, A., Cervi, F., Ronchetti, F., 2009. Weight of evidence and artificial neural networks for potential groundwater spring mapping: an application to the Mt. Modino area (Northern Apennines, Italy). Geomorphology 111, 79–87. Ercanoglu, M., Gokceoglu, C., 2002. Assessment of landslide susceptibility for a landslide-prone area (north of Yenice, NW Turkey) by fuzzy approach. Environmental Geology 41, 720–730. Ermini, L., Catani, F., Casagli, N., 2005. Artificial Neural Networks applied to landslide susceptibility assessment. Geomorphology 66, 327–343. Falaschi, F., Giacomelli, F., Federici, P.R., Puccinelli, A., Avanzi, G.D., Pochini, A., Ribolini, A., 2009. Logistic regression versus artificial neural networks: landslide susceptibility evaluation in a sample area of the Serchio River valley, Italy. Natural Hazards 50, 551–569. Fausett, L., 1994. Fundamentals of Neural Networks: Architectures, Algorithms, and Applications. Prentice Hall, Englewood Cliffs, NJ, USA. Fernandes, F.A.N., Lona, L.M.F., 2005. Neural network applications in polymerization processes. Brazilian Journal of Chemical Engineering Geology 22, 401–418. Foody, G.M., 1995. Using prior knowledge in artificial neural network classification with a minimal training set. International Journal of Remote Sensing 16, 301–312. Foresee, F.D., Hagan, M.T., 1997. Gauss–Newton approximation to Bayesian learning. Proceeding of the 1997 International Joint Conference on Neural Networks, Houston, City, pp. 1930–1935. Garcia-Rodriguez, M.J., Malpica, J.A., 2010. Assessment of earthquake-triggered landslide susceptibility in El Salvador based on an Artificial Neural Network model. Natural Hazards and Earth System Sciences 10, 1307–1315. Gorsevski, P.V., Foltz, R.B., Gessler, P.E., Cundy, T.W., 2001. Statistical modeling of landslide hazard using GIS. Seventh Federal Interagency Sedimentation Conference, Silver Legacy, Reno, Nevada, City, pp. 103–109. Guzzetti, F., Carrara, A., Cardinali, M., Reichenbach, P., 1999. Landslide hazard evaluation: a review of current techniques and their application in a multi-scale study, Central Italy. Geomorphology 31, 181–216. Hagan, M.T., Menhaj, M.B., 1994. Training feedforward networks with the Marquardt algorithm. IEEE Transactions on Neural Networks 5, 989–993. Haykin, S., 1998. Neural Networks: A Comprehensive Foundation, second ed. Prentice Hall, Upper Saddle River, NJ, USA. Hecht-Nielsen, R., 1987. Kolmogorov's mapping neural network existence theorem. Proceedings of the First IEEE International Conference on Neural Networks, San Diego, CA, USA, City, pp. 11–14. Hue, T.T., Duong, T.V., Toan, D.V., Nghinh, L.T., Minh, V.C., Pho, N.V., Xuan, P.T., Hoan, L.T., Huyen, N.X., Pha, P.D., Chinh, V.V., Thom, B.V., 2004. Investigation and Assessment of the Types of Geological Hazard in the Territory of Vietnam and Recommendation of Remedial Measures. Phase II: A Study of the Northern Mountainous Province of Vietnam. Institute of Geological Sciences, Vietnam Academy of Science and Technology, Hanoi.

Hush, D.R., 1989. Classification with neural networks: a performance analysis. IEEE International Conference on Systems Engineering, Dayton, Ohio, USA, City, pp. 277–280. Joseph, H.L., Huang, Y., Dickman, M., Jayawardena, A.W., 2003. Neural network modelling of coastal algal blooms. Ecological Modelling 159, 179–201. Kaastra, I., Boyd, M., 1996. Designing a neural network for forecasting financial and economic time series. Neurocomputing 10, 215–236. Kanellopoulos, I., Wilkinson, G.G., 1997. Strategies and best practice for neural network image classification. International Journal of Remote Sensing 18, 711–725. Kanungo, D.P., Arora, M.K., Sarkar, S., Gupta, R.P., 2006. A comparative study of conventional, ANN black box, fuzzy and combined neural and fuzzy weighting procedures for landslide susceptibility zonation in Darjeeling Himalayas. Engineering Geology 85, 347–366. Kawabata, D., Bandibas, J., 2009. Landslide susceptibility mapping using geological data, a DEM from ASTER images and an Artificial Neural Network (ANN). Geomorphology 113, 97–109. Lamy, D., 1996. Modeling and sensitivity analysis of neural networks. Mathematics and Computers in Simulation 40, 535–548. Lee, S., 2007a. Application and verification of fuzzy algebraic operators to landslide susceptibility mapping. Environmental Geology 52, 615–623. Lee, S., 2007b. Landslide susceptibility mapping using an artificial neural network in the Gangneung area, Korea. International Journal of Remote Sensing 28, 4763–4783. Lee, S., Dan, N.T., 2005. Probabilistic landslide susceptibility mapping on the Lai Chau province of Vietnam: focus on the relationship between tectonic fractures and landslides. Environmental Geology 48, 778–787. Lee, S., Pradhan, B., 2007. Landslide hazard mapping at Selangor, Malaysia using frequency ratio and logistic regression models. Landslides 4, 33–41. Lee, S., Sambath, T., 2006. Landslide susceptibility mapping in the Damrei Romel area, Cambodia using frequency ratio and logistic regression models. Environmental Geology 50, 847–855. Lee, S., Ryu, J.H., Lee, M.J., Won, J.S., 2003a. Use of an artificial neural network for analysis of the susceptibility to landslides at Boun, Korea. Environmental Geology 44, 820–833. Lee, S., Ryu, J.H., Min, K.D., Won, J.S., 2003b. Landslide susceptibility analysis using GIS and artificial neural network. Earth Surface Processes and Landforms 28, 1361–1376. Lee, S., Ryu, J.H., Won, J.S., Park, H.J., 2004. Determination and application of the weights for landslide susceptibility mapping using an artificial neural network. Engineering Geology 71, 289–302. Lee, S., Ryu, J.H., Lee, M.J., Won, J.S., 2006. The application of artificial neural networks to landslide susceptibility mapping at Janghung, Korea. Mathematical Geology 38, 199–220. Lee, S., Ryu, J.H., Kim, I.S., 2007. Landslide susceptibility analysis and its verification using likelihood ratio, logistic regression, and artificial neural network models: case study of Youngin, Korea. Landslides 4, 327–338. Luxhoj, J.T., 1998. An artificial neural network for nonlinear estimation of the turbine flow-meter coefficient. Engineering Applications of Artificial Intelligence 11, 723–734. MacKay, D.J.C., 1992. Bayesian interpolation. Neural Computation 4, 415–447. Mas, J.F., Flores, J.J., 2008. The application of artificial neural networks to the analysis of remotely sensed data. International Journal of Remote Sensing 29, 617–663. Melchiorre, C., Matteucci, M., Azzoni, A., Zanchi, A., 2008. Artificial neural networks and cluster analysis in landslide susceptibility zonation. Geomorphology 94, 379–400. My, N.Q., 2007. Construction of the Environmental Hazard Zonation Map for Northwest Territory of Vietnam. Vietnam Geography Assosiation, Hanoi. 98 pp. Neaupane, K.M., Achet, S.H., 2004. Use of backpropagation neural network for landslide monitoring: a case study in the higher Himalaya. Engineering Geology 74, 213–226. Oh, H.-J., Lee, S., 2011. Cross-application used to validate landslide susceptibility maps using a probabilistic model from Korea. Environmental Earth Sciences 64, 395–409. Oh, H.-J., Pradhan, B., 2011. Application of a neuro-fuzzy model to landslidesusceptibility mapping for shallow landslides in a tropical hilly area. Computers & Geosciences 37, 1264–1276. Paliwal, M., Kumar, U.A., 2009. Neural networks and statistical techniques: a review of applications. Expert Systems with Applications 36, 2–17. Palmer, A., Montano, J.J., Franconetti, F.J., 2008. Sensitivity analysis applied to artificial neural networks for forecasting time series. Methodology: European Journal of Research Methods for the Behavioral and Social Sciences 4, 80–86. Paola, J.D., Schowengerdt, R.A., 1995. A review and analysis of back propagation neural networks for classification of remotely-sensed multispectral imagery. International Journal of Remote Sensing 16, 3033–3058. Pareek, V.K., Brungs, M.P., Adesina, A.A., Sharma, R., 2002. Artificial neural network modeling of a multiphase photodegradation system. Journal of Photochemistry and Photobiology A—Chemistry 149, 139–146. Pavel, M., Fannin, R.J., Nelson, J.D., 2008. Replication of a terrain stability mapping using an Artificial Neural Network. Geomorphology 97, 356–373. Poudyal, C.P., Chang, C., Oh, H.J., Lee, S., 2010. Landslide susceptibility maps comparing frequency ratio and artificial neural networks: a case study from the Nepal Himalaya. Environmental Earth Sciences 61, 1049–1064. Pradhan, B., 2010. Application of an advanced fuzzy logic model for landslide susceptibility analysis. International Journal of Computational Intelligence Systems 3, 370–381. Pradhan, B., 2011. Manifestation of an advanced fuzzy logic model coupled with Geoinformation techniques to landslide susceptibility mapping and their comparison with logistic regression modelling. Environmental and Ecological Statistics 18, 471–493.

Author's personal copy D. Tien Bui et al. / Geomorphology 171–172 (2012) 12–29 Pradhan, B., Lee, S., 2010a. Delineation of landslide hazard areas on Penang Island, Malaysia, by using frequency ratio, logistic regression, and artificial neural network models. Environmental Earth Sciences 60, 1037–1054. Pradhan, B., Lee, S., 2010b. Landslide susceptibility assessment and factor effect analysis: backpropagation artificial neural networks and their comparison with frequency ratio and bivariate logistic regression modelling. Environmental Modelling & Software 25, 747–759. Pradhan, B., Lee, S., 2010c. Regional landslide susceptibility analysis using back-propagation neural network model at Cameron Highland, Malaysia. Landslides 7, 13–30. Pradhan, B., Lee, S., Buchroithner, M.F., 2009. Use of geospatial data for the development of fuzzy algebraic operators to landslide hazard mapping: a case study in Malaysia. Applied Geomatics 1, 3–15. Pradhan, B., Sezer, E.A., Gokceoglu, C., Buchroithner, M.F., 2010. Landslide susceptibility mapping by neuro-fuzzy approach in a landslide-prone area (Cameron Highlands, Malaysia). IEEE Transactions on Geoscience and Remote Sensing 48, 4164–4177. Sarkar, S., Kanungo, D.P., 2004. An integrated approach for landslide susceptibility mapping using remote sensing and GIS. Photogrammetric Engineering and Remote Sensing 70, 617–625. Sezer, E.A., Pradhan, B., Gokceoglu, C., 2011. Manifestation of an adaptive neuro-fuzzy model on landslide susceptibility mapping: Klang valley, Malaysia. Expert Systems with Applications 38, 8208–8219. Thach, N.N., Xuan, N.T., My, N.Q., Quynh, P.V., Minh, N.D., Hoa, D.B., Bao, D.V., Dan, N.V., Thuy, T.V., Hien, N.T., 2002. Application of Remote Sensing and Geographical Information System for Research and Forecast of Natural Hazards in Hoa Binh Province. National University Hanoi, Hanoi. 197 pp. Thinh, D.V., Dong, N.P., Hong, P.M., Hung, P.V., Khoi, T.N., Ke, T.D., Phu, D.V., Thang, P.X., Thanh, P.V., Thang, P.H., Thay, B.V., Thinh, N.T., Thien, T.V., Tu, M.T., Vinh, B.X., 2005. The Investigated Report of Natural Hazards in the Northwest of Vietnam. Northern Geological Mapping Division, Hanoi. 12 pp. Tien Bui, D., Pradhan, B., Lofman, O., Revhaug, I., Dick, O.B., 2011. Landslide susceptibility mapping at Hoa Binh province (Vietnam) using an adaptive neurofuzzy inference system and GIS. Computers & Geosciences, http://dx.doi.org/ 10.1016/j.cageo.2011.10.031. Trowsdale, A.J., Usherwood, T.W., Wadsworth, J.E.J., Patel, M., Farrugia, D.C.J., 1998. Neural networks for providing ‘on-line’ access to discretised modelling techniques. Journal of Materials Processing Technology 80–81, 475–480. Vahidnia, M.H., Alesheikh, A.A., Alimohammadi, A., Hosseinali, F., 2010. A GIS-based neuro-fuzzy procedure for integrating knowledge and data in landslide susceptibility mapping. Computers & Geosciences 36, 1101–1114.

29

Van Den Eeckhaut, M., Vanwalleghem, T., Poesen, J., Govers, G., Verstraeten, G., Vandekerckhove, L., 2006. Prediction of landslide susceptibility using rare events logistic regression: a case-study in the Flemish Ardennes (Belgium). Geomorphology 76, 392–410. Van Westen, C.J., Terlien, M.T.J., 1996. An approach towards deterministic landslide hazard analysis in GIS. A case study from Manizales (Colombia). Earth Surface Processes and Landforms 21, 853–868. Van Westen, C.J., Rengers, N., Soeters, R., 2003. Use of geomorphological information in indirect landslide susceptibility assessment. Natural Hazards 30, 399–419. Van, T.T., Tuy, P.K., Giap, N.X., Ke, T.D., Thai, T.N., Giang, N.T., Tho, H.M., Tuat, L.T., San, D.N., Hung, L.Q., Chung, H.T., Hoan, N.T., 2002. Assessment and Prediction of Geological Hazards in the 8 Coastal Provinces of Central Vietnam from Quang Binh to Phu Yen—Current Status, Causes, Prediction and Recommendation of Remedial Measures. Vietnam Institude of Geosciences and Mineral Resourses, Hanoi. 215 pp. Van, T.T., Anh, D.T., Hieu, H.H., Giap, N.X., Ke, T.D., Nam, T.D., Ngoc, D., Ngoc, D.T.Y., Thai, T.N., Thang, D.V., Tinh, N.V., Tuat, L.T., Tung, N.T., Tuy, P.K., Viet, H.A., 2006. Investigation and Assessment of the Current Status and Potential of Landslides in Some Sections of the Ho Chi Minh Road, National Road 1A and Proposed Remedial Measures to Prevent Landslides from Threat of Safety of People, Property, and Infrastructure. Vietnam Institute of Geosciences and Mineral Resources, Hanoi. 249 pp. Varnes, D.J., 1984. Landslide Hazard Zonation: A Review of Principles and Practice. UNESCO, Paris. Vergari, F., Seta, M.D., Monte, M.D., Fredi, P., Palmieri, E.L., 2011. Landslide susceptibility assessment in the Upper Orcia Valley (Southern Tuscany, Italy) through conditional analysis: a contribution to the unbiased selection of causal factors. Natural Hazards and Earth System Sciences 11, 1475–1497. Wilamowski, B.M., Chen, Y., Malinowski, A., 1999. Efficient algorithm for training neural networks with one hidden layer. International Joint Conference on Neural Networks, Washington, DC, USA City, pp. 1725–1728. Yesilnacar, E., Topal, T., 2005. Landslide susceptibility mapping: a comparison of logistic regression and neural networks methods in a medium scale study, Hendek region (Turkey). Engineering Geology 79, 251–266. Yilmaz, I., 2009. Landslide susceptibility mapping using frequency ratio, logistic regression, artificial neural networks and their comparison: a case study from Kat landslides (Tokat-Turkey). Computers & Geosciences 35, 1125–1138.

Paper V Tien Bui, D., Pradhan, B., Lofman, O., Revhaug, I., Dick, O.B., 2012. Application of support vector machines in landslide susceptibility assessment for the Hoa Binh province (Vietnam) with kernel functions analysis. Proceedings of the iEMSs Fourth Biennial Meeting: International Congress on Environmental Modelling and Software (iEMSs 2012). International Environmental Modelling and Software Society, Leipzig, Germany, July 2012.

Managing Resources of a Limited Planet Table of Content of the Proceedings of the sixth biannial meeting of the International Environmental Modelling and Software Society Leipzig, Germany July 1-5, 2012

Editors Ralf Seppelt Helmholtz-Centre for Environmental Research - UFZ, Leipzig, Germany Alexey A. Voinov ITC, Faculty of Geo-Information Science and Earth Observation of the University of Twente, Enschede, The Netherlands Susanne Lange F+U confirm, Leipzig, Germany Dagmar Bankamp Helmholtz-Centre for Environmental Research - UFZ, Leipzig, Germany

How to cite: R. Seppelt, A.A. Voinov, S. Lange, D. Bankamp (Eds.) (2012): International Environmental Modelling and Software Society (iEMSs) 2012 International Congress on Environmental Modelling and Software Managing Resources of a Limited Planet: Pathways and Visions under Uncertainty, Sixth Biennial Meeting, Leipzig, Germany http://www.iemss.org/society/index.php/iemss-2012-proceedings ISBN: 978-88-9035-742-8

Impressum The copyright of all papers is an exclusive right of the authors. No work can be reproduced without written permission of the authors. Each paper has been peer reviewed by at least two independent reviewers. ISBN: 978-88-9035-742-8 Published by the International Environmental Modelling and Software Society (iEMSs) President: Andrea-Emilio Rizzoli Address: iEMSs Secretariat, c/o IDSIA, Galleria 2, CH - 6928 Manno Contact: [email protected] Website: http://www.iemss.org Online Publication: http://www.iemss.org/society/index.php/iemss-2012-proceedings Date of Publication: December 6, 2012

ii

International Environmental Modelling and Software Society (iEMSs) 2012 International Congress on Environmental Modelling and Software Managing Resources of a Limited Planet, Sixth Biennial Meeting, Leipzig, Germany R. Seppelt, A.A. Voinov, S. Lange, D. Bankamp (Eds.) http://www.iemss.org/society/index.php/iemss-2012-proceedings

Application of support vector machines in landslide susceptibility assessment for the Hoa Binh province (Vietnam) with kernel functions analysis Dieu Tien Buia,b,*, Biswajeet Pradhanc, Owe Lofmana, Inge Revhauga, Oystein B Dicka a

Department of Mathematical Sciences and Technology, Norwegian University of Life Sciences, P.O. Box 5003 IMT, N-1432, Aas, Norway b Faculty of Surveying and Mapping, Hanoi University of Mining and Geology, Dong Ngac, Tu Liem, Hanoi, Vietnam. c Institute of Advanced Technology, Spatial and Numerical Modelling Laboratory, University Putra Malaysia, Serdang, Selangor Darul Ehsan 43400, Malaysia. * [email protected];[email protected] Abstract: The main objective of this study is to investigate the potential application of support vector machines (SVM) with kernel functions analysis for spatial prediction of landslides in the Hoa Binh province, Vietnam. A landslide inventory map that accounts for landslides that occurred during the last ten years was constructed using data from various sources. The landslide inventory was randomly divided into a training dataset 70% for building the models and the remaining 30% for the validation of the models. Ten landslide conditioning factors, such as slope angle, aspect, relief amplitude, lithology, soil type, landuse, distance to roads, distance to rivers, distance to faults and rainfall were prepared. During the model building process, four different SVM kernel functions (linear, polynomial, radial basic function, and sigmoid) were employed and four landslide susceptibility maps were constructed. Using the prediction rate method, the validation was performed by using landslide locations, which were not utilized during the model building. The validation results showed that the area under the curve (AUC) for landslide susceptibility maps produced by the SVM linear function, SVM polynomial function, SVM radial basic function, and SVM sigmoid function are 0.956, 0.956, 0.952, and 0.945 respectively. It indicates that the four landslide models seem to have performed well. Compared with the logistic regression (AUC =0.938) and Bayesian neural network model (AUC 0.903), the accuracy of the SVM landslide models in this study (using radial basic function and polynomial function) are slightly better. The result shows that SVM is a powerful tool for landslide susceptibility mapping at a regional scale. These maps can be very useful for natural hazards assessment and for land use planning. Keywords: Landslide susceptibility; Support vector machines; Remote sensing; GIS; Hoa Binh province; Vietnam 1. Introduction Landslides are considered as one of the most common recurring natural hazards in Vietnam that have caused large loss of lives and property in recent years (Tien Bui et al., 2011b) . Landslides mainly occurred during heavy rainfall, especially in the tropical rain storms. Landslide susceptibility map preparation is considered as the

382

D. Tien Bui et al. / Application of support vector machines in landslide susceptibility assessment for Hoa Binh province (Vietnam) with kernel functions analysis

first important step for landslide hazard mitigation and management. Due to the complex nature of landslides, a reliable spatial prediction of landslide hazards are not easy (Ercanoglu et al., 2004).Therefore, various techniques and methods have been proposed and a review of these methods can be seen in the literature (Guzzetti et al., 1999). In recent years, support vector machines (SVM) approaches have been employed for landslide studies with results considered to outperform conventional methods (Ballabio et al., 2012). However, the performance of the SVM is heavily influenced by the selection of kernel functions. From the literature review, it seems that the effects of the kernel functions on landslide susceptibility models have been less analyzed. The main objective of this study is to investigate the potential application of SVM with kernel functions analysis for spatial prediction of landslides in the Hoa Binh province (Vietnam). In addition, the SVM landslide models were compared with models estimated from logistic regression and Bayesian neural network for the study area. 2. Study area and data The study area is the mountainous Hoa Binh province (Fig.1) in the North West part of Vietnam. Its area is around 4,660 km2, situated between longitudes 104°48'E and 105°50'E, and latitudes 20°17'N and 21°08'N. The elevation is in the range from 0 to 1,510 meter descending from Northwest to Southeast. More than 38 geologic formations outcropped in the province. The main characteristics are limestone, conglomerate, aphyric basalt, sandstone, silty sandstone, and black clay shale. The rainy season is normally from May to October with a total rainfall that accounts for 84-90% of the total yearly rainfall. The high frequency and intensity of the rain especially during tropical rainstorms is considered to be the most important landslide triggering factor.

Figure 1. Landslide inventory of the study area. Landslides that have occurred in the past and present are keys to the spatial prediction of landslide hazard in the future (Guzzetti et al., 1999). The landslide inventory map is therefore, the first step in landslide modeling. In this study, we used the landslide inventory map (Fig.1) prepared by Tien Bui et al. (2011a) to analyze the relationships between landslide occurrence and landslide conditioning factors. The landslide inventory map included 118 landslides depicted as polygons. The size of the smallest landslide is about 380 m2, the largest is 14,340 m2, and the average is 3,440 m2. A total of ten landslide conditioning factors were selected for this study: slope angle, aspect, relief amplitude, lithology, soil type, land use, distance to roads, distance to rivers, distance to faults and rainfall. This selection is based on the spatial relationship between landslide occurrence and landslide conditioning factors carried out by Tien Bui et al.(2011a). The classes in detail for the ten landslide conditioning factors are shown in table 1. A digital elevation model (DEM) with a spatial resolution of 20x20 m was generated using national topographic maps (scale of 1:25,000). Based on the DEM, three

383


derivative factors (slope angle, aspect and relief amplitude) were extracted. The lithology and distance-to-faults maps were constructed based on the geological and mineral resources map of Vietnam (scale of 1:200,000). The land use map (scale of 1:50,000) was compiled from the national status land use database. The soil type map (scale of 1:100,000) was compiled from the National Pedology Map. The distance-to-roads and the distance-to-rivers maps were constructed by buffering the road and river network that undercut slopes. The road and river network was extracted from the national topographic map in a scale of 1:50,000. The rainfall map was constructed using the value of maximum rainfall of eight days (seven rainfall day plus last day of rainfall larger than 100 mm) for the period from 1990 to 2010, using the inverse distance weighed method. Table 1. Landslide conditioning factors and their classes for this study. Landslide conditioning Class factors Slope angle (o) (1)0–10; (2)10–20; (3)20–30; (4)30–40; (5)40–50;(6)> 50 Aspect (1)Flat; (2)N; (3)NE; (4)E; (5)SE; (6)S; (7)SW; (8)W; (9)NW Relief amplitude (m) (1)0–50; (2)50–100; (3)100–150; (4)150–200; (5) 200– 250; (6)250–532. Lithology (1)Group 1; (2)Group 2; (3)Group 3; (4)Group 4; (5)Group 5; (6)Group 6; (7) Group 7 Land use (1)Populated area; (2)Orchard land; (3)Paddy land; (4)Protective forest land; (5)Natural forest land; (6)Productive forest land; (7)Water; (8)Annual crop land; (9)Non tree rocky mountain; (10)Barren land; (11)Specially used forest land; (12)Grass land Soil type (1)Eutric Fluvisols; (2)Degraded soil; (3)Limestone Mountain; (4)Ferralic Acrisols; (5)Rhodic Ferralsols; (6)Humic Acrisols; (7)Dystric Fluvisols; (8)Dystric Gleysols; (9)Luvisols; (10)Humic Ferralsols; (11)Populated Area; (12)Water, (13)Gley Fluvisols. Rainfall (mm) (1)362–470; (2) 470–540; (3) 540– 610; (4) 610–950 Distance to roads (m) (1)0–40; (2)40–80; (3) 80–120; (4) >120 Distance to rivers (m) (1)0–40; (2)40–80; (3) 80–120; (4) >120 Distance to faults (m) (1)0–200; (2)200–400; (3) 400–700; (4)700–1,000; (5) > 1,000

3. Landslide susceptibility mapping using support vector machines 3.1 Support vector machines Support vector machines (SVM) is a supervised learning algorithm that is based on statistical learning theory (Vapnik, 1998). Given a training dataset that contains a set of landslide conditioning factors as inputs and landslide locations as output values, the goal of the SVM training algorithm is to find an optimal hyper-plane that separate the dataset into two classes one with landslides and one with nolandslides. The process of maximizing the separation will result into two parallel hyper-planes known as boundary planes. The distance between them is called the margin and the observations lying near the boundary planes are called the support vectors (Vapnik, 1998). n Assume we have a training dataset ( X i ,y i ) with Xi  R , yi 1,-1 . X i represents an input vector of ten landslide conditioning factors. The two classes 1,-1 denote landslide and no-landslide. The optimal separating hyper-plane decision function w and b can be obtained by solving the following optimization function:

384


1 T l W W + C  i=1 ξ i 2

Minimize : W,b,ξ

(1)

Subject to yi (wT ( Xi ) + b)  1- ξi

(2)

where w is a coefficient vector that determines the orientation of the hyper-plane, b is the offset of the hyper plane from the origin,  is the positive slack variables that allows for penalized constraint violation. C is the penalty parameter that controls the trade-off between the maximum margin and the minimum error. Using Lagrange multiplier ( α i ), the dual is: 1 l l  i  j y i y j (X i )(X j ) 2 i=1 j =1

l

Maximize :   i i=1

Subject to

l

 y i

j

(3)

= 0 and 0   i  C

(4)

i=1

The decision function can be written as: l

f(x) = sign(  y i  i K(X i , X j )+ b)

(5)

i=1

where K(X i , X j ) =  (X i )T  (X j ) is the kernel function. Table 2. Kernel functions and their parameters used in this study. Kernel Formula Kernel parameters Linear kernel function (LN)

K(X i , X j ) = X i T X j

Radial basis function (RBF)

K(X i , X j ) = exp(- xi - x j

Polynomial function (PL)

K(Xi , X j )= (  XiT X j +1)d

2

)

T d Sigmoid kernel function (SIG) K(Xi , X j )=Tanh(  Xi X j +1)

3.2



,d 

Performance assessment of the landside susceptibility models

Using several statistical evaluation criteria such as true positive (TP), false positive (FP), true negative (TN), false negative (FN). The overall accuracy of the trained landside susceptibility model is calculated as (TP+TN)/N, with N as the total number of training pixels. The reliability of the landslide susceptibility model is estimated using Cohen’s Kappa index (κ) (Guzzetti et al., 2006) as follows. P -P   obs exp (6) 1 - Pexp Where Pobs = (TP+TN)/ the proportion of pixels that is correctly classified as landslide or non-landslide. Pexp = ((TP+FN)(TP+FP)+(FP+TN)(FN+TN))/Sqr(N) is the proportion of pixels for which the agreement is expected by chance. According to Landis and Koch (1977), the strength of agreement between the model and the reality is as follows: ≤0 (poor); 0-0.2 (slight ); 0.2-0.4 (fair); 0.4-0.6 (moderate); 0.6-0.8 (substantial); 0.8-1 (almost perfect). 3.3

Preparation of training and validation dataset

In this study, the ten landslide conditioning factor maps were converted into a pixel format with a spatial resolution of 20×20 m. In each map, the frequency ratio value for each individual attribute class was calculated. Each attribute class was then assigned a sequence number based on the ratio value. In the next step, the MaxMin normalization procedure was carried out to rescale in the range 0.1 to 0.9 using Eq(7): v - Min(v) (7) v = (U - L)+ L Max(v) - Min(v)

385


where v  is the normalized data matrix; v is the original data matrix; U and L are the upper and lower normalization boundaries. The landslide inventory map with 118 landslide polygons was randomly split into two parts: Part1 with 70% of the data (82 landslides with 684 landslide grid cells) used in the training phase of the landslide models. Part-2 is a validation dataset with 30% of the data (36 landslides with 315 landslide grid cells). A total of 684 landslide pixels in the part1 were assigned the value of 1, and the same amount of no-landslide pixels was randomly generated from the landslide-free area and assigned the value -1. Finally, an extracting process was carried out to extract the value of ten landslide conditioning factors to build a training dataset. This dataset contains a total of 1368 observations, and ten input variables, one target variable (landslide, no-landslide). 3.4

Training models and generation of landslide susceptibility maps

The performance of the SVM model is depended on the choice of kernel functions and their parameters. Table 2 shows SVM Kernel functions and their parameters used in this study. C is the regularization parameter,  is the kernel width, and d the degree of the polynomial kernel. If the value of C is large, it will lead to few training errors. In contrast, a small value of C will generate a larger margin and increase the number of training errors. Parameters  and d controls the degree of nonlinearity and degree of the polynomial kernel respectively. In this study the grid-search method and 5-fold cross-validation were selected to be used to find the best kernel parameters. The training dataset was randomly split into 5 equal sized subsets. The merged four subsets were used to train models whereas the remaining subset was used as a test set. The cross-validation process was repeated five times for each of the five subsets. A grid space was set with C =2-5, 2-4,…, 210; γ = 210, 29, …,2-4; d =1, ...,8. Table 3 shows overall accuracy and Cohen’s kappa index of the trained landslide models. The best value of C for LN-SVM is 4 with the overall accuracy 87.8%. The best C and γ for RBFSVM are found as 8 and 0.25 respectively with the overall accuracy 91.1%. In the case of PL-SVM, the best C,γ,and d are 1, 0.3536, 3 respectively, with the overall accuracy 91.1%. Cohen’s Kappa indexes are 0.756, 0.822, 0.823 and 0.727 for the four landslide models (Table 3). The Kappa values indicate that the strength agreement between the observed and the predicted values is substantial for LN-SVM and SIG-SVM. Whereas it is almost perfect for RBF-SVM, PL-SVM Table 3. Overall accuracy and Cohen’s Kappa index for the four SVM models. No Parameters RBF-SVM PL-SVM LN-SVM SIG-SVM 1 Overall accuracy (%) 91.08 91.15 87.79 86.33 2 Cohen’s Kappa index 0.822 0.823 0.756 0.727 Once the landslide susceptibility models were successfully trained in the training phase, they were then used to calculate the landslide susceptibility indexes (LSI) for all the study pixels. The results were then converted into a GIS. 4. Validation and comparison of landslide susceptibility maps 4.1

Success-rate and prediction-rate curves

The four landslide susceptibility maps were validated by means of the success-rate and prediction-rate curves (Chung et al., 2003; Guzzetti et al., 2006). The successrate results were obtained by comparing the four landslide susceptibility maps with the landslide pixels in the training dataset (Fig. 2a). And then areas under the success-rate curves (AUC) were estimated. The result show that RBF-SVM and PL-SVM have the highest area under the curve (AUC) values 0.961 and 0.957 respectively. They are followed by LN-SVM (0.940) and SIG-SVM (0.932). The success-rate measures the goodness of fit for the landslide models to the data.

386


The AUC results indicate that the capacity of correctly classifying the areas with existing landslides is highest for RBF-SVM, followed by the PL-SVM, LN-SVM, SIG-SVM.

Figure 2. (a) Success rate curves of the four SVM models; (b) Prediction rate curves of the four SVM models and the Bayesian regularized neural networks and the logistic regression. The success rate is not a suitable measure for the prediction capability of the landslide models because it is based on the landslide pixels that have already been used for building the model. The prediction rate may be used to estimate the prediction capability. In this study, the prediction-rate curves and area under the curves were obtained (Fig. 2b) by comparing the four susceptibility maps with the landslide pixels in the validation dataset. The results show that the highest prediction capability is for RBF-SVM and PL-SVM with AUC values of 0.955 and 0.956 respectively, followed by LN-SVM (0.952) and SIG-SVM (0.945). Compared with the results from the logistic regression (0.938), and Bayesian regularized neural networks (0.903), the prediction capability of the two RBF-SVM and PLSVM models seems to be slightly better. 3.2. Reclassification of landslide susceptibility indexes and relative importance assessment of landslide conditioning factors The landslide susceptibility indexes were reclassified into 5 classed based on the percentage of area (Pradhan et al., 2010) high (10%), moderate (10%), low (20%), very low (20%), and no (40%) (Fig. 3a). Landslide density analysis (Sarkar et al., 2008) was performed on the five landslide susceptibility classes. The results show that the landslide density gradually increases from the no to the high susceptibility class (Fig. 3b). The four landslide susceptibility maps are shown in Fig. 4.

Figure 3. Cumulative percentage of landslides vs. percentage of landslide susceptibility map (a); landslide density plots of five landslide susceptibility classes (b). The importance of a certain factor was estimated by excluding the factor and then calculated the overall accuracy of the model (Table 4). It could be observed that the highest accuracy was obtained when all of the ten factors are used, with LNSVM, RBF-SVM, PL-SVM. However, for the case of SIG-SVM, the soil type factor might have caused slightly noise by reducing the model accuracy 0.2%. Distance to roads, rainfall, distance to rivers, land use and slope angle are the most importance factors for LN-SVM. In the case of RBF-SVM, the most importance factors are distance to roads, soil type, slope angle, land use and distance to rivers. Whereas distance to roads, land use, distance to rivers, slope angle and soil type are most important for PL-SVM. And distances to roads, land use, distance to rivers, slope angle and distance to faults are most important for SIG-SVM. It could

387


be observed that LN-SVM includes both rainfall and land use as important repressors. This may be helpful for a possible scenario analysis, including future climate and land use scenarios. Table 4. Accuracy of the trained SVM models for landslide susceptibility using all conditioning factors and without one of the factors. Overal accuracy (%) No Conditioning factors LN-SVM RBF-SVM PL-SVM SIG-SVM 1 Minus slope angle 87.4 89.4 89.3 84.6 2 Minus lithology 87.6 90.6 90.3 85 3 Minus rainfall 86.2 90.4 90.0 85.7 4 Minus landuse 86.9 89.5 89.0 82.9 5 Minus soil type 87.6 89.3 89.5 86.5 6 Minus aspect 87.7 90.5 90.7 85.5 7 Minus distance to roads 80.4 82.9 83.2 76.8 8 Minus distance to rivers 86.8 90.1 89.1 83.1 9 Minus distance to faults 87.8 90.5 90.1 84.9 10 Minus relief amplitude 80.3 90.4 90.8 85.7 All 87.8 91.1 91.1 86.3

Figure 4. Landslide susceptibility zonation maps: (a) LN-SVM; (b) RBF-SVM; (c)PL-SVM; (d) SIG-SVM. 5. Concluding remarks In this paper, we investigated the potential application of support vector machines for landslide susceptibility assessment at the Hoa Binh province (Vietnam). Ten landslide conditioning factors (slope angle, aspect, relief amplitude, lithology, soil type, land use, distance to roads, distance to rivers, distance to faults and rainfall) were used in this analysis. The landslide inventory with 118 landslide-polygons that occurred during the last ten years was used. 70% of the landslide inventory was

388


used for building susceptibility models, whereas the remaining 30% was used for validating and assessing the prediction capability of the models. Four kernel functions were included in the analysis, linear function, radial basis function, polynomial function, and sigmoid function. Four landslide susceptibility maps were constructed. Using the success-rate and the prediction-rate methods, the landslide susceptibility maps were validated and compared. The largest area under the success-rate curve (AUC) is for the RBF-SVM (0.961), followed by PLSVM (0.956), LN-SVM (0.940), and SIG-SVM (0.932). It indicates that RBF-SVM and PL-SVM have a better goodness of fit to the training data. The highest area under the prediction-rate curve (AUC) is for RBF-SVM (0.954) and PL-SVM (0.955), followed by LN-SVM (0.952), SIG-SVM (0.945). Compared to logistic regression (AUC =0.938) and Bayesian regularized neural networks (AUC =0.903), the prediction capability of RBF-SVM and PL-SVM performed slightly better. The reliability of the four susceptibility models was assessed using the Cohen’s Kappa index (κ). κ values are of 0.822, 0.823 for RBF-SVM, PL-SVM respectively, indicating almost perfect agreement. Whereas κ values for LN-SVM, SIG-SVM are of 0.756, and 0.722 indicating that the strength of agreement between the observed and predicted values are substantial. Based on the aforementioned results, we conclude that RBF-SVM, PL-SVM models have almost equal accuracies and they may be somewhat better than logistic regression and Bayesian regularized neural networks. As a final conclusion, the results show that SVM is a powerful tool for landslide susceptibility mapping at medium scale. These maps can be very useful for natural hazards assessment and for land use planning. ACKNOWLEDGMENTS This research was funded by the Norwegian Quota scholarship program. The data analysis and write-up were carried out as a part of the first author’s PhD studies at the Geomatics Section, Department of Mathematical Sciences and Technology, Norwegian University of Life Sciences, Norway. REFERENCES Ballabio, C., Sterlacchini, S., 2012. Support Vector Machines for Landslide Susceptibility Mapping: The Staffora River Basin Case Study, Italy. Mathematical Geosciences 44(1) 47-70. Chung, C.J.F., Fabbri, A.G., 2003. Validation of spatial prediction models for landslide hazard mapping. Natural Hazards 30(3) 451-472. Ercanoglu, M., Gokceoglu, C., 2004. Use of fuzzy relations to produce landslide susceptibility map of a landslide prone area (West Black Sea Region, Turkey). Engineering Geology 75(3-4) 229-250. Guzzetti, F., Carrara, A., Cardinali, M., Reichenbach, P., 1999. Landslide hazard evaluation: a review of current techniques and their application in a multi-scale study, Central Italy. Geomorphology 31(1-4) 181-216. Guzzetti, F., Reichenbach, P., Ardizzone, F., Cardinali, M., Galli, M., 2006. Estimating the quality of landslide susceptibility models. Geomorphology 81(1-2) 166-184. Landis, J.R., Koch, G.G., 1977. The measurement of observer agreement for categorical data. Biometrics 33 159–174. Pradhan, B., Lee, S., 2010. Regional landslide susceptibility analysis using backpropagation neural network model at Cameron Highland, Malaysia. Landslides 7(1) 13-30. Sarkar, S., Kanungo, D., Patra, A., Kumar, P., 2008. GIS based spatial data analysis for landslide susceptibility mapping. Journal of Mountain Science 5(1) 52-62. Tien Bui, D., Lofman, O., Revhaug, I., Dick, O., 2011a. Landslide susceptibility analysis in the Hoa Binh province of Vietnam using statistical index and logistic regression. Natural Hazards 59 1413–1444. Tien Bui, D., Pradhan, B., Lofman, O., Revhaug, I., Dick, O.B., 2011b. Landslide susceptibility mapping at Hoa Binh province (Vietnam) using an adaptive neurofuzzy inference system and GIS. Computers & Geosciences. Doi 10.1016/j.cageo.2011.10.031. Vapnik, V.N., 1998. Statistical Learning Theory. Wiley-Interscience

389

Paper VI Tien Bui, D., Pradhan, B., Lofman, O., Revhaug, I., 2012. Landslide susceptibility assessment in Vietnam using Support vector machines, Decision tree and Naïve Bayes models. Mathematical Problems in Engineering. Doi:10.1155/2012/974638, 2012.

Hindawi Publishing Corporation Mathematical Problems in Engineering Volume 2012, Article ID 974638, 26 pages doi:10.1155/2012/974638

Research Article Landslide Susceptibility Assessment in Vietnam Using Support Vector Machines, Decision Tree, and Na¨ıve Bayes Models Dieu Tien Bui,1, 2 Biswajeet Pradhan,3 Owe Lofman,1 and Inge Revhaug1 1

Department of Mathematical Sciences and Technology, Norwegian University of Life Sciences, P.O. Box 5003IMT, 1432 Aas, Norway 2 Faculty of Surveying and Mapping, Hanoi University of Mining and Geology, Dong Ngac, Tu Liem, Hanoi, Vietnam 3 Department of Civil Engineering, Spatial and Numerical Modelling Research Group, Faculty of Engineering, Universiti Putra Malaysia, Selangor 43400 Serdang, Malaysia Correspondence should be addressed to Dieu Tien Bui, [email protected]; bui-tien.dieu@ umb.no Received 1 April 2012; Accepted 24 April 2012 Academic Editor: Wei-Chiang Hong Copyright q 2012 Dieu Tien Bui et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The objective of this study is to investigate and compare the results of three data mining approaches, the support vector machines SVM, decision tree DT, and Na¨ıve Bayes NB models for spatial prediction of landslide hazards in the Hoa Binh province Vietnam. First, a landslide inventory map showing the locations of 118 landslides was constructed from various sources. The landslide inventory was then randomly partitioned into 70% for training the models and 30% for the model validation. Second, ten landslide conditioning factors were selected i.e., slope angle, slope aspect, relief amplitude, lithology, soil type, land use, distance to roads, distance to rivers, distance to faults, and rainfall. Using these factors, landslide susceptibility indexes were calculated using SVM, DT, and NB models. Finally, landslide locations that were not used in the training phase were used to validate and compare the landslide susceptibility maps. The validation results show that the models derived using SVM have the highest prediction capability. The model derived using DT has the lowest prediction capability. Compared to the logistic regression model, the prediction capability of the SVM models is slightly better. The prediction capability of the DT and NB models is lower.

2

Mathematical Problems in Engineering

1. Introduction Vietnam is identified as a country that is particularly vulnerable to some of the worst manifestations of climate change such as sea level rise, flooding, and landslides. In the recent years, together with flooding, landslides have occurred widespread and recurrent in the northwest mountainous areas of Vietnam and have caused substantial economic losses and property damages. Landslides usually occurred during heavy rainfalls in the rainy season from May to October every year. In particular, in the Hoa Binh province during the rainy season of 2006 and 2007, large landslides occurred frequently due to heavy rainfalls. Most of these landslides occurred on cut slopes and alongside roads in mountainous areas. Landslide disaster can be reduced by understanding the mechanism, prediction, hazard assessment, early warning, and risk management 1. Therefore, studies on landslides and determining measures to mitigate losses are an urgent task. However, the study on landslides in Vietnam is still limited except a few case studies 2–5. Through scientific analyses of these landslides, we can assess and predict landslide prone areas, offering potential measures to decrease landslide damages 6, 7. Spatial prediction of landslide hazard map preparation is considered the first important step for landslide hazard mitigation and management 8. The spatial probability of landslide hazards can be expressed as the probability of spatial occurrence of slope failures with a set of geoenvironmental conditions 9. However, due to the complex nature of landslides, producing a reliable spatial prediction of landslide hazard is not easy. For this reason, various approaches have been proposed in the literature. Review of these approaches has been carried out by Guzzetti et al. 10, Wang et al. 11, and Chacon ´ et al. 12. In the recent years, some soft computing approaches have been applied for landslide hazard evaluation including fuzzy logic 7, 13–20, neuro-fuzzy 3, 15, 21, 22, and artificial neural networks 6, 23–29. In general, the quality of landslide susceptibility models is affected by the methods used 30. For this reason, comparison of those methods with the conventional methods has been carried out using different datasets. Some researchers found that soft computing methods outperform the conventional methods 31–35; however, other authors find no differences in overall predictive performance 36. In general, soft computing approaches give rise qualitatively and quantitatively on the maps of the landslide hazard areas and the spatial results are appealing 37. In more recent years, data mining approaches have been considered used for landslide studies such as SVM, DT, and NB 38, 39. They belong to the top 10 data mining algorithms identified by the IEEE 40. In the case of SVM, the main advantage of this method is that it can use large input data with fast learning capacity. This method is well-suited to nonlinear high-dimensional data modeling problems and provides promising perspectives in the landslide susceptibility mapping 41. Micheletti et al. 42 stated that SVM methods can be used for landslide studies because of their ability in dealing with high-dimensional spaces effectively and with a high classification performance. In the case of DT, according to Yeon et al. 43 the probability of observations that belong to the landslide class can be used to estimate indexes of susceptibility. Saito et al. 44 used a decision tree model for landslide susceptibility mapping in the Akaishi Mountains Japan and stated that the decision tree model has appropriate accuracy for estimating the probabilities of future landslides. Nefeslioglu et al. 45 applied a DT in the metropolitan area of Istanbul Turkey with a good prediction accuracy of the landslide model. Yeon et al. 43 concluded that DT can be used efficiently for landslide susceptibility mapping. In the case of NB, although the method has been successfully applied in many domains 46; however, the application in landslide


3

susceptibility assessment may still be limited. NB is a popular and fast supervised learning algorithm for data mining applications based on the Bayes theorem. The main advantage of NB is that it can process a large number of variables, both discrete and continuous 47. NB is suitable for large-scale prediction of complex and incomplete data 48. The main potential drawback of this method is that it requires independence of attributes. However, this method is considered to be relatively robust 49. The main objective of this study is to investigate and compare the results of three data mining approaches, that is, SVM, DT, and NB, to spatial prediction of landslide hazards for the Hoa Binh province Vietnam. The main difference between this study and the aforementioned works is that SVM with two kernel functions radial basis and polynomial kernels and NB were applied for landslide susceptibility modeling. To assess these methods, the susceptibility maps obtained from the three data mining approaches were compared to those obtained by the logistic regression model reported by the same authors 2. The computation process was carried out using MATLAB 7.11 and LIBSVM 50 for SVM and WEKA ver. 3.6.6 The University of Waikato, 2011 for DT and NB.

2. Study Area and Data Used 2.1. Study Area

Hoa Binh has an area of about 4,660 km2 and is located between the longitudes 104◦ 48 E and 105◦ 50 E and the latitudes 20◦ 17 N and 21◦ 08 N in the northwest mountainous area of Vietnam Figure 1. The province is hilly with elevations ranging between 0 and 1,510 m, with an average value of 315 m and standard deviation of 271.5 m. The terrain gradient computed from a digital elevation model DEM with a spatial resolution of 20 × 20 m is in the range from 0◦ to 60◦ , with a mean value of 13.8◦ and a standard deviation of 10.4◦ . There are more than 38 geologic formations that have cropped out in the province Figure 2. Six geological formations, Dong Giao, Tan Lac, Vien Nam, Song Boi, Suoi Bang, and Ben Khe, cover about 72.8% of the total area. The main lithologies are limestone, conglomerate, aphyric basalt, sandstone, silty sandstone, and black clay shale. The ages of rocks vary from the Paleozoic to Cenozoic with different physical properties and chemical composition. Five major fracture zones pass through the province causing rock mass weakness: Hoa Binh, Da Bac, Muong La-Cho Bo, Son La-Bim Son, and Song Da. The soil types are mainly ferralic acrisols, humic acrisols, rhodic ferralsols, and eutric fluvisols that account for 80% of the total study area. Land use is comprised of approximately 7.5% populated areas, 14.5% agricultural land, 52.6% forest land, 21% barren land and nontree rocky mountain, 0.4% grassland, and 4% water surface. In the study area, there are heavy rainfalls with high intensity, especially during tropical rainstorms, and with an average annual precipitation varying from 1353 to 1857 mm data shown for the period 1973–2002. The precipitation is most abundant during May to October with a rainfall that accounts for 84–90% annual precipitation. Rainfall usually peaks in the months of August and September with the average around 300 to 400 mm per month. The climate has a typical characteristic for the monsoonal region with a high humidity, being hot, and rainy. January is usually the coldest month with an average temperature of 14.9◦ C whereas the warmest month is July with an average temperature of 26.7◦ C. Landslides occurred mostly in the rainy season when heavy rains exceeded 100 mm per day and continued for three days. Landslides also occurred when rainfall continued for

4

Mathematical Problems in Engineering 105◦ 0′ E

105◦ 40′ E

105◦ 20′ E

Phu Tho W

21◦ 0′ N

China

N

Hanoi

Hanoi city

E S

Laos

21◦ 0′ N

Ky Son

Paracel Islands

Da Bac Thailand

Vietnam

Son La Cambodia

The Hoa Binh lake

Luong Son Spratly Islands

Hoa Binh city

Mai Chau

Tan Lac

20◦ 40′ N

0 Cao Phong

10

20

20◦ 40′ N

(Kilometers)

Kim Boi

Ha Nam

Thanh Hoa

Lac Thuy Lac Son Yen Thuy Ninh Binh

20◦ 20′ N 105◦ 0′ E

105◦ 20′ E

Landslide positions Landslides used for validating models Landslides used for building models

20◦ 20′ N

105◦ 40′ E

Road

Figure 1: Landslide inventory map of the study area.

five to seven days with rainfall larger than 100 mm for the last day. For example, landslides occurred in the Doc Cun and Doi Thai areas on September 2000 when the 7 days accumulated rainfalls were 308 and 383 mm, respectively. Many landslides occurred on 5 October 2007, in the Thung Khe, Toan Son, Phuc San, Tan Mai, Doc Cun, and surrounding areas with 3 days of accumulated rainfalls amounting from 334 to 529 mm.

2.2. Data Landslides are assumed to occur in the future under the same conditions as for the past and current landslides 10. Therefore, a landslide inventory map has been considered to be the most important factor for prediction of future landslides. The landslide inventory map portrays the spatial distribution of a single landslide event a single trigger or multiple landslide events over time historical 51. For the study area, the landslide inventory map Figure 1 constructed by Tien Bui et al. 2 was used to analyze the relationships between landslide occurrence and landslide conditioning factors. The map shows 118 landslides that occurred during the last ten years, including 97 landslide polygons and 21 rock fall locations. The size of the largest landslide is 3,440 m2 , the smallest is 380 m2 , and the average landslide size is 3,440 m2 .


5

Based on previous research carried out by Tien Bui et al. 2, ten landslide conditioning factors are selected to build landslide models and to predict spatial distribution of the landslides in this study. They are slope angle, slope aspect, relief amplitude, lithology, soil type, land use, distance to roads, distance to rivers, distance to faults, and rainfall. The slope angle, slope aspect, and relief amplitude were extracted from a DEM that was generated from national topographic maps at the scale of 1 : 25,000. The slope angle map with 6 categories was constructed Figure 3a. The slope aspect map with nine layer classes was constructed: flat, north, northeast, east, southeast, south, southwest, west, and northwest. The relief amplitude that presents the maximum difference in height per unit area 52 was constructed with 6 categories: 0–50 m, 50–100 m, 100–150 m, 150–200 m, 200–250 m, and 250– 532 m. For the construction of the relief amplitude map, different sizes of the unit area were tested to choose a best one 20 × 20 pixels using the focal statistic module in the ArcGIS 10 software. The lithology and faults were extracted from four tiles of the Geological and Mineral Resources Map of Vietnam at the scale of 1 : 200,000. This is the only geological map available for the study area. The lithology map Figure 3b was constructed with seven groups based on clay composition, degree of weathering, estimated strength, and density 53, 54. The distance-to-faults map was constructed by buffering the fault lines with 5 categories as: 0– 200 m, 200–400 m, 400–700 m, 700–1,000 m, and >1,000 m. The soil type map Figure 3c was constructed with 13 categories. The land-use map Figure 3d was constructed with twelve categories. A road network that undercut slopes was extracted from the topographic map at the scale of 1 : 50,000. A distance-to-roads map was constructed with 4 categories: 0–40 m, 40– 80 m, 80–120 m, and >120 m. A hydrological network that undercut slopes was also extracted from the topographic map at the scale of 1 : 50,000. And then a distance-to-rivers map was constructed with 4 categories: 0–40 m, 40–80 m, 80–120 m, and >120 m. The rainfall map was prepared using the value of maximum rainfall of eight days seven rainfall days plus last day of rainfall larger than 100 mm for the period from 1990 to 2010, using the Inverse Distance Weighed IDW method. The precipitation data was extracted from a database from the Institute of Meteorology and Hydrology in Vietnam.

3. Landslide Susceptibility Mapping Using SVM, DT, and NB Models 3.1. Support Vector Machines (SVM) Support vector machines are a relatively new supervised learning method based on statistical learning theory and the structural risk minimization principle 55. Using the training data, SVM implicitly maps the original input space into a high-dimensional feature space. Subsequently, in the feature space the optimal hyper plane is determined by maximizing the margins of class boundaries 56. The training points that are closest to the optimal hyper plane are called support vectors. Once the decision surface is obtained, it can be used for classifying new data. Consider a training dataset of instance-label pairs xi , yi with xi ∈ Rn , yi ∈ {1, −1}, and i 1, . . . , m. In the current context of landslide susceptibility, x is a vector of input space that contains slope angle, lithology, rainfall, soil type, slope aspect, land use, distance to roads, distance to rivers, distance to faults, and relief amplitude. The two classes {1, −1} denote landslide pixels and no-landslide pixels. The aim of the SVM classification is to find

6

Mathematical Problems in Engineering Table 1: Normalized classes of landslide conditioning factors used.

Data layers

Class

Class pixels %

Landslide pixels %

Frequency ratio

Attribute

Normalized classes

Slope angle ◦

0–10 10–20 20–30 30–40 40–50 >50

42.82 29.13 20.25 6.84 0.93 0.04

0.20 29.93 54.75 14.31 0.80 0.00

0.005 1.028 2.704 2.094 0.862 0.000

2 4 5 6 3 1

0.26 0.58 0.74 0.90 0.42 0.10

Flat −1 North 0–22.5 and 337.5–360 Northeast 22.5–67.5 East 67.5–112.5 Southeast 112.5–157.5 South 157.5–202.5 Southwest 202.5–247.5 West 247.5–292.5 Northwest 292.5–337.5

0.06

0.00

0.000

1

0.10

12.02

4.70

0.391

2

0.20

14.56

11.81

0.811

6

0.60

12.06

7.81

0.648

5

0.50

12.04

14.51

1.206

7

0.70

12.90

22.72

1.761

8

0.80

14.60

26.33

1.804

9

0.90

11.31

7.11

0.628

4

0.40

10.46

5.01

0.478

3

0.30

Relief amplitude m

0–50 50–100 100–150 150–200 200–250 250–532

27.00 23.97 22.98 14.75 7.06 4.24

1.10 25.43 41.04 20.12 8.41 3.90

0.041 1.061 1.786 1.364 1.190 0.920

1 3 6 5 4 2

0.10 0.42 0.90 0.74 0.58 0.26

Lithology

Group 1 Group 2 Group 3 Group 4 Group 5 Group 6 Group 7

4.08 39.62 32.55 11.65 1.18 5.62 5.29

6.31 33.43 27.13 21.62 0.00 7.81 3.70

1.546 0.844 0.833 1.856 0.000 1.389 0.700

6 4 3 7 1 5 2

0.77 0.50 0.37 0.90 0.10 0.63 0.23

Populated area Orchard land Paddy land Protective forestland Natural forestland Productive forestland Water Annual crop land Nontree rocky mountain Barren land Specially used forestland Grass land

7.53 3.71 9.17 8.58 31.91

14.01 2.50 4.10 20.32 15.62

1.862 0.674 0.448 2.368 0.489

10 7 5 12 6

0.75 0.54 0.39 0.90 0.46

11.72

22.62

1.930

11

0.83

3.97 1.60

1.00 0.20

0.252 0.125

4 3

0.32 0.25

4.08

7.21

1.767

9

0.68

16.95

12.41

0.732

8

0.61

0.36

0.00

0.000

2

0.17

0.43

0.00

0.000

1

0.10

Slope aspect

Land use


7 105◦ 40′ E

105◦ 20′ E

Nam Thiep formation: gritstone, sandstone

Quaternary: chocolate sand, clay, yellow sand, silk, boulder, pebble

Suoi Bang formation: sandstone, conglomerate

Yen Chau formation: red sandstone, calcareous conglomerate

0

◦ ′

21 0 N Da Bac

10

Phia Bioc complex: conglomerate, gritstone

(Kilometers)

Nam Tham formation: clay shale, siltstone, marl Dong Giao formation: limestone, massive limestone Tan Lac formation: conglomerate, sandstone

Ky Son

N W

Song Boi formation: sandstone, silty sandstone

20

Ba Vi complex: peridotite, dunite, gabbro, diabase Ban Xang complex: peridotite, dunite

Luong Son

E

21◦ 0′ N

S

Co Noi formation: sandstone

Hoa Binh city 20◦ 40′ N

Mai Chau

Tan Lac

Landslide

Vien Nam formation: aphyric basalt Bao Ha complex: gabbro, amphibolite

Kim Boi

20◦ 40′ N

Cao Phong

Yen Duyet formation: black clay shale

Lac Thuy

Cam Thuy formation: aphyric basalt

Yen Thuy

Ban Cai formation: clay shale, limestone

Lac Son

Bac Son formation: light-grey massive limestone

Na Vang formation: cherty limestone, clayey limestone Si Phay formation: cherty limestone, marl, black clay shale

20◦ 20′ N

Ban Pap formation: thick-bedded to massive limestone

Geological boundary

Nam Pia formation: clay shale, marl, sericite schist

Fault

Ban Nguon formation: sandstone, siltstone

Deep-seated fault

Song Mua formation: black clay shale, siltstone

Da Dinh formation: dolomite, tremolitized marble

Bo Hieng formation: clay shale, marl, limestone lenses

Xom Giau complex: gneissoid biotite-microcline granite

Sinh Vinh formation: grey sandy limestone

Sinh Quyen formation: quartzite, biotite gneiss

Ban Ngam complex: granite, granosyenite

Suoi Chieng formation: Biotite gneiss, gneiss amphibole

Po Sen complex: tonalite, granodiorite, gneissoid granite

◦ ′

◦

105 0 E

Ben Khe formation: conglomerate, quartzite, clay shale

20◦ 20′ N

Ham Rong formation: dolomitic marble, quartz-sericite schist

′

105◦ 40′ E

105 20 E

Figure 2: Geologic map of the study area.

an optimal separating hyperplane that can distinguish the two classes, that is, landslides and no landslides {1, −1}, from the mentioned set of training data. For the case of linear separable data, a separating hyperplane can be defined as yi w · xi b ≥ 1 − ξi ,

3.1

where w is a coefficient vector that determines the orientation of the hyper plane in the feature space, b is the offset of the hyper plane from the origin, and ξi is the positive slack variables 57. The determination of an optimal hyper plane leads to the solving of the following optimization problem using Lagrangian multipliers 58:

Minimize

n

αi −

i1

n n 1 αi αj yi yj xi xj , 2 i1 j1

n Subject to αi yj 0, i1

0 ≤ αi ≤ C,

3.2

8


N E

W Ky Son

S

> 50

105◦ 0′ E

105◦ 20′ E

105◦ 0′ E

105◦ 20′ E

105◦ 0′ E

105◦ 20′ E

105◦ 0′ E

105◦ 20′ E

a

20

Eutric fluvisols

(Kilometers)

Ferralic acrisols

Da Bac

Gley fluvisols

E

Luong Son

(Kilometers)

W

Humic acrisols

Populated area Ky Son Orchard land

E

Luong Son Paddy land

S

Mai Chau

Humic ferralsols

Tan Lac Kim Boi Cao Phong

20◦ 40′ N

Hoa Binh city

20◦ 40′ N

20◦ 40′ N

Grassland

Nontree rocky mountain

20

Da Bac

S

Lac Thuy

Cao Phong

Dystric fluvisols

105◦ 20′ E

105◦ 40′ E

20◦ 20′ N

Water

Rhodic ferralsols

20◦ 20′ N

Yen Thuy

Populated area

Lac Thuy

Natural forestland Protective forestland

Lac Son

Luvisols

105◦ 0′ E

Water Kim Boi

Landslide

Limestone mountain

Hoa Binh city

Tan Lac

Mai Chau

Landslide

20◦ 20′ N

10

N

Ky Son

N W

0

105◦ 40′ E

Barren land

Annual cropland

21◦ 0′ N

Dystric gleysols

21◦ 0′ N

21◦ 0′ N

10

105◦ 40′ E

b

105◦ 40′ E

Degraded soil 0

20◦ 40′ N

20◦ 40′ N

20◦ 40′ N

105◦ 40′ E

Lac Thuy

20◦ 20′ N

40–50

20–30

Fault line Lac Son Yen Thuy Group 6: metamorphic rock with rich aluminosilicate components Group 7: metamorphic rock with rich quartz components

Yen Thuy

21◦ 0′ N

10–20

Group 5: acid-neutral magmatic rocks Kim Boi

Cao Phong

20◦ 20′ N

20◦ 20′ N

Lac Son

30–40

Hoa Binh city

Tan Lac

Landslide

20◦ 20′ N

20◦ 40′ N

Kim Boi

Landslide 0–10

Group 4: mafic-ultramafic magma rocks

Luong Son

Mai Chau

Lac Thuy

Slope group (degree)

Ky Son

E S

Hoa Binh city

Cao Phong

Da Bac

N W

Luong Son

Group 2: sedimentary aluminosilicate and quartz rocks 10 20 Group 3: sedimentary carbonate rocks (Kilometers)

0

20◦ 40′ N

Da Bac

Tan Lac

105◦ 40′ E 21◦ 0′ N

20

(Kilometers)

Mai Chau

105◦ 20′ E Group 1: Quaternary deposits

21◦ 0′ N

21◦ 0′ N

105◦ 0′ E

Lac Son Yen Thuy

Productive forestland Specially used forestland

105◦ 0′ E

105◦ 20′ E

c

105◦ 40′ E

20◦ 20′ N

10

0

105◦ 40′ E 21◦ 0′ N

105◦ 0′ E

d

Figure 3: Landslide conditioning factor maps a slope, b lithology, c soil type, and d landuse.

where αi are Lagrange multipliers, C is the penalty, and the slack variables ξi allows for penalized constraint violation. The decision function, which will be used for the classification of new data, can then be written as gx sign

n

yi αi xi b .

3.3

i1

In cases when it is impossible to find the separating hyper plane using the linear kernel function, the original input data may be transferred into a high-dimension feature space through some nonlinear kernel functions. The classification decision function is then written as gx sign

n i1

yi αi K xi , xj b ,

3.4


9

Table 1: Continued. Data layers

Class

Class pixels %

Landslide pixels %

Frequency ratio

Attribute

Normalized classes

3.49 0.03

6.11 0.00

1.751 0.000

12 3

0.83 0.23

Eutric fluvisols Degraded soil Limestone mountain Ferralic acrisols Rhodic ferralsols Humic acrisols Dystric fluvisols Dystric gleysols Luvisols Humic ferralsols Populated area Water Gley fluvisols

14.42

15.12

1.048

9

0.63

36.53 8.97 30.91 0.73 0.39 0.46 1.15 0.44 2.41 0.08

43.84 3.40 28.13 2.80 0.60 0.00 0.00 0.00 0.00 0.00

1.200 0.379 0.910 3.828 1.524 0.000 0.000 0.000 0.000 0.000

10 7 8 13 11 4 5 2 1 6

0.70 0.50 0.57 0.90 0.77 0.30 0.37 0.17 0.10 0.43

362–470 470–540 540– 610 610–950

22.48 46.40 22.18 8.94

27.23 35.84 9.01 27.93

1.211 0.772 0.406 3.125

3 2 1 4

0.63 0.37 0.10 0.90

Distance to roads m

0–40 40–80 80–120 >120

1.40 1.68 1.88 95.04

41.64 21.52 4.70 32.13

29.755 12.788 2.509 0.338

4 3 2 1

0.90 0.63 0.37 0.10

Distance to rivers m

0–40 40–80 80–120 >120

3.86 4.52 4.82 86.80

14.41 12.41 8.31 64.86

3.731 2.747 1.725 0.747

4 3 2 1

0.90 0.63 0.37 0.10

Distance to faults m

0–200 200–400 400–700 700–1,000 >1,000

18.09 15.95 19.89 14.31 31.75

24.02 11.61 24.22 18.42 21.72

1.328 0.728 1.218 1.287 0.684

5 2 3 4 1

0.90 0.30 0.50 0.70 0.10

Soil type

Rainfall mm

where Kxi , xj is the kernel function. The choice of the kernel function is crucial for successful SVM training and classification accuracy 59. There are four types of kernel function groups that are commonly used in SVM: linear kernel LN, polynomial kernel PL, radial basis function RBF kernel, and sigmoid kernel SIG. The LN is considered to be a specific case of RBF, whereas the SIG behaves like the RBF for certain parameters 60. According to Keerthi and Lin 61, the LN is not needed for use when the RBF is used. And generally, the classification accuracy of the SIG may not be better than RBF 62. Therefore in this study, only the two kernel functions, RBF and PL, were selected. According to Zhu et al. 63, the main advantage of using RBF is that RBF has good interpolation abilities. However, it may fail to provide longer-range extrapolation. On contrast, PL has better extrapolation abilities at lower-order degrees but

10

Mathematical Problems in Engineering Table 2: RBF and PL kernels and their parameters. Kernel function RBF PL

Formula Kxi , xj exp−γxi − xj 2 Kxi , xj γxTi xj 1d

Kernel parameters γ γ, d

requires higher order degrees for good interpolation. The formulas and their parameters are shown in Table 2. The performance of the SVM model depends on the choice of the kernel parameters. For the RBF-SVM, the regularization parameter C and the kernel width γ are the two parameters that need to be determined, whereas C, γ and the degree of polynomial kernel d are three for the case of the PL-SVM. Parameter C controls the tradeoff between training errors and margin, which helps to control overfitting of the model. If values of C are large, that will lead to a few training errors, whereas a small value for C will generate a larger margin and thus increase the number of training errors 64. Parameter γ controls the degree of nonlinearity of the SVM model. Parameter d defines the degree of the polynomial kernel. The process of picking up the best pairs of parameters, which produce the best classification result, is considered to be an important research issue in the data mining area 65. Many methods have been proposed, such as the heuristic parameter selection 66, the gradient descent algorithm 67, the Levenberg-Marquardt method 68, and the cross-validation method 69. However, the grid search method that is widely used in the determination of SVM parameters is still considered to be the most reliable optimization method 70 and was selected for this study. Firstly, the ranges of all parameters with a stepsize process were determined. Secondly, the grid search was performed by varying the SVM hyperparameters. Finally, the performance of every combination is assessed to find the best pairs of parameters. However, the grid search is only suitable for the adjustment of a small number of parameters due to the computational complexity 71.

3.2. Decision Tree (DT) A DT is a hierarchical model composed of decision rules that recursively split independent variables into homogeneous zones 72. The objective of DT building is to find the set of decision rules that can be used to predict outcome from a set of input variables. A DT is called a classification or a regression tree if the target variables are discrete or continuous, respectively 73. DT has been applied successfully in many real-world situations for classification and prediction 74. The main advantage of DT is that DT models have the capability of modeling complex relationship between variables. They can incorporate both categorical and continuous variables without strict assumptions with respect to the distribution of the data 75. In addition, DTs are easy to construct and the resulting models can be easily interpreted. Furthermore, the DT model results provide clear information on the relative importance of input factors 76. The main disadvantage of DTs is that they are susceptible to noisy data and that multiple output attributes are not allowed 77. Many algorithms for constructing decision tree models such as classification and regression tree CART 78, chi-square automatic interaction detector decision tree CHAID 79, ID3 80, and C4.5 81 are proposed in the literature. In this study, the J48 algorithm 82, which is a Java reimplementation of the C4.5 algorithm, was used. The C4.5 uses an


11

entropy-based measure as the selection criteria that is considered to be the fastest algorithm for machine learning with good classification accuracy 83. Given a training dataset T with subsets Ti , i 1, 2, ..., s, the C4.5 algorithm constructs a DT using the top-down and recursivesplitting technique. A tree structure consists of a root node, internal nodes, and leaf nodes. The root node contains all the input data. An internal node can have two or more branches and is associated with a decision function. A leaf node indicates the output of a given input vector. The procedure of DT modeling consists of two steps: 1 tree building and 2 tree pruning 84. The tree building begins by determining the input variable with highest gain ratio as the root node of the DT. Then the training dataset is split based on the root values, and subnodes are created. For discrete input variables, a subnode of the tree is created for each possible value. For continuous input variables, two sub-nodes are created based on a threshold that was determined in the threshold-finding process 81. In the next step, the gain ratio is calculated for all the sub-nodes individually, and the process is subsequently repeated until all examples in a node belong to the same class. And those nodes are called leaf nodes and are labeled as class values. Since the tree obtained in the building step may have a large number of branches and therefore may cause a problem of over-fitting 85, therefore, the tree needs to be pruned for better classification accuracy for new data. Two types of tree pruning can be seen: before pruning and after pruning. In the case of pre-pruning, the growing of the tree will be stopped when a certain criterion is satisfied, whereas in the post-pruning case the full tree will be constructed first, and then the ending subtrees will be replaced by leafs based on the error comparison of the tree before and after replacing sub-trees. The information gain ratio for attribute A is as follows: GainRatioA, T

GainA, T , SplitInfoA, T

3.5

where GainA, T EntropyT −

s |Ti | i1

SplitInfoA −

s |Ti | i1

|T |

EntropyTi ,

|Ti | log2 . |T | |T |

3.6

A DT can estimate the probability of belonging to a specific class and therefore the probability isused to predict the probability of landslide pixels. The estimated probability is based on a natural frequency at the tree leaf. However, the estimated probability might not give sound probabilistic estimates; therefore Laplace smoothing 86 was used in this study.

3.3. Na¨ıve Bayes (NB) An NB classifier is a classification system based on Bayes’ theorem that assumes that all the attributes are fully independent given the output class, called the conditional independence assumption 48. The main advantage of the NB classifier is that it is very easy to construct without needing any complicated iterative parameter estimation schemes 40. In addition,

12


NB classifier is robust to noise and irrelevant attribute. This method has been successfully applied in many fields 87. Given an observation consisting of k attributes xi , i 1, 2, . . . , k xi is landslide conditioning factor, yj , j landslide, nolandslide is the output class. NB estimates the probability P yj /xi for all possible output class. The prediction is made for the class with the largest posterior probability as yNB

argmax P yj

n P xi /yj .

yj ∈{Landslide, no-landslide} i1

3.7

The prior probability P yj can be estimated using the proportion of the observations with output class yj in the training dataset. The conditional probability is calculated using P

xi yj

2 1 2 √ e−xi −μ /2δ , 2πδ

3.8

where μ is mean and δ is standard deviation of xi .

3.4. Performance Evaluation The performances of the trained landslide models were assessed using several statistical evaluation criteria using counts of true positive TP, false positive FP, true negative TN, false negative FN. TP rate sensitivity measures the proportion of the number of pixels that are correctly classified as landslides and is defined as TP/TP FN. TN rate specificity measures the proportion of number of pixels that are correctly classified as non-landslide and is defined as TN/TN FP. Precision measures the proportion of the number of pixels that are correctly classified as landslide occurrences and is defined as TP/TP FP. Overall accuracy is calculated as TP TN/total number of training pixels. The F-measure combines precision and sensitivity into their harmonic mean and is defined as 2 ∗ Sensitivity ∗ Specificity/Sensitivity Specificity 88. In order to measure the reliability of the landslide susceptibility models, the Cohen kappa index κ 89–91 was used to assess the model classification compared to chance selection: κ

PC − Pexp , 1 − Pexp

3.9

where PC is the proportion of number of pixels that are correctly classified as landslide or non-landslide and is calculated as TP TN/total number of pixels. Pexp is the expected agreements and is calculated as TP FN)(TP FP FP TN)(FN TN/Sqrttotal number of training pixels. A κ value of 0 indicates that no agreement exists between the landslide model and reality whereas a κ value of 1 indicates a perfect agreement. If κ value is negative, it indicates a poor agreement. A κ value in the range 0.80–1 is considered as indicator of almost perfect


13

agreement while a value in the range 0.60–0.80 indicates a substantial agreement between the model and reality. For a value in the interval 0.40–0.60, the agreement is moderate and the values of 0.20–0.40 and 90% whereas FP rate is low 0.1

≤ 0.1 Slope

Dist. to roads > 0.42

≤ 0.42 0.1 (288/1)

Landuse

Dist. to rivers

Aspect

Dist. to faults ≤ 0.7 Land use ≤ 0.54 0.1 (107)

0.1 (12) > 0.3

Dist. to faults

> 0.58

≤ 0.58

Aspect

≤ 0.3

Land use

Slope

> 0.54

0.1 (34)

0.1 (15)

Soil

≤ 0.63 0.1 (9/2)

≤ 0.46

> 0.7 Rainfall

1 (12/1) > 0.63

0.9 (23/1)

≤ 0.74

0.9 (11/3)

Soil

≤ 0.3

> 0.1

≤ 0.7

0.1 (9)

0.9 (11.4)

0.1 (7)

Dist. to rivers

≤ 0.1

> 0.1

> 0.74 0.9 (17/1)

0.1 (30/5)

0.9 (37/6)

Relief amplitude ≤ 0.42 0.9 (17/3)

Land use

> 0.83

0.9 (65/5) ≤ 0.7

> 0.1

≤ 0.1

Relief amplitude

> 0.37

≤ 0.37

0.1 (27/2)

> 0.58

0.1 (9/5)

> 0.46

Rainfall > 0.75

≤ 0.75 0.9 (16/2)

Slope

≤ 0.58

0.9 (173/15)

> 0.37

Land use

> 0.3

≤ 0.3

Aspect ≤ 0.7

> 0.7

0.1 (13/2)

≤ 0.37

Dist. to faults > 0.6

≤ 0.6

> 0.42

≤ 0.42

Lithology > 0.37

0.9 (311/7)

slope > 0.68

≤ 0.68

≤ 0.37

> 0.63

≤ 0.63

0.1 (64/1)

0.1 (8)

> 0.42 0.1 (7)

> 0.7 0.9 (8)

Figure 6: Decision tree model for landslide susceptibility assessment for the study area.

≤ 0.83 Aspect > 0.7 0.9 (28/3)


17

3.6.3. Na¨ıve Bayes (NB) In the case of NB classifier, the probability is first calculated for each output class landslide, no landslide, and the classification is then made for the class with the largest posterior probability. The NB model was constructed using the WEKA software. The NB model obtained an overall classification accuracy of 86.1% in average. TP rate, precision, and Fmeasure are varied from 83% to 89%. The Cohen kappa index of 0.722 indicates that the strength of agreements between the observed and the predicted values is substantial. A summary result of the model assessment and performance is shown in Tables 4 and 5. Once the SVM, DT, and NB models were successfully trained in the training phase, they were used to calculate the landslide susceptibility indexes LSIs for all the pixels in the study area. The results were then transferred into a GIS and loaded in the ARCGIS 10 software for visualization.

4. Validation and Comparison of Landslide Susceptibility Models 4.1. Success Rate and Prediction Rate for Landslide Susceptibility Maps The validation processes of the four landslide susceptibility maps were performed by comparing them with the landslide locations using the success-rate and prediction-rate methods 95. Using the landslide grid cells in the training dataset, the success-rate results were obtained. Figure 7 shows the success-rate curves of the four landslide susceptibility maps obtained from RBF-SVM, PL-SVM, DT, NB models in this study in comparison with the logistic regression model. It could be observed that RBF-SVM and logistic regression have the highest area under the curve, with AUC values of 0.961 and 0.962, respectively. They are followed by PL-SVM 0.956, DT 0.952, and NB 0.935. Based on these results we can conclude that the capability of correctly classifying the areas with existing landslides is highest for the RBF-SVM equals to logistic regression, followed by the PL-SVM, DT, and NB. Since the success-rate method uses the landslide pixels in the training dataset that have already been used for constructing the landslide models, the success-rate may not be a suitable method for measuring the prediction capability of the landslide models 96. According to Chung and Fabbri 95, the prediction rate could be used to estimate the prediction capability of the landslide models. In this study, the prediction-rate results of the four landslide susceptibility models were obtained by comparing them with the landslide grid cells in the validation dataset. And then the areas under the prediction-rate curves AUCs were further estimated. The more the AUC value is close to 1, the better the landslide model. The prediction-rate curves and AUC of the four landslide susceptibility maps are shown in Figure 8. The results show that AUCs for the four models vary from 0.909 to 0.955. It indicates that all the models have a good prediction capability. The highest prediction capability is for RBF-SVM and PL-SVM with AUC values of 0.954 and 0.955, respectively. They are followed by NB 0.935 and DT 0.907. Compared with the logistic regression AUC of 0.938 that used the same data, it can be seen that the prediction capability of the two SVM models may be slightly better whereas the prediction capability of DT and ND is lower.

18

Mathematical Problems in Engineering 100 Percentage of landslides

90 80 70 60 50 40 30 20 10 0 0

10

20

30

40

50

60

70

80

90

100

Percentage of landslide susceptibility map Logistic regression, AUC = 0.962 RBF-SVM, AUC = 0.961 PL-SVM, AUC = 0.957 Decision tree, AUC = 0.952 Naïve Bayes, AUC = 0.935

Figure 7: Success-rate curves and area, under the curves AUCs of RBF-SVM, PL-SVM, DT, and NB models in comparison with the logistic regression model.

100 Percentage of landslides

90 80 70 60 50 40 30 20 10 0 0

10

20

30

40

50

60

70

80

90

100

Percentage of landslide susceptibility map Logistic regression, AUC = 0.938 RBF-SVM, AUC = 0.955 PL-SVM, AUC = 0.956 Decision tree, AUC = 0.909 Na¨ıve Bayes, AUC = 0.932

Figure 8: Prediction-rate curves and areas under the curves AUCs of RBF-SVM, PL-SVM, DT, and NB models in comparison with the logistic regression model.

4.2. Reclassification of Landslide Susceptibility Indexes The landslide susceptibility indexes were reclassified into four relative susceptibility classes: high, moderate, low, and very low. In this study, the classification method proposed by Pradhan and Lee 8 was used to determine landslide susceptibility class breaks based on percentage of area: high 10%, moderate 10%, low 20%, and very low 60% Figure 9. Landslide density analysis was performed on the four landslide susceptibility classes 97. Landslide density is defined as the ratio of landslide pixels to the total number of


19

100

60 50 40 30 20 10 0

0

10

20

30

Very low landslide susceptibility

70

Low landslide susceptibility

80 High landslide susceptibility Moderate landslide susceptibility

Percentage of landslides

90

40

50

60

70

80

90

100

Percentage of landslide susceptibility map Decision tree Na¨ıve Bayes

RBF-SVM PL-SVM

Figure 9: Percentage of landslides against percentage of landslide susceptibility maps using of RBF-SVM, PL-SVM, DT, and NB models. Table 6: Characteristics of the four susceptibility zones of the four landslide susceptibility models obtained from RBF-SVM, PL-SVM, DT, and NB models. Landslide susceptibility classes High Moderate Low Very low

Percentage of area 10.0 10.0 20.0 60.0

RBF-SVM 8.719 0.740 0.221 0.017

Landslide density PL-SVM DT 8.749 9.069 0.660 0.571 0.241 0.115 0.018 0.022

NB 8.128 0.791 0.371 0.057

pixels in the susceptibility class. An ideal landslide susceptibility map has the landslide density value increasing from a very low- to a higher-susceptibility class 32. A plotting of the landslide density for the four landslide susceptibility classes of the four landslide susceptibility models RBF-SVM, PL-SVM, DT, and NB is shown in Figure 10. It could be observed that the landslide density is gradually increased from the very low- to the highsusceptibility class. Figure 11 shows landslide susceptibility maps using RBF-SVM, PL-SVM, DT, and NB models. Table 6 shows the characteristics of the four susceptibility classes of the four maps of the study area. It can be observed that the percentages of existing landslide pixels for the high class are 87.2%, 87.5%, 90.7%, and 81.3% for RBF-SVM, PL-SVM, DT, and NB, respectively. In contrast, 80% of the pixels in the study areas are in the low- and very-low-susceptibility classes. These maps are satisfing two spatial effective rules 98, 1 the existing landslide pixels should belong to the high-susceptibility class and 2 the high susceptibility class should cover only small areas.

5. Discussions and Conclusions This paper presents a comparative study of three data mining approaches SVM, DT, and NB for landslide susceptibility mapping in the Hoa Binh province Vietnam. The landslide inventory was constructed with 118 polygons of landslides that occurred during the last ten

20

Mathematical Problems in Engineering Landslide density

10 8 6 4 2

High (10%)

Moderate (10%)

Low (20%)

Very low (60%)

0

Landslide susceptibility classes RBF-SVM PL-SVM

Decision tree Na¨ıve Bayes

Figure 10: Landslide density plots of four landslide susceptibility classes of RBF-SVM, PL-SVM, DT, and NB models.

Da Bac

Lac Thuy

Yen Thuy

Moderate (−0.617–−0.083) High (−0.083 to 3.611 )

105◦ 20′ E

105◦ 40′ E

Luong Son

Hoa Binh city

Mai Chau

Tan Lac Cao Phong

Road Landslide Support vector machines model Lac Son (using polynomial function) Very low (−4.933–−1.208) Low (−1.208–−0.626)

105◦ 0′ E

105◦ 20′ E

◦ ′

′

E

0

10

S (Kilometers)

Ky Son Da Bac

20

W

0

10 (Kilometers)

Ky Son Da Bac

20

Yen Thuy

105◦ 20′ E c

105◦ 40′ E

20◦ 20′ N

Lac Thuy

Lac Son

20◦ 40′ N

Kim Boi

20◦ 40′ N

20◦ 40′ N

Tan Lac Cao Phong

Luong Son

lake

20◦ 20′ N

20◦ 40′ N 20◦ 20′ N

E

Hoa Binh

Hoa Binh city

105◦ 0′ E

105◦ 40′ E

S

Luong Son

lake

Road Landslide Decision tree model (using J48 algorithm) Very low (0.007–0.028) Low (0.028–0.143) Moderate (0.143–0.615) High (0.615–0.974)

105◦ 40′ E

N

Hoa Binh

Mai Chau

105◦ 20′ E

105 0 E

105 40 E

N W

Yen Thuy

21◦ 0′ N

105 20 E

◦

21◦ 0′ N

21◦ 0′ N

105 0 E

Lac Thuy

b ′

21◦ 0′ N

◦

Kim Boi

Moderate (−0.626–−0.095) High (−0.095 to 5.68 )

a ◦ ′

20

Hoa Binh city Mai Chau

Tan Lac Cao Phong

Road Landslide Na¨ıve Bayes model Very low (0–0.068) Low (0.068–0.263) Moderate (0.263–0.561) High (0.561–1)

105◦ 0′ E

Kim Boi

Lac Thuy

Lac Son

Yen Thuy

105◦ 20′ E

105◦ 40′ E

20◦ 20′ N

105◦ 0′ E

20◦ 40′ N

Kim Boi

10 (Kilometers)

Da Bac

lake

Hoa Binh city

Cao Phong

0 Ky Son

Hoa Binh

Tan Lac

Road Landslide Support vector machines model Lac Son (using radial basis function) Very low (−5.844–−1.231) Low (−1.231–−0.617)

E S

Luong Son

20◦ 40′ N

Mai Chau

W

21◦ 0′ N

20

105◦ 40′ E

20◦ 40′ N

10 (Kilometers)

105◦ 20′ E N

20◦ 20′ N

0 Ky Son

21◦ 0′ N

E S

lake

20◦ 40′ N

105◦ 0′ E

20◦ 20′ N

W

Hoa Binh

20◦ 20′ N

105◦ 40′ E 21◦ 0′ N

105◦ 20′ E N

20◦ 20′ N

21◦ 0′ N

105◦ 0′ E

d

Figure 11: Landslide susceptibility maps of the Hoa Binh province Vietnam using: a RBF-SVM; b PL-SVM; c DT; and d NB.


21

years. A total of ten landslide conditioning factors were used in this analysis, including slope angle, lithology, rainfall, soil type, slope aspect, landuse, distance to roads, distance to rivers, distance to faults, and relief amplitude. For building the models, a training dataset was extracted with 70% of the landslide inventory, whereas the remaining landslide inventory was used for the assessment of the prediction capability of the models. Using the three data mining algorithms, SVM, DT, and NB, the landslide susceptibility maps were produced. These maps present spatial predictions of landslides. They do not include information “when” and “how frequently” landslides will occur. In the case of SVM, the selection of the kernel function and its parameters play an important role in landslide susceptibility assessment. For the RBF function, the best kernel parameters of C and γ are 8 and 0.25, respectively. For the PL function, it is clear that the degree of polynomial function had significant effect in the model. The SVM model with a polynomial degree of 3 has the highest accuracy. The best kernel parameters of C and γ are 1 and 0.3536 respectively. In the case of DT, the probability that an observation belongs to landslide class using Laplace smoothing was used to calculate the landslide susceptibility index. For building the DT model, the selection of MNI per leaf tree and CF has largely affected the accuracy of the model. In this study, the best decision tree model is found with MNI per leaf tree as 6 and the CF as 0.35. Relative importance of landslide conditioning factors are as follows: distance to roads, slope angle, landuse, slope aspect, rainfall, relief amplitude, distance to rivers, distance to faults, lithology, and soil type. In the case of NB, the application for landslide modeling is relatively robust. This is not a time-consuming method, and techniques required to use are simple. The result of this study shows that NB gives relatively good prediction capability. Qualitative interpretation of the high landslide susceptibility classes of the four maps shows that they agree quite well with field evidence and assumptions. High probability of landslides distributes in areas with active fault zones and road-cut sections. Using the success-rate and prediction-rate methods, the landslide susceptibility maps were validated using the existing landslide locations. The quantitative results show that all the landslide models have good prediction capability. The highest area under the success-rate curve AUC is for the RBF-SVM 0.961, followed by PL-SVM 0.956, DT 0.938, and NB 0.935. The highest prediction-rate result is for RBF-SVM and PL-SVM with areas under the prediction curves AUC of 0.954 and 0.955, respectively. They are followed by NB 0.932 and DT 0.903. When compared with the results obtained from the logistic regression Figure 8, the prediction capabilities of the two SVM models are slightly better. On contrast, DT and NB models have lower accuracy. The quantitative results of this study are comparable to those obtained in other studies, such as Brenning 99 and Yilmaz 35. The findings of this study agree with Yao et al 100 who states that SVM possesses better prediction efficiency than the logistic regression. Additionally, the findings also agree with Marjanović et al. 101, who reported that SVM outperformed the logistic regression and DT. Similarly, the results also agree with Ballabio and Sterlacchini 102, who concluded that SVM was found to outperform the logistic regression, linear discriminant, and NB. The reliabilities of the landslide models were assessed using Cohen kappa index κ. In this study, the kappa indexes are of 0.822, 0.823, and 0.860 for RBF-SVM, PL-SVM, and DT, respectively. It indicates an almost perfect agreement between the observed and the predicted values. Cohen kappa index is 0.722 for NB indicating substantial agreement between the observed and the predicted values. The reliability analysis results are satisfying compared with other works such as Guzzetti et al. 91 and Saito et al. 44.

22


Landslide susceptibility maps are considered to be a useful tool for territorial planning, disaster management, and natural hazards’ mitigation. This study shows that SVMs have considered being a powerful tool for landslide susceptibility with high accuracy. As a final conclusion, the analyzed results obtained from the study can provide very useful information for decision making and policy planning in landslide areas.

Acknowledgments This research was funded by the Norwegian Quota scholarship program. The data analysis and write-up were carried out as a part of the first author’s Ph.D. studies at the Geomatics Section, Department of Mathematical Sciences and Technology, Norwegian University of Life Sciences, Norway.

References 1 K. Sassa and P. Canuti, Landslides-Disaster Risk Reduction, Springer, New York, NY, USA, 2008. 2 D. Tien Bui, O. Lofman, I. Revhaug, and O. Dick, “Landslide susceptibility analysis in the Hoa Binh province of Vietnam using statistical index and logistic regression,” Natural Hazards, vol. 59, pp. 1413–1444, 2011. 3 D. Tien Bui, B. Pradhan, O. Lofman, I. Revhaug, and O. B. Dick, “Landslide susceptibility mapping at Hoa Binh province Vietnam using an adaptive neuro-fuzzy inference system and GIS,” Computers & Geosciences. In press. 4 D. Tien Bui, B. Pradhan, O. Lofman, I. Revhaug, and O. B. Dick, “Spatial prediction of landslide hazards in Hoa Binh province Vietnam: a comparative assessment of the efficacy of evidential belief functions and fuzzy logic models,” CATENA, vol. 96, pp. 28–40, 2012. 5 S. Lee and T. Dan, “Probabilistic landslide susceptibility mapping on the Lai Chau province of Vietnam: focus on the relationship between tectonic fractures and landslides,” Environmental Geology, vol. 48, no. 6, pp. 778–787, 2005. 6 S. Lee, “Landslide susceptibility mapping using an artificial neural network in the Gangneung area, Korea,” International Journal of Remote Sensing, vol. 28, no. 21, pp. 4763–4783, 2007. 7 B. Pradhan, “Use of GIS-based fuzzy logic relations and its cross application to produce landslide susceptibility maps in three test areas in Malaysia,” Environmental Earth Sciences, vol. 63, no. 2, pp. 329–349, 2011. 8 B. Pradhan and S. Lee, “Regional landslide susceptibility analysis using back-propagation neural network model at Cameron Highland, Malaysia,” Landslides, vol. 7, no. 1, pp. 13–30, 2010. 9 F. Guzzetti, P. Reichenbach, M. Cardinali, M. Galli, and F. Ardizzone, “Probabilistic landslide hazard assessment at the basin scale,” Geomorphology, vol. 72, no. 1–4, pp. 272–299, 2005. 10 F. Guzzetti, A. Carrara, M. Cardinali, and P. Reichenbach, “Landslide hazard evaluation: a review of current techniques and their application in a multi-scale study, central Italy,” Geomorphology, vol. 31, no. 1–4, pp. 181–216, 1999. 11 H. Wang, L. Gangjun, X. Weiya, and W. Gonghui, “GIS-based landslide hazard assessment: an overview,” Progress in Physical Geography, vol. 29, no. 4, pp. 548–567, 2005. 12 J. Chacon, ´ C. Irigaray, T. Fernández, and R. El Hamdouni, “Engineering geology maps: landslides and geographical information systems,” Bulletin of Engineering Geology and the Environment, vol. 65, no. 4, pp. 341–411, 2006. 13 M. Ercanoglu and C. Gokceoglu, “Assessment of landslide susceptibility for a landslide-prone area north of Yenice, NW Turkey by fuzzy approach,” Environmental Geology, vol. 41, no. 6, pp. 720–730, 2002. 14 M. Ercanoglu and C. Gokceoglu, “Use of fuzzy relations to produce landslide susceptibility map of a landslide prone area West Black Sea region, Turkey,” Engineering Geology, vol. 75, no. 3-4, pp. 229–250, 2004. 15 B. Pradhan, E. A. Sezer, C. Gokceoglu, and M. F. Buchroithner, “Landslide susceptibility mapping by neuro-fuzzy approach in a landslide-prone area Cameron Highlands, Malaysia,” IEEE Transactions on Geoscience and Remote Sensing, vol. 48, no. 12, pp. 4164–4177, 2010.


23

16 S. Lee, “Application and verification of fuzzy algebraic operators to landslide susceptibility mapping,” Environmental Geology, vol. 52, no. 4, pp. 615–623, 2007. 17 A. Akgun, E. A. Sezer, H. A. Nefeslioglu, C. Gokceoglu, and B. Pradhan, “An easy-to-use MATLAB program MamLand for the assessment of landslide susceptibility using a Mamdani fuzzy algorithm,” Computers and Geosciences, vol. 38, no. 1, pp. 23–34, 2011. 18 B. Pradhan, “Application of an advanced fuzzy logic model for landslide susceptibility analysis,” International Journal of Computational Intelligence Systems, vol. 3, no. 3, pp. 370–381, 2010. 19 B. Pradhan, “Landslide susceptibility mapping of a catchment area using frequency ratio, fuzzy logic and multivariate logistic regression approaches,” Journal of the Indian Society of Remote Sensing, vol. 38, no. 2, pp. 301–320, 2010. 20 B. Pradhan, “Manifestation of an advanced fuzzy logic model coupled with Geo-information techniques to landslide susceptibility mapping and their comparison with logistic regression modelling,” Environmental and Ecological Statistics, vol. 18, no. 3, pp. 471–493, 2011. 21 H. J. Oh and B. Pradhan, “Application of a neuro-fuzzy model to landslide-susceptibility mapping for shallow landslides in a tropical hilly area,” Computers and Geosciences, vol. 37, no. 9, pp. 1264– 1276, 2011. 22 M. H. Vahidnia, A. A. Alesheikh, A. Alimohammadi, and F. Hosseinali, “A GIS-based neuro-fuzzy procedure for integrating knowledge and data in landslide susceptibility mapping,” Computers and Geosciences, vol. 36, no. 9, pp. 1101–1114, 2010. 23 S. Lee, J. H. Ryu, K. Min, and J. S. Won, “Landslide susceptibility analysis using GIS and artificial neural network,” Earth Surface Processes and Landforms, vol. 28, no. 12, pp. 1361–1376, 2003. 24 S. Lee, J. H. Ryu, J. S. Won, and H. J. Park, “Determination and application of the weights for landslide susceptibility mapping using an artificial neural network,” Engineering Geology, vol. 71, no. 3-4, pp. 289–302, 2004. 25 F. Catani, N. Casagli, L. Ermini, G. Righini, and G. Menduni, “Landslide hazard and risk mapping at catchment scale in the Arno River basin,” Landslides, vol. 2, no. 4, pp. 329–342, 2005. 26 L. Ermini, F. Catani, and N. Casagli, “Artificial neural networks applied to landslide susceptibility assessment,” Geomorphology, vol. 66, no. 1–4, pp. 327–343, 2005. 27 B. Pradhan, S. Lee, and M. F. Buchroithner, “A GIS-based back-propagation neural network model and its cross-application and validation for landslide susceptibility analyses,” Computers, Environment and Urban Systems, vol. 34, no. 3, pp. 216–235, 2010. 28 I. Yilmaz, “A case study from Koyulhisar Sivas-Turkey for landslide susceptibility mapping by artificial neural networks,” Bulletin of Engineering Geology and the Environment, vol. 68, no. 3, pp. 297–306, 2009. 29 B. Pradhan and M. F. Buchroithner, “Comparison and validation of landslide susceptibility maps using an artificial neural network model for three test areas in Malaysia,” Environmental and Engineering Geoscience, vol. 16, no. 2, pp. 107–126, 2010. 30 I. Yilmaz, “The effect of the sampling strategies on the landslide susceptibility mapping by conditional probability and artificial neural networks,” Environmental Earth Sciences, vol. 60, no. 3, pp. 505–519, 2010. 31 I. Yilmaz, “Landslide susceptibility mapping using frequency ratio, logistic regression, artificial neural networks and their comparison: a case study from Kat landslides Tokat-Turkey,” Computers and Geosciences, vol. 35, no. 6, pp. 1125–1138, 2009. 32 B. Pradhan and S. Lee, “Landslide susceptibility assessment and factor effect analysis: backpropagation artificial neural networks and their comparison with frequency ratio and bivariate logistic regression modelling,” Environmental Modelling and Software, vol. 25, no. 6, pp. 747–759, 2010. 33 E. Yesilnacar and T. Topal, “Landslide susceptibility mapping: a comparison of logistic regression and neural networks methods in a medium scale study, Hendek region Turkey,” Engineering Geology, vol. 79, no. 3-4, pp. 251–266, 2005. 34 H. A. Nefeslioglu, C. Gokceoglu, and H. Sonmez, “An assessment on the use of logistic regression and artificial neural networks with different sampling strategies for the preparation of landslide susceptibility maps,” Engineering Geology, vol. 97, no. 3-4, pp. 171–191, 2008. 35 I. Yilmaz, “Comparison of landslide susceptibility mapping methodologies for Koyulhisar, Turkey: conditional probability, logistic regression, artificial neural networks, and Support Vector Machine,” Environmental Earth Sciences, vol. 61, no. 4, pp. 821–836, 2010. 36 C. P. Poudyal, C. Chang, H. J. Oh, and S. Lee, “Landslide susceptibility maps comparing frequency ratio and artificial neural networks: a case study from the Nepal Himalaya,” Environmental Earth Sciences, vol. 61, no. 5, pp. 1049–1064, 2010.

24


37 B. Pradhan, “Remote sensing and GIS-based landslide hazard analysis and cross-validation using multivariate logistic regression model on three test areas in Malaysia,” Advances in Space Research, vol. 45, no. 10, pp. 1244–1256, 2010. 38 A. S. Miner, P. Vamplew, D. J. Windle, P. Flentje, and P. Warner, “A comparative study of various data mining techniques as applied to the modeling of landslide susceptibility on the Bellarine Peninsula, Victoria, Australia,” in Geologically Active, A. L. Williams, G. M. Pinches, C. Y. Chin, and T. J. McMorran, Eds., p. 352, CRC Press, New York, NY, USA, 2010. 39 S. Wan and T. C. Lei, “A knowledge-based decision support system to analyze the debris-flow problems at Chen-Yu-Lan River, Taiwan,” Knowledge-Based Systems, vol. 22, no. 8, pp. 580–588, 2009. 40 X. Wu, V. Kumar, Q. J. Ross et al., “Top 10 algorithms in data mining,” Knowledge and Information Systems, vol. 14, no. 1, pp. 1–37, 2008. 41 S. B. Bai, J. Wang, G. N. Lu, M. Kanevski, and A. Pozdnoukhov, “GIS-Based landslide susceptibility mapping with comparisons of results from machine learning methods versus logistic regression in basin scale,” Geophysical Research Abstracts, EGU, vol. 10,A-06367, 2008. 42 N. Micheletti, L. Foresti, M. Kanevski, A. Pedrazzini, and M. Jaboyedoff, “Landslide susceptibility mapping using adaptive Support Vector Machines and feature selection,” Geophysical Research Abstracts, EGU, vol. 13, 2011. 43 Y. K. Yeon, J. G. Han, and K. H. Ryu, “Landslide susceptibility mapping in Injae, Korea, using a decision tree,” Engineering Geology, vol. 116, no. 3-4, pp. 274–283, 2010. 44 H. Saito, D. Nakayama, and H. Matsuyama, “Comparison of landslide susceptibility based on a decision-tree model and actual landslide occurrence: the Akaishi mountains, Japan,” Geomorphology, vol. 109, no. 3-4, pp. 108–121, 2009. 45 H. A. Nefeslioglu, E. Sezer, C. Gokceoglu, A. S. Bozkir, and T. Y. Duman, “Assessment of landslide susceptibility by decision trees in the metropolitan area of Istanbul, Turkey,” Mathematical Problems in Engineering, vol. 2010, Article ID 901095, 2010. 46 C. A. Ratanamahatana and D. Gunopulos, “Feature selection for the naive Bayesian classifier using decision trees,” Applied Artificial Intelligence, vol. 17, no. 5-6, pp. 475–487, 2003. 47 W. Tzu-Tsung, “A hybrid discretization method for na¨ıve Bayesian classifiers,” Pattern Recognition, vol. 45, no. 6, pp. 2321–2325, 2012. 48 D. Soria, J. M. Garibaldi, F. Ambrogi, E. M. Biganzoli, and I. O. Ellis, “A “non-parametric” version of the naive Bayes classifier,” Knowledge-Based Systems, vol. 24, no. 6, pp. 775–784, 2011. 49 J. Kazmierska and J. Malicki, “Application of the na¨ıve Bayesian classifier to optimize treatment decisions,” Radiotherapy and Oncology, vol. 86, no. 2, pp. 211–216, 2008. 50 C.-C. Chang and C.-J. Lin, LIBSVM : a Library for Support Vector Machines, ACM Transactions on Intelligent Systems and Technology, New York, NY, USA, 2011. 51 B. D. Malamud, D. L. Turcotte, F. Guzzetti, and P. Reichenbach, “Landslide inventories and their statistical properties,” Earth Surface Processes and Landforms, vol. 29, no. 6, pp. 687–711, 2004. 52 F. Vergari, M. Della Seta, M. Del Monte, P. Fredi, and E. Lupia Palmieri, “Landslide susceptibility assessment in the Upper Orcia Valley Southern Tuscany, Italy through conditional analysis: a contribution to the unbiased selection of causal factors,” Natural Hazards and Earth System Science, vol. 11, no. 5, pp. 1475–1497, 2011. 53 T. T. Van, D. T. Anh, H. H. Hieu et al., Investigation and Assessment of the Current Status and Potential of Landslide in Some Sections of the Ho Chi Minh Road, National Road 1A and Proposed Remedial Measures to Prevent Landslide from Threat of Safety of People, Property, and Infrastructure, Vietnam Institute of Geoscience and Mineral Resources, Hanoi, Vietnam, 2006. 54 F. Arikan, R. Ulusay, and N. Aydin, “Characterization of weathered acidic volcanic rocks and a weathering classification based on a rating system,” Bulletin of Engineering Geology and the Environment, vol. 66, no. 4, pp. 415–430, 2007. 55 V. N. Vapnik, Statistical Learning Theory, Wiley-Interscience, New York, NY, USA, 1998. 56 S. Abe, Support Vector Machines for Pattern Classification, Springer, London, UK, 2010. 57 C. Cortes and V. Vapnik, “Support-vector networks,” Machine Learning, vol. 20, no. 3, pp. 273–297, 1995. 58 P. Samui, “Slope stability analysis: a Support Vector Machine approach,” Environmental Geology, vol. 56, no. 2, pp. 255–267, 2008. 59 R. Damaˇseviˇcius, “Optimization of SVM parameters for recognition of regulatory DNA sequences,” Top, vol. 18, no. 2, pp. 339–353, 2011. 60 S. Song, Z. Zhan, Z. Long, J. Zhang, and L. Yao, “Comparative study of SVM methods combined with voxel selection for object category classification on fMRI data,” PLoS ONE, vol. 6, no. 2, Article


25

ID e17191, 2011. 61 S. S. Keerthi and C. J. Lin, “Asymptotic behaviors of Support Vector Machines with gaussian kernel,” Neural Computation, vol. 15, no. 7, pp. 1667–1689, 2003. 62 H.-T. Lin and C.-J. Lin, “A study on sigmoid kernels for SVM and the training of non-PSD kernels by SMO-type methods,” Tech. Rep., National Taiwan University, Taipei, Taiwan, 2003. 63 X. Zhu, S. Zhang, Z. Jin, Z. Zhang, and Z. Xu, “Missing value estimation for mixed-attribute data sets,” IEEE Transactions on Knowledge and Data Engineering, vol. 23, no. 1, pp. 110–121, 2011. 64 R. Damaˇseviˇcius, “Structural analysis of regulatory DNA sequences using grammar inference and Support Vector Machine,” Neurocomputing, vol. 73, no. 4–6, pp. 633–638, 2010. 65 S. Ali and K. A. Smith, “Automatic parameter selection for polynomial kernel,” in Proceedings of the IEEE International Conference on Information Reuse and Integration (IRI ’03), pp. 243–249, Octobe 2003. 66 D. Mattera and S. Haykin, “Support Vector Machines for dynamic reconstruction of a chaotic system,” in Advances in Kernel Methods, pp. 211–241, MIT Press, Cambridge, Mass, USA, 1999. 67 O. Chapelle, V. Vapnik, O. Bousquet, and S. Mukherjee, “Choosing multiple parameters for Support Vector Machines,” Machine Learning, vol. 46, no. 1–3, pp. 131–159, 2002. 68 J. Platt, Probabilistic Outputs for Support Vector Machines and Comparison to Regularized Likelihood Methods, MIT Pres, Cambridge, Mass, USA, 2000. 69 V. Cherkassky and F. Mulier, Learning from Data: Concepts, Theory and Methods, John Wiley and Sons, New York, NY, USA, 2007. 70 L. Zhuang and H. Dai, “Parameter optimization of kernel-based one-class classifier on imbalance text learning,” in Pricai 2006: Trends in Artificial Intelligence, Proceedings, vol. 4099, pp. 434–443, 2006. 71 T. Mu and A. K. Nandi, “Breast cancer detection from FNA using SVM with different parameter tuning systems and SOM-RBF classifier,” Journal of the Franklin Institute, vol. 344, no. 3-4, pp. 285– 311, 2007. 72 A. J. Myles, R. N. Feudale, Y. Liu, N. A. Woody, and S. D. Brown, “An introduction to decision tree modeling,” Journal of Chemometrics, vol. 18, no. 6, pp. 275–285, 2004. 73 M. Debeljak and S. Dˇzeroski, “Decision trees in ecological modelling,” in Modelling Complex Ecological Dynamics, F. Jopp, H. Reuter, and B. Breckling, Eds., pp. 197–209, Springer, Berlin, Germany, 2011. 74 S. K. Murthy, “Automatic construction of decision trees from data: a multi-disciplinary survey,” Data Mining and Knowledge Discovery, vol. 2, no. 4, pp. 345–389, 1998. 75 R. Bou Kheir, M. H. Greve, C. Abdallah, and T. Dalgaard, “Spatial soil zinc content distribution from terrain parameters: a GIS-based decision-tree model in Lebanon,” Environmental Pollution, vol. 158, no. 2, pp. 520–528, 2010. 76 G. K. F. Tso and K. K. W. Yau, “Predicting electricity energy consumption: a comparison of regression analysis, decision tree and neural networks,” Energy, vol. 32, no. 9, pp. 1761–1768, 2007. 77 Y. Zhao and Y. Zhang, “Comparison of decision tree methods for finding active objects,” Advances in Space Research, vol. 41, no. 12, pp. 1955–1959, 2008. 78 L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, Classification and Regression Trees, Wadsworth, Belmont, Calif, USA, 1984. 79 J. A. Michael and S. L. Gordon, Data Mining Technique: For Marketing, Sales and Customer Support, Wiley, New York, NY, USA, 1997. 80 J. R. Quinlan, “Induction of decision trees,” Machine Learning, vol. 1, no. 1, pp. 81–106, 1986. 81 J. R. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann, San Mateo, Calif, USA, 1993. 82 I. H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques, Morgan Kaufmann, Los Altos, Calif, USA, 2nd edition, 2005. 83 T. S. Lim, W. Y. Loh, and Y. S. Shih, “Comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms,” Machine Learning, vol. 40, no. 3, pp. 203– 228, 2000. 84 J. H. Cho and P. U. Kurup, “Decision tree approach for classification and dimensionality reduction of electronic nose data,” Sensors and Actuators B, vol. 160, no. 1, pp. 542–548, 2011. 85 V. T. Tran, B. S. Yang, M. S. Oh, and A. C. C. Tan, “Fault diagnosis of induction motor based on decision trees and adaptive neuro-fuzzy inference,” Expert Systems with Applications, vol. 36, no. 2, pp. 1840–1849, 2009. 86 F. Provost and P. Domingos, “Tree induction for probability-based ranking,” Machine Learning, vol. 52, no. 3, pp. 199–215, 2003. 87 Z. Xie, Q. Zhang, W. Hsu, and M. Lee, “Enhancing SNNB with local accuracy estimation and ensemble techniques,” in Proceedings of the 10th international conference on Database Systems for Advanced Applications (DASFAA ’05), L. Zhou, B. Ooi, and X. Meng, Eds., vol. 3453, p. 983, Springer,

26


Beijing, China, April 2005. 88 Y. Murakami and K. Mizuguchi, “Applying the Na¨ıve Bayes classifier with kernel density estimation to the prediction of protein-protein interaction sites,” Bioinformatics, vol. 26, no. 15, pp. 1841–1848, 2010. 89 J. Cohen, “A coefficient of agreement for nominal scales,” Educational and Psychological Measurement, vol. 20, no. 1, pp. 37–46, 1960. 90 F. K. Hoehler, “Bias and prevalence effects on kappa viewed in terms of sensitivity and specificity,” Journal of Clinical Epidemiology, vol. 53, no. 5, pp. 499–503, 2000. 91 F. Guzzetti, P. Reichenbach, F. Ardizzone, M. Cardinali, and M. Galli, “Estimating the quality of landslide susceptibility models,” Geomorphology, vol. 81, no. 1-2, pp. 166–184, 2006. 92 J. R. Landis and G. G. Koch, “The measurement of observer agreement for categorical data,” Biometrics, vol. 33, no. 1, pp. 159–174, 1977. 93 B. Pradhan and S. Lee, “Delineation of landslide hazard areas on Penang island, Malaysia, by using frequency ratio, logistic regression, and artificial neural network models,” Environmental Earth Sciences, vol. 60, no. 5, pp. 1037–1054, 2010. 94 C. M. Wang and Y. F. Huang, “Evolutionary-based feature selection approaches with new criteria for data mining: a case study of credit approval data,” Expert Systems with Applications, vol. 36, no. 3, pp. 5900–5908, 2009. 95 C. J. F. Chung and A. G. Fabbri, “Validation of spatial prediction models for landslide hazard mapping,” Natural Hazards, vol. 30, no. 3, pp. 451–472, 2003. 96 S. Lee, J. H. Ryu, and I. S. Kim, “Landslide susceptibility analysis and its verification using likelihood ratio, logistic regression, and artificial neural network models: case study of Youngin, Korea,” Landslides, vol. 4, no. 4, pp. 327–338, 2007. 97 S. Sarkar, D. P. Kanungo, A. K. Patra, and P. Kumar, “GIS based spatial data analysis for landslide susceptibility mapping,” Journal of Mountain Science, vol. 5, no. 1, pp. 52–62, 2008. 98 T. Can, H. A. Nefeslioglu, C. Gokceoglu, H. Sonmez, and T. Y. Duman, “Susceptibility assessments of shallow earthflows triggered by heavy rainfall at three catchments by logistic regression analyses,” Geomorphology, vol. 72, no. 1–4, pp. 250–271, 2005. 99 A. Brenning, “Spatial prediction models for landslide hazards: review, comparison and evaluation,” Natural Hazards and Earth System Science, vol. 5, no. 6, pp. 853–862, 2005. 100 X. Yao, L. G. Tham, and F. C. Dai, “Landslide susceptibility mapping based on Support Vector Machine: a case study on natural slopes of Hong Kong, China,” Geomorphology, vol. 101, no. 4, pp. 572–582, 2008. 101 M. Marjanović, M. Kovaˇcević, B. Bajat, and V. Voˇzen´ılek, “Landslide susceptibility assessment using SVM machine learning algorithm,” Engineering Geology, vol. 123, no. 3, pp. 225–234, 2011. 102 C. Ballabio and S. Sterlacchini, “Support Vector Machines for landslide susceptibility mapping: the Staffora River Basin case study, Italy,” Mathematical Geosciences, vol. 44, no. 1, pp. 47–70, 2012.

Paper VII Tien Bui, D., Pradhan, B., Lofman, O., Revhaug, I., Dick, O.B., 2012. Regional prediction of landslides hazards in the Hoa Binh province (Vietnam) using probability analysis of intense rainfall. Natural Hazards. Doi: 101007/s11069-012-0510-0 0:0. (Accepted).

2

Regional prediction of landslide hazard using probability analysis of intense rainfall in the Hoa Binh province, Vietnam

3

Dieu Tien Bui a,b,*, Biswajeet Pradhan , Owe Lofman a, Inge Revhaug a, Oystein B. Dick a

1

4 5

c

a

Department of Mathematical Sciences and Technology, Norwegian University of Life Sciences, P.O. Box 5003 IMT, NO-1432, Ås, Norway

6 7

b

8 9

c

10

Faculty of Surveying and Mapping, Hanoi University of Mining and Geology, Dong Ngac, Tu Liem, Hanoi, Vietnam Faculty of Engineering, Department of Civil Engineering, Geospatial Information Science Research Centre (GISRC), University Putra Malaysia, Serdang, Selangor Darul Ehsan 43400, Malaysia *Corresponding author: Tel: +47 64965424. E-mail addresses: [email protected]/[email protected] (D. Tien Bui)

11 12

Abstract

13 14 15 16 17 18 19 20 21 22 23 24 25 26 27

The main objective of this study is to assess regional landslide hazards in the Hoa Binh province of Vietnam. A landslide inventory map was constructed from various sources with data mainly for a period of 21 years from 1990 to 2010. The historic inventory of these failures shows that rainfall is the main triggering factor in this region. The probability of the occurrence of episodes of rainfall and the rainfall threshold were deduced from records of rainfall for the aforementioned period. The rainfall threshold model was generated based on daily and cumulative values of antecedent rainfall of the landslide events. The result shows that 15-day antecedent rainfall give the best fit for the existing landslides in the inventory. The rainfall threshold model was validated using the rainfall and landslide events that occurred in 2010 that were not considered in building the threshold model. The result was used for estimating temporal probability of a landslide to occur using a Poisson probability model. Prior to this work, five landslide susceptibility maps were constructed for the study area using support vector machines, logistic regression, evidential belief functions, Bayesian regularized neural networks, and neuro-fuzzy models. These susceptibility maps provide information on the spatial prediction probability of landslide occurrence in the area. Finally, landslide hazard maps were generated by integrating the spatial and the temporal probability of landslide. Total 15 specific landslide hazard maps were generated considering three time periods of 1, 3, and 5 years.

28

Key words: Landslide hazard; Rainfall threshold; GIS; Hoa Binh; Vietnam

29 30

1. Introduction

31 32 33 34 35 36 37 38 39

In the Hoa Binh province, rainfall during the last decade has been particularly heavy resulting in an increasing frequency of landslide occurrences. Landslides mainly occur in the rainy season during May to October, especially during torrential rainstorms. In addition, the effect of tectonic activity, steep terrain, and extensive clear-cut logging are factors leading to the occurrence of landslides (Tien Bui et al. 2011a). In recent years, due to the development of economics with extensive land use activities, some main infrastructures such as new road networks and settlement expansions are shifted to the mountainous regions. Therefore areas that may have a potential risk for landslides should be identified in order to reduce the probability of damage. For that reason, landslide hazard assessment has become an urgent task that can help authorities to reduce landslide damages through proper land use management for infrastructural development and for environmental protection.

40 41 42 43 44 45 46 47 48 49

Landslide hazard is expressed as the probability of a potentially damaging landslide in a specified period of time and in a given area (Van Westen et al. 2006; Varnes 1984). The aforementioned definition of landslide hazard incorporates the concepts of both location and time. It means when assessing landslide hazards, one has to predict “where” a landslide will occur (spatial probability) and “when” or how frequent (temporal probability) will they occur. For estimation of the spatial probability of landslide hazards, various methods and models are successfully developed and used in the literature (Chacon et al. 2006; Guzzetti et al. 2006; Yao et al. 2008; Pradhan and Lee 2010; Pradhan et al. 2010; Yeon et al. 2010; Yilmaz 2010; Marjanovic et al. 2011; Oh and Pradhan 2011; Sezer et al. 2011; Althuwaynee et al. 2012; Ballabio and Sterlacchini 2012; Devkota et al. 2012; Lee et al. 2012; Pourghasemi et al. 2012a; 2012b; Xu et al. 2012; Zare et al. 2012; Tien Bui et al. 2012c; Pradhan 2010a, b, 2011a, b, 2012). However, few attempts have been carried out to estimate temporal probability of slope

1

50 51 52

failure (Guzzetti et al. 2005; Jaiswal et al. 2010; Das et al. 2011). Thus, landslide hazard mapping is considerably challenging either due to incomplete dataset or unavailability of historical data in developing countries (Harp et al. 2009) such as in Vietnam.

53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69

Two main approaches have been widely used for the assessment of temporal probability of the future occurrence of landslides: the first one is the analysis of potential slope failure and the second one is the statistical treatment of the past landslide events (Lopez Saez et al. 2012). The first approach analyses the current slope conditions and assesses the potential for instability. This approach, however, is less suitable for large areas (Jaiswal and van Westen 2009) such as the Hoa Binh province. The second one focus on the analyses of the frequency of the past landslide events (Brabb 1984) and can be carried out directly by using the historical records of the landslides or indirectly using the information of rainfall-triggered landslide events (Corominas and Moya 2008). Direct analyses of fairly complete historical landslide records covering a long time span is considered to be a good way to obtain the temporal probability. However, mostly it is extremely difficult to have such data for all the existing individual landslides at regional scale. Therefore, indirect method that use the frequency of occurrence of rainfall to estimate the temporal probability of landslides was used in this study. Although indirect method does not require a complete multi-temporal landslide inventory, they require an establishment of reliable relations between rainfall and the occurrence of landslides (Jaiswal et al. 2010). Once the rainfall threshold is determined, the temporal probability of landslides is calculated based on cluster of times rainfall exceeded the threshold. Since the frequency of the rainfall triggered landslides only provides estimation on how often landslides may occur, therefore it needs to integrate with the spatial prediction of potential landslides to produce a landslide hazard map (Corominas and Moya 2008).

70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89

Prediction of landslide occurrences using rainfall thresholds have been successfully accomplished in the United States (Chleborad et al. 2006; Chleborad 2000; Salciarini et al. 2008; Godt et al. 2006; Coe et al. 2004), Canada (Jakob and Weatherly 2003; Jakob et al. 2006), Japan (Saito et al. 2010; Matsushi and Matsukura 2007; Osanai et al. 2010), New Zealand (Glade 2000; Schmidt and Glade 2003), Italy (Guzzetti et al. 2005; Aleotti 2004; Salciarini et al. 2012; Giannecchini et al. 2012; Martelloni et al. 2011), Spain (Corominas and Moya 1999), Portugal (Zezere et al. 2005), and Norway (Melchiorre and Frattini 2012). Three main approaches have been proposed to establish rainfall thresholds for landslide initiation (Guzzetti et al. 2007). The first one is the physical based models (Crosta and Frattini 2003; Montgomery and Dietrich 1994; Wilson and Wieczorek 1995), the second one is empirical models (Caine 1980; Reichenbach et al. 1998; Aleotti 2004; Jemec and Komac 2012; Sengupta et al. 2010), and the last one is statistical based models (Frattini et al. 2009). The physical based threshold models establish the correlation between rainfall and local terrain characteristics (e.g. slope gradient, soil depth, and lithology) through a dynamic hydrological model (Terlien 1998). Empirical models define rainfall thresholds based on the analysis of the past rainfall events that have resulted in landslides. Based on plotting the rainfall conditions that resulted in landslides, the thresholds are usually determined visually by drawing lower-bound lines in the graph (Guzzetti et al. 2007). Statistical-based models use statistical analysis techniques, such as logistic regression and Bayesian inference to determine rainfall thresholds (Frattini et al. 2009; Guzzetti et al. 2007). Although a fair number of rainfall thresholds have been proposed however, most of the aforementioned thresholds only perform reasonably well in the respective study areas for which they were developed (Jakob et al. 2006). But in general, it is difficult to export these results to other areas due to the differences in climate, geology, and geomorphologic settings.

90 91 92 93 94 95 96 97 98

The main objective of this study is to assess landslide hazard using temporal rainfall at regional scale for the Hoa Binh province. The rainfall threshold for landslide initiation was estimated based on the correlation analysis of the rainfall and the historical landslide records. The rainfall threshold was validated using the landslide events in 2010. Then the exceedance probability of the rainfall threshold was obtained and the temporal probability of landslides was estimated indirectly using Poisson model. Various landslide susceptibility maps were generated from the previous works conducted by the same authors using support vector machine (Tien Bui et al. 2012a), logistic regression (Tien Bui et al. 2011a), evidential belief functions (Tien Bui et al. 2012d), Bayesian regularized neural networks (Tien Bui et al. 2012b), and neuro-fuzzy models (Tien Bui et al. 2011b) were used to integrate with the temporal probability to produce the landslide hazard maps.

99 100

2. Study area

101 102 103 104 105 106

The Hoa Binh province is located in the north-western region of Vietnam (Fig. 1). It covers an area of about 4660 km2 between longitudes 104°48′E and 105°50′E, and latitudes 20°17′N and 21°08′N. The elevation in the province ranges from 0 to 1,510 m above sea level and gradually decreases from Northwest to Southeast. The landscape in the province is rather complicated, and generally it can be reclassified into three basic classes: the mountainous complex, the hilly complex, and the valley. It has a diverse topography such as mountains, small valleys, hills, mounts, cliffs, and plains. The mountainous region is strongly dissected and steep. The plains are

2

107 108 109

small and intermixed with valleys whereas the hills are dispersed between the mountains and the plains. The diverse topography makes the province one of the most prone regions in Vietnam for natural disasters such as floods and landslides. The detail description of the study area can be seen in Tien Bui et al. (2011a).

110 111 112 113 114

Geologically, the area comprises of limestone, conglomerate, aphyric basalt, sandstone, silty sandstone, and black clay shale. The ages of rocks vary from Paleozoic to Cenozoic. Five major fracture zones (Hoa Binh, Da Bac, Muong La-Cho Bo, Son La-Bim Son, and Song Da) pass through the province causing rock mass weakness. The land use of the study area comprises of forest land (52.6%), barren land and non-forest rocky mountain (21%), agricultural land (14.5%), settlement areas (7.5%), water surface (4%), and grass land (0.4%).

115 116

Fig. 1 The study area

117 118 119 120

The study area is situated in the monsoonal region, with hot, rainy, and dry seasons. The coldest month is January and the hottest month is July with an average temperature of 14.9o C and 26.7o C respectively. Seasons in the province are classified as rainy or dry. The rainy season is normally from May to October with a high frequency of intense rainfalls. In the rainy season, the average annual rainfall is around 200 mm per month. In

3

121 122 123 124 125 126 127

August and September, rainfall peaks at values from 300 to 400 mm per month. The frequency and intensity of the rainfall is concentrated over a short period that triggers most of the landslides, flooding, and erosion in the study area. The rainfall data available from 12 rain gauges are based on daily measurements. The historical rainfall record for the past 21 years i.e. for the period 1990 to 2010 shows that, the mean annual rainfall ranges from 1376.1 mm in the Muong Chieng area, to 2075.7 mm in the Kim Boi region (Table 2). The annual rainfall ranged from 937.3 mm (Cao Phong rain gauge in 1991) to 4811.6 mm (Lac Son 2001), with an average value of 3120.7 mm. It accounts for 86% of the annual rainfall (Fig. 3).

128 129 130 131 132

The average annual rainfall is lowest at the Muong Chieng rain gauge (1376 mm) and highest at the Kim Boi rain gauge (2076 mm).The most important characteristic of rainfall in the study area is that it is concentrated in a few days with a maximum daily rainfall exceeding 100 mm. The maximum rainfall recorded in a single day of a year varies from 35 mm (at the Cao Phong rain gauge in 1993) to 950 mm (at the Mai Chau rain gauge in 2006) (Fig. 2).

133 4

134 135 136

Fig. 2 Daily rainfall for the period of 1990-2010 for rain gauges: (a) Luong Son; (b) Cao Phong; (c) Hoa Binh; (d) Kim Boi; (e) Tu Ly, (f) Muong Chieng; (g) Phieng Ve; (h) Mai Chau; (i) Tan Lac; (k) Lac Son; (m) Hung Thi; (n) Chi Ne

137 138

Fig. 3 Distribution of rainfall at the 12 rain gauges of the Hoa Binh province of the period 1990-2010

139

3. Landslide inventory map

140 141 142 143 144 145 146

The landslide inventory map was prepared from historical records for the past 20 years. This map is based on landslide inventory maps from several projects such as (1) “Investigation and assessment of the types of geological hazard in the territory of Vietnam and recommendation of remedial measures. Phase II: A study of the northern mountainous province of Vietnam” (Hue et al. 2004); (2) “The investigated report of natural hazards in the northwest of Vietnam” (Thinh et al. 2005); (3) “Construction of an environmental hazard zonation map for the northwest territory of Vietnam” (My 2007). Additionally, some recent landslides were collected by interpretation of SPOT 5 satellite imagery with 2.5 m spatial resolution (Tien Bui et al. 2011a).

147 148 149 150 151 152 153 154 155 156 157 158

Fig.1 shows the distribution of landslides in the Hoa Binh province. A total of 97 areas of landslides and 21 areas of soil-rock slides were registered in the map. The smallest landslide size of the landslide is about 383 m2, the largest is 14,343 m2 and the average is 3,443 m2. In the study area, a full detail inventory of landslide is not available and a landslide was only reported if it had affected the infrastructural system or fatal death or injuries in the local community. Analysis of the landslide inventories show that most of the slope failures in the study area were caused by rainfall infiltration into the soil causing an increase in soil pore-water pressure (My 2007; Hue et al. 2004). Table 1 show that most of these landslides have occurred in the rainy season from May to October when daily rainfalls exceeded 100 mm. However, the antecedent rainfall that influence the degree of saturation of the soil (Godt et al. 2006) has also play an important role for the initiation of the landslides. Many landslides occurred on October 5, 2007 when daily rainfall and three days antecedent rainfall exceeded 172.4 and 161.6 mm respectively. A large number of landslide occurrences were also happened on October 31, 2008 when daily rainfall and 15-day antecedent rainfall exceeded 118.0 and 93.3 mm respectively (Table 1).

159

Table 1 Temporal occurrence of rainfall triggered landslides in the HoaBinh province from 1990 to 2010 Landslide Date of Daily 3 days 5 days 7days 10 days 15 days 30 days Affected areas episode occurrence rainfall rainfall rainfall rainfall rainfall rainfall rainfall (dd/mm/yyyy) (mm) (mm) (mm) (mm) (mm) (mm) (mm) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

14/9/1994 24/7/1996 15/8/1996 13/9/1996 14/9/1998 11/9/2000 11/9/2000 18/9/2005 27/9/2005 23/7/2004 19/8/2006 6/7/2007 6/7/2007 6/7/2007 5/10/2007

102 350 243.2 136.0 116.2 240.3 217.3 114.3 210.1 110.1 116.7 131.3 171.5 168.3 239.9

61.9 52.4 209.5 60.9 0 4.8 19.7 12.3 6.6 50.4 248.7 4.2 6.4 0.7 257.3

65.4 65.4 229.5 60.9 1.7 27.5 53.8 169.1 6.6 50.4 253.9 82.3 10.6 8.6 257.3

116.1 65.4 239.0 60.9 42.8 68.2 69.4 169.1 14.6 67.2 300.5 82.6 14.1 10.7 257.3

164.2 66.6 242.2 60.9 75.1 69.4 91.4 175.8 177.7 95.1 300.9 92.2 65.6 89.7 362.1

5

270.3 160.7 330.4 120.0 75.1 69.4 92.6 175.8 413.8 112.3 355.0 97.1 79.0 169.9 413.2

498.0 238.2 825.5 631.3 139.3 153.6 330.1 389.4 526.0 137.9 544.0 174.0 206.1 312.5 612.9

Doc Cun, Binh Thanh, Road 6 Pu Bin, Phuc San, Tan Mai Phuc San, Dong Bang, Tong Da Pu Bin, Phuc San Vinh Dong, Vinh Tien Doc Cun, Road 6 Doi Thai, Lac Son Doc Quy Hau, Road 6 Thung Khe, Mai Chau Dong Chanh, Vinh Dong, Kim Boi Doi Ong Tuong, Dong Tien Km 89+700, QL12B Suoi Lao, Road 433 Road 435, Thai Binh, HoaBinh Xom Ong , Nam Phong, Cao Phong

16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

160

5/10/2007 5/10/2007 5/10/2007 5/10/2007 27/9/2008 31/10/2008 31/10/2008 31/10/2008 31/10/2008 31/10/2008 31/10/2008 31/10/2008 11/5/2009 25/6/2008 18/7/2010

183.0 234.8 172.4 208.8 101.8 295.0 163.3 218.7 286.1 118.0 159.8 147.5 134.2 113.6 123.1

346.4 214.3 161.6 254.0 120.6 67.2 61.4 37.7 42.9 55.2 54.9 46.0 10.4 13.7 0.1

346.4 214.3 161.6 254.0 120.6 79.4 114.3 49.1 82.0 70.9 71.9 73.2 10.4 13.7 2.8

346.4 214.3 161.6 254.0 123.6 79.8 130.4 74.8 104.4 82.6 73.5 79.2 10.4 105.2 2.9

442.9 314.3 240.0 343.9 130.1 127.8 131.8 75.3 107.5 91.4 94.1 81.6 37.1 112.4 69.9

483.9 380.0 297.6 392.7 203.9 140.2 141.3 75.9 114.6 93.3 108.6 87.5 71.2 132.5 69.9

567.3 763.6 496.0 538.4 269.8 194.4 247.8 161.6 151.8 175.4 208.3 141.7 82.8 180.4 300.4

Pu Pin,So Lo, Phuc San,TongDau Doc Lao, Toan Son, Da Bac Doc Cun, Road 6 - Tan Lac Doc Quy Hau, Tan Lac Doc Quy Hau, Tan Lac Tu Son, Road 21, Kim Boi Phuc San, Tan Mai, Dong Bang Road 433, Da Bac Road 446, Luong Son Road 445, Ky Son Road 435, Cao Phong Road 440, Tan Lac Chi Ne, Lac Thuy Road 435, Nam Phong, Cao Phong Phuc San, Tan Mai, Mai Chau

28/8/2010 28/8/2010 28/8/2010 28/8/2010 28/8/2010

131.0 152.9 113.5 131.1 108.7

107.0 161.3 117.4 97.8 163.5

143.7 214.7 160.1 164.5 209.0

205.8 310.7 181.1 204.8 245.4

210.4 323.1 181.3 230.3 259.3

281.3 341.2 242.7 288.3 278.6

371.1 456.8 308.4 374.8 300.4

Phu Cuong QL 12B, Road 440 QL 12B, Lac Son Road 433, Da Bac Road 435, Cao Phong Road 432, Phuc San Tan Mai

Table 2 Available precipitation data (see Fig. 1 for the rain gauge locations) No

Rain gauge

1 2 3 4 5 6 7 8 9 10 11 12

Luong Son Cao Phong Hoa Binh Kim Boi Tu Ly Muong Chieng Phieng Ve Mai Chau Tan Lac Lac Son Hung Thi Chi Ne

Mean annual precipitation (mm) Data availability 1729.5 1844.7 1832.1 2075.7 1808.1 1376.1 1553.6 1758.8 1716.5 1978.0 1890.7 1885.2

1990-2010 1990-2010 1990-2010 1990-2010 1990-2010 1990-2010 1990-2010 1990-2010 1990-2010 1990-2010 1990-2010 1990-2010

Notes

1990 and 1991 (August to December ) missing

161 162

4. Temporal assessment of landslide hazards

163 164

4.1. Determination of rainfall threshold

165 166 167 168 169 170 171 172 173 174 175 176 177

A rainfall threshold is defined as the minimum rainfall conditions for triggering landslides in a particular region (Guzzetti et al. 2007). The determination of rainfall thresholds for landslide initiation is considered as a basic task in landslide hazard assessment and various methods have been proposed to establish rainfall thresholds (Dahal et al. 2009; 2008; Guzzetti et al. 2007; Zezere et al. 2005; Giannecchini et al. 2012; Frattini et al. 2009; Crosta 1998; Corominas and Moya 1999; D'Odorico and Fagherazzi 2003; Glade 2000; Godt et al. 2006; Marques et al. 2008; Saito et al. 2010). In general, they can be classified into five threshold groups: (1) Empirical; (2) Physical-based; (3) Intensity-duration; (4) Normalized intensity-duration; and (5) Antecedent rainfall. The advantage and disadvantage for each of these threshold groups are discussed in Guzzetti et al. (2007; 2008). The thresholds using rainfall intensity-duration is the most widely used method in the literature (Martelloni et al. 2011). For rainfall threshold estimation, the most four common variables used in the literature are: daily rainfall (Dahal and Hasegawa 2008), antecedent rainfall (Glade 2000), cumulative rainfall (Polemio and Sdao 1999), and normalized critical rainfall (Aleotti 2004). In general, the selection of the right parameters in constructing rainfall threshold is mainly dependent on the landslide type (Martelloni et al. 2011).

6

178 179 180 181 182

In the case of the Hoa Binh province where only daily rainfall is available, antecedent rainfall was believed to have played an important role in the initiation of landslides because it reduces soil suction and increases the pore-water pressure (Thach et al. 2002). Therefore, antecedent rainfall was used for the establishment of the threshold model using the empirical method. Fig. 2 shows the daily rainfall data at the Hoa Binh province between the years 1990 and 2010.

183 184 185 186 187 188 189 190 191 192 193 194

One of the most difficulties when using antecedent rainfall for landslide prediction is to determine the number of days to be used (Guzzetti et al. 2007). A detail literature review revealed a complex relationship on the correlation between numbers of days for the antecedent rainfall with the triggering of a landslide. Terlien (1998) considered 2, 5, 15, and 25 days for the Manizales area (Colombia). Kim et al. (1992) used of 3 days, whereas Heyerdahl et al. (2003) used 4 days. Glade (2000) used 10 days. Aleotti (2004) considered 7, 10, and 15 days. Zezere et al. (2005) used antecedent rainfall for 1, 5, 10, 15, 30, 45, 60, 75 and 90 days. Polemio and Sdao (1999) considered 180-day cumulative daily rainfall data. In summary, the antecedent rainfall between 3 to 120 days (Pasuto and Silvano 1998) could be significant for explaining the landslide occurrence (Dahal et al. 2009). The large variability on the number of antecedent rainfall days may be influenced by factors such as: (i) diverse lithological, morphological, vegetation, and soil conditions, (ii) different climatic regimes and meteorological circumstances leading to slope instability, (iii) and heterogeneity and incompleteness in the rainfall and landslide data used to determine the thresholds (Guzzetti et al. 2007).

195 196 197 198

Fig. 4 Relationship between daily rainfall and antecedent rainfall of the HoaBinh province for the period 1990 to 2010. In the figures, red diamonds depicts the landslide events and blue diamond shows the maximum yearly rainfall in one day without reported landslide

199 200

In order to determine the number of days for the antecedent rainfall, we considered the correlation analysis between the daily rainfalls related to the past landslide events and the corresponding antecedent rainfall (Zezere

7

201 202 203 204 205 206 207

et al. 2005) for six different periods: 3, 5, 7, 10, 15, and 30 days. The result is shown in Fig. 4. The red diamonds depicts the landslide events whereas the blue diamond shows the maximum yearly rainfall in one day without reported landslide for the period of 21years from 1990 to 2010. From these graphs, it can be observed that the best discrimination (between the events triggered landslides and those that did not) appeared for the landslide events corresponding to the 15 days of antecedent rainfall. For the other antecedent rainfall days, discrimination seems to be not so evident. For that reason the 15-day antecedent rainfall was adopted for the calculation of rainfall threshold in this study.

208 209 210 211 212 213 214

In order to determine the rainfall threshold RTH, a scatter graph that represents daily rainfall (with one or more landslides) against the corresponding 15-day antecedent rainfall was constructed (Fig. 5). This graph is based on the rainfall-induced landslides in the past with episodes from 1990 to 2009. The landslides that occurred in 2010 were used for the threshold validation. The mathematical equation for the envelope curve for landslides was then obtained using the lower end of the plotted points in the scattered graph (Chleborad 2000; Chleborad et al. 2006; 2008; Jaiswal and van Westen 2009) as RTH= 128.5 – 0.164 R15Ad. Where RTH is rainfall threshold and R15Ad is accumulated rainfall values for the antecedent 15 days.

215 216 217

Fig. 5 Rainfall threshold for the Hoa Binh province, RTH is the threshold rainfall and R15Ad is the 15-days antecedent rainfall

218

4.2. Validation of the rainfall threshold

219 220 221 222 223 224 225 226 227 228 229 230

In landslide hazard modeling, validation of employed models is considered to be the most essential component and without validation, the prediction models will have no scientific significance (Chung and Fabbri 2003). In order to validate the rainfall threshold, the main recent rainfall events and the recorded landslide data from 1 January 2010 to 31 December 2010 were used. This dataset was not used in the rainfall threshold model creation. The result (Fig. 6) shows that in the period from May to October 2010 the rainfall threshold exceeded once on 28 August at Tan Lac, Lac Son, and Tu Ly. Subsequently, the rainfall exceeded two more times in Cao Phong (Jun 25 and August 28) and three times Mai Chau (18 and 25 Jun, 28 August) respectively. It could be observed that all landslides occurred when the rainfall values exceeded these thresholds. In contrast, during one event occurred in Mai Chau (25 June) the rainfall threshold exceeded but no landslide was reported (Fig. 6e). The daily and 15day antecedent rainfalls on June 25 were 94.6 and 195.8 mm respectively. This rainfall event, however, was quite close with the previous event (Jun 18) that did cause some landslides (Fig. 6e). In general, the threshold model has performed well for accurate forecasting of landslide events in 2010.

231

4.3. Temporal probability of landslide initiation

232 233 234 235 236 237

For estimating the temporal probability of landslide initiation, we considered the following assumptions: (1) the probability of occurrence of a landslide is related to the probability of occurrence of the rainfall threshold (Jaiswal and van Westen 2009); (2) the landslide activity will not occur or occurs only rarely when rainfall amounts are below the rainfall threshold (Chleborad et al. 2006). The probability of occurrence of episodes of rainfall exceeding the rainfall threshold for the period of 21 years (from 1990 to 2010) was used for estimating the temporal probability of a landslide occurrence using a Poisson probability model.

238 239

According to Crovelli (2000), probability of n landslides during time t can be estimated using Poisson distribution as follows: P(N(t) = n)  e  λt

240

( λt ) n ; n  1, 2, 3,.. n!

(1)

241

where N(t) is the number of landslides occurred during time t, λ is the rate of occurrence of landslides.

242 243

Probability of one or more landslides occurred during time t, which is called exceedance probability, can be estimated as follows:

8

244

P(N(t)  1 )  1 – Exp   t /  

245 246 247 248 249

where  is called the future mean recurrence interval and   t ; t is a period of time in the future for which the exceedance probability is calculated. The future mean recurrence interval is estimated using the historical mean recurrence interval with the assumption that the future occurrence of landslides will remain the same as it was in the past (Crovelli 2000).

(2)

1

250 251 252 253

Fig. 6 Validation of the threshold equation RTH= 128.5 – 0.164 R15Adfor the Hoa Binh province. Positive values on the vertical axis indicate threshold exceedance (R>RTH). Black triangles are the rainfall events that have reported landslides. Rain gauges: (a) Tan Lac; (b) Lac Son; (c) Tu Ly; (d) Cao Phong; (e) Mai Chau

254 255

Table 3 Temporal probability of landslide hazard for the Hoa Binh province with the return period of one, three, and five years

256 257 258

No

Sub-region

1 2 3 3 4 5 6 7 8 9 10 11

Luong Son Cao Phong Hoa Binh Kim Boi Tu Ly Muong Chieng Phieng Ve Mai Chau Tan Lac Lac Son Hung Thi Chi Ne

Number of times the threshold are exceeded 29 30 34 28 28 20 19 31 24 29 30 46

Temporal probability for different return periods ( 1 year) (3 years) (5 years) 0.749 0.984 0.999 0.760 0.986 0.999 0.802 0.992 1 0.736 0.982 0.999 0.736 0.982 0.999 0.614 0.943 0.991 0.595 0.934 0.989 0.771 0.988 0.999 0.681 0.968 0.997 0.749 0.984 0.999 0.760 0.986 0.999 0.888 0.999 1

Because of the rainfall variability, the study area was divided into 11 sub-regions (Fig. 1) based on elevation difference, horizontal distance, geographic boundaries as well as the topographical location of the rain gauges (Petrucci and Polemio 2009; Petrucci et al. 2009). Using the time series of daily rainfall of 11 rain gauges, the

9

259 260

number of times the threshold were exceeded for the period of 21 years (1990-2010) was calculated. Finally, the temporal probability for each sub-region was obtained (Table 3).

261 262

5. Spatial prediction of landslide hazards

263 264 265 266 267 268

Prior to this study, landslide susceptibility maps were constructed by analysing the relationship between the landslide inventories with various conditioning factors. The same authors have developed and successfully applied various models for the spatial probability of landslide hazards for the Hoa Binh province. Based on that results, the landslide susceptibility maps obtained from the models: logistic regression (Tien Bui et al. 2011a), support vector machines (Tien Bui et al. 2012a), evidential belief functions (Tien Bui et al. 2012d), Bayesian neural networks (Tien Bui et al. 2012b), neuro-fuzzy (Tien Bui et al. 2011b), were used in this study (Fig. 7).

269 270 271 272

Fig. 7 Landslide susceptibility maps for the Hoa Binh province obtained from (a) Support vector machines with the radial basis function (RBF); (b) Logistic regression; (c) Evidential belief functions; (d) Bayesian regularized neural networks; (e) Neuro-fuzzy with the Sigmoid curve membership function (Sigmf)

10

273 274 275 276 277 278 279 280 281 282 283

All the landslide susceptibility models were validated by mean of success rate and prediction rate methods. The success-rate results were obtained by comparing the landslide susceptibility maps with landslide in the training dataset. Whereas the prediction capability of the susceptibility models were assessed using the validation dataset that is independent from the one used in the process of building the landslide models. Since the results have already been discussed in the previous published papers, for that reason we only sum up the main results here: the prediction rate of the models is slightly lower than the success rate of the models. The landslide susceptibility model obtained from support vector machine has the highest prediction probability. It is followed by logistic regression, evidential belief functions, Bayesian neural networks, and neuro-fuzzy (Fig. 8). This result is in agreement with Yao et al. (2008), Marjanovic et al. (2011), and Ballabio and Sterlacchini (2012), who stated that prediction capability of the support vector machines model possesses better than those obtained from the logistic regression and other conventional models.

284 285

Fig. 8 (a) Success-rate and (b) prediction-rate curves of the five landslide susceptibility models

286 287

6. Landslide hazard assessment

288 289 290 291 292

For presentation purpose, only two landslide hazard maps based on support vector machines and logistic regression models are shown here. These maps were obtained by multiplying the values of spatial and temporal probability of landslides to delineate the landslide hazard. In order to evaluate the changes in temporal of these maps, the probability is estimated for three scenarios i.e. 1 year, 3 years, and 5 years. Examples of rainfallinduced landslide hazard maps for 1 year, 3 years, and 5 years are shown in Fig. 9.

293 294 295 296 297 298

These landslide hazard maps were visualized by means of four main groups risk conditions (Pradhan and Lee 2010) such as: (1) high risk areas (10%); (2) moderate risk areas (15%); (3) low risk areas (15%; (4) very low risk areas (60%). These risk condition groups are drawn based on the graph of cumulative percentage of landslide vs. landslide hazard map index (Fig. 10). Further, this graph was constructed by overlapping the landslide inventory with the landslide hazard map produced using support vector machines model with scenario of 5 years.

299 300 301 302 303 304

It could be observed that most of the landslide pixels (87.2% landslide pixel) are located in the high risk areas where the landslide hazard probability of over 58%. Only few landslide pixels (1%) are located in the very low risk areas. The landslide pixels fall in the moderate and low risk areas are 9.4% and 2.4% respectively. In general, the hazard map clearly separates the high and the low risk areas (Fig. 11). Moreover the areas with high and moderate risk should be taken into account when planning mitigation and designing landslide remedial measures.

305 306

7. Discussions and conclusion

307 308 309 310 311 312 313

Landslides are a common natural hazard during heavy rainfall in mountainous areas of Vietnam. During the last decade; landslides are significantly increased due to clear-cut logging, deforestation, and infrastructural expansion. However, the assessment of landslide hazard is even difficult in developed countries (Harp et al. 2009). In recent years, the development of geographical information systems (GIS) technology in combination with the mathematical and statistical tools (such as in Matlab software) have led to the growing application of quantitative techniques in many fields of the earth sciences (Carrara and Pike 2008) including landslide. Many methods and techniques for the quantitatively assessment of landslide hazard have been proposed, however they

11

314 315 316 317 318 319 320 321

may be best classified as susceptibility models because they only provide information on spatial probability (Das et al. 2011). In this study, we present an approach for the regional prediction of landslide hazards in the Hoa Binh province (Vietnam). This approach allows us to assess landslide hazard scenario in an area where a multitemporal landslide inventory is not full available. Since the map is available only in regional scale, therefore the run out analysis to quantify the hazard was not included. The landslide hazard maps were obtained by integrating temporal probability of landslides with the landslide susceptibility maps. These hazard maps provide information on “where” and “when” a landslide is expected in terms of probability, therefore land use planning for the future development implementation indifferent scenarios (1, 3, and 5 years) can be carried out.

322 323 324

Fig. 9 Examples of landslide hazard maps for three scenarios from 1 to 5 years. The probability gives the joint probabilities of landslides spatial occurrence and landslides temporal occurrence

325 326 327 328 329

In general, these hazard maps clearly separate the high and the low risk areas. The detail interpretation of the hazard map (example in Fig. 11) revealed that areas with high probability of landslides along the active fault zones passing through the Tan Mai and Phuc San communes (Fig 11). They belong to the most vulnerable areas for landslides in the study area for the last years. In the two communes, a total of 350 families had to relocate due to landslides in 2010 and 2011. Other areas with high probability of landslides are (as shown in Fig. 11): (1)

12

330 331 332 333 334 335 336 337

Phieng Sa, Tan Son areas: on February 16, 2012, a large landslide occurred in these areas destroying a section of the national road number 6 and killing two people (the blue pushpin, Fig. 11). Another large landslide occurred on 13 September 2012 that destroyed the road section and blocked traffic for some days (the pink pushpin in Fig. 11). (2) Tong Dau and Dong Ban areas: in these areas on 2 March 2012, a large landslide (more than 40,000 m3) collapsed destroying more than 100 m of the road section and caused serious traffic problems. High vulnerable areas for landslides (as reported for last some years) are also for Toan Son and Tu Ly, Dong Tien and Doc Cun, Quy Hoa and Vinh Dong, and Hung Tien. These areas are indicated as high probability of landslide hazards on the map (Fig. 11).

338 339

Fig. 10 Cumulative percentage of landslide against landslide hazard map index

340 341

Fig. 11 Landslide hazard map the scenario of 5 years based on the support vector machines (RBF) model

342 343

A further analysis was carried out by overlaying the populated area with the landslide hazard maps. The result shows that many high risk regions are falling within the populated areas with extensive road networks (Figs. 11,

13

344 345 346 347

12). This is one of the main problems for the implementation of the development in these areas. As shown in the map (Fig. 11), a high probability of landslide hazards has caused the loss of human lives, properties, and infrastructures due to recent landslides. Therefore, these areas should be given a high priority for developing mitigation measures to reduce the impact of landslides.

348 349 350 351 352 353 354

In the Hoa Binh province where a full detail landslide inventory is not available, it is impossible to use the traditional methods for direct analysis i.e. the frequency of landslides to obtain landslide hazard maps. Moreover, the details of soil thickness and soil strength properties are also not available. For that reason, deterministic models for the assessment of the absolute or relative stability of the slopes in regional analysis are not applicable in the study area. Therefore temporal probability of landslide occurrences was calculated using an indirect method based on the mean rate of occurrence of the rainfall threshold. This method uses the availability of information on the date of landslide episodes and daily rainfall data for 21 years (1990-2010).

355 356 357 358 359 360

The comparative assessment between the landslide susceptibility maps with the landslide hazard map shows that some areas is relatively low susceptible to landslides (such as Lac Thuy area) but they become a little higher of landslide hazards (Fig. 11) in the given time of one year and vice versa. This is due to the number of times that rainfall has exceeded for the period resulting high temporal probability for those regions. And thus the probability of landslide occurrence for an area is a conditionally function of the probabilities of a landslide triggering rainfall event and the landslide susceptibility condition (Jaiswal et al. 2010).

361 362 363 364 365 366 367 368 369 370 371 372 373 374

The spatial prediction models of landslide hazard for the Hoa Binh province were constructed based on the assumption that the future landslides will occur under the same geo-environmental conditions that produced them in the past. It is important to note that the landslide susceptibility models did not consider the rainfall triggering factor. Since the hazard models have an expected validity of 5 years, a question is that if a possibility these factors are being changed for that period. It could be expected that factors such as lithology, faults, soil type in the study area will not be significantly changed in the considered period. However, factors such as land use, distance to roads may change for the five years due to the anthropologic activities. This is because the changes in the vegetation layer by clear-cut logging and deforestation in the catchment areas of the Buoi River, Boi River, and Bui River of the province is continuing. In addition, the expansion of the road network system has been carried out during last few years and is also continuing to expand in the future. Thus, some landslide conditioning factors utilized in the landslide susceptibility models may be changed. Therefore, an assessment of the change of land use and road system should be carried out for the considered period. If the change is significant in the regional scale, conditioning factors of land use and distance to roads should be updated to be included in the susceptibility models.

375 376 377

Fig. 12 An example photograph of a landslide risk area at Dong Tien, Hoa Binh city. Photo courtesy of the Hoa Binh newspaper 2012

378 379 380 381 382 383 384

Because rainfall is the main triggering factor of landslides and there has been no report on landslides due to earthquake in this region. Temporal probability of landslides was indirectly estimated based on the statistical relationship between the historical landslide events and rainfall data. The rainfall threshold was estimated using all landslide episodes without considering the specific of landslide sizes as well as the number of landslides in the episodes. The role of the antecedent rainfall in triggering landslide was exhibited. The result shows that the cumulative rainfall before the landslide triggering event for the antecedent period in this study should be considered for 10 to 15 days.

14

385 386 387 388 389 390 391 392 393

Comparison of the days of antecedent rainfall in this study with those mentioned in the literature shows that very different periods in various empirical modelling techniques are taken into account and they are ranging from a few days to some months (Terlien 1998; Polemio and Sdao 1999). In general, it is still difficult to quantify the antecedent periods. The determination of number of days for antecedent rainfall is dependent on local climate condition, slope, and characteristic of soils in terms of physical-mechanical properties and permeability (Aleotti 2004). For landslide inventory in this study, a landslide was only reported and recorded if they had significantly affects the infrastructural system or causing death to the local livestock. Moreover, small landslides are not included in the landslide inventory map. Therefore the temporal probability model can be improved if the rainfall threshold is constructed for small landslide and separately sizes.

394 395 396 397

Landslide hazard maps are a useful tool in taking appropriate decisions and measures for landslide prevention and mitigation. The developed landslide hazard maps in this study will provide quantitative information on areas prone to landslides in future that assist local authorities, planner, policy marker, and decision makers in infrastructure planning and development.

398 399

Acknowledgements

400 401 402 403 404

This research was funded by the Norwegian Quota scholarship program. The first author would like to thank Dr. Razak Seidu (Norwegian University of Life Sciences) and Dr. Tran Tan Van (Director of Vietnam Institute of Geosciences and Mineral Resources) for their valuable comments. The data analysis and write-up were carried out as a part of the first author’s PhD studies at the Geomatics Section, Department of Mathematical Sciences and Technology, Norwegian University of Life Sciences, Norway.

405 406

Reference

407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436

Aleotti P (2004) A warning system for rainfall-induced shallow failures. Engineering Geology 73 (3-4):247-265. doi:10.1016/j.enggeo.2004.01.007 Althuwaynee OF, Pradhan B, Lee S (2012) Application of an evidential belief function model in landslide susceptibility mapping. Computers & Geosciences (0). doi:10.1016/j.cageo.2012.03.003 Ballabio C, Sterlacchini S (2012) Support vector machines for landslide susceptibility mapping: The Staffora River Basin Case Study, Italy. Mathematical Geosciences 44 (1):47-70. doi:10.1007/s11004-011-9379-9 Brabb E Innovative approaches to landslide hazard mapping. In: Proceedings of 4th International Symposium on Landslides, 1984. Canadian Geotechnical Society. Toronto, Canada, pp 307–323 Caine N (1980) The rainfall intensity-duration control of shallow landslides and debris flows. Geografiska Annaler Series a-Physical Geography 62 (1-2):23-27. doi:10.2307/520449 Carrara A, Pike RJ (2008) GIS technology and models for assessing landslide hazard and risk. Geomorphology 94 (3-4):257-260. doi:10.1016/j.geomorph.2006.07.042 Chacon J, Irigaray C, Fernandez T, El Hamdouni R (2006) Engineering geology maps: landslides and geographical information systems. Bulletin of Engineering Geology and the Environment 65 (4):341-411. doi:10.1007/s10064-006-0064-z Chleborad AF (2000) Preliminary Method for Anticipating the Occurrence of Precipitation-Induced Landslides in Seattle, Washington. US Geological Survey Open-File Report 00-0469, Chleborad AF, Baum RL, Godt JW (2006) Rainfall thresholds for forecasting landslides in the Seattle, Washington, area-exceedance and probability. USGS Open-File Report 2006-1064, Chleborad AF, Baum RL, Godt JW, Powers PS (2008) A prototype system for forecasting landslides in the Seattle, Washington, area. Reviews in Engineering Geology 20:103-120. doi:10.1130/2008.4020(06) Chung CJF, Fabbri AG (2003) Validation of spatial prediction models for landslide hazard mapping. Natural Hazards 30 (3):451-472 Coe JA, Michael JA, Crovelli RA, Savage WZ, Laprade WT, Nashem WD (2004) Probabilistic assessment of precipitation-triggered landslides using historical records of landslide occurrence, Seattle, Washington. Environmental & Engineering Geoscience 10 (2):103-+. doi:10.2113/10.2.103 Corominas J, Moya J (1999) Reconstructing recent landslide activity in relation to rainfall in the Llobregat River basin, Eastern Pyrenees, Spain. Geomorphology 30 (1-2):79-93. doi:10.1016/s0169-555x(99)00046-x Corominas J, Moya J (2008) A review of assessing landslide frequency for hazard zoning purposes. Engineering Geology 102 (3-4):193-213. doi:10.1016/j.enggeo.2008.03.018

15

437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490

Crosta G (1998) Regionalization of rainfall thresholds: an aid to landslide hazard evaluation. Environmental Geology 35 (2):131-145. doi:10.1007/s002540050300 Crosta GB, Frattini P (2003) Distributed modelling of shallow landslides triggered by intense rainfall. Nat Hazards Earth Syst Sci 3 81-93 Crovelli RA (2000) Probability models for estimation of number and costs of landslides. United States Geological Survey Open File Report 00-249. http://pubs.usgs.gov/of/2000/ofr-00-0249/ProbModels.html, D'Odorico P, Fagherazzi S (2003) A probabilistic model of rainfall-triggered shallow landslides in hollows: A long-term analysis. Water Resources Research 39 (9). doi:126210.1029/2002wr001595 Dahal R, Hasegawa S, Nonomura A, Yamanaka M, Masuda T, Nishino K (2009) Failure characteristics of rainfall-induced shallow landslides in granitic terrains of Shikoku Island of Japan. Environmental Geology 56 (7):1295-1310. doi:10.1007/s00254-008-1228-x Dahal RK, Hasegawa S (2008) Representative rainfall thresholds for landslides in the Nepal Himalaya. Geomorphology 100 (3-4):429-443. doi:10.1016/j.geomorph.2008.01.014 Das I, Stein A, Kerle N, Dadhwal V (2011) Probabilistic landslide hazard assessment using homogeneous susceptible units (HSU) along a national highway corridor in the northern Himalayas, India. Landslides 8 (3):293-308. doi:10.1007/s10346-011-0257-9 Devkota K, Regmi A, Pourghasemi H, Yoshida K, Pradhan B, Ryu I, Dhital M, Althuwaynee O (2012) Landslide susceptibility mapping using certainty factor, index of entropy and logistic regression models in GIS and their comparison at Mugling–Narayanghat road section in Nepal Himalaya. Natural Hazards:1-31. doi:10.1007/s11069-012-0347-6 Frattini P, Crosta G, Sosio R (2009) Approaches for defining thresholds and return periods for rainfall-triggered shallow landslides. Hydrological Processes 23 (10):1444-1460. doi:10.1002/hyp.7269 Giannecchini R, Galanti Y, Avanzi GD (2012) Critical rainfall thresholds for triggering shallow landslides in the Serchio River Valley (Tuscany, Italy). Natural Hazards and Earth System Sciences 12 (3):829-842. doi:10.5194/nhess-12-829-2012 Glade T (2000) Applying Probability Determination to Refine Landslide-triggering Rainfall Thresholds Using an Empirical “Antecedent Daily Rainfall Model”. Pure and Applied Geophysics 157 (6):1059-1079. doi:10.1007/s000240050017 Godt JW, Baum RL, Chleborad AF (2006) Rainfall characteristics for shallow landsliding in Seattle, Washington, USA. Earth Surface Processes and Landforms 31 (1):97-110. doi:10.1002/esp.1237 Guzzetti F, Peruccacci S, Rossi M, Stark C (2008) The rainfall intensity–duration control of shallow landslides and debris flows: an update. Landslides 5 (1):3-17. doi:10.1007/s10346-007-0112-1 Guzzetti F, Peruccacci S, Rossi M, Stark CP (2007) Rainfall thresholds for the initiation of landslides in central and southern Europe. Meteorology and Atmospheric Physics 98 (3):239-267. doi:10.1007/s00703-0070262-7 Guzzetti F, Reichenbach P, Ardizzone F, Cardinali M, Galli M (2006) Estimating the quality of landslide susceptibility models. Geomorphology 81 (1-2):166-184. doi:10.1016/j.geomorph.2006.04.007 Guzzetti F, Reichenbach P, Cardinali M, Galli M, Ardizzone F (2005) Probabilistic landslide hazard assessment at the basin scale. Geomorphology 72 (1-4):272-299. doi:10.1016/j.geomorph.2005.06.002 Harp EL, Reid ME, McKenna JP, Michael JA (2009) Mapping of hazard from rainfall-triggered landslides in developing countries: Examples from Honduras and Micronesia. Engineering Geology 104 (3-4):295311. doi:10.1016/j.enggeo.2008.11.010 Heyerdahl H, Harbitz CB, Domaas U, Sandersen F, Tronstad K, Nowacki F, Engen A, Kjekstad O, Devoli G, Buezo SG, Diaz MR, Hernandez W Rainfall induced lahars in volcanic debris in Nicaragua and El Salvador: practical mitigation. In: Proceedings of international conference on fast slope movements— prediction and prevention for risk mitigation, IC-FSM2003, Naples, 2003. Patron Pub, pp 275–282 Hue TT, Duong TV, Toan DV, Nghinh LT, Minh VC, Pho NV, Xuan PT, Hoan LT, Huyen NX, Pha PD, Chinh VV, Thom BV (2004) Investigation and Assessment of the Types of Geological Hazard in the Territory of Vietnam and Recommendation of Remedial Measures. Phase II: A Study of the Northern Mountainous Province of Vietnam. Institute of Geological Sciences, Vietnam Academy of Science and Technology, Hanoi Jaiswal P, van Westen CJ (2009) Estimating temporal probability for landslide initiation along transportation routes based on rainfall thresholds. Geomorphology 112 (1-2):96-105. doi:10.1016/j.geomorph.2009.05.008

16

491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545

Jaiswal P, van Westen CJ, Jetten V (2010) Quantitative landslide hazard assessment along a transportation corridor in southern India. Engineering Geology 116 (3–4):236-250. doi:10.1016/j.enggeo.2010.09.005 Jakob M, Holm K, Lange O, Schwab JW (2006) Hydrometeorological thresholds for landslide initiation and forest operation shutdowns on the north coast of British Columbia. Landslides 3 (3):228-238. doi:10.1007/s10346-006-0044-1 Jakob M, Weatherly H (2003) A hydroclimatic threshold for landslide initiation on the North Shore Mountains of Vancouver, British Columbia. Geomorphology 54 (3-4):137-156. doi:10.1016/s0169-555x(02)00339-2 Jemec M, Komac M (2012) Rainfall patterns for shallow landsliding in perialpine Slovenia. Natural Hazards:113. doi:10.1007/s11069-011-9882-9 Kim SK, Hong WP, Kim YM (1992) Prediction of rainfall-triggered landslides in Korea. Landslides, Vols 1 and 2. Lee M-J, Choi J-W, Oh H-J, Won J-S, Park I, Lee S (2012) Ensemble-based landslide susceptibility maps in Jinbu area, Korea. Environmental Earth Sciences doi:101007/s12665-011-1477-y:1-15. doi:10.1007/s12665-011-1477-y Lopez Saez J, Corona C, Stoffel M, Schoeneich P, Berger F (2012) Probability maps of landslide reactivation derived from tree-ring records: Pra Bellon landslide, southern French Alps. Geomorphology 138 (1):189202. doi:10.1016/j.geomorph.2011.08.034 Marjanovic M, Kovacevic M, Bajat B, Vozenílek V (2011) Landslide susceptibility assessment using SVM machine learning algorithm. Engineering Geology 123 (3):225-234. doi:10.1016/j.enggeo.2011.09.006 Marques R, Zêzere J, Trigo R, Gaspar J, Trigo I (2008) Rainfall patterns and critical values associated with landslides in Povoação County (São Miguel Island, Azores): relationships with the North Atlantic Oscillation. Hydrological Processes 22 (4):478-494. doi:10.1002/hyp.6879 Martelloni G, Segoni S, Fanti R, Catani F (2011) Rainfall thresholds for the forecasting of landslide occurrence at regional scale. Landslides:1-11. doi:10.1007/s10346-011-0308-2 Matsushi Y, Matsukura Y (2007) Rainfall thresholds for shallow landsliding derived from pressure-head monitoring: cases with permeable and impermeable bedrocks in Boso Peninsula, Japan. Earth Surface Processes and Landforms 32 (9):1308-1322. doi:10.1002/esp.1491 Melchiorre C, Frattini P (2012) Modelling probability of rainfall-induced shallow landslides in a changing climate, Otta, Central Norway. Climatic Change 113 (2):413-436. doi:10.1007/s10584-011-0325-0 Montgomery DR, Dietrich WE (1994) A physically-based model for the topographic control on shallow landsliding. Water Resources Research 30 (4):1153-1171. doi:10.1029/93wr02979 My NQ (2007) Construction of the Environmental Hazard Zonation Map for Northwest Territory of Vietnam. Vietnam Geography Assosiation, Hanoi Oh H-J, Pradhan B (2011) Application of a neuro-fuzzy model to landslide-susceptibility mapping for shallow landslides in a tropical hilly area. Computers & Geosciences 37 (9):1264-1276 doi:10.1016/j.cageo.2010.10.012 Osanai N, Shimizu T, Kuramoto K, Kojima S, Noro T (2010) Japanese early-warning for debris flows and slope failures using rainfall indices with Radial Basis Function Network. Landslides 7 (3):325-338. doi:10.1007/s10346-010-0229-5 Pasuto A, Silvano S (1998) Rainfall as a triggering factor of shallow mass movements. A case study in the Dolomites, Italy. . Environmental Geology 35:184–189 Petrucci O, Polemio M (2009) The role of meteorological and climatic conditions in the occurrence of damaging hydro-geologic events in Southern Italy. Natural Hazards and Earth System Sciences 9 (1):105-118 Petrucci O, Polemio M, Pasqua A (2009) Analysis of Damaging Hydrogeological Events: The Case of the Calabria Region (Southern Italy). Environmental Management 43 (3):483-495. doi:10.1007/s00267-0089234-z Polemio M, Sdao F (1999) The role of rainfall in the landslide hazard: the case of the Avigliano urban area (Southern Apennines, Italy). Engineering Geology 53 (3-4):297-309. doi:10.1016/s0013-7952(98)000830 Pourghasemi H, Pradhan B, Gokceoglu C (2012a) Application of fuzzy logic and analytical hierarchy process (AHP) to landslide susceptibility mapping at Haraz watershed, Iran. Natural Hazards 63 (2):965-996. doi:10.1007/s11069-012-0217-2 Pourghasemi HR, Mohammady M, Pradhan B (2012b) Landslide susceptibility mapping using index of entropy and conditional probability models in GIS: Safarood Basin, Iran. Catena 97 (0):71-84. doi:10.1016/j.catena.2012.05.005

17

546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599

Pradhan B (2010a) Application of an advanced fuzzy logic model for landslide susceptibility analysis. International Journal of Computational Intelligence Systems 3 (3):370-381 Pradhan B (2010b) Landslide susceptibility mapping of a catchment area using frequency ratio, fuzzy logic and multivariate logistic regression approaches. Journal of the Indian Society of Remote Sensing 38 (2):301320. doi:10.1007/s12524-010-0020-z Pradhan B (2011a) Manifestation of an advanced fuzzy logic model coupled with geo-information techniques to landslide susceptibility mapping and their comparison with logistic regression modelling. Environmental and Ecological Statistics 18 (3):471-493. doi:10.1007/s10651-010-0147-7 Pradhan B (2011b) Use of GIS-based fuzzy logic relations and its cross application to produce landslide susceptibility maps in three test areas in Malaysia. Environmental Earth Sciences 63 (2):329-349. doi:10.1007/s12665-010-0705-1 Pradhan B (2012) A comparative study on the predictive ability of the decision tree, support vector machine and neuro-fuzzy models in landslide susceptibility mapping using GIS. Computers & Geosciences Doi 101016/jcageo201208023. doi:10.1016/j.cageo.2012.08.023 Pradhan B, Lee S (2010) Landslide susceptibility assessment and factor effect analysis: backpropagation artificial neural networks and their comparison with frequency ratio and bivariate logistic regression modelling. Environmental Modelling & Software 25 (6):747-759. doi:10.1016/j.envsoft.2009.10.016 Pradhan B, Lee S, Buchroithner MF (2010) A GIS-based back-propagation neural network model and its crossapplication and validation for landslide susceptibility analyses. Computers Environment and Urban Systems 34 (3):216-235. doi:10.1016/j.compenvurbsys.2009.12.004 Reichenbach P, Cardinali M, De Vita P, Guzzetti F (1998) Regional hydrological thresholds for landslides and floods in the Tiber River Basin (central Italy). Environmental Geology 35 (2):146-159. doi:10.1007/s002540050301 Saito H, Nakayama D, Matsuyama H (2010) Relationship between the initiation of a shallow landslide and rainfall intensity--duration thresholds in Japan. Geomorphology 118 (1-2):167-175. doi:10.1016/j.geomorph.2009.12.016 Salciarini D, Godt JW, Savage WZ, Baum RL, Conversini P (2008) Modeling landslide recurrence in Seattle, Washington, USA. Engineering Geology 102 (3–4):227-237. doi:10.1016/j.enggeo.2008.03.013 Salciarini D, Tamagnini C, Conversini P, Rapinesi S (2012) Spatially distributed rainfall thresholds for the initiation of shallow landslides. Natural Hazards 61 (1):229-245. doi:10.1007/s11069-011-9739-2 Schmidt M, Glade T (2003) Linking global circulation model outputs to regional geomorphic models: a case study of landslide activity in New Zealand. Climate Research 25 (2):135-150. doi:10.3354/cr025135 Sengupta A, Gupta S, Anbarasu K (2010) Rainfall thresholds for the initiation of landslide at Lanta Khola in north Sikkim, India. Natural Hazards 52 (1):31-42. doi:10.1007/s11069-009-9352-9 Sezer EA, Pradhan B, Gokceoglu C (2011) Manifestation of an adaptive neuro-fuzzy model on landslide susceptibility mapping: Klang valley, Malaysia. Expert Systems with Applications 38 (7):8208-8219. doi:10.1016/j.eswa.2010.12.167 Terlien MTJ (1998) The determination of statistical and deterministic hydrological landslide-triggering thresholds. Environmental Geology 35 (2-3):124-130 Thach NN, Xuan NT, My NQ, Quynh PV, Minh ND, Hoa DB, Bao DV, Dan NV, Thuy TV, Hien NT (2002) Application of Remote Sensing and Geographical Information System for Research and Forecast of Natural Hazards in Hoa Binh Province. National University Hanoi, Hanoi Thinh DV, Dong NP, Hong PM, Hung PV, Khoi TN, Ke TD, Phu DV, Thang PX, Thanh PV, Thang PH, Thay BV, Thinh NT, Thien TV, Tu MT, Vinh BX (2005) The Investigated Report of Natural Hazards in the Northwest of Vietnam. Northern Geological Mapping Division, Hanoi Tien Bui D, Lofman O, Revhaug I, Dick O (2011a) Landslide susceptibility analysis in the Hoa Binh province of Vietnam using statistical index and logistic regression. Natural Hazards 59:1413–1444. doi:10.1007/s11069-011-9844-2 Tien Bui D, Pradhan B, Lofman O, Revhaug I (2012a) Landslide susceptibility assessment in Vietnam using Support vector machines, Decision tree and Naïve Bayes models. Mathematical Problems in Engineering Doi:101155/2012/974638 2012:26 Tien Bui D, Pradhan B, Lofman O, Revhaug I, Dick OB (2011b) Landslide susceptibility mapping at Hoa Binh province (Vietnam) using an adaptive neuro-fuzzy inference system and GIS. Computers & Geosciences 45:199-211. doi:10.1016/j.cageo.2011.10.031

18

600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631

Tien Bui D, Pradhan B, Lofman O, Revhaug I, Dick OB (2012b) Landslide susceptibility assessment in the Hoa Binh province of Vietnam: A comparison of the Levenberg-Marquardt and Bayesian regularized neural networks. Geomorphology 171–172 (0):12–29 Tien Bui D, Pradhan B, Lofman O, Revhaug I, Dick OB (2012c) Landslide susceptibility mapping at Hoa Binh province (Vietnam) using an adaptive neuro-fuzzy inference system and GIS. Computers & Geosciences 45 (0):199-211. doi:10.1016/j.cageo.2011.10.031 Tien Bui D, Pradhan B, Lofman O, Revhaug I, Dick OB (2012d) Spatial prediction of landslide hazards in Hoa Binh province (Vietnam): a comparative assessment of the efficacy of evidential belief functions and fuzzy logic models. Catena 96 (0):28-40. doi:Doi 10.1016/j.catena.2012.04.001 Van Westen CJ, Van Asch TWJ, Soeters R (2006) Landslide hazard and risk zonation—why is it still so difficult? Bull Eng Geol Env (2006) 65:167–184. doi:10.1007/s10064-005-0023-0 Varnes DJ (1984) Landslide Hazard Zonation: A Review of Principles and Practice. UNESCO, Paris Wilson RC, Wieczorek GF (1995) Rainfall thresholds for the initiation of debris flows at La Honda, California. Environmental & Engineering Geoscience 1 (1):11-27 Xu C, Xu XW, Dai FC, Saraf AK (2012) Comparison of different models for susceptibility mapping of earthquake triggered landslides related with the 2008 Wenchuan earthquake in China. Computers & Geosciences 46:317-329. doi:10.1016/j.cageo.2012.01.002 Yao X, Tham LG, Dai FC (2008) Landslide susceptibility mapping based on Support Vector Machine: A case study on natural slopes of Hong Kong, China. Geomorphology 101 (4):572-582. doi:DOI: 10.1016/j.geomorph.2008.02.011 Yeon YK, Han JG, Ryu KH (2010) Landslide susceptibility mapping in Injae, Korea, using a decision tree. Engineering Geology 116 (3-4):274-283. doi:10.1016/j.enggeo.2010.09.009 Yilmaz I (2010) Comparison of landslide susceptibility mapping methodologies for Koyulhisar, Turkey: conditional probability, logistic regression, artificial neural networks, and support vector machine. Environmental Earth Sciences 61 (4):821-836. doi:10.1007/s12665-009-0394-9 Zare M, Pourghasemi H, Vafakhah M, Pradhan B (2012) Landslide susceptibility mapping at Vaz Watershed (Iran) using an artificial neural network model: a comparison between multilayer perceptron (MLP) and radial basic function (RBF) algorithms. Arabian Journal of Geosciences:1-16. doi:10.1007/s12517-0120610-x Zezere JL, Trigo RM, Trigo IF (2005) Shallow and deep landslides induced by rainfall in the Lisbon region (Portugal): assessment of relationships with the North Atlantic Oscillation. Natural Hazards and Earth System Sciences 5 (3):331-344

19