Bankruptcy Prediction Using Data Mining ...

MSc in Databases and Web-Based-Systems School of Computing, Science and Engineering

MSc Dissertation

Bankruptcy Prediction Using Data Mining Classification Techniques Author: Safwan Umer Supervisor: Prof. Farid Meziane 2014

Abstract At present, data mining has achieved a significant place in diverse fields of science, engineering and management because of the importance and requirements of the extraction of hidden patterns and valuable information from large sets of data available. Data mining is also being used in the field of bankruptcy prediction to classify bankrupt and non-bankrupt firms by using the financial factors of the firms. In this regard, it is important to provide efficient data mining models to predict bankruptcy. These models can help humans to understand, analyse, and forecast the financial distress of a company to avoid bankruptcy. This has inspired me to develop and apply various past and present data mining models to predict bankruptcy. The objective of bankruptcy prediction in the fields of data mining and machine learning is, to develop a model that can give higher prediction accuracy (Tsai, Hsu and Yen, 2014). This is also the main objective of this thesis. In this dissertation, in order to assess the efficiency of the data mining models, five years of financial ratios of 464-Bankrupt and 464- Non-bankrupt firms are used. This dissertation, presents an application of about all the data mining models used in previous extensive literature and many new techniques using state of the art data mining software. The models are developed using SAS Enterprise Miner, WEKA and IBM SPSS. This study shows the application of 11 models using SAS Enterprise Miner (EM). The bankruptcy prediction accuracy of Neural Network, Auto Neural, Regression and High Performance Regression were excellent using SAS Enterprise Miner. This study also presents application of 21 data mining models using the WEKA data mining software. Using WEKA Simple

Classification

and

regression

trees(SimpleCART),Multi-Boost

Ada-

Boost(MultiBoostAB), OneR and Radial based function network (RBFNetwork) models were efficient to predict bankruptcy. Finally, 6 models of IBM SSPSS were employed to determine the classification accuracy of bankrupt and non-bankrupt firms. Multi-Layer Perception Neural Network model prove to be the best predictor of bankruptcy using IBM SPSS. Overall 37 data mining models have been applied and the empirical results of all the models have been analysed, the highest bankruptcy prediction accuracy is achieved by using Neural Networks. The results of this study show that it is possible to forecast bankruptcy five years before it is happening. Keywords: Data mining; Neural Network; Auto Neural; Regression; High Performance Regression; Simple Classification and Decision Trees; Multi-Boost Ada-Boost ; OneR; Multi-Layer Perception Neural Networks; 2

Acknowledgments I am immensely thankful to my supervisor, Prof. Farid Meziane, who has guided me patiently thought the process of my dissertation. I would have never been able to finish my dissertation without his invaluable support, encouragement, supervision and important suggestions. I am also very thankful to Dr. Mo Saraee who gave me a strong theoretical and practical understanding of data mining classification concepts during the course work. I would also like to express my immense appreciation to Dr. Rasool Eskandari for providing me financial data and basic understanding of financial factors. I would never been able to complete my research without his help. Finally I am also very thankful to my parents, family and elder brother who were always supporting me morally and encouraging me with their best wishes.

3

Contents Abstract ................................................................................................................................................... 2 Acknowledgments................................................................................................................................... 3 Chapter 1 Introduction and Motivation................................................................................................ 10 1.1 Introduction ................................................................................................................................. 10 1.2 Research Motivations.................................................................................................................. 12 1.3 Objectives of the thesis ............................................................................................................... 12 1.4 Contributions: ............................................................................................................................. 12 1.5 Thesis Outline ............................................................................................................................. 13 Chapter 2 Literature Review ................................................................................................................. 14 2.1 Introduction ................................................................................................................................. 14 2.2 Statistical Techniques ................................................................................................................. 15 2.3 Uni-variate or Linear statistical methods .................................................................................... 15 2.4 Multiple Discriminant Analysis .................................................................................................. 17 2.5 Probability, Regression, Logistic and factor analysis models ..................................................... 20 2.5.1 Linear probability model ...................................................................................................... 20 2.5.2 Conditional probability models ............................................................................................ 21 2.6 Machine learning Models............................................................................................................ 25 2.6.1 Neural Networks .................................................................................................................. 25 2.6.2 Decision trees ........................................................................................................................... 26 2.6.3 Support Vector Machines..................................................................................................... 27 2.6.4 Fuzzy logic ........................................................................................................................... 28 2.6.5 Rough Sets ........................................................................................................................... 29 2.6.6 Case based reasoning ........................................................................................................... 30 2.7 Other Methods ............................................................................................................................ 31 Chapter 3 Financial Distress and Bankruptcy ....................................................................................... 33 3.1 Introduction ................................................................................................................................. 33 3.2 Financial Distress ........................................................................................................................ 33 3.1.1 Stages of Financial Distress ................................................................................................. 34 3.1.2 Factors of Financial distress ................................................................................................. 35 Internal Factors: ............................................................................................................................ 35 External factors: ............................................................................................................................ 35 3.1.3 Causes of Financial Distress ................................................................................................ 35 3.1.4 Result of corporate financial Distress .................................................................................. 36 4

3.2 Bankruptcy .................................................................................................................................. 36 3.2.1 Cost of bankruptcy ............................................................................................................... 37 3.1.3 Determining cost of bankruptcy ........................................................................................... 37 3.1.4 Direct costs of bankruptcy endured by the firm ................................................................... 38 3.1.5 Indirect costs of bankruptcy endured by the firm ................................................................ 38 Chapter 4 Data ..................................................................................................................................... 39 4.1 Introduction ................................................................................................................................. 39 4.2 Importance of Data sample ......................................................................................................... 39 4.2.1 Population ............................................................................................................................ 39 4.2.2 Sample.................................................................................................................................. 39 4.2.3 Importance ........................................................................................................................... 39 4.3 Source of Data............................................................................................................................. 41 4.4 Selection of Ratios ...................................................................................................................... 41 Table 4.1 Financial ratios used in this study ..................................................................................... 42 4.5 Data Pre-Processing .................................................................................................................... 43 4.5.1 Missing values ..................................................................................................................... 44 4.5.2 Outliers................................................................................................................................. 44 4.6 Descriptive Statistics of data samples ......................................................................................... 45 4.7 Summary ..................................................................................................................................... 45 Chapter 5: Model development and application .................................................................................. 46 5.1 Introduction ................................................................................................................................. 46 Part-1:.................................................................................................................................................... 46 5.2 Overview ..................................................................................................................................... 46 5.3 SAS Enterprise miner and its predictive modelling .................................................................... 46 5.3 Application of the Models .......................................................................................................... 48 5.3.1 Decision Trees ......................................................................................................................... 49 5.3.2 Decision Trees Model: ......................................................................................................... 49 5.3.3 High Performance Trees Model ........................................................................................... 49 5.3.4 Neural Network .................................................................................................................... 49 5.3.5 Neural Network Model ........................................................................................................ 50 5.3.6 Auto Neural Model .............................................................................................................. 50 5.3.7 High Performance Neural Model ......................................................................................... 50 5.3.8 Data Mining Neural Model .................................................................................................. 51 5.3.9 Regression Model ................................................................................................................ 51 5

5.3.10 High Performance Support Vector Machine Model .......................................................... 51 5.3.11 High Performance Regression Model ................................................................................ 52 5.3.12 Memory Based Reasoning Model ...................................................................................... 52 Part 2: .................................................................................................................................................... 54 5.4 WEKA: ....................................................................................................................................... 54 5.4.1 Naïve Bayes ......................................................................................................................... 55 5.4.2 Naïve Bayes Model .............................................................................................................. 55 5.4.3 BayesNet Model................................................................................................................... 55 5.4.4 SMO OR SVM Model ......................................................................................................... 55 5.4.5 RBFNetwork Model............................................................................................................. 56 5.4.6 Kstar Model ......................................................................................................................... 56 5.4.7 LWL Model ......................................................................................................................... 56 5.4.8 AdaBoostM1 Model............................................................................................................. 56 5.4.9 ClassificationViaRegression Model ..................................................................................... 56 5.4.10 Decorate Model .................................................................................................................. 57 5.4.11 Dagging Model .................................................................................................................. 57 5.4.12 LogisticBoost Model .......................................................................................................... 57 5.4.13 MultiBoostAB Model ........................................................................................................ 57 5.4.14 Random Committee Model ................................................................................................ 58 5.4.15 HyperPipes Model.............................................................................................................. 58 5.4.17 NNge Model....................................................................................................................... 58 5.4.18 OneR Model ....................................................................................................................... 58 5.4.19 ZeroR Model ...................................................................................................................... 59 5.4.20 Random Forest Model ........................................................................................................ 59 5.4.21 J48 Model........................................................................................................................... 59 5.4.22 SimpleCart Model .............................................................................................................. 59 5.4.23 END Model ........................................................................................................................ 60 Part 3 ..................................................................................................................................................... 61 5.5 IBM SPSS ................................................................................................................................... 61 5.5.1 MLP neural network Model ................................................................................................. 61 5.6 Models implementation using variations of decision trees ..................................................... 61 5.6.1 CHAID Model ..................................................................................................................... 61 5.6.2 CHAID Exhaustive Model ................................................................................................... 62 5.6.3 CART Model ....................................................................................................................... 62 6

5.6.4 QUEST Model ..................................................................................................................... 63 5.6.5 K-NN Model ........................................................................................................................ 63 5.7 Summary ................................................................................................................................. 63 Chapter 6 Results Analysis and Critical Evaluation .............................................................................. 64 6.1 Introduction ................................................................................................................................. 64 6.2 Type-I Error ................................................................................................................................ 64 6.3 Type-II Error ............................................................................................................................... 64 6.4 Total Error................................................................................................................................... 64 6.5 Classification Accuracy .............................................................................................................. 65 6.6 Empirical Results Analysis ......................................................................................................... 65 6.6.1 Analysis of Results of SAS Enterprise Miner Models ......................................................... 65 6.6.2 Analysis of Results of WEKA ............................................................................................. 67 6.6.3 Analysis of results of IBM SPSS models ............................................................................. 69 6.7 Critical Evaluation ...................................................................................................................... 70 6.8 Summary ..................................................................................................................................... 70 Chapter 7 Conclusion and Future Directions ...................................................................................... 71 7.1 Conclusions ................................................................................................................................. 71 7.2 Future Directions .................................................................................................................... 73 Bibliography .......................................................................................................................................... 74 Appendix-A:........................................................................................................................................... 87 Appendix B .......................................................................................................................................... 117

7

List of Figures Figure 2.1 Neural Network basic understanding ................................................................................. 26 Figure 2.2 Basic understanding of decision trees ................................................................................ 27 Figure 2.3 Basic idea of the Hyperplanes and support vectors ............................................................ 28 Figure 2.4 Cased Based Reasoning 4-step cycle................................................................................... 30 Figure 2.5 A comparison of different bankruptcy prediction approaches............................................. 31 Figure 2.6 Accuracy of different methods being used in the past ......................................................... 32 Figure 2.7 Studies using different model of bankruptcy prediction...................................................... 32 Figure 4.1 Method used in SPSS to find 5th and 95th percentile ........................................................ 45 Figure 5.1 step by step method of creating any project in SAS Enterprise miner ................................ 47 Figure 5.2 The step by step implementation of the model generation using SAS EM ......................... 48 Figure 5.3 Final implementation diagram of models using SAS .......................................................... 53 Figure 5.14 Final application diagram of models using WEKA ........................................................... 54 Figure 6.1 Bankrupt and non-Bankrupt firms prediction Accuracy ..................................................... 66 Figure 6.2 Bankrupt firms five years ahead prediction accuracy using WEKA models....................... 67 Figure 6.3 non-Bankrupt firms five years prediction accuracy using WEKA models ........................ 67 Figure 6.4 Bankrupt and non-bankrupt firms prediction accuracy ....................................................... 69 Figure 5.4 Model Decision Trees........................................................................................................ 117 Figure 5.5 Model HP Tree .................................................................................................................. 118 Figure 5.6 Neural Network Model ...................................................................................................... 119 Figure 5.7 Auto Neural Model ............................................................................................................ 120 Figure 5.8 HP Neural Model ............................................................................................................... 121 Figure 5.9 DMNeural Model .............................................................................................................. 122 Figure 5.10 Regression Model ............................................................................................................ 123 Figure 5.11 HP SVM Model ............................................................................................................... 124 Figure 5.12 HP Regression Model ...................................................................................................... 125 Figure 5.13 Memory Based Reasoning Model ................................................................................... 126

List of Tables Table 2.1 some studies that used Univariate statistical methods to predict bankruptcy ....................... 16 Table 2.2 Studies using MDA model from 1968 to 1996 ..................................................................... 18 Table 2.3 the use of the logistic model in different studies ................................................................. 23 Table 4.1 Financial ratios used in this study ......................................................................................... 42 Table 6.1 Bankrupt and non-bankrupt five years ahead prediction accuracy table using SAS Enterprise miner models......................................................................................................................................... 66 Table 6.2 Bankrupt and non-bankrupt firms five years ahead prediction accuracy table using WEKA models ................................................................................................................................................... 68 Table 6.3 Bankrupt and non-bankrupt firms five years prediction accuracy table using SPSS ............ 69 Table 4.2 Containing 5th and 95th percentile for the data one year before bankruptcy ....................... 87 Table 4.3 Containing 5th and 95th percentile for the data 2 year before bankruptcy. .......................... 88 Table 4.4 Containing 5th and 95th percentile for the data 3 year before bankruptcy. .......................... 89 Table 4.5 Containing 5th and 95th percentile for the data 4 year before bankruptcy. .......................... 90 Table 4.6 Containing 5th and 95th percentile for the data 5 year before bankruptcy. .......................... 91 Table 4.7 Univariate Statistics for data sample one year before bankruptcy ....................................... 92 Table 4.8 Univariate Statistics for data sample two year before bankruptcy: ....................................... 93 Table 4.9 Univariate Statistics for data sample three year before bankruptcy ...................................... 94 8

Table 4.10 Univariate Statistics for data sample four year before bankruptcy ..................................... 95 Table 4.11 Univariate Statistics for data sample five year before bankruptcy……………………………..94 Table 5.1 Prediction accuracy of the model starting from year one to five using Decision Trees Model .............................................................................................................................................................. 97 Table 5.2 Prediction accuracy of the model starting from year one to five using HP Trees Model ..... 98 Table 5.3 Prediction accuracy of the model starting from year one to five using Neural Network Model .................................................................................................................................................... 98 Table 5.4 Prediction accuracy of the model starting from year one to five using Auto Neural Model 99 Table 5.5 Prediction accuracy of the model starting from year one to five using HP Neural Model ... 99 Table 5.6 Prediction accuracy of the model starting from year one to five using Neural Network Model .................................................................................................................................................. 100 Table 5.7 Prediction accuracy of the model starting from year one to five using Neural Network Model .................................................................................................................................................. 100 Table 5.8 Prediction accuracy of the model starting from year one to five using HP SVM Model ... 101 Table 5.9 Prediction accuracy of the model starting from year one to five using Neural Network Model .................................................................................................................................................. 102 Table 5.10 Prediction accuracy of the model starting from year one to five using MBR Model ....... 102 Table 5.11 Bankruptcy prediction accuracy using Naïve Bayes Model ............................................. 103 Table 5.12 Bankruptcy prediction accuracy using BayesNet Model .................................................. 103 Table 5.13 Bankruptcy prediction accuracy table using SMO OR SVM Model ................................ 104 Table 5.14 Bankruptcy prediction accuracy table using RBFNetwork Model ................................... 104 Table 5.15 Bankruptcy prediction accuracy table using KSTAR Model ............................................ 105 Table 5.16 Bankruptcy prediction accuracy table using LWL Model ................................................ 105 Table 5.17 Bankruptcy prediction accuracy table using AdaBoostM1 Model ................................... 106 Table 5.18 Bankruptcy prediction accuracy table using ClassificationviaRegression Model ............ 106 Table 5.19 Bankruptcy prediction accuracy table using Decorate Model .......................................... 107 Table 5.20 Bankruptcy prediction accuracy table using Dagging Model ........................................... 107 Table 5.21 Bankruptcy prediction accuracy table using ogisticBoost Model ..................................... 108 Table 5.22 Bankruptcy prediction accuracy table using MultiBoostAB Model ................................. 108 Table 5.23 Bankruptcy prediction accuracy table using Random Committee Model ........................ 109 Table 5.24 Bankruptcy prediction accuracy table using HyperPipes Model ...................................... 109 Table 5.25 Bankruptcy prediction accuracy table using NNge Model ............................................... 110 Table 5.26 Bankruptcy prediction accuracy table using OneR Model ............................................... 110 Table 5.27 Bankruptcy prediction accuracy table using ZeroR Model............................................... 111 Table 5.28 Bankruptcy prediction accuracy table using Random Forest Model ................................ 111 Table 5.29 Bankruptcy prediction accuracy table using J48 Model ................................................... 112 Table 5.30 Bankruptcy prediction accuracy table using SimpleCart Model....................................... 112 Table 5.31 Bankruptcy prediction accuracy table using END Model................................................. 113 Table 5.32 Bankruptcy prediction accuracy table using MLP neural network Model ....................... 113 Table 5.33 Bankruptcy prediction accuracy table using CHAID Model ............................................ 114 Table 5.34 Bankruptcy prediction accuracy table CHAID Exhaustive Model ................................... 114 Table 5.35 Bankruptcy prediction accuracy table CART Model ........................................................ 115 Table 5.36 Bankruptcy prediction accuracy table QUEST Model ..................................................... 115 Table 5.37 Bankruptcy prediction accuracy table K-NN Model ........................................................ 116

9

Chapter 1 Introduction and Motivation 1.1 Introduction Data mining is used to find hidden patterns in large sets of data. Data mining has been widely used in many different fields to conceive logics in the data stored in databases (Shamsinejad, Saraee and Shekholeslam, 2011). State of the art data mining classification models are being used in the field of bankruptcy prediction. The most popular techniques which are being used now-a-days are decision trees (DT), Artificial Neural Networks (ANN), Support Vector Machines (SVM), Case Base Reasoning (CBR), K-Nearest Neighbour (K-NN), Bayesian Networks, Regression and hybrid methods (Chen et al. 2011). Given the economic and financial consequences of bankruptcy to companies, it is not a surprise that bankruptcy prediction issue was and remains of great attraction to researchers, creditors, shareholders, and auditors. All the stockholders have great attraction in observing the financial performance of their firms (Wilson and Sharda, 1994). Bankruptcy forecast of an organisation has been a paramount subject in the accounting and finance literature (Zhang, Hu, Patuwo & Indro, 1999). Financial failure of a company significantly affects the company, stakeholders, employees, customers and nation. Bankruptcy prediction is one of the areas that have been extensively studied in the fields of accounting and finance (Wilson and Sharda, 1994). The companies cannot be immune against bankruptcy and bankruptcy is not something that happens overnight. Therefore, it is very important to understand and predict the phenomena that lead to bankruptcy (Kim and Kang, 2009). Timely prediction of bankruptcy also helps in making best business decisions for the future of the company. The accuracy of the bankruptcy prediction is very important and if it is not predicted accurately, the results would be catastrophic for the company. Prediction of the corporate failure is very important because it impacts employees of the company, management, auditors and debtors (Jardin, 2014). Companies which do not have enough financial means to operate have to eliminate the company’s assets and pay its debts. If a company does not have enough money to pay its

10

debts then the company goes in a financial distress. The company must have to be in a solvent state to keep its progress (Blum, 1974). Bankruptcy could be caused by many factors like poor management, less financial funds, shortage of fund providers, revenue decrement, lack of assets, lack of management knowledge, lack of stockholders in terms of fund raising and lack of shares (David and Denis, 1995). Various researches are available on the topic of bankruptcy prediction. These studies have analysed different financial distress factors that lead to bankruptcy (Wilson and Sharda, 1994). This dissertation has a comprehensive literature review spanning from 1932 to 2014 and comprises various theoretical, statistical and machine learning approaches for bankruptcy prediction. The major purpose of this dissertation is to evaluate bankruptcy prediction through the use of data mining models. This study also illustrates the theoretical concepts and practical results of the data mining models in the prediction of bankruptcy. The tools used in this study are very well known in data mining community and these are SAS enterprise miner, WEKA and IBM SPSS. The process of bankruptcy prediction involves several important steps on data containing financial ratios. First of all data is gathered. Secondly, the data is processed in a meaningful format to apply different data mining techniques. Thirdly, the processed data is used to apply data mining techniques and different data mining classification models are generated. Finally, the results of different employed models are compared and the best model is selected. The sample data that I have used is gathered from the Financial Analysis Made Easy (FAME) Database. This sample consists of 464 bankrupt and 464 non-bankrupt companies. This dissertation shows the importance of data and its pre-processing phase using an effective statistical method. The 41 financial ratios used in this study are also very important because these have been used in most of the research articles. An important contribution of this dissertation is its use of 5 years prior ratios for different companies from 2000 to 2012 to predict the bankruptcy five years ahead.

11

1.2 Research Motivations We are witnessing a very competitive era for companies where bankruptcy is seen as tarnishing the companies’ reputations. The bankruptcy prediction is a very challenging subject. When a company starts to go into insolvent state and does not return to the solvent state due to the debts which have not been paid because of lesser amount of liquidity. In this state, the company has either to pay its debts or file for bankruptcy (Wruck, 1990). Many large organizations like Delta Airlines, United Airlines, New Century Financial, Calpine, Lyondell Chemicals, Telecom Company Global Crossing, Thornburg Mortgage and Pacific Gas have filed for bankruptcies in last 2 decades (Anon., 2014). These incidents completely disturbed the investors around the world and made it even more important to predict the financial distress before bankruptcy. Auditors, as a general duty use bankruptcy prediction techniques to assess the financial state of a company before investing in the company (Wilson and Sharda, 2009).The managers of the companies who make the decisions are always looking for a prediction model that gives the best results in bankruptcy prediction. Many techniques have been used in the past. 1.3 Objectives of the thesis Bankruptcy prediction is not a field related to the accounting and finance it is a versatile area. I have chosen this area because I want to apply the state of the art data mining software available to obtain best models for bankruptcy prediction using five years back ratios. The major objectives of this thesis are: 1. Utilization of different data mining methods and algorithms using SAS enterprise miner, WEKA and IBM SPSS. 2. Analysis of results obtained from various data mining models implementation. 1.4 Contributions: The contributions of the dissertation in the field of bankruptcy prediction are: 1. Bankruptcy prediction using 5 years prior ratios because most of the research articles have used 3 years back ratios for prediction. 2. Predict bankruptcy five years ahead using five years back ratios. 3. Use of 41 most important financial ratios. 4. Use of 11 SAS Enterprise miner models, 21 WEKA models and 6 IBM SPSS models. 5. Find the most effective model for bankruptcy prediction. 12

1.5 Thesis Outline On the basis of the theoretical and practical literature review this dissertation describes different features of bankruptcy prediction models. This thesis is divided into 7 chapters. Chapter 2 Provide a comprehensive literature review of various statistical and machine learning techniques in the domain of data mining to predict bankruptcy. Chapter 3 This chapter elaborates financial distress, factors of financial distress, causes of financial distress, bankruptcy definitions and costs of bankruptcy. Chapter 4 This chapter illustrates importance of data, ratios and pre-processing phase of data. This chapter also elaborated the method of winsorizing to eliminate outliers in the data. Chapter 5 This chapter offers a complete analysis and applications of different models using SAS Enterprise Miner, WEKA and IBM SPSS. It also presents prediction accuracy results provided by each model. Chapter 6 This chapter give a complete insight and critical evaluation of each data mining model. It also gives the five years prior results of each model in a detailed manner. Chapter 7 This chapter summarizes the major contributions of this dissertation and gives directions for future work.

13

Chapter 2 Literature Review 2.1 Introduction Various methods have been used in the literature for predicting the business failure. Each methodology has its importance and contributions in this area. But each prediction technique is basically used to divide the firms in financially healthy or financially failed firms (Dimitras, Zankis and Zopounidis,1996). Business failure studies have attracted world-wide interest from many researchers and practitioners. Earlier techniques, when there was no statistical or machine learning technique available, used to compare two companies, one with a healthy financial state and the other with a failed financial state (Bellovary, Giacomino and Akers, 2007). According to Fitzpatric (1932) there are five stages of financial failure. These stages are incubation, financial embarrassment, financial insolvency, total insolvency and confirmed insolvency. Then statistical bankruptcy prediction models started from the Beaver’s (1966) one variable model and Altman’s Linear Discriminant Analysis model (Altman, 1968). Since bankruptcy prediction has become a hot topic for the researchers and they have started to use different techniques to get better and more reliable results. Many researchers started to use different models to improve the results of the Altman’s technique. Data mining techniques were not used until 1980. The use of data mining techniques like SVM, NN, Decision trees was started in late 1980’s for bankruptcy prediction (Pompe and Feedlers, 1997). There are various statistical, machine learning, soft computing, operational and evolutionary approaches to predict bankruptcy and each have its own pros and cons (Kumar and Ravi 2007). The most important methods used in the past, their research procedures and prediction accuracy results are discussed in the next section.

14

2.2 Statistical Techniques These are the techniques that use statistical methods on sample of data containing bankrupt and non-bankrupt companies. Many studies are available and have used statistical techniques on different financial ratios. A statistical technique contains the methods using financial parameters and ratios to predict financial distress. The Beaver’s uni-variate model was the initial point of research for these techniques. Examples of these techniques are Linear Discriminant analysis (LDA), MDA Multiple Discriminant Analysis (MDA), Quadratic Discriminant Analysis (QDA), Logistic regressions and Factor analysis (Kumar and Ravi 2007). The traditional statistical methods can better control huge data sets without losing the prediction performance, while machine learning techniques obtain better performance with smaller data sets and would be affected by large data sets (chen, 2011). 2.3 Uni-variate or Linear statistical methods Univariate statistical models are the simplest models and are based on the speculation of a sequential relationship between all ratios and the failure status. These models use the quantitative methods like mean, median, mode, range, variance, frequency distribution and standard deviation. In this model, ratios are used for bankruptcy prediction. In a univariate analysis model of bankruptcy prediction there are two most important aspects (Balcaen and Ooghe, 2006). 1. Optimal cut-off point for each ratio. 2. Classification procedure carried out for each ratio separately. These are the earlier techniques used to differentiate between a financially stable and financially failed firm. Table 2.1 shows some of the studies that used Univariate statistical methods to predict bankruptcy. The Univariate models were heavily criticised but laid the path for other models like MDA, Linear Probability Model (LPM), Logistic and Regression.

15

Table 2.1 some studies that used Univariate statistical methods to predict bankruptcy

Name of the Researcher(s) Fitzpatrick (1932)

Smith and Winakor (1935) Merwin (1942)

Chudson (1945) Jackendoff (1962) Beaver (1966)

Pinches, Eubank, Mingo and Caruthers(1975)

Special features of the study 1. 2. 3. 4. 1. 2. 3. 1. 2. 3.

Compared 13 ratios. Used 20 pair of healthy and failed firms. Most Significant ratios were Net Worth/Debt and Net Profits/Net worth. Least Important Ratios were Current Ratio and Quick Ratio for firms with long-term liabilities. Used Ratios of 183 Bankrupted firms from a variety of industries. Prediction of bankruptcy was better using Working Capital/ Total Assets ratio. The Current Assets/Total Assets Ratio declined when a firm approached bankruptcy. He used Small manufactures in his study. Prediction could be possible five years before bankruptcy. Most significant ratios for business failures were Net working Capital to Total Assets the current ration and net worth to total debt. 1. Financial patterns were studied for the first time. 2. His study specified that models of bankruptcy prediction for general application cannot be suitable as industry specific models. 1. Used the method of ratios’ comparison of profitable and unprofitable firms. 2. Current Ratio and Net working capital to total asset were the most significant while Debt to worth least significant ratios for profitable firms. 1. Used 79 Failed and 79 non-failed firms in 38 industries. 2. Used 30 ratios for the first time. 3. He came to know which ratios have highest predictive ability. 4. Ratios like Net income to total debt had 92%; Net Income to Sales have 91% and cash flow to total assets have 90% of accuracy in bankruptcy prediction. 1. Used specific financial ratios which are more important in predicting bankruptcy. 2. Used financial data of 221 firms and 48 ratios. 3. Financial ratios and their predictive accuracies were as following: Debt/Total Capital=99%,Total income/Total capital = 97%Cash/ Total Assets=91%

16

2.4 Multiple Discriminant Analysis MDA is the most commonly used statistical method for bankruptcy prediction. This method has been used in more than 70 research studies from 1960 to present. This method is used to classify a variable into one of the several a priori groups available, depending upon the features of that variable. This technique was also very efficient in the prediction of the qualitative data. MDA technique examines a complete profile of features prevalent to the pertinent group of corporations. It also considers the interaction of these characteristics. The major benefit of MDA is that it can deal with the problem of classification because it can observe the complete profile of a financial factor. The MDA method also decreases the analyst’s space dimensionality (Altman, 1968). An MDA technique is made up of linear collection of variables, which are used to discriminate between failing and non-failing firms (Balcaen and Ooghe, 2006). Altman (1968) specified the discriminant function of a firm as follows. 𝑍= 𝑉1 𝑋1 + 𝑉2 𝑋2 + 𝑉3 𝑋3 + ………………….. + 𝑉𝑛 𝑋𝑛 Where 𝑉1, 𝑉2, 𝑉3 , …………………………. 𝑉𝑛 are Discriminant Coefficients. And 𝑋1 , 𝑋2 , 𝑋3 , …………………………….𝑋𝑛 are Independent Variables. The MDA calculate the Discriminant Coefficients, 𝑉𝑖 and the Independent Variables 𝑋𝑖 are actual values. Where 𝑖= 1, 2, 3, 4, ………………,𝑛 Many researchers used MDA bankruptcy prediction technique, based on the methodology by the Altman Z-Score model. Deakin, Edmister and lis (1972) used LDA method and obtained prediction accuracy of 80%, 88% and 83% respectively. Table 2.2 shows the studies using MDA model for predicting bankruptcy from 1968 to 2004. Varun (2009) applied these techniques on 78 failed companies and 91 non-failed companies in the period of 1999 to 2007. His research showed that the ratios total debt to total assets, cash flow from operations / Interest Expense and net profit / total assets were the most differentiating ratios one year before the bankruptcy and short term debt / total assets and sales/ total assets were the most discriminating variables for predicting two years before the bankruptcy.

17

Table 2.2 Studies using MDA model from 1968 to 2004

Reference

Application

No. of Ratios Used

Altman (1968)

(Manufacturing)

5

79%

Mfg. Firms General

33 failed and 33 non-failed firms.

14

32 failed and 32 non-failed firms.

77% to 96% Failed Firms 78% to 92% Non-Failed Firms

Deakin (1972)

No. of Firms Used in the data Sample.

Accuracy In percentage

Special feature(s).

First time use of MDA method which is also called ZScore. Classification of data in bankrupt and non-bankrupt firms. Finding of decision rule validated over different sample of firms.

Edmister (1972)

General Small Business

7

562 failed and 562 88% Failed Firms Some ratios can be used to predict bankruptcy rather than non-failed firms. and 83% Nonusing all financial variables. Fail3ed Firms

Blum(1974)

General Firms

2

57% to 94%

Sinkey Jr.(1975)

Banks

5

115 failed and 115 non-failed firms. 110 banks.

Altman and Loris(1976)

Dealers and Brokers

15

Sample consists of 40 failed broker firms and 113 active entities.

Altman, Haldeman and Narayanan (1977)

General

7

53 Bankrupt Firms and 58 NonBankrupt firms

Failed Firms66.7% to 87.5% and Non-Failed Firms-58.3% to 85.0% Failed Firms61.7% to 92.5% and Non-Failed Firms 84% to 91.4%

Gave the failing company model. First time use of “F” statistic function for bankruptcy prediction. Bankrupt BankSpecified important factors to discriminate between failed 53.64% to71.85% and non-failed banks.

18

Financial early warning system model was developed to detect the failure.

Z-Score model was updated to Zeta model. Compared linear and quadratic discriminant analyses and obtained efficient results.

Ketz(1978)

General

16

75 failed firms and 597 non-failed firms.

Failed firms56% and Nonfailed firms 93%

The use of general price level statements to distinguish between a failing and non- failing firms.

Castanga and Matolcsy (1981)

Austrailian Firms

10

A sample of 21 companies.

This study proposed that it is not easy to use a distinct model to predict financial distress efficiently.

Izan (1984)

Austrailian Firms

5

Keasey and Watson (1986)

Small UK firms

5

A sample of 53 failed and 50 nonfailed firms A sample of 10 failed and 10 nonfailed firms.

Failed firms- 0% to 90% and Nonfailed firms 76% to 100% 40% to 100%

Koh and Killough (1990)

General

5

Laitinen (1991)

Small and mid- 6 size Finnish firms

Alici(1996)

UK Mfg. firms

4

29 Failed and 31 Non-failed British corporations

Pidado and Rodriques (2004)

Mfg. firms

15

42-bankrupt and 42- Non-bankrupt firms

A sample of 400 firms. Out of 400 only 14 were bankrupt. 40 randomly selected failed and non-failed firms.

Failed firms70% Non-failed firms 66.7% to 68.3% Failed firms78.6% and Nonfailed firms 88.25% Failed firms57.5% to 90% and Non-failed firms 52.5% to 87.5% Failed firms60.12% Nonfailed firms 71.07% 89.58%

19

He used company ratios using their industry median and made a combination of five variables for Discriminant model. The use of trade-credit specialists and statistical model to predict financial failure.

SAS 34 and SAS 59 were used to make a prediction model. Development of a prediction model which was accurate approximately 88 percent. Finding the existence of the failure processes in the firms. These processes were used on selected ratios to predict financial failure.

Introduced wavelet networks and pruning techniques were examined in his model.

Used MDA technique in the footwear manufacturing industry.

Lugovskaja (2009) also used MDA technique to predict financial failure of Russian Small and medium-sized Enterprises (SMEs). He used two MDA models on a data set of 260 bankrupt and 260 non-bankrupt arbitrary SMEs. In the first model he found six important bankruptcy prediction ratios and the classification result was 76.2% for the estimation sample and 68.1% for the holdout sample. In the second model he used non-financial variable such as size and age with financial factors of SMEs and classification accuracy was 77.9% for the estimation sample and 79% for the holdout sample. Ivica Pervan et al (2011) used this statistical technique on a sample of 78 bankrupted and 78 non-bankrupted companies from the Croatian manufacturing and trade industries. This study mentioned that financial statements and financial factors are informative to predict the bankruptcy of a company. They obtained results with 79.5% bankruptcy prediction accuracy. Recently, Lee and Choi (2013) provided a multi-industry prediction model. This study used different sets of variables and produced a model which is better in reflecting the characteristics of the industry and selection of ratios to elaborate distinct prediction results. The accuracy of this model for MDA model is 74.82%. In addition to these outcomes, this study also emphasis on the fact that it is mandatory to build bankruptcy prediction models for each industry specifically to generate the efficient and reliable prediction results. 2.5 Probability, Regression, Logistic and factor analysis models There were many problems in Discriminant Analysis methods Eisenbeis (1977) provided a summary of these problems. According to him there were 7 problems related to the applications of DA like violation of assumptions, usage of linear function, no interpretation of variables separately, less dimensions, no group definition, unsuitable choice of prior probabilities and estimation of classification errors. Due to these errors researchers started to introduce many other statistical models like linear probability, factor analysis legit and probability analysis (Eisenbeis, 1977). These models are discussed below. 2.5.1 Linear probability model Probability of failure could be used to predict bankruptcy. Therefore researcher started to develop different LPM models as a substitute of DA (Dimitras et al.,1996). This model consists of a special case of least squares and a dependent variable in the form of 0 or 1.

20

Meyer and Pifer (1970) presented techniques of simple least squares linear regression with the concept of dummy variable 0 and 1 (0 for non-failed and 1 for financially failed banks). They applied this technique on banks data set consisting on 18 financial ratios and their empirical classification accuracy was 67% to 100% for failed and 55% to 89% for non-failed banks. Later on, Grammatikos and Gloubos (1984), Theodossiou (1991), Vranas (1992) and Lennox (1999) also used this research method in their studies to predict bankruptcy. 2.5.2 Conditional probability models These models are divided into two subcategories logistic and probability models. These models have a great deal of importance in the field of bankruptcy prediction. The logistic method gives the probability of a firm that is going to be bankrupt. (Dimitras et al., 1996) discussed that In the logistic model the probability of a company 𝑖 that bankrupt given the vector variable 𝑋𝑖 as (Dimitras et al., 1996): 𝑃(𝑋𝑖 , 𝑐) = 𝐹(𝑑 + 𝑐 𝑋𝑖 ) Where 𝐹(𝑑 + 𝑐 𝑋𝑖 ) is the cumulative logistic function and is given by the equation as 𝐹(𝑑 + 𝑐 𝑋𝑖 ) =

1 1+𝑒 (𝑑+𝑐𝑋𝑖 )

Martin (1977) introduced the logistic regression model to predict the financial failure of banks. He used a data set of about 5700 Federal Reserve member banks, 58 of the banks have financially failed. He used six years back ratios for prediction and obtained a classification accuracy starting from 91.3% to 41.7% one to six years before prediction for failed banks and the results for non-failed banks were also remarkable starting from 91.1% to 82.2% one to six years before bankruptcy prediction. Ohlson (1980) proposed the concept of conditional probability model. The data set was used from 10-K (Annual report of a firm that gives a comprehensive summary of firms’ financial performance) financial statements for the first time. In this study he elaborated on the following four statistically important factors for bankruptcy prediction: 1. The size of the company. 2. Financial structure of the company. 3. Performance of the company. 4. Current liquidity of the company. 21

He criticised the MDA technique because of the three problems associated with it. (i) Matched samples. (ii) MDA behaves like a Discriminating device and does not provide any statistical importance of variables. (iii) MDA model gives output in the form of a score which is difficult to understand. Conditional logistic model keeps away all of the problems related to the MDA. The accuracy of this logistic prediction model was 96.12%, 95.55% and 92.84% for one year, two years and one-two years respectively. Mensah (1983) also used logistic analysis method on a sample of 66 manufacturing firms and 32 factors and his classification model accuracy was 18% to 55% for bankrupt firm while 80% to 86% for non-bankrupt firms. Table 2.3 summarises the use of the logistic model in different studies. Furthermore, Erkki and Teija(2000) also used a combination of the logistic model and Taylor’s series. They used logistic model to describe insolvency and Taylor’s series to approximate the exponent of the logistic function. They used a sample of 400 firms and concluded that classification accuracy could be increased by using interacting ratios. Kalori et al. (2002) applied this technique to develop an early warning system. They used this model to predict the financial distress of banks. The classification accuracy of the model was over 96% in 1 year before failure and 95% before 2 years. In 2003, Foreman performed analysis of bankruptcy within US local telecommunications industry using logistic model. Moreover, Jones and Hensher (2004) proposed a mixed logistic analysis model to predict financial distress of a firm. They specified financial distress in three states 0 state for nonfailed, 1 state for insolvent and 2 state for failed firms. Mei and Lin (2005) also applied this approach with quadratic interval regression model. Their empirical findings show that quadratic model can help the logistic model to distinguish between failed and non-failed firms. Recently,(Masten and Masten, 2012) used logistic model with Classification and Regression Trees (CART)- base methodology. This was a very simple approach and used dummy variables. Their practical results show that the combination of these methods gives the highest prediction accuracy of 95%.

22

Table 2.3 The use of logistic model in different studies

Industry

Mensah (1983)

Mfg. firms

No of Ratios used 32

Casey and Bartczak (1985)

General

9

Lau (1987)

General

10

Mahmood and Lawrence (1987)

General

13

Pantalone and Platt(1987a) Peel (1987)

Bank

5

Failed banks-86.7% and non-failed banks 83.4%

Private UK firms General

8

Bankrupt firms-67% to 92% and non-bankrup firms 79% to 88%. Bankrupt firms-79.6% to 88.8% and non-bankrup firms 76.7% to 98.0%.

General

10

General

7

Gilbert, Menon and Schwartz (1990)

General

6

Bankrupt firms-29.2% to 62.5% and non-bankrup firms 90% to 97.9%.

Agarwal (1993)

General

5

Platt, Platt and Pedersen (1994)

Oil and Gas companies

6

Bankrupt firms-40% to 80% and non-bankrup firms 56.5% to 86.6%. Bankrupt firms-80% to 94% and non-bankrup firms 91% to 96%.

Dimitras, Slowinski, Susmaga and Zopounidis (1999) Zhang. Hu. Patuwo and Indro(1999)

Greek firms

12

Bankrupt firms-63.2 and non-bankrup firms 84.2%.

Mfg. Frims

6

Bankrupt firms-85 to 93% and non-bankrupt firms 83% to 87%.

Reference

Aziz, Emauel and Lawson(1988) Aziz and Lawson(1989) Hopwood, Mckeowon and Mutchler (1989)

6

Classification Accuracy Results

Bankrupt firms-18% to 55% and non-bankrup firms 80% to 86%. Bankrupt firms-13% to 63% and non-bankrup firms 95% to 98%. Bankrupt firms-20% and non-bankrupt 85.4% to 93.7%. Bankrupt firms-28.6% to 73.8% and non-bankrup firms 90% to 96.6%.

Bankrupt firms-53.9% to 92.3% and non-bankrup firms 70.2% to 79.1%. Bankrupt firms-3.1% to 62.5% and non-bankrup firms 87.5% to 100%.

23

Probability model is also like logistic model but the function calculating the probability is very different from the logistic model. Grablowsky and Talley (1981) used probability analysis for classification of credit applicants and found that probability analysis could be used as the substitute for Discriminant analysis. The studies using this model are less accurate than the logistic model, and only a few researchers have worked in this particular area. Hanweck (1977) applied this method on banks financial data for testing the financial distress. He used 6 financial factors and obtained 67% accuracy for failed banks and 99% for nonfailed banks using hold out sample. Zmijewski (1984) also investigated this statistical method on a biased sample of the data set consisting of 40 bankrupt and 800 non-bankrupt firms. He used probability and bivariate probability analysis to assess the sample bias issue. Skogsvik (1990) examines this model to inspect the bankruptcy of Swedish mining and manufacturing firms on a data sample consisting of 17 financial factors and period from 1966 to 1980. His empirical result shows a classification accuracy of 84.0% to 71.2% from 1 to 6 years respectively before bankruptcy. Moreover, Theodossiou (1995) used probability model for Greek manufacturing firms and obtained a classification accuracy of 95.5% for Bankrupt and 92.6% for non-Bankrupt firms. Later on, Boritz and Kennedy (1995), Lennox (1999) also used this method for bankruptcy prediction. Canbas et al. (2005) presented an Integrated Early Warning System (IEWS) to investigate the financial problems of banks by incorporating logistic regression, DA, probability and principal component analysis. This system helped in a great deal to assess the financial conditions of banks. Their calculated failure prediction probability of banks were 56%, 99% and 99.9% for year one, two and three respectively. Factor analysis was used to describe a set of variables in terms of factors on the basis of the relation between actual variables. This technique was used in a combination with logistic estimation by West (1985) to investigate the financial condition of the bank.

24

2.6 Machine learning Models Different Intelligent techniques are being used these days because statistical techniques have distributional hypotheses that financial data do not always fit. Thus machine learning techniques which do not require parameters conquer the limitations of traditional statistical models. Machine learning models belonging to the data mining domain include artificial neural networks, decision trees, Case-based reasoning, SVM, Fuzzy logic, and rough sets(Kim and Kang,2010). 2.6.1 Neural Networks Neural network models have always held a significant place in the history of bankruptcy prediction. The researcher started to use neural networks for financial distress prediction in early 1990’s and are still using this method in its different forms (Jeong et al.2012). These models simulate the information processing power of human brain. These models learn by example and can make different decisions on the basis of previous experience. A neural network consists of three interconnected layers input layers, hidden layers and a target or output layer. The input layer contains the raw data that is given to the neural network then hidden layer assign weight to the input unit based on the connection between input and hidden units. Output unit also depends on the weight between the hidden and output unit. Thus, instead of using parameters NN use weights for prediction(Tsai and Wu, 2008). Odom and Sharda (1990) firstly applied this technique for bankruptcy prediction with a comparison to MDA technique. They used a data sample of 128 firms and obtained 77.8% to 81.5% and classification accuracy. There are many different variations of NN such as Back Propagation Neural Networks (BPNN), Self-Orgnizing Feature Map (SOM) , Probabilistic NN, Auto Associative NN and Cascade Correlation NN. These NNs are divided into different categories due to their learning type, algorithm and connection of nodes with each other (Kumar and Ravi,2007). Vellido et al., (1999), Wong et al., (1997), Zhang et al., (1999), Atiya (2001), and Paliwal and Kumar (2009) have reviewed the use of NN in business and other science and engineering domains. Jeong et al. (2012) proposed a new architecture of NN models by using hybrid tuning method. The practical results show that tuned model was significant in predicting financial failure. Their research has numerous advantages like the reflection of nonlinear aspects of ratios using Generalized Additive Model(GAM) , most favourable parameter values of the 25

variable were secured and this model was more profitable than other non-tuned models such as SVM, Generalized Logistic Model (GLM), MDA, CBR, DT and GAM. Recently, Lee and Choi (2013) applied BNN and MDA model for construction, retail and manufacturing industries to predict the financial distress. This study further elaborates on the relative power of each independent variable and the classification accuracy of BNN model was 81.43%.Figure 2.1 gives an example of one input and one target layer with hidden layer neural network architecture. Figure 2.1 Figure 2 Neural Network basic understanding (Tsai and Wu, 2008) Input Layer

Hidden Layer

Output Layer

VAR-1

VAR-2

Target

VAR-3

* * * VAR-N

2.6.2 Decision trees Decision trees works on the principle of dividing a huge amount of data into small understandable pieces until no more pieces are possible by utilizing different algorithmic rules. A decision tree consists of a root and leaf nodes. The root nodes are also called decision nodes and leaf nodes are also called terminal leafs (SAS Inc., 2013). The objective of this partitioning is to make cases with similar target values. ID3 is the old decision tree algorithm and it was proposed by Quinlan in 1979 it was then enhanced to C4.5. Kumar and Ravi (2007) have mentioned that decision trees provide if-than-else rules which are very simple to understand and they also defined different types of algorithms for decision trees like CRT, CHAID, Quest and C5.0 ( which is the enhanced version of C4.5).CRT and 26

CHAID are new algorithmic techniques. CRT use towing optimum split techniques whereas CHAID uses chi square statistics. Frydman et al. (1985), Bryant (1997), Curram and Migers (1994) applied decision trees to predict financial failures whereas Hui et al. (2010) gave a comparative study of decision trees with other data mining models ANN,SVM, Logistic and MDA statistical method. The decision trees are easily understandable by human, more accurate than NN and SVM but sometimes excess of rules makes it difficult to comprehend (Olson et al., 2012). The following figures give basic understanding of decision trees. Figure 2.2 Basic understanding of decision trees (SAS Institute Inc., 2012)

Recently, Chih et al. (2014) made a comparative study of different classifiers for bankruptcy prediction. They applied these techniques using the combination methods of bagging and boosting on a data set of 220 failed and 220 non-failed firms and empirical results shows 83% DT-bagging and 85% DT-boosting classification accuracy. 2.6.3 Support Vector Machines The concept of support vector machines was proposed from statistical learning theory by Vapnik (1998). This model surpasses the limitation of linear boundaries because it is a combination of linear modelling and instance based learning. SVM use a linear model to apply non-linear class boundaries through some linear mapping of the input vector into the high dimensional feature space. In the new space maximum separation between the decision classes is given by using maximum margin hyperplane. The training examples close to this margin hyperplane are called support vectors and all others are called boundaries. The 27

maximum margin hyperplane is a special kind of linear model. Figure 2.3 gives the basic idea of the hyperplanes and support vectors (Circled in the figure). Figure 2.3 Basic idea of the Hyperplanes and support vectors (Han et al., 2006)

SVM is very powerful because it integrates statistical methods and machine learning methods. According to Chaudhuri and De (2011) SVM initially started from the idea of Search Reasoning Machines (SRM) (Shin, Lee and Kim, 2005) to build a model and is becoming more famous due to its better predictive accuracy and performance. A wide range of research articles have been written on this topic. In the past, Tay and Cao (2001) and Kim (2004) used SVM in financial time series forecasting, Tay and Cao (2001) applied a modified version of SVM in their research, Shin, Lee and Kim (2005) investigated the efficiency of SVM for bankruptcy prediction and concluded that it works better than BPN for smaller training data sets, Min and Lee (2005) evaluated this technique to find the optimal parameter values of kernel function of SVM, Chih et al. (2007) implemented a real valued genetic algorithm to optimize parameters of support vector machine for predicting financial distress. Later on, Gao, Cui and Po (2008) predicted enterprise bankruptcy using NoisyTolerant Support Vector Machine. Recently, Fong et al. (2014) also used a comparative method of SVM to predict bankruptcy. 2.6.4 Fuzzy logic This technique is based on fuzzy set of mathematical theory presented by Zadeh (1965). According to Zadeh a fuzzy set is a collection of distinct objects and each object is associated to a particular grade by a membership functions ranging from 0 to 1. Fuzzy logic also assists classification problems by extracting the ‘if-then’ rules. These rules can easily be used to understand two ways logic of the data. Basic idea of the fuzzy logic is to assimilate experience and observation to convert experiential knowledge into a model. According to Shapiro (2002) Fuzzy logic provides a structure for approximate reasoning, which could be used to translate the qualitative knowledge about a problem into set of comprehensible rules.

28

But he also indicated a disadvantage that is difficult to build and tune a membership function and rules using fuzzy logic model. Fuzzy logic is used in many areas that included credit risk prediction (Chung et al., 2005), loan analysis commercial system (Levy et al., 1991), correlations of crude oil systems (Sunday et al., 2011) , disease of a firm (Hernan and Antonio, 2008) and forecasting exchange rates (Korol, 2014) . 2.6.5 Rough Sets The concept of rough sets was introduced by Pawlak (1982, 1984) and Pawlak et al. (1995) to solve the problem of impossible distinguishability between objects in a set. It is convenient to classify the objects in precise classes but can be imprecise with crisp sets. The bankruptcy prediction using rough sets is a very recent technique because past comprehensive literature review by Dimitras et al. (1996) used 158 bankruptcy prediction related to research articles and none of them have used rough sets technique. This technique eradicates rules clash, gathers extra raw facts and figures about ordering characteristics of attributes to produce a very simple model (Mckee and Lensberg, 2002). Slowinski and Stenfanowski (1994) described that rough sets of approaches essentially allows the analysis of a huge set of predictive ratios to recognize numerous reduced ratios set that it can forecast the characteristics of interest. This technique was first used by Matarazzo et al. (1998a) and (1998b) to predict bankruptcy. They used of dominance relation and indiscernibility relation in the first research study and only dominance relation in their second research study. Susmaga et al.(1999) also applied this technique to predict bankruptcy in comparison with DA and logistic and deduce that the rough set of techniques performed better than the other two. Mckee (2000) employed rough set model on variables specified by recursive partitioning technique and a holdout sample of 100 companies the empirical results show 88% classification accuracy and Popova and Bioch (2001) used rough set method with a slight modification using monotone extensions to predict bankruptcy. Slowinski et al. (2001) and Matarazzo et al.(2002) used dominance based rough sets approach and concluded that it is the only data mining method holding the preference order of the data. Furthermore, this theory can be used to solve classification problems by using exact and possible induced decision rules (Kumar and Ravi, 2007). Moreover, research 29

articles on this topic by Francis and Lixiang (2002), Indrani (2006) , Ching et al. (2010), Chen (2012) and Zhi et al. (2012) also elaborate the use of rough set techniques for bankruptcy prediction. Recently, Chiang et al. (2014) used rough set and hybrid random forest method, while intellectual capital as predictive variable for bankruptcy prediction and they concluded that hybrid approach provided best classification rate with least Types-I and Type-II errors. 2.6.6 Case based reasoning Case-based reasoning (CBR) is an intelligent technique that resolves new problems by utilizing identical experienced solution in the past (Kolodner, 1991). When encountering a new problem CBR recovers a case that is identical from the past cases, and if mandatory modify to give the wanted result. The new solution is prepared by recovering and modifying old experiences that nearly matches the given problem. CBR copies the problem by solving the skill of human beings who resolve to present problems by using past experiences. CBR is a technique to solve problems and make better decisions in a complex and altering business environment (Han, 2002). According to Jeng and Liang (1995) CBR process requires four steps to solve a problem these are (1) acceptance of new problem, (2) Recovering applicable case from the library of cases, (3) modifying recovered cases to fit the problem in hand and (4) assessing solution. This process is illustrated by a 4-step cycle as shown in Figure 2.4. Figure 2.4 Cased Based Reasoning 4-step cycle (Aamodt and Plaza, 1994)

30

CBR has not been widely used in the field of bankruptcy prediction but has been widely used in the fields of management, engineering, medical diagnosis, clash resolution in traffic control, creating product index for e-shopping malls and in the drawing of semiconductors (Turban and Aronson, 2001). For further reading on this topic the reader may refer to the research articles by Ahn and Kim (2009), Sungbin et al. (2010) and Chuang (2013). 2.7 Other Methods Many statistical and intelligent models have been discussed above with a wide range of literature reviews. Nevertheless, there are many other methods to predict financial bankruptcy each has its own advantages and disadvantages but they have not been discussed in this study. The techniques which have not been discussed above include: soft computing methods Liang et al.( 1997), Soltys and Ignizio (1996), Chaudhuri and De (2011), Gordini (2014) , Kurniawan et al. (2008), Heo and Yang (2014) and Fong et al. (2014), Operational research techniques Sueyoshia and Goto (2009), Zhang et al.(1999), Leary (1992) , Banks and Parakash (1994), Sun and Shenoy (2007), Kao and Liu (2004) and Jardin (2014), Selforganizing maps Kiviluoto (1998), Kwon et al. (1996), Peltonen et al. (2001), Huysmansa et al. (2006) and Chen et al.(2013). Furthermore, for a comprehensive literature about statistical methods for bankruptcy prediction readers may refer to the literature reviews by Zanakis et al. (1982) and Jodi et al. (2007). In addition to this, extensive research reviews by Kumar and Ravi (2007) and Jodi et al. (2007) give detailed description of the different models used from 1968 to 2007 and 1932 to 2000 respectively for bankruptcy prediction. Finally, Figures 2.5, 2.6, and 2.7 by Aziz and Dar (2006) give the overall understanding of the different approaches used in the past and their bankruptcy prediction accuracies. Figure 2.5 A comparison of different bankruptcy prediction approaches (Aziz and Dar, 2006)

31

Figure 2.6 Accuracy of different methods being used in the past (Aziz and Dar, 2006)

Figure 2.7 Studies using different model of bankruptcy prediction (Aziz and Dar, 2006)

32

Chapter 3 Financial Distress and Bankruptcy 3.1 Introduction This chapter describes the basic understanding of financial distress that leads to bankruptcy. Additionally, causes and outcomes of financial distress have been elaborated. Finally, cost of bankruptcy has also been discussed in the last section of the chapter. 3.2 Financial Distress Due to the rigorous variation of worldwide economy and customer appetite, corporates are facing high competition and unknown operational environment. Companies which cannot understand financial distress and take significant measures at early stage, have to run into bankruptcy, which not only effects the reputation of the company and stability of socials economy, but also conduct a huge loss to stockholders, creditors, managers and employees of the company (Sun and li ,2009). Financial distress is also known by Bankruptcy and liquidation in different studies. If a corporation does not have enough cash flow to pay its current contract obligations, debt to suppliers of the stock and salaries of the workers then it is considered in a state of financial distress. These obligations may also contain debts from court legal procedures and reimbursement of interests. The breach of debt contract can be a message that financial distress is forthcoming. The financial theory proposes that financial distress is the preliminary phase in the life cycle of a corporation and it also gives an indication to change the management (Wruck, 1990). Moreover, Whitaker (1999) and Gaughan (2011) also considered inadequate Cash flow as major measure of financial distress. Financial failure is the situation when profit is lower than invested capital, keeping the risk in observation, even if the same investment is used at the different economic situation at prevailing rates and where the average returning output of the firm is always below the capital cost of firm. A firm is not in financial distress if it is unable to pay its slight amount of debt or deficiency of debts. Insolvency can also be used to describe dismissive corporate performance. The financial distress of a firm is further ascribed using four general terms in many research studies: failure, default, insolvency and bankruptcy. Furthermore, the financial idea of default also means that a company is not in a condition to pay debt or interest to 33

creditors on due time. At last, the financial distress is elaborated in technical and legal case. The technical financial distress is the case where a corporate is unable to keep its contractor and legal case refers to the failure of the company to meet regular repayment on loan (Altman and Hotchkiss, 2005). It is important for a financially distressed company to start renegotiating to reach at a better agreement with creditors. These discussions need closing down of loss creating operations and regeneration of company through temporary or permanent discharge of workers. If after applying all negotiations and new contracts with creditors the company still faces financial distress then physical exit of the company is the last option. The financial distress of the company can also be eradicated by giving the company under the control of new owner (Hashi, 1997). According to, Gaughan (2011) financial failure does not means that a company is unable to meet its due debt obligations. This can even happen when the corporate have enough net worth to pay present legal responsibilities. Additionally, financial distress is not a necessary measure of corporate bankruptcy because some companies also default due to management ineligibility (Perold, 1999). Finally, reader may refer to the Karels and Prakash (1987) and Lin and Mclean (2000) for further definitions and explorations of financial distress. 3.1.1 Stages of Financial Distress According to research studies different stages in the financial distressed companies are early stage, mid-stage and final or later stage. The symptoms of companies in these stages are: 1. Early stage: Customers start complaining about the services and quality (Whitaker, 1999), the company start to feel sales are decreasing and stock return turns less than expected (Opler and Titman, 1994). 2. Mid-Stage: in the mid-stage of the financial distress the company faces problems like cash shortage, less profit (Makridakis, 2001) unable to pay dividend payments and disturbance in the payment of debt to suppliers (Altman and Hotchkiss, 2005). 3. Final or later stage: According to Altman and Hotchkiss (2005) the company have constant cash deficit and it breaches the debt contract with the creditors. The bankruptcy of a company can be predicted about 5 to six years before it is happening because some of the researcher as stated in the Table 2.2 has predicted bankruptcy 5 years ahead. 34

3.1.2 Factors of Financial distress There are two factors of financial distress discussed in different research studies Internal and External. Internal Factors: There are many different internal factors related to financial distress some of the important are (Keskin, 2002): 1. Bad management, 2. Lack of communication between the business entities. 3. Major projects’ failure, 4. Expansion of business with no stability, 5. No agreement between domain growths. Wruck (1990) and Whitaker (1999) also considered poor management a significant factor in the financial distress of a company. External factors: Each company have to exist in an environment. The External factors involve environmental factors that lead to financial distress and some of them are discussed by the researchers as following: 1. Social Environment (Sevil et al.(1997) and (Tezcan, 2002)) 2. Economic Environment (Buker et al (1997) and (Demir, 1997)) 3. Legal and Political Environment, (Turko, 1999) 4. Technological Environment, 5. Natural Environment (Turko, 1999) 6. Industrial Endowment. 3.1.3 Causes of Financial Distress The most significant causes of the financial distress discussed by David and Denis (1995) following leveraged recapitalizations are: 1. Bad performance due to expanded industry. 2. Low proceeds from asset sales 3. Negative Stock price reactions

35

3.1.4 Result of corporate financial Distress As mentioned above financial distress is continues event and it takes more or less six years to reach its final stage bankruptcy. According to (Kumar and Ravi, 2007) the health of a firm or bank relies on its: 1. solvency in the beginning, 2. capability, workability and planning of creating cash, 3. Accessibility to capital market, 4. Financial ability to endure in case of random cash deficiency. And when a company gets more and more liquidated, it gets into a danger zone which is called bankruptcy. 3.2 Bankruptcy The concept of bankruptcy has been used to describe firm bearing financial troubles. A few researchers have used generic term “failed” as synonym to “bankrupt”. Nonetheless, bankruptcy is an activity starting financially and ends legally. It is hard to tell the particular moment of occurrence of bankruptcy. It seems to be intuitive settlement in which financial distress continues until the firm or creditors file a legal action. Financial failure is a mandatory, but not enough, condition of bankruptcy (Karels and Parakash, 1987). The firms under the allocations of National bankruptcy act are legally bankrupt either they are in receivership or have been allowed the right to restructure (Altman, 1968). When a firm is unable to pay its financial obligations as they are due, bond default, an overdrawn bank account or preferred stock dividend, operationally this firm is said to be bankrupt, failed or default (Blum, 1974). According to Deakin (1972) a firm encountering insolvency, bankruptcy or liquidity for the interest of creditors is said to be a failed firm. A variety of definitions have appeared to explain failure or bankruptcy. From a financial point of view they consist on: negative net worth, non-payment of creditors, bond defaults, inability to pay obligated debts overdrawn bank accounts omission of dividends, receivership etc. Karels and Parakash(1987). However, for more information about bankruptcy definitions reader may also read research studies by Elam (1975), Morris (1983), and Taffler and Tisshaw (1977).

36

3.2.1 Cost of bankruptcy The bankruptcy cost is generally divided into two categories (Kalay et al., 2007): 1. Direct cost 2. Indirect cost The entire cost of bankruptcy including direct and indirect for firm is 15% of pre-distress firm value and 7% for the retailer firms (Altman, 1984). According to Franks and Torous (1994) the formal cost of bankruptcy is more than informal cost by 4.5%. The same year, Opler and Titman (1994) announced that the firms with more leverage lose market shares. At last, Kaplan (1994) concluded that profit from the liquidated financial reshuffle procedure also increased the cost. The bankruptcy cost can be divided into four sub-categories (Branch, 2002) : 1. Real costs endured personally by the bankrupt firm. 2. Real costs endured straight by the claimants. 3. Bankrupt firm losses that are balance by profit to other institution. 4. Real costs endured by other parties rather than bankrupt firm. The costs (1), (2), and (3) are considered to be the sub-categories of direct costs while (4) belong to the indirect cost. 3.1.3 Determining cost of bankruptcy The cost of financial distress is associated to the market value of the company or firm just before it become bankrupt and is given by the formula (Branch, 2002). 𝑃𝐷𝑉 = 𝐿𝐶𝐷 + 𝑇𝐷𝐶 + 𝑁𝑉𝑅 Where 𝑃𝐷𝑉 = 𝑃𝑟𝑒 − 𝑑𝑖𝑠𝑡𝑟𝑒𝑠𝑠 𝑣𝑎𝑙𝑢𝑒 𝑇𝐷𝐶 = 𝐶𝑜𝑠𝑡 𝑒𝑛𝑑𝑢𝑟𝑒𝑑 𝑏𝑦 𝑐𝑙𝑎𝑖𝑚𝑠 𝑁𝑉𝑅 = 𝑁𝑒𝑡 𝑉𝑎𝑙𝑢𝑒 𝑅𝑒𝑡𝑟𝑖𝑒𝑣𝑒𝑑 PDV is the considered the entire value of the firm’s assets according to its previous bankruptcy financial report. Mostly, at the final stage of the financial distress the equity value of the company is near to zero when it is going to file bankruptcy. On the other hand the

37

balance sheet of the bankrupted frim will not be showing running losses but representing some overdue assets values (Branch, 2002). 3.1.4 Direct costs of bankruptcy endured by the firm After filing the bankruptcy the company have to hire a team of professional. These professional may include people from the law, accounts, banks, auctioneers, actuaries and practitioners who sell the distressed assets. The professional asks for a particular amount of payment in return for their services. Moreover, at the stage of bankruptcy the firm also have to bear the cost of the internal staff and other resource as well (Branch, 2002). 3.1.5 Indirect costs of bankruptcy endured by the firm The indirect bankruptcy cost can be described as the lost gains of previous sales, the costs of the assets at discounted sale, and the costs of the disruptions in the firm during the period of the financial distress. These disruption may be in the investment and financial policies of the firm (Rajeev and Yun, 2013).Managers of the company have to bear the personal cost of bankruptcy, either they lose their jobs or give 35% of their previous salary(Gilson and Vetsuypens, 1994). Furthermore, research studies by Thorburn (2000), Bris et al. (2006) Pulvino (1999) and Kaplan (1994) also explain direct and indirect cost of the bankruptcy. Finally, (Branch, 2002) have concluded the victims of bankruptcy costs in four steps. Firstly the bankruptcy cost is imposed on the landlord, suppliers, customer, employees etc. Secondly, creditors and claimant will also have to face the costs associated with bankruptcy of the firm. Thirdly, the par value of the liquidated firm’s debt before bankruptcy is assigned as follows, 28% to the loss causing bankruptcy, 16% cost to deal with bankruptcy and 56% is the cost to the claims-holders. Lastly, interest holder also have to be given a cost if company bankrupt.

38

Chapter 4 Data 4.1 Introduction In this chapter I shall be discussing about the importance of bankruptcy prediction data sample. The database source I have used to obtain this data. Finally I will be discussing about the variables selection, data pre-processing phase and statistic description of the data used in this dissertation. 4.2 Importance of Data sample Before describing the importance of the data sample it is important to discuss two statistical terms, Population and Sample. 4.2.1 Population It is the complete collection of objects or items that may be the section of a study (Kathleen and Jonathan, 2011), for instance, all manufacturing companies in the UK, all banks in UK, all bankrupt firms in UK, all non-bankrupt companies that are still in active state. 4.2.2 Sample It is the sub-group of items from a particular population (Katleen and Jonathan, 2011), for example, the group of 63-bankrupt firms randomly selected from a large database containing records of thousands of bankrupt firms. The data sample must be the representative of whole population. 4.2.3 Importance After reading exhaustive literature I have come to know that selection of data sample is the most important aspect in the bankruptcy prediction. Since, it is an important fact that computers provide information according to the data given to process. If computers are given erroneous data to process the result would also be unexpectedly erroneous. Nevertheless, previous studies show that researchers knew the importance of the data sample to predict bankruptcy. Initially, the researchers used data sample containing limited number of bankrupt and non-bankrupt firms. For example, Beaver (1966) used a data sample of 79bankrupt and 79 non-bankrupt firms, Piches et al. (1975) used data sample of 221 firms, Altman (1968) and Deakin (1972) used a data sample of 32-Bankrupt and 32-Non-bankupt 39

firms. Later on, some researchers also used large data samples, for instance, Zmijewski (1984) used a data sample of 40-Bankrupt and 800-Non bankrupt firms and Erkki and Teija (2000) used equally divided data sample of 400 bankrupt and non-bankrupt firms. Since my major concern in this study is to apply data mining classification techniques to predict bankruptcy, hence, it is very important for me to select an unbiased training and test data sample. The training data sample I have employed in this study consist of an unbiased sample of 464 Bankrupt and 464 non-Bankrupt UK and Irish firms during the period of 2000 to 2012 while test data sample contains 64 bankrupt and 64 non-bankrupt companies during period 2010 to 2012. I have selected 5 years prior ratios to analyse bankruptcy prediction. Finally, I divided data into 5 different data files to perform my analysis as follows. 1. Data sample containing financial ratios one year before bankruptcy (dataset1.xlsx). 2. Data sample containing financial ratios two years before bankruptcy (dataset2.xlsx). 3. Data sample containing financial ratios three years before bankruptcy (dataset3.xlsx). 4. Data sample containing financial ratios four years before bankruptcy (dataset4.xlsx). 5. Data sample containing financial ratios five years before bankruptcy (dataset5.xlsx).

40

4.3 Source of Data This data sample has been collected from the Financial Analysis Made Easy (FAME) database. This database gives detailed information on all significant private and public companies in the UK and Ireland. The information provided contains, Name, number of employees, profile, location, assets, identification number, status, legal form, incorporate date, phone number, industry, stock data, mortgage data, account type, accounting figures, financial statistics, custom data and information related to directors and owners of the companies. We can access the past 10 year’s financial data for a company from this database. Using FAME database we can analyse detailed statistical description, aggregation, linear regression and segmentation of data in seconds. Moreover, FAME database describe the status of the companies in two categories: 1. Active 2. Inactive The inactive companies are further subdivided into two classifications: 1. Dissolved 2. Liquidated FAME database contains financial information of approximately 3,147,877 active and 9,186,893 dissolved or liquidated companies. I have selected the bankrupt firms during the period of 2000 to 2012 in active and liquidation state. And similarly I have selected nonbankrupt firms in liquidation state. 4.4 Selection of Ratios Literature review shows different number of ratios used by different studies. Some of the studies have used only 4 financial ratios while others have used more than 4 as mentioned in chapter 2. I have selected 41 ratios to use in this study. The ratios used in this study are also very important because I have selected significant ratios being used in previously most cited research papers. Each type of financial ratio measures a certain type of financial aspect of a business. Table 4.1 gives the description of the financial ratios used in this study. In this data set some of the ratios like X2-T1, X2-T2, X2-T3, X2-T4, X2-T5 were saved as string data type. I converted these ratios to numeric type data using IBM SPSS software.

41

Table 4.1 Financial ratios used in this study Ratio Name used in this study

Financial ratio

X1 X2 X3 X4 X5

Factor/Consideration Net income / Total assets Current ratio Working capital/Total assets Retained earnings / Total assets Earnings before interest and taxes / Total assets Sales / Total assets Quick ratio Total debt / Total assets Current assets / Total assets Net income / Net worth Total liabilities / Total assets Cash / Total assets Market value of equity/book value of equity Cash flow from operations / Total assets Cash flow from operations / Total liabilities Current liabilities / Total assets Cash flow from operations / Total debt Quick assets / Total assets Current assets / Sales Earnings before interest and taxes / Interest Inventory / Sales Operating income / Total assets Cash flow from operations / Sales Net income / Sales Long-term debt / Total assets Net worth / Total assets Total debt / Net worth Total liabilities / Net worth Cash / Current liabilities Cash flow from operations / Current liabilities Working capital/Sales Capital/Assets Net sales / Total assets Net worth / Total liabilities Total assets Cash flow (using net income) / Debt Cash flow from operations Operating expenses / Operating income Quick assets / Sales Sales / Inventory

X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20 X21 X22 X23 X24 X25 X26 X27 X28 X29 X30 X31 X32 X33 X34 X35 X36 X37 X38 X39 X40 X41

42

Number of approximate research articles containing this ratio. 65 60 50 50 40 36 35 33 31 29 25 23 21 18 17 19 15 13 14 15 15 15 13 13 14 15 13 14 14 15 10 9 7 8 7 7 7 7 7 7 7

4.5 Data Pre-Processing To apply data mining techniques the data must be filtered and prepared for recognizing efficient pattern in the data. According to Han and Kamber (2000) the data mining process involves six important steps: Select data, Filter data, Give meaning (Value) to Filtered data, programming, data mining and report generation. Data cleaning is very important as it removes any errors from data and improves its quality. Since, Data obtained from any source have missing values, outliers and noise. Data preprocessing is a phase in which data is prepared for analysis by using different data cleaning and processing methods. If data is not pre-processed before applying different models the results would be very different than the processed data. Therefore, it is important to preprocess data for better classification results. Moreover, the data used in this study is presented in the form of a combination of X and T variable. Where X (starting from 1 to 41) variable shows the ratios and T (starting from 1 to 5) shows the number of year before bankruptcy. Since, the data contains 5 years prior ratios and I have to apply data mining on each year data before bankruptcy so, I made different files of data containing ratios related to each year. For example, to apply data mining models on data 5 years before bankruptcy I deleted first to four years ratios remaining 5 year ratios specified as X,T (where X = 1 to 41 and T=5). I used IBM SPSS to make these samples of data. In addition to this, I also deleted columns of data that were not required in this study. The deleted columns were status and event data year. Since, I also want to find out the most important ratios in the bankruptcy prediction I also made different data file with different ratios (deleting others) in mind as well. To make data more cleaner I truncated the spare decimal (if it were greater than 6 after decimal place) places to 4 decimal places by using excel roundup function Roundup(Number, Digit). Finally, the data was showing the bankruptcy of the firms in binary form (0 for bankrupt and 1 for non-bankrupt firm). I converted the form of this variable to nominal for classification and changed 0 to “bankrupt” and 1 to “non-bankrupt” string data type for better classification analysis. At last I deleted some columns from the sample data which was not required in this study. The columns I deleted were, Company Name, status and Year of Event.

43

4.5.1 Missing values Missing values have always been a problem for researchers and it is up to researchers how they deal with the missing values. According to Rubin (2002) there are three major kinds of missing values mechanisms: 1. Missing completely at random: 2. Missing at random: 3. Not missing at random: The missing values in the data is limited or scattered in the whole data. Limited is when only few values are missing in the data and total is when all data is full of missing values. The most commonly method used to solve missing values it to impute missing values with the average value. The SPSS missing values analysis gives the complete insight of the missing values in the data one year before bankruptcy. According to this analysis variables in each sample of data have certain missing value such as : X1T1,X3T1,X4T1,X5T1,X6T1,X8T1,X9T1,X10T1,X11T1,X12T1,X14T1,X16T1,X18T1,x1 9T1,X20T1,X21T1,X22T1,X23T1,X24T1,X25T1,X26T1,X27T1,X28T1,X29T1,X30T1,X31 T1,X32T1,X33T1,X34T1,X35T1,X36T1,x39T1,X40T1,X41T1 contains zero missing values variable X7T1, X13T1,X15T1,X17T1 contains more than 50 missing value while variable X2T1 is having five missing values. Moreover, the SAS and IBM SPSS have methods to impute the missing values in the data. I have used IBM SPSS to detect and impute missing values in the data using mean of nearby point method. 4.5.2 Outliers Outliers are the values in the data that are significantly far away from the other observation in the data (Hansen et al., 1983). The outlier affects the results of analysis method and also skew data from normal distribution. The most commonly used methods to deal with outliers are (Dhiren and Ghosh, 2012): 1. Do not disturb it and treat it like other data values. 2. Winsorizing 3. Eliminating In the trimming method the outliers are eradicated from the data during analysis and winsorizing is a method to assign an outlier highest or lowest value in the data that is not an outlier. A general method of winsorinzing is to replace any data value over the ninety fifth 44

percentile of the sample data by the 95th percentile and any value below the 5th percentile by 5th percentile (Dhiren and Ghosh, 2012). 4.5.2.1 Solution of outliers I applied the descriptive statistics technique of IBM SPSS to find out the 5th and 95th percentile of each ratio and applied winsorizing method to remove the extreme values in the data. Figure 4.1 gives the method used in SPSS to find 5th and 95th percentile of each year data. Figure 4.1 Method used in SPSS to find 5th and 95th percentile

Tables 4.2, 4.3, 4.4, 4.5 and 4.6 present the 5th and 95th percentile of each year data in Apendix-A. 4.6 Descriptive Statistics of data samples Descriptive Statistics elaborates the basic characteristics of the data and provide summaries related to the samples and measures. They are used to show quantitative measures, mean, standard deviation of data in a feasible manner (Ibe, 2014). I have applied SPSS to determine the descriptive statistics of data. Tables 4.7, 4.8, 4.9, 4.10 and 4.11 in Appendix-A show the descriptive statistics of one to five years data. 4.7 Summary Since the data has been pre-processed and cleansed by using different statistical methods. Hence it is ready to be used in the bankruptcy prediction models development. The next chapter will be presenting this implementation.

45

Chapter 5: Model development and application 5.1 Introduction This chapter consists of three parts, Part - 1 presents the application of data mining methods using SAS enterprise miner, Part-2 elaborates the used of data mining algorithms using WEKA software and Part-3 presents the classification of bankruptcy data using IBM SPSS Modeller.

Part-1: 5.2 Overview This part gives a brief description to the SAS enterprise miner and its predictive modelling approach. Moreover, this section introduces the step by step implementation of the models with brief introduction to the data mining model nodes used and their execution using SAS programming. 5.3 SAS Enterprise miner and its predictive modelling SAS enterprise miner (EM) is a tool to generate most reliable and accurate predictive and illustrative models using huge amount of data. SAS enterprise miner use a data mining process with five SEMMA steps, Sample, Explore, Modify, Model and Assess. Since, I have already performed first three steps on data in chapter 4, so I will be performing last two methods, model and assessment in this chapter. SAS enterprise miner provides a GUI to perform different data mining tasks. The GUI consists of Workspace where nodes can be dragged from a toolbar to create a process flow diagram. Figure 5.1 elaborates the process of creating any project in SAS Enterprise miner.

46

Figure 5.1 step by step method of creating any project in SAS Enterprise miner

Open SAS Enterprise miner

Create New Project

Create a Library

Create a Diagram

Assign a Data Source

Place Node in Workspace and Execute

Perform Data mining Using Execution Results

SAS enterprise miner have many features including data mining set of tools, an easy to use GUI, more accurate predictions, development of better predictive models for later use and text editor to write code to perform task through SAS enterprise guide. SAS also helps in the goal of data mining process to develop predictive models. These models help to find rules for prediction using variable and data from one data source. After creating better predictive model, it can be applied to the new data source for prediction.

47

5.3 Application of the Models To develop a bankruptcy prediction models it is required to have a data set. This data set needs to be imported into the SAS miner. Since SAS does not understand this data set hence it is converted to SAS data set to perform tasks by SAS miner. In the later step, the SAS dataset is divided into three parts, Training, Validation and Testing and explored using stat explorer node. I have used 70% data as training and 30% as Validation data to test the results. After the data have been divided into two categories, different predictive model are used and compared. The validation data set is employed to save a modelling node from over fitting the training data and to compare different models. Finally the results of these models are acquired and best ones are considered. The Figure 5.2 presents step by step implementation of the model generation using SAS EM: Figure 5.2 The step by step implementation of the model generation using SAS EM

Data Set

Import Data Set into SAS Miner Data Set conversion into SAS Data Set.

Exploring the Descriptive Statistics

Partition of Data in Training and Validation Sets

Different Models Implementation

Calculating Results including Type-1 and Type-2 Errors

Model Accuracy

Final Results

48

In this phase I shall be applying prediction models nodes at 5 data sets separately to perform bankruptcy prediction task, with a short introduction of the model employed. 5.3.1 Decision Trees Decision trees divides huge amount of data by applying a series of rules. These algorithmic principles break data into small pieces. These rules make the subgroups of steps that have less mixture than overall data sample. Using these steps the overall focus of data is isolated with similar target values. SAS miner creates grows prunes and assess decision tree models. Chapter 2 elaborates structure of decision trees. There are two decision trees variations in SAS, Decision Trees and HP Trees: 5.3.2 Decision Trees Model: Decision tree node is used to generate this model. This node allows applying multipath splitting of data using data variables. SAS applies best of the CHAID,CART and C4.5 algorithms using a hybrid approach ( SAS Institute Inc., 2003). Overall model accuracy using Decision Trees Model is 67.2%, 56.0%, 69.0%, 63.0%, and 67.5% for one to five years respectively. Table 5.1 presents bankruptcy prediction accuracy given in the Appendix-A. Moreover, Figure 5.4 in Appendix B shows the classification bar chart and score mode for each year. 5.3.3 High Performance Trees Model This model also applies F-Test in finding the splitting rules. It also helps to create a tree model with interval targets. Overall model accuracy using HP Tree Model is 61%, 68.3%, 70.5%, 62.0%, and 61.3 % for one to five years respectively. Table 5.2 presents bankruptcy prediction accuracy given in the Appendix-A. Moreover, Figure 5.5 in Appendix B shows the classification bar graph and score mode for each year. 5.3.4 Neural Network Neural networks consist of billions of interlinked neurons like human brain that can send and receive information from each other. They copy the style of humans the way they learn from experience. SAS provide multi variations of neural networks like Neural Network, DM Neural, Auto Neural and HP Neural. In this section I have applied each of these to analyse and find their predictive classification accuracy.

49

5.3.5 Neural Network Model This node model helps to generate, train, and test multilayer feed forward neural networks ( SAS Institute Inc., 2003). Overall model accuracy using Neural Network Model is 95.4%, 97.7%, 93.25%, 92.2%, and 90.1 % for one to five years respectively. Table 5.3 presents bankruptcy prediction accuracy given in the Appendix-A. Moreover, Figure 5.6 in Appendix B shows the classification bar graph and score mode for each year bankrupt and non-bankrupt classification. 5.3.6 Auto Neural Model Auto neural node could be found in the model group of SAS miner. This model is used to find the optimal configurations for a neural network model. Auto Neural node model performs only a small number of searches to find better network configuration. There are many options used by this model to handle configuration like, one hidden node may contain more than two neurons, iterations used estimate vector and fit vector, freeze past used layers and error functions are also used (SAS Institute Inc., 2013). Overall model accuracy using Auto Neural Model is 93.5%, 99.5%, 50.0%, 97.7%, 50.0 % for one to five years respectively. Table 5.4 presents bankruptcy prediction accuracy given in the Appendix-A. In addition to the classification accuracy table, Figure 5.7 in Appendix B shows the classification bar graph and score mode for each year’s bankrupt and non-bankrupt classification. 5.3.7 High Performance Neural Model This node model produce multi-layer neural network which delivers information between different layer map particular inputs to a predicted value. This helps in creating neural networks on huge data sets in no time. This model node has two goals (SAS Institute Inc., 2013): 1. Conducts efficient and rapid training of NN. 2. Generate easy to use and reliable model. Overall model accuracy using HP Neural Model is 51.0%, 47.25.0%, 84.0%, 89.4%, and 54.6 % for one to five years respectively. Table 5.5 presents bankruptcy prediction accuracy given in the Appendix-A. In addition to the classification accuracy table, Figure 5.8 in Appendix B shows the classification bar graph, NN diagram and score mode for each year bankrupt and non-bankrupt classification.

50

5.3.8 Data Mining Neural Model The DMNeural node model is used to create additive nonlinear model. The major purpose of the algorithm that is used in DMNeural node is to eradicate certain problems like, Nonlinear estimation problem, Computing time, Finding global and optimal solution. The training process of DMNeural creates eight functions. Each function performs a particular functionality and their optimization is also executed individually. DMneural node model choose a function that gives most appropriate results (SAS Institute Inc., 2013). Overall model accuracy using DMNeural Model is 46.64%, 55.15%, 52.4%, 61.1%, 64.7 % for one to five years respectively. Table 5.6 presents bankruptcy prediction accuracy given in the Appendix-A. In addition to the classification accuracy table, Figure 5.9 in Appendix- B shows the classification bar graph, NN diagram and score mode for each year bankrupt and non-bankrupt classification. 5.3.9 Regression Model This node also belongs to the model group of SAS miner. Regression node could be used to create both the linear and logistic regression models. The linear regression predicts the target using one or more input variables. The logistic regression requires and event of interest as a function of input variables. There are two functions used in regression node model: 1. Link Function 2. Error function Link function is used for the distribution problems and error function is used perform linear regression on the data (SAS Institute Inc., 2013). Overall model accuracy using Regression Model is, 46.64%, 55.15%, 52.4%, 61.1%, and 64.7 % for one to five years respectively. . Table 5.7 presents bankruptcy prediction accuracy given in the Appendix-A. In addition to the classification accuracy table, Figure 5.10 in Appendix B shows the classification bar graph, NN diagram and score mode for each year bankrupt and non-bankrupt classification. 5.3.10 High Performance Support Vector Machine Model High performance Support Vector Machine is a supervised intelligent technique used to conduct classification and regression analysis. The HP SVM nod model of SAS enterprise miner require only one binary target variable in the form of 0 and 1. The input variables can be of any type supported by SAS miner (SAS Institute Inc., 2013) Overall model accuracy using HP SVM Model is 58.41%, 54.0%, 54.0%, 54.2%, and 48.29 % for one to five years respectively. Table 5.8 presents bankruptcy prediction accuracy given in the Appendix-A. In 51

addition to the classification accuracy table, Figure 5.11 in Appendix B shows the classification bar graph, NN diagram and score mode for each year bankrupt and nonbankrupt classification 5.3.11 High Performance Regression Model HP regression node model also provide the facility of linear regression and logistic logistics using but in a high performance environment and using interval as well as binary class value. It predicts the target values depending on the input variable. On the contrary to Regression model this node model support interval, binary, nominal and ordinal class target values. HP Regression can perform particular selection techniques: 1. Forward, backward and stepwise for interval targets. 2. Forward, backward, stepwise, LAR and LASSO for the selection methods. Overall model accuracy using HP Regression Model is 99.0%, 50.0%, 47.25%, 49.0%, and 50.5 % for one to five years respectively. Table 5.9 presents bankruptcy prediction accuracy given in the Appendix-A. In addition to the classification accuracy table, Appendix- B shows the classification bar graph, NN diagram and score mode for each year bankrupt and nonbankrupt classification 5.3.12 Memory Based Reasoning Model Memory-Based Reasoning is similar to the Case Based Reasoning method. MBR node model recognize same cases and implement the information that is acquired from the cases to a new situation or record. This model uses K-Nearest method like CBR to predict target values. Knearest neighbour usually carries a data sample and a probe, the data sample contains a collection of variables and probe has a specific value for each variable. The distance between variable value and probe is calculated. The values that have smallest distance to the probe are k-nearest neighbour to that probe (SAS Institute Inc., 2013). Overall model accuracy using MBR Model is 52.1%, 61.9%, 59.5%, 61.3%, and 59.55% for one to five years respectively. Table 5.10 presents bankruptcy prediction accuracy given in the Appendix-A. In addition to the classification accuracy table, Figure 5.13 in Appendix B shows the classification bar graph, NN diagram and score mode for each year bankrupt and nonbankrupt classification accuracy. In this part I have applied different data mining node models available on data sample. Each model has its strengths and weaknesses. Chapter 6 elaborates an complete insight of each 52

model results and accuracy. Following is the final implementation diagram of all SAS data mining models that I have used in this study for data set 1. Figure 5.3 Final implementation diagram of models using SAS

53

Part 2: This section gives a brief introduction to WEKA and method to apply data mining in WEKA. This part also elaborates applications of data mining algorithms on the data samples using WEKA software. 5.4 WEKA: WEKA is open source software consisting of a group of algorithms to perform data mining tasks on large amount of data. Using WEKA is possible to perform different data mining related techniques on data like classification, regression, clustering and association rule mining. [Mark et al.(2009)]. WEKA divides classification algorithm into different groups, Bayes classifiers, Functions classifiers, Lazy classifiers, Meta classifiers, MI classifiers, rules base classifier and trees classifiers. The Figure 5.14 gives the step by step implementation of data mining algorithms on data using WEKA. Figure 5.14 Final application diagram of models using WEKA

Open Data File

Pre-process

Select Bankrupt/NonBankrupt as Target

Select Classification Algorithms and apply

Select Different Test options

Calculate Results with Type-1 Type-2 Errors

Calculate Model Accuracy 54

Since WEKA provides the algorithmic models so, this section represents applications these models and their empirical findings of the classification accuracy using the above mentioned implementation approach. I will be processing the confusion matrix and calculating the classification accuracy in each case. In every model generated I have used 10 fold cross validation technique to validate the accuracy of the model. 5.4.1 Naïve Bayes The Naïve Bayes classifier algorithm gives a very simple method, with vivid semantics, to propose, use and learn probabilistic informatics knowledge. This technique is used for supervised data mining tasks in which the goal is to predict a target class of test variables, while training contains the class information. The Naïve Bayes can be used on the binary, missing, nominal class values. It is also efficient for the binary, numeric, empty nominal, unary, nominal, missing attributes (George and Pat, 1995). 5.4.2 Naïve Bayes Model The overall prediction accuracy considering both bankrupt and non-bankrupt firms using Naïve Bayes Model is 85.8%, 51.1%, 85.8%, 52.2%, and 93.70 % for one to five years respectively. Moreover, Table 5.11 in Appendix-A gives a detailed prediction accuracy of both bankrupt and non-bankrupt firms. 5.4.3 BayesNet Model BayesNET or Bayes network is a general network used to infer probability of event using the observations of other events in the similar network (Sankaran and Ramesh, 2005). The overall prediction accuracy considering both bankrupt and non-bankrupt firms using BayesNet Model is, 85.8%, 51.1%, 88.0%, 51.1%, and 50.1% for one to five years respectively. Moreover, Table 5.12 in Appendix-A gives a detailed prediction accuracy of both bankrupt and non-bankrupt firms. 5.4.4 SMO OR SVM Model Sequential minimal optimization algorithm was presented by Platt (1998) to resolve the SVM quadratic programming problem. The overall prediction accuracy considering both bankrupt and non-bankrupt firms using SMO OR SVM Model is 61.7%, 55.1%, 59.4%, 53.2% and 50.7% for one to five years respectively. Moreover, Table 5.13 in Appendix-A gives a detailed prediction accuracy of both bankrupt and non-bankrupt firms.

55

5.4.5 RBFNetwork Model Radial base function network that employs radial basis function as activation functions is based on neural network logic to solve problems (Schwenker et al., 2001). The overall prediction accuracy considering both bankrupt and non-bankrupt firms using RBFNetwork Model is 61.7%, 77.5%, 63.5%, 55.7% and 88.3% for one to five years respectively. Moreover, Table 5.14 in Appendix-A gives a detailed prediction accuracy of both bankrupt and non-bankrupt firms. 5.4.6 Kstar Model K* belongs to the group of instance based classifiers but it uses an entropy based distance function instead of using other distance function (John and Leonard, 1995). The overall prediction accuracy considering both bankrupt and non-bankrupt firms using KSTAR Model is 100%, 49.8%, 50.3%, 50.4%, and 50.2% for one to five years respectively. Moreover, Table 5.15 in Appendix-A gives a detailed prediction accuracy of both bankrupt and nonbankrupt firms applying Kstar model. 5.4.7 LWL Model Locally weighted learning algorithm is also instance based algorithm but uses Naïve Bayes for the classification problems (Eibe et al., 2003). The overall prediction accuracy using both bankrupt and non-bankrupt firms using LWL Model is 51.7%, 50.3%, 52.1%, 93.6%, and 49.05% for one to five years respectively. Moreover, Table 5.16 in the Appendix-A gives a detailed prediction accuracy of both bankrupt and non-bankrupt firms using LWL model. 5.4.8 AdaBoostM1 Model This algorithm belongs to the group of nominal class classifiers and can classify only nominal class problems using the boosting technique. It resolves the problem of over fitting (Yaov and Robert, 1996). The overall prediction accuracy considering both bankrupt and non-bankrupt firms using AdaBoostM1 Model is 58.6%, 57.0%, 56.2%, 54.4% and 49.8% for one to five years respectively. Moreover, Table 5.17 in Appendix-A gives a detailed prediction accuracy of both bankrupt and non-bankrupt firms manipulating AdaBoostM1 model. 5.4.9 ClassificationViaRegression Model This algorithm belongs to the class of classification algorithms using regression for performing data mining .It uses the binary method of target once regression model have been created (Frank et al., 1998). The overall prediction accuracy considering both bankrupt and non-bankrupt firms using ClassificationviaRegression Model is 50.6%, 47.8%, 67.75%, 56

48.22%, and 56.9% for one to five years respectively. Moreover, Table 5.18 in Appendix-A gives a detailed prediction accuracy of both bankrupt and non-bankrupt firms applying ClassificationViaRegression model. 5.4.10 Decorate Model Decorate belongs to the meta learner group of algorithms in WEKA. It is the most accurate meta algorithm and classifies using particular intelligent training cases. Further information about this algorithm could be found in the conference paper by (Melville et al., 2003).The overall prediction accuracy considering both bankrupt and non-bankrupt firms using Decorate Model is 55.0%, 52.6%, 51.8%, 53.4%, and 63.79% for one to five years respectively. Moreover, Table 5.19 in Appendix-A gives a detailed prediction accuracy of both bankrupt and non-bankrupt firms Decorate model utilizing Decorate model. 5.4.11 Dagging Model This algorithm mimics the base class using the disjoint and stratified fold out of training data. The overall prediction accuracy considering both bankrupt and non-bankrupt firms using Dagging Model is 62.3%, 59.9%, 63.06%, 61.9% and 63.79% for one to five years respectively. Moreover, Table 5.20 in Appendix-A gives a detailed prediction accuracy of both bankrupt and non-bankrupt firms employing Dagging model. 5.4.12 LogisticBoost Model This meta algorithm also perform classification on data using regression considering regression as base learner. I can also solve multi class issues with the classification data mining. The overall prediction accuracy considering both bankrupt and non-bankrupt firms LogisticBoost Model is 54.5%, 62.7%, 69.8%, 45.5% and 66.09% for one to five years respectively. Moreover, Table 5.21 in Appendix-A gives a detailed prediction accuracy of both bankrupt and non-bankrupt firms LogisticBoost model. 5.4.13 MultiBoostAB Model This algorithm is a combination of AdaBoost and Wagging. It uses the capabilities of both algorithms to reduce biasness and variance. It also gives lower error rate and is more significant than Ada boost and Wagging (Geoffrey and Webb, 2000). The overall prediction accuracy considering both bankrupt and non-bankrupt firms using MultiBoostAB Model is 57.8%, 52.15%, 57.7%, 87.7% and 56.1% for one to five years

57

respectively. Moreover, Table 5.22 in Appendix-A gives a detailed prediction accuracy of both bankrupt and non-bankrupt firms MultiBoostAB model. 5.4.14 Random Committee Model This Meta algorithm use random base classifiers. The base classifier is created using distinct random seed. The final result is the average of the predictions produced by the individual distinct classifiers. The overall prediction accuracy considering both bankrupt and non-bankrupt firms using Random Committee Model is 54.0%, 49.5%, 51.9%, 50.4% and 50.4% for one to five years respectively. Moreover, Table 5.23 in Appendix-A gives a detailed prediction accuracy of both bankrupt and non-bankrupt firms Random Committee model. 5.4.15 HyperPipes Model This algorithm belongs to the miscellaneous group of the WEKA algorithms. It uses hyperPipe classifiers. Hyper pipe is created for each group of classes and it consists of all points of that group. The observations are suited in the groups that contain most of the similar observations. The overall prediction accuracy considering both bankrupt and non-bankrupt firms using HyperPipes Model is 49.6%, 48.50%, 48.60%, 49.3% and 46.9% for one to five years respectively. Moreover, Table 5.24 in Appendix-A gives a detailed prediction accuracy of both bankrupt and non-bankrupt firms HyperPipes model. 5.4.17 NNge Model This algorithm is just like nearest neighbour but uses non nested exemplars. The overall prediction accuracy considering both bankrupt and non-bankrupt firms using NNge Model is 50.8%, 50.0%, 52.3%, 44.3% and 48.8% for one to five years respectively. Moreover, Table 5.25 in Appendix-A gives a detailed prediction accuracy of both bankrupt and non-bankrupt firms using NNge model. 5.4.18 OneR Model This algorithm is very simple classification algorithm and works on creating a 1R classifier. Further information related to this algorithm could be obtained from the research article by Holte (1993).

58

The overall prediction accuracy considering both bankrupt and non-bankrupt firms using OneR Model is 51.3%, 51.02%, 51.3%, 51.02% and 50.05% for one to five years respectively. Moreover, Table 5.26 in Appendix-A gives a detailed prediction accuracy of both bankrupt and non-bankrupt firms using OneR model. 5.4.19 ZeroR Model This is the algorithm belongs to rules groups in WEKA. This is the simplest algorithm and completely relies of the target variable without taking in consideration the predictors. The overall considering accuracy using both bankrupt and non-bankrupt firms using ZeroR Model is 49.5%, 49.5%, 49.5%, 49.5% and 49.5% for one to five years respectively. Moreover, Table 5.27 in Appendix-A gives a detailed prediction accuracy of both bankrupt and nonbankrupt firms using ZeroR model. 5.4.20 Random Forest Model This algorithm is combination of tree predictors and each tree predictor relies on the values of a random vector. Breiman (2001) has given a complete insight of random forest algorithm. The overall prediction accuracy considering both bankrupt and non-bankrupt firms using Random Forest Model is 51.2%, 49.4%, 49.2%, 47.05% and 50.3% for one to five years respectively. Moreover, Table 5.28 in Appendix-A gives a detailed prediction accuracy of both bankrupt and non-bankrupt firms using Random Forest model. 5.4.21 J48 Model This algorithm is used to produce a C4.5 pruned or unpruned decision tree. The overall prediction accuracy considering both bankrupt and non-bankrupt firms using J48 Model is 52.5%, 48.6%, 49.5%, 51.0% and 50.9% for one to five years respectively Moreover, Table 5.29 in Appendix-A gives a detailed prediction accuracy of both bankrupt and non-bankrupt firms using J48 model. 5.4.22 SimpleCart Model This algorithm belongs to the trees group of algorithms in WEKA and used to create classification trees using fractional instances. The overall prediction accuracy considering both bankrupt and non-bankrupt firms using SimpleCart Model is 49.86%, 49.87%, 50.15%, 58.4%, and 53.51% for one to five years respectively. Moreover, Table 5.30 in Appendix-A gives a detailed prediction accuracy of both bankrupt and non-bankrupt firms using SimpleCart model.

59

5.4.23 END Model This algorithm belongs to the meta group of algorithms in WEKA. It is used to solve problems related to two class classifiers. The overall prediction accuracy considering both bankrupt and non-bankrupt firms using END Model is 52.5%, 52.4%, 54.1%, 51.0% and 52.5% for one to five years respectively. Moreover, Table 5.31 in Appendix-A gives a detailed prediction accuracy of both bankrupt and non-bankrupt firms using End model.

60

Part 3 This section consists of a brief introduction to IBM SPSS, application of MLP neural networks, different variations of decision trees and nearest neighbour algorithm to predict bankruptcy. 5.5 IBM SPSS This is a program developed and designed to perform predictive analytics using different machine learning algorithms. It facilitates a wide range of algorithms and methods to perform statistical and data mining tasks. 5.5.1 MLP neural network Model This is the most extensively used neural network in data analysis and constructing classifiers (Asil and Shahsavand, 2014).

The basic of multilayer perceptron is based on hidden units

and input layers. Every hidden layer accepts a collection of input variables and the activation function converts the results to final layers called output. The overall prediction accuracy considering both bankrupt and non-bankrupt firms using MLP neural network Model is 100.0%, 86.2%, 94.5%, 58.7% and 51.6% for one to five years respectively. Moreover, Table 5.32 in Appendix-A gives a detailed prediction accuracy of both bankrupt and non-bankrupt firms using MLP neural network model. 5.6 Models implementation using variations of decision trees Since It has been discussed in chapter 2 that there are many types of decision trees algorithms the most important one are CHAID, CHAID Exhaustive, CART and QUEST. Now I will be implementing these algorithmic models to find out the most efficient and reliable model for bankruptcy event prediction. 5.6.1 CHAID Model CHAID tree based model was proposed by Kass ( 1980) to assess the relationship between input variables and target variables. This model divides explanatory variables into homogenous subgroups according to the response variable. In CHAID stepwise process it recognize each input (explanatory) variable in turn of the least important with respect to target (response) variable. If the difference is very below a particular level (p-value) then both of the categories are considered to be linear and are combined in one category. Split iteration ensures that best partition is found for each response (target) variable [Andrea et al. (2014)]. The overall prediction accuracy using both bankrupt and non-bankrupt firms using Naïve Bayes Model is 56.2%, 56.1%, 56.0%, 53.0% and 61.2% for one to five years 61

respectively. Moreover, Table 5.33 in Appendix-A gives a detailed prediction accuracy of both bankrupt and non-bankrupt firms using CHAID model. 5.6.2 CHAID Exhaustive Model CHAID Exhaustive algorithm was presented by Biggs et al. (1991). This algorithm is basically based on three steps: 1. Merging 2. Splitting 3. Stopping In the merging step each explanatory (input) variable merge non-important categories and each final category have one child node. The merging step also calculates the p- value which is used in the splitting step. The splitting step then find the best split for each predictor value found in merging step and selects which one of the predictor value is to be used to split the child node. In the final step the stopping step will stop the tree growing process: ➢ If the node is pure. ➢ If further split is not possible. ➢ If the node size is less than the node size specified by user. ➢ If the split node provides a child node whose node size is less than specified by user. ➢ If the tree depth reaches user specified limit. Following is the classification accuracy result performing CHAID Exhaustive on data using SPSS. The overall prediction accuracy using both bankrupt and non-bankrupt firms using CHAID Model is 58.5%, 82.2%, 55.2, 53.0% and 66.3% for one to five years respectively. Moreover, Table 5.34 in Appendix-A gives a detailed prediction accuracy of both bankrupt and nonbankrupt firms using CHAID Exhaustive model. 5.6.3 CART Model Classification and regression trees are intelligent technique for creation of better prediction model from the data provided. In the CART the models are generated by recursive partitioning of data and each partition have significant fitting of the model which results in a graphical decision tree. The classification tree are built for dependent variables that have particular random values while regression trees are built for the dependent continuous and ordered variables (Loh, 2011). 62

The overall prediction accuracy using both bankrupt and non-bankrupt firms using CART Model is 57.9%, 57.1%, 56.2, 54.4% and 52.7% for one to five years respectively. Moreover, Table 5.35 in Appendix-A gives a detailed prediction accuracy of both bankrupt and nonbankrupt firms using CART model. 5.6.4 QUEST Model Quick, unbiased, efficient statistical tree is a statistical algorithm for classification and data mining proposed by (Loh and Shih, 1997). The major features of this algorithm are to: 1. Use unbiased variable selection. 2. Use Fisher’s LDA technique. 3. Impute missing values. 4. Predict variables with many categories. The overall prediction accuracy using both bankrupt and non-bankrupt firms using QUEST Model is 94.0%, 82.0%, 78.2, 50.0% and 50.0% for one to five years respectively. Moreover, Table 5.36 in Appendix-A gives a detailed prediction accuracy of both bankrupt and nonbankrupt firms using QUEST model. 5.6.5 K-NN Model K- Nearest Neighbour (KNN) is the oldest and simple non-parametric classification algorithmic technique. In the KNN a target is allocated to the most general target class among its k-nearest neighbour. In K-NN classification approach the target is a membership class and each object is inserted into this class by the majority closeness vote of its neighbour. The features of KNN are its simplicity, easiness to interpret and greater accuracy rate [Hui et al. (2011)]. The overall prediction accuracy using both bankrupt and non-bankrupt firms using K-NN Model is 61.3%, 53.4%, 45.2, % and 47.1% for one to five years respectively. Moreover, Table 5.37 in Appendix-A gives a detailed prediction accuracy of both bankrupt and nonbankrupt firms using KNN model. 5.7 Summary This chapter gives a complete insight of the models implementation, generation and overall classification accuracy of each model using SAS miner, WEKA and IBM SPSS. Next step is to critically analyse these results and select the most efficient model of data mining software used in this chapter. 63

Chapter 6 Results Analysis and Critical Evaluation 6.1 Introduction This chapter consists of brief description of Type-I and Type-II errors of bankruptcy prediction models. This chapter also consists of the analysis and critical evaluation of the results obtained from applications of models using SAS Enterprise Miner, WEKA and IBM SPSS. 6.2 Type-I Error Type-I error is also called alpha error, this error happens when bankrupt firms are predicted as nonbankrupt. Type-I error has a greater cost impact on banks than type-II error. In terms of credit analysis, type-I error shows the loss of capital loan and interest related with a client that goes bust, when he was predicted non-bankrupt. Hence, type-I error has a greater cost factor for banks than typeII error (Neves and Vieira, 2006). According to Altman et al. (1977) type-I error costs are 35 times higher for banks than type-II error costs. According to Neves and Vieira (2006) overall Type-I error is calculated as: 𝑇𝑦𝑝𝑒 − 𝐼 𝐸𝑟𝑟𝑜𝑟 =

𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑏𝑎𝑛𝑘𝑟𝑢𝑝𝑡 𝑓𝑖𝑟𝑚𝑠 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝑎𝑠 𝑛𝑜𝑛−𝑏𝑎𝑛𝑘𝑟𝑢𝑝𝑡 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠 𝑐𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑒𝑑 𝑎𝑠 𝑁𝑜𝑛−𝑏𝑎𝑛𝑘𝑟𝑢𝑝𝑡

6.3 Type-II Error Type-II error is also called beta error, this error happens when non-bankrupt firms are predicted as bankrupt. In terms of credit analysis, type-II error causes loss to a business with as potential customer that is healthy, but was predicted as bankrupt. Type-II error costs could be higher than type-I error costs if a government decides to impose a formal early warning system. However, Type-I and Type-II costs are not presented in most of the literature articles and remain mainly unrevealed Neves and Vieira (2006). According to Neves and Vieira (2006) overall Type-II error is calculated as: 𝑇𝑦𝑝𝑒 − 𝐼𝐼 𝐸𝑟𝑟𝑜𝑟 =

𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑁𝑜𝑛−𝑏𝑎𝑛𝑘𝑟𝑢𝑝𝑡 𝑓𝑖𝑟𝑚𝑠 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝑎𝑠 𝑏𝑎𝑛𝑘𝑟𝑢𝑝𝑡 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠 𝑐𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑒𝑑 𝑎𝑠 𝑏𝑎𝑛𝑘𝑟𝑢𝑝𝑡

6.4 Total Error Total error of a predicted model is given as the sum of Type-I and Type-II errors divided by total number of observations in the data. According to Neves and Vieira (2006) total error is calculated as: 𝑇𝑦𝑝𝑒 − 𝐼𝐼 𝐸𝑟𝑟𝑜𝑟 =

𝑇𝑦𝑝𝑒−𝐼 𝐸𝑟𝑟𝑜𝑟+𝑇𝑦𝑝𝑒−𝐼𝐼 𝐸𝑟𝑟𝑜𝑟 𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑑𝑎𝑡𝑎 𝑠𝑎𝑚𝑝𝑙𝑒

64

6.5 Classification Accuracy The classification accuracy of a bankruptcy prediction model is generally measured by the percentage of correctly classified observations. The Classification accuracy is calculated as Neves and Vieira (2006): 𝐶𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑐𝑎𝑡𝑖𝑜𝑛 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =

𝑇𝑜𝑡𝑎𝑙 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝐶𝑜𝑟𝑟𝑒𝑐𝑡 𝐶𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑐𝑡𝑖𝑜𝑛𝑠 𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑑𝑎𝑡𝑎 𝑠𝑎𝑚𝑝𝑙𝑒

6.6 Empirical Results Analysis After applying Type-I error and Type-II error I have calculated the accuracy of each model for every year. This section is divided into three categories according different software models. 1. Analysis of results of SAS enterprise miner models. 2. Analysis of results of WEKA models. 3. Analysis of results of IBM SPSS models. 6.6.1 Analysis of Results of SAS Enterprise Miner Models The results obtained after the implementation of different SAS Models, it has been proved that four SAS Miner models have given efficient result in bankruptcy prediction. Table 6.1 (Part-1) and Figure 6.1, which consist of classification prediction accuracy of bankrupt firms prior five years, clearly shows that four models, NN, HP Neural, Regression and HP Regression are giving bankruptcy prediction accuracy more than 90% for each year before the event. Table 6.1 (Part-2) and Figure 6.1, which also consist of classification prediction accuracy for nonbankrupt firms prior five years, shows that Neural Network and Auto Neural models has given bankrupt firms classification accuracy more than 90% for each year before the event. According to Table 6.1 (Part-1) prediction accuracy of Neural networks is 95.90%,97.80%,95.50%,95.00% and 95% for starting from one to five years respectively before bankruptcy year, which shows that NN are more efficient than other three models, as others have certain fluctuation in some years. Similarly, Table 6.1 (Part-2) shows that Auto Neural model which is also a type of NN also gives 93%, 99.5%, 99%, 97.6% and 99% starting from year one to five respectively. According to the research conducted in the field of bankruptcy prediction, various researchers have used different statistical and intelligent methods to predict bankruptcy but Neural Networks and its different types are most commonly used intelligent methods (kumar and ravi, 2007). Cadden (1991) used neural network model to predict bankruptcy using three year ahead forecast, his classification accuracy was 90%, 90% and 80% respectively for bankrupt firms and 100%,90% and 90% for nonbankrupt firms. Moreover, Leshno and Spector (1996) also used Neural Network method to predict bankruptcy, and obtained prediction accuracy of the two years ahead case 76.4% to 76.4%.

65

Table 6.1 Bankrupt and non-bankrupt five years ahead prediction accuracy table using SAS Enterprise miner models

Model Name Decision Trees HP Trees Neural Network Auto Neural HP Neural DMNeural Regression HP SVM HP Regression MBR

(Part-1) Bankruptcy Prediction Accuracy (%) Prior Event One year Two Years Three Four Five Years years Years 73.27 60.00 64.50 32.00 39.80 78.44 84.00 71.70 51.00 32.00 95.90 97.80 95.50 95.00 95.00 94.00 99.50 0 97.80 0 90.00 95.00 95.00 90.00 97.8 59.30 93.70 92.27 74.20 39.80 98.70 95.00 92.27 94.42 95.90 67.70 40.70 40.70 38.70 33.40 98.00 92.27 95.60 95.60 93.00 39.80 48.20 49.10 47.20 47.20

One year

Two Years

52.00 50.40 95.40 93.00 12.00 63.57 99.50 49.13 100.00 64.47

61.20 52.90 97.60 99.50 0.00 16.60 0.00 67.70 0.00 75.60

Three Years 94.80 70.00 92.00 99.00 73.00 13.00 0.00 67.70 0.00 70.00

Four years 73.50 73.00 92.40 97.60 88.90 48.00 6.20 70.00 3.00 75.40

Five Years

Non-Bankrupt Prediction Accuracy

Banrkupt Prediction Accuracy

120 100 80 60 40 20 0

(Part-2) Non-Bankruptcy Prediction Accuracy (%) Prior Event

120 100 80 60 40 20 0

One year Two Years Three Years

One year Two Years Three Years

Four years

Four years

Five Years

Five Years

Figure 6.1 Bankrupt and non-Bankrupt firms prediction Accuracy chart

66

95.6 90.20 93.10 99 12.00 90.51 3.40 63.57 3.00% 71.90

6.6.2 Analysis of Results of WEKA The results obtained after the implementation of WEKA data mining algorithms have shown that WEKA is also very good software to preform classification using different algorithms. The Table 6.2 (Part-1) and Figure 6.2 clearly show that SimpleCart, RBFNetwork and MultiboostAB are the most efficient algorithms to predict bankruptcy phenomena. The prediction accuracy of SimpleCart algorithm is 89.60%, 70.00%, 89.80%, 86.40% and 85.70% starting from one to five years respectively for a case of five years ahead forecast of bankruptcy. MultiboostAB algorithm is also showing good prediction accuracy of 82.10%, 76.29%, 80.40% for first, second and fourth year, but its classification accuracy is below 70% for third and fifth year. The Figure 6.3 also represents that RBFNetwork is also a very good predictor of bankruptcy with a prediction rate of over 70% in first two years, 90% in third and fifth year, and 48% in the fourth years. The non-bankrupt firms forecast is also handled efficiently by OneR, Hyperpipes and Dagging algorithms. The Table 6.2 (Part-2) and Figure 6.3 evidently displays that prediction classification of OneR is over 95.0% in case of five years ahead forecast. It can also be observed that non-bankrupt classification accuracy of Dagging algorithm is more than 80% for first four years and 77.6% for the fifth year.

Figure 6.2 Bankrupt firms five years ahead prediction accuracy using WEKA models chart

One year

Two Years

Three Years

Four years

Five Years

Figure 6.3 non-Bankrupt firms five years prediction accuracy using WEKA models chart

67

END

Simple…

J48

Rando…

ZeroR

OneR

NNge

HyperP…

Rando…

MultiB…

LogitBo…

Dagging:

Decora…

Classifi…

AdaBo…

LWL:

KSTAR:

RBFNet…

SMO:

BayesN…

150.00% 100.00% 50.00% 0.00%

Naïve…


Table 5.2 Bankrupt and non-bankrupt firms five years ahead prediction accuracy table using WEKA models

(Part-1) Bankruptcy Prediction Accuracy Prior

Model Name Naïve Bayes: BayesNet:

One year 92.00%

(Part-2) Non-Bankruptcy Prediction Accuracy Prior

Two Years

Three Four Five Years One year Two Years Three Four Five Years Years years Years years 6.20% 79.70% 93.75% 92.80% 79.70% 96.30% 92.00% 10.50% 94.60%

100.00%

6.20%

78.00%

57.10%

57.30%

79.70%

96.30%

98.00%

45.20%

43.00%

SMO:

73.70%

58.20%

62.50%

51.50%

55.60%

49.70%

51.90%

59.40%

53.20%

45.90%

RBFNetwork:

76.70%

75.40%

92.30%

62.90%

95.00%

46.70%

79.50%

34.60%

62.50%

81.70%

100%

50.20%

49.80%

49.80%

52.80%

100%

47.40%

54.50%

51.00%

47.60%

LWL:

81.50%

61.20%

74.80%

91.60%

10.60%

21.90%

46.70%

29.50%

95.70%

87.50%

AdaBoostM1:

53.20%

51.00%

53.20%

83.80%

45.90%

64.00%

64.00%

37.20%

25.00%

49.80%

ClassificationviaRegression:

32.30%

29.31%

62.50%

18.70%

24.70%

68.90%

66.40%

73.06%

78.50%

89.22%

Decorate:

52.80%

92.70%

23.70%

55.20%

88.20%

57.30%

12.50%

79.90%

51.50%

16.60%

Dagging:

50.21%

37.90%

44.60%

52.80%

43.10%

74.40%

81.90%

81.50%

71.20%

84.50%

LogisticBoost:

68.10%

73.92%

68.90%

70.00%

60.30%

44.80%

51.50%

70.90%

21.20%

72.20%

MultiBoostAB

82.10%

76.29%

69.40%

80.40%

61.80%

33.60%

34.00%

46.10%

95.20%

50.31%

Random Committee

56.50%

51.30%

49.50%

53.20%

53.20%

51.50%

47.60%

54.30%

47.60%

50.40%

HyperPipes

19.20%

16.80%

16.80%

19.50%

16.20%

80.20%

80.20%

80.30%

80.20%

77.60%

NNge

53.01%

55.80%

57.90%

46.90%

53.50%

48.70%

44.30%

46.70%

41.59%

44.10%

OneR

6.00%

4.70%

6.00%

4.70%

5.30%

96.70%

97.50%

96.70%

96.70%

95.01%

ZeroR

39.60%

39.60%

39.60%

39.60%

39.60%

59.50%

59.50%

59.50%

59.50%

59.50%

Random Forest

55.20%

32.80%

42.00%

47.10%

36.60%

47.20%

66.16%

56.40%

47.00%

64.20%

J48

64.10%

40.80%

28.40%

54.00%

49.70%

40.90%

56.40%

70.60%

48.00%

52.10%

SimpleCart

89.60%

70.00%

89.80%

86.40%

85.70%

10.10%

29.74%

10.50%

30.40%

21.30%

END

64.00%

60.00%

66.20%

49.80%

64.00%

40.90%

44.80%

40.90%

52.20%

40.90%

KSTAR:

68

6.6.3 Analysis of results of IBM SPSS models The results obtained after the implementation of SPSS models have demonstrated that SPSS can also be used to predict bankruptcy of a firm in an effective manner. The Table 6.3 (Part1) and Figure 6.4 effectively illustrate that Multi-Layer Perception Neural Network (MLP Neural Network) is the most effective model to predict bankruptcy. The prediction accuracy of this model is 100.00%, 90.40%, 98.10%, 74.40% and 32.10% starting from first year to fifth year forecast respectively. It can also be observed that Classification and regression tree (CART) model captured second position in prediction of bankruptcy. The classification accuracy of CART model is 84.90%, 72.20%, 86.20%, 83.80% and 95.00%, one to five years before bankruptcy respectively. Non-bankrupt firms are also predicted by MLP Neural Network model. Table 6.3 (Part-2) and figure 6.4 also presents the classification accuracy of non-bankrupt firms from 100.00%, 82.00%, 91.00%, 42.50%, 72.20% one to five years correspondingly. Figure 6.6 demonstrate that Quick, Unbiased, Efficient Statistical Tree(QUEST) also provides a good classification accuracy of 88.10%, 56.20%, 100.00% and 100.00% for fist four years and 0% for the fifth year of non-bankrupt firms. Table 6.3 Bankrupt and non-bankrupt firms five years prediction accuracy table using SPSS (Part-1) Bankruptcy Prediction Accuracy Prior

(Part-2)Non-Bankruptcy Prediction Accuracy Prior

Model Name

One year

MLP neural network CHAID

100.00%

90.40%

98.10%

74.40%

32.10% 100.00%

82.00%

78.90%

41.80%

85.30%

12.90%

59.30%

33.60%

CHAID Exhaustive

65.10%

65.00%

75.20%

12.90%

CART

84.90%

72.20%

86.20%

83.80%

QUEST

88.10%

65.00%

56.20%

K-NN

61.40%

57.60%

51.80%

Two Years

Three Years

Four years

Four years

Five Years

91.00%

42.50%

72.20%

70.50%

26.70%

93.10%

63.10%

91.40%

51.90% 100.00%

35.10%

93.10%

41.20%

95.00%

30.80%

42.00%

26.30%

25.00%

10.30%

0.00% 100.00%

88.10%

56.20% 100.00% 100.00%

0.00%

61.20%

49.10%

50.30%

Five Years

51.10%

Bankrupt Prediction Accuracy

120.00% 100.00% 80.00% 60.00% 40.00% 20.00% 0.00%

One year

Two Years

Three Years

42.70%


120.00% 100.00% 80.00% 60.00% 40.00% 20.00% 0.00%

One year

Two Years

Four years

Five Years

Three Years

One year

Two Years

Four years

Five Years

Figure 6.4 Bankrupt and non-bankrupt firms prediction accuracy

69

Three Years

47.20%

44.90%

6.7 Critical Evaluation There is one pitfall associated with all data mining software I have used in this empirical study. Despite various advantages and characteristics of SAS enterprise miner used in this study, there is one disadvantage, that it works on nodes and does not specify the name of the algorithm used in the development of model. WEKA data mining software resolves this problem but there is another problem associated with WEKA, that it does not provide graphical user interface. Both of these problems are eliminated by IBM SPSS but I do not have access to the complete data mining IBM SPSS modeller. The dataset samples used in this study were also a big hindrance in performing different data mining techniques. All data samples had a great deal of missing values. Though, I applied IBM SPSS technique to eradicate missing values drawback yet, I am not sure that all the values were imputed efficiently by IBM SPSS. Final drawback of this approach is that it cannot predict the human faults and frauds. We know that all financial statements are made by accountants and concerned staff of the company. If they are making are not giving correct information about the company ratios then these models are unable to predict the bankruptcy of the company. So, if the financial ratios are faulty the result would also be accordingly faulty. 6.8 Summary This chapter contains results of all major software used in this study. I have concluded that the all models of software used in this study have their particular importance in the field of bankruptcy prediction. The most important models to predict bankruptcy using SAS Enterprise miner are, Neural Network, Auto Neural, Regression and HP Regression. The most

efficient

models

to

forecast

bankruptcy

using

WEKA

are

SimpleCart,

RBFNetwork,OneR and MultiboostAB. Considering IBM SPSS the most reliable models are MLP Neural Network, CART and QUEST to classify bankruptcy prediction. Finally, the main pitfall in the study is the missing values in the data.

70

Chapter 7 Conclusion and Future Directions 7.1 Conclusions In this study I have used variety of data mining classification methods to deal with bankruptcy prediction. I have applied numerous data mining models, using the most commonly used software to predict bankruptcy more effectively as well as accurately. In this dissertation, there were three major objectives to achieve, using five years prior financial ratios of 464 bankrupt and 464 non-bankrupt firms. Firstly, to develop different data mining models to predict bankruptcy using three data mining software, SAS Enterprise miner, WEKA and IBM SPSS. Secondly, the application of these models, and analyse the accuracy of each model separately. Thirdly, to obtain the most accurate model provided by different data mining software individually. The first motivation of this study was to understand financial distress that leads to bankruptcy, effects of bankruptcy, cost of bankruptcy and the factors involved in bankruptcy. The second motivation was to find, most commonly used data mining models used from 1932 to present and apply those models to test their accuracy. Very vast research has been carried out in the field of bankruptcy prediction because of the importance of the topic. Nevertheless, each research study has used only few machine learning or statistical methods to predict bankruptcy. To develop an effective data mining classification model, is a very significant but slightly difficult task for financial organisations. These prediction models tests a new individual or company, whether or not it will bankrupt. If the classification accuracy of these prediction models is not efficient, this can lead to wrong decisions and cause huge financial lose (Tsai et al., 2014). To achieve goals of my study mentioned above, I developed 6 chapters and each chapter is a building block to achieve my goal: Chapter one is related to introduction, Chapter 2 is related to literature review, chapter 3 defines bankruptcy and its costs, chapter 4 gives a complete insight of the data and test samples used, chapter 5 gives a detailed description of development and application of each model using SAS EM, WEKA and SPSS, and Chapter 6 provides a critical evaluation of these effective models. After carrying out an extensive research, in the field of bankruptcy prediction, I have understood the importance of an effective model for bankruptcy prediction. Furthermore, bankruptcy is an important phenomenon for a big or small company. Finally, I concluded that 71

most of the researchers only used one or two methods to predict bankruptcy. So, I chose to apply a variety of data mining models using software, SAS Enterprise Miner, WEKA and IBM SPSS. Then, to give a better understanding of bankruptcy to the reader, corporate financial distress, actual cause of bankruptcy, was defined and elaborated. Moreover, different stages of financial distress, factors of financial distress, causes and results of corporate distress were discussed. Later on, bankruptcy was defined and four types of costs associated with bankruptcy were illustrated. In the later step, data was gathered from FAME (Financial Analysis Made Easy) database. This data was cleansed and pre-processed by applying statistical techniques and tools. Missing values were minimized, using SPSS missing value imputation technique. Outliers were handled using winsorization method. In addition, data was divided into five different data sets prior to bankruptcy year. Since the research in the field of bankruptcy prediction, shows that the selection of financial identifiers (ratios) is also very important factor for creating an effective model. If significant identifiers are not selected, the results of the developed model would not be accurate. By keeping in this in mind, I have chosen 41 financial ratios most commonly used in various research studies from different ratio groups, Liquidity, Leverage, solvability, profitability, efficiency and cash flow . Then, Chapter 5 consists of three parts, part-1 elaborate step by step procedure of model development using SAS EM. I have developed 11 models using Decision Trees, HP Trees, Neural Network, Auto Neural, HP Neural, DMNeural, Regression, HP SVM, HP Regression and Memory Based Reasoning (MBR) nodes of SAS enterprise Miner and implemented these models on the five years distinct data samples. The best bankruptcy prediction models using SAS EM are Neural Network, Auto Neural, Regression and HP Regression. Later on, I have illustrated a step by step process of model generation using WEKA, and developed 21 distinct models using Naïve Bayes, BayesNet, SMO, RBFNetwork, KSTAR, LWL, AdaBoostM1,ClassificationviaRegression,Decorate,Dagging,LogisticBoost,MultiBoostAB, Random, Committee, HyperPipes, NNge, OneR, ZeroR, Random Forest, J48, SimpleCart and END algorithmic data mining models. The highest bankruptcy prediction model using WEKA are SimpleCart, RBFNetwork,OneR and MultiboostAB. Finally, I gave a step by step plan of model development using SPSS. I proposed, MLP neural network, CHAID, CHAID Exhaustive, CART, QUEST and K-NN 6 individual models using IBM SPSS. The best classification accuracy is given by MLP Neural Network to predict bankruptcy. Finally, Chapter 6 critically evaluates the results provided by each software and model separately. It is concluded that the classification accuracy of Neural Network model is higher 72

than all of the other models. In case of SAS EM, NN models provided results of 95.90%, 97.80%, 95.50%, 95.00%, and 95.00% and Auto neural provided classification accuracy of 93% , 99.5%, 99%, 97.6% and 99% in bankruptcy prediction using five years prior ratios of the firms. Moreover, using WEKA SimpleCart data mining (DM) algorithm provided 89.60%, 70.00%, 89.80%, 86.40%, 85.70% classification accuracy for one to five years respectively, on the other hand, RBFNetwork algorithm that works with hidden layers also provided 76.70%, 75.40%, 92.30%, 62.90%, 95.00% bankruptcy prediction accuracy on a five years financial ratios of different firms. Finally, MLP neural network model of IBM SPSS also provided remarkable classification accuracy of 100.00%, 90.40%, 98.10%, 74.40% and 32.10% for one to five years respectively. In the background history of bankruptcy prediction studies, the neural network models have captured a significant place. Researches on the applications of NN models to financial distress prediction problems inaugurated in the 1990s, and they are still operational in today’s research. For two decades, researchers have verified the supremacy of NN models over numerous statistical models such as MDA, logistic regression, and k-NN (Jeong et al., 2012). This dissertation also acknowledges the supremacy of NN models over other data mining models. 7.2 Future Directions In this dissertation, I have employed about 37 distinct data mining classification models using SAS EM, WEKA, and IBM SPSS, but many researchers have used only one or two prediction models. I have come to the conclusion that NN models and their types are the most effective models to predict bankruptcy. In future, it would be a fascinating subject to predict bankruptcy using different financial statements instead of using financial ratios. The bankruptcy prediction for five years ahead have been done in this study using numerous data mining models, but financial statement, balance sheets, income statements, and statements of cash flows could also be used in near future to predict bankruptcy. Moreover, the models could also be used to predict bankruptcy of individuals in the near future. I have applied many data mining models in this study to predict bankruptcy, but many other methods are also available to predict bankruptcy. In future, research can also be conducted to predict bankruptcy without using financial ratios and applying data mining on financial statements.

73

Bibliography Guoqiang Zhang, Michael Y. Hu, , B. Eddy Patuwo, Daniel C. Indro, 1999. Artificial neural networks in bankruptcy prediction: General framework and cross-validation analysis. European Journal of Operational Research, 116(1), pp. 16-32. H. Kurniawan, Peter Nwe, Kok Thai, P. Ravi Kumar,V. Ravi,, 2008. Soft computing system for bank performance prediction. Applied Soft Computing, 8(1), pp. 305-315. SAS Institute Inc., 2003. Data Mining Using SAS® Enterprise MinerTM A Case Study Approach.. Second Edition ed. Carry: NC: SAS Institute Inc.. A. Aamodt, E. Plaza, 1994. Case-based reasoning; foundational issues, methodological variations, and system approaches. AI Communications, 7(1), pp. 39-59. A. Garmroodi Asil, A. Shahsavand, 2014. Reliable estimation of optimal sulfinol concentration in gas treatment unit via novel stabilized MLP and regularization network. Journal of Natural Gas Science and Engineering., Volume 21, pp. 791-804. A. Vellido, P. Lisboa and J. Vaughan , 1999. Neural Networks in Business A survey of Applications(1992-1998).. Expert system applications, Volume 17, pp. 51-71. A.I. Dimitras , S.H. Zankis, C. Zopounidis, 1996. A survey of business failures with an emphasis on prediction methods and industrail Applications. European Journal of Operational Research , I(90), pp. 487-513. Altman E. ,R. Haldeman , P. Narayanan , 1977. Zeta analysis: A new model to identify bankruptcy risk of corporations.. Journal of Banking and Finance , 1(1), pp. 29-51. Altman, E. ,B. Loris, 1976. A financial early warning system for over-the-counter broker-dealers. Journal of Finance , 4(12), pp. 1201-1217. Altman, E.I., Hotchkiss E., 2005. Corporate Financial Distress and Bankruptcy:Predict and Avoid Bankruptcy, Analyze and Invest in Distressed Debt.. 3rd ed. New Jersy: Jhon Wiley & sons. Altman, E.I, 1968. Financial Ratios, Discriminant Analysis and the prediction of corporate bankruptcy. Journal of Finance, 4(1968), pp. 589-609. Altman, E. I., 1984. A further Empirical Investigation of the bankruptcy cost question.. The Journal of Finance., XXXIX(4), pp. 1067-1089. Andrea Bichlera, , Arnold Neumaierb, , Thilo Hofmanna,, 2014. A tree-based statistical classification algorithm (CHAID) for identifying variables responsible for the occurrence of faecal indicator bacteria during waterworks operations. Journal of Hydrology, 519 Part A.(27), pp. 909-917. Anon., 2014. The Street. [Online] Available at: http://www.thestreet.com/gallery/tsc-bankruptcy2-decade/0/photo-closed.html [Accessed 09 09 2014]. Arindam Chaudhuri and kajal De, 2011. Fuzzy Support Vector Machine For Bankruptcy Prediction. Applied Soft computing , Volume 11, pp. 2472-2486. 74

Arindam Chaudhuri and Kajal De, 2011. Fuzzy Support Vector machine for bankruptcy prediction.. Applied Vector Machine for bankruptcy predction., 11(2), pp. 2472-2486. Arindam Chaudhuri, Kajal De, 2011. Fuzzy Support Vector Machine for bankruptcy prediction. Applied Soft Computing, 11(2), p. 2472–2486. Arjana Brezigar-Masten , Igor Masten, 2012. CART-based selection of bankruptcy predictors for the logit model.. Expert Systems With Applications, Volume 39, pp. 10153-10159. B. Matarazzo,R. Slowinski and S. Greco, 2002. Rough approximation by dominance relations.. International Journal of Intelligent Systems., 17(2), pp. 153-171. B. Wong, T. Bodnovich and Y selvi, 1997. Neural network applications in Business. A review and analysis of the literature(1988-95). Decision support systems, Volume 19, pp. 301-320. Bankruptcy prediction with rough sets. (2001) ERIM Report Series Research in Management (ERS2001-11-LIS). Beaver, W., 1966. Finanacial Ratios as predictors of failure. Journal of Accounting Research , 3(1966), pp. 71-111. Bigss, D., Ville, B., and Suen, E., 1991. A Method of Choosing Multiway Partitions for Classification and Decision Trees.. Journal of Applied Statistics., 18(1), pp. 49-62. Blum, M., 1974. Failing company discriminant analysis. Journal of Accounting Research , 1(12), pp. 125. Bose, I., 2006. Deciding the financial health of dot-coms using rough sets.. Information & Management., 43(7), pp. 835-846. Branch, B., 2002. A cost of bankruptcy A review.. International Review of Financial Analysis, Volume 11, pp. 39-57. Breiman, L., 2001. Random Forests. Machine Learning, Volume 45, pp. 5-32. Bris, A., Welch, I., Zhu, N, 2006. The costs of bankruptcy: Chapter 7 liquidation versus Chapter 11 reorganization.. Journal of Finance, Volume 61, pp. 1253-1303. Bryant, S. M., 1997. A case-based reasoning approach to bankruptcy prediction modeling. Intelligent Systems in Accounting, Finance and Management., Volume 6, pp. 195-214. Büker, S., Asikoglu, R., Sevil, G., 1997. Finansal Yönetim. 2nd ed. Eskişehir: Anadulu Üniversitesi. C. Kao, S.-T. Liu, 2004. Prediction bank performance with financial forecasts: A case of Taiwan commercial banks. Journal of Banking & Finance, Volume 28, p. 2353–2368. Castagna, A. a. Z. M., 1981. The prediction of corporate failure: Testing the. Australian Journal of Management, 1(6), pp. 23-50. Chen, M.-Y., 2012. Visualization and dynamic evaluation model of corporate financial structure with self-organizing map and support vector regression.. Applied Soft Computing, 12(8), p. 2274–2288. 75

Chen, Y.-S., 2012. Classifying credit ratings for Asian banks using integrating feature selection and the CPDA-based rough sets approach.. Knowledge-Based Systems., Volume 26, pp. 259-270. Chih-Fong Tsai , Jhen- Wei Wu, 2008. Using neural network ensembles for bankruptcy prediction and credit scoring.. Expert systems with applications, Volume 34, pp. 2639-2649. Chih-Fong Tsai, Yu-Feng Hsu, David C. Yen, 2014. A comparative study of classifier ensembles for bankruptcy prediction. Applied Soft Computing, Volume 24, pp. 977-984. Chih-Fong Tsai, Yu-Feng Hsu, David C. Yen, 2014. A comparative study of classifier ensembles for bankruptcy prediction.. Applied Soft Computing, Volume 24, pp. 977-984. Chih-Hung Wua, Gwo-Hshiung Tzeng, Yeong-Jia Good, Wen-Chang Fang, 2007. A real-valued genetic algorithm to optimize the parameters of support vector machine for predicting bankruptcy.. Expert Systems with Applications., 32(2), pp. 397-408. Ching-Chiang Yeh, Der-Jang Chi and Ming-Fu Hsu, 2010. A hybrid approach of DEA, rough set and support vector machines for business failure prediction.. Expert Systems with Applications., 37(2), pp. 1535-1541. Chuang, C.-L., 2013. Application of hybrid case-based reasoning for enhanced performance in bankruptcy prediction.. Information Sciences., Volume 236, pp. 174-185. Chudson, W., 1945. The Pattern of Corporate Financial Structure.. New York: National Bureau of Economic Research.. Chulwoo Jeong, Jae H. Min, Myung Suk Kim , 2012. A tuning method for the architecture of neural network models incorporating GAM and GA as applied to bankruptcy prediction.. Expert Systems with Applications., Volume 39, pp. 3650-3658. Chulwoo Jeong, Jae H.Min , Myung Suk Kim, 2012. A tuning method for the architecture of neural network models incorporating GAM and GA as applied to bankruptcy prediction. Expert Systems with Applications , 39(3), p. 3650–3658. Curram, S. P., & Mingers, J., 1994. Neural networks, decision trees induction and discriminant analysis: An empirical comparison.. Journal of the operational research society., 4(45), pp. 440-450. David J. Denis, Diane K. Denis, 1995. Causes of financial distress following leveraged recapitalizations. Journal of Financial Economics, 37(2), pp. 129-157. David L. Olson, Dursun Delen, Yanyan Meng, 2012. Comparative analysis of data mining methods for bankruptcy prediction. Decision Support Systems, 52(2), pp. 464-473. Deakin, E. E., 1972. A Discriminant Analysis of Predictors of Business Failure. Journal of Accounting Reasearch, 1(10), pp. 167-179. Demir, H., 1997. . İşletmelerde Başarısızlığın Nedenleri ve Çıkış Yolları, Dış Ticaret Dergisi, 6.. 6 ed. s.l.:s.n.

76

Dhiren Ghosh and Andrew Vogt, 2012. Outliers: An Evaluation of Methodologies. Section on survey Research Methods., pp. 3455-3460. E. Frank, Y. Wang, S. Inglis, G. Holmes, I.H. Witten, 1998. Using model trees for classification. Machine Learning, 32(1), pp. 63-76. E. Turban, J.E. Aronson, 2001. Decision Support Systems and Intelligent Systems.. 6th ed. Upper Saddle River, NJ: Prentice Hall. E.I.Altman,E. Hotchkiss, 2005. Corporate Financial Distress and Bankruptcy : predict and avoid bankruptcy. 3rd ed. New Jersey.: John Wiley & Sons . Edmister, R., 1972. An Empirical test of financial ratio analysis for small business failurer prediction. Journal of financial and quantitative analysis, 2(7), pp. 1477-1493. Eibe Frank, Mark Hall, Bernhard Pfahringer, 2003. Locally Weighted Naive Bayes. In: 19th Conference in Uncertainty in Artificial Intelligence, 249-256,. New York, s.n. Eisenbeis, R., 1977. Pitfalls in the application of discriminant analysis in business and economics.. The journal of Finance, Issue 32, pp. 875-900. Elam, R., 1975. The Efforts of lease data on the predictive ability of financial ratios.. The accounting Review., pp. 25-43. Erkki K. Laitinen, Teija Laitinen, 2000. Bankruptcy prediction Application of the Taylor's expansion in logistic regression.. International Review of Financial Analysis, 9(4), pp. 327-349. F.E.H. Tay, L. Cao, 2001. Modified support vector machines in financial time series forecasting. Omega, 29(4), pp. 309-317. F.E.H. Tay, L. Cao, 2002. Modified support vector machines in financial time series forecasting. Neurocomputing, 48(1-4), pp. 847-861. Fang-Mei Tseng , Yi-Chung Hub, 2010. Comparing four bankruptcy prediction models: Logit, quadratic interval logit,neural and fuzzy neural networks.. Expert Systems with Applications., Volume 37, pp. 1846-1853. Fang-Mei Tseng, L. Lin, 2005. A quadratic interval logit model for forecasting bankruptcy.. Omega The international Journal of Management Science., Volume 33, pp. 85-91. Fitzpatrick, P., 1932. A comparison of ratios of successful industrial enterprises with whose of failed companies. s.l.:s.n. Foreman, R. D., 2003. A logistic analysis of bankruptcy within the US local. Journal of Economics and Business, Volume 55, p. 135–166. Francis E.H. Tay and Lixiang Shen, 2002. Economic and financial prediction using rough sets model.. European Journal of Operational Research., 141(3), pp. 641-659.

77

Frank, J. ,. &. T. W., 1994. A comparison of Financial restructuring is distress exchanges and chapter 11 reorgnization.. Journal of Financial Economics., Volume 27, pp. 315-353. G. Zhang, M. Hu, and B. Patuwo et al., 1999. Artificial neural networks in bankruptcy prediction: General framework and cross-validation analysis.. European Journal operational research., Volume 116, pp. 16-32. Gaughan, P., 2011. Merger, Acquisitions and Corporate Restructuring. 3rd ed. New York: Jhon Wiley. Geoffrey I. Webb, 2000. MultiBoosting: A Technique for Combining Boosting and Wagging. Machine Learning., 40(2), pp. 1-50. Gilson, S. C. and Vetsuypens, M. R, 1994. CEO Compensation in Financial Distressed Firms: An Empirical Analysis.. The Journal of Finance., 48(2), pp. 425-458. Gleb Lanine , Rudi Vander Vennet, 2006. Failure prediction in the Russian bank sector with logit and trait recognition models.. Expert Systems with Applications., Volume 30, pp. 463-478. Gordini, N., 2014. A genetic algorithm approach for SMEs bankruptcy prediction: Empirical evidence from Italy. Expert Systems with Applications, 41(14), p. 6433–6445. Grablowsky, B.J. and Talley, W.K., 1981. "Probit and discriminant factors for classifying credit applicants: A comparison.. Journal of Economics and Business, Volume 33, pp. 254-261. Grammatikos, T., and Gloubos, G., 1984. Predicting bankruptcy of industrial firms in Greece.Spoudai,. The University of Piraeus Journal of Economics and Business Statistics and operations Research, pp. 3-4, 421-443. Guoqiang Zhang, Michael Y. Hu, Eddy Patuwo, Daniel C. Indro, 1999. Artificial neural networks in bankruptcy prediction: General framework and cross-validation analysis. European Journal of Operational Research, 116(1), pp. 16-32. H. Frydman, E.I. Altman, D. Kao, 1985. Introducing recursive partitioning for financial classification: The case of financial distress.. Journal of Finance, 1(40), p. 269–291. H., I., 1984. Corporate distress in Australia. Journal of Banking and finance., Issue 8, pp. 303-320. H.Tisshaw, R. T. a., 1977. Going, Going, Gone - Four Factors Which predict.. Accountancy., p. 50. Han, C.-S. P. a. I., 2002. A case-based reasoning with the feature weights derived by analytic hierarchy process for bankruptcy prediction.. Expert Systems with Applications., 23(3), pp. 255-264. Hansen, M., Madow, W., and Tepping, B., 1983. An Evaluation of Model-Dependent and Probability Sampling Inferences in Sample Surveys.. J. Amer. Stat. Assoc., Volume 78, pp. 776-793. Hanweck, G., 1977. Predicting bank failures. Research Papers in Banking and Financial Economics, Financial Studies Section, Board of Governors of the Federal Reserve System. Washington D.C: s.n. Hashi, I., 1997. The Economics of Bankruptcy, Reorganization and Liquidation. Lessons for east European Transition Economics.. Russian And East European Finance and Trade, 33(4), pp. 6-34. 78

Hernan Pedro Vigier and Antonio Terceno, 2008. A model for the prediction of disease of firms by means of fuzzy relations.. Fuzzy sets and systems., 159(17), pp. 2299-2316. Holte, R., 1993. Very simple classification Rules Perform well on most commonly used datasets.. Machine Learning, Volume 11, pp. 63-91. Hsueh-Ju Chen, Shaio Yan Huang and Chin-Shien Kin, 2009. Alternative Diagnosis of corporate bankruptcy: A neuro fuzzy approach.. Expert Systems with appications., 36(4), pp. 7710-7720. Hui Li , Young-Chan Lee , Yan-Chun Zhou , Jie Sun , 2011. The random subspace binary logit (RSBL) model for bankruptcy prediction.. Knowledge-Based Systems, Volume 24, pp. 1380-1388. Hui-Ling Chen, Bo Yang, Gang Wang, Jie Liu, Xin Xu, Su-Jing Wang, Da-You Liu, 2011. A novel bankruptcy prediction model based on an adaptive fuzzy k-nearest neighbor method. KnowledgeBased Systems., 24(8), pp. 1348-1359. Hui-Ling Chen, Bo Yang, Gang Wang, Jie Liu, Xin Xu, Su-Jing Wang, Da-You Liu, 2011. A novel bankruptcy prediction model based on an adaptive fuzzy k-nearest neighbor method. KnowledgeBased Systems, 24(8), pp. 1348-1359. Hyunchul Ahn and Kyoung-jae Kim, 2009. Bankruptcy prediction modeling with hybrid case-based reasoning and genetic algorithms approach.. Applied Soft Computing, 9(2), pp. 599-607. I.M. Premachandra , Gurmeet Singh Bhabra , Toshiyuki Sueyoshi, 2009. DEA as a tool for bankruptcy assessment: A comparative study with logistic regression technique.. European Journal of Operational Research., Volume 193, pp. 412-424. Ibe, O. C., 2014. Introduction to Descriptive Statistics.. 2nd ed. Elsevier Inc.: Academic Press . Ivica Pervan, Maja Pervan, Bruno Vukoja, 2011. PREDICTION OF COMPANY BANKRUPTCY USING STATISTICAL. Croatian Operational Research Review, Volume 2, pp. 158-167. J. C. NEVES, A. VIEIRA, 2006. Improving Bankruptcy Prediction with Hidden Layer Learning Vector Quantization.. European Accounting Review, 15(2), pp. 253-271. J. Levy, E. Mallach, P. Duchessi, 1991. A fuzzy logic evaluation system for commercial loan analysis. Omega, International Journal of Management Science, 19(6), pp. 651-669. J. Peltonen,S. Kaski, J. Sinkkonen,, 2001. Bankruptcy analysis with self-organizing maps in learning metrics. IEEE Transactions on Neural Networks, 12(4). J.P. Ignizio, J.R. Soltyas, 1996. Simultaneous design and training of ontogenic neural network classifier. Computers Operations Research, 23(6), p. 535–546. Jackendoff, N., 1962. A study of Published Industry Finanacial and Operating Ratios.. Philadelphia: Temple University, Bureau of Economic and Business Research. Jae H. Min, Young-Chan Lee, 2005. Bankruptcy prediction using support vector machine with optimal choice of kernel function parameters. Expert sysetms with applications., 28(4), pp. 603-614.

79

Jardin, P. d., 2014. Bankruptcy prediction using terminal failure processes. European Journal of Operational Research. Jie Sun and Hui Li, 2009. Financial distress early warning based on group decision making. Compters and Operational Research., Volume 36, pp. 885-906. Jodi Bellovary, Don Giacomino, Michael Akers, 2007 . A Review of Bankruptcy Prediction Studies: 1930 to Present. Journal of Financial Education, Volume 33, pp. 1-42. Johan Huysmansa,Bart Baesens,Jan Vanthienen, Tony van Gestel , 2006. Failure prediction with self organizing maps. Expert Systems with Applications, 30(3), p. 479–487. John G. Cleary, Leonard E. Trigg, 1995. K*: An Instance-based Learner Using an Entropic Distance Measure. In: 12th International Conference on Machine Learning. 108-114. s.l., s.n. Junyoung Heo and Jin Yong Yang , 2014. AdaBoost based bankruptcy forecasting of Korean construction companies.. Applied Soft Computing, Volume 24, pp. 494-499. K.C. Lee, I. Han, Y. Kwon, 1996. Hybrid neural network models for bankruptcy predictions. Decision Support Systems, Volume 18, pp. 63-72. K.F. Lam, J.W. Moy, 2002. Combining discriminant methods in solving classification problems in twogroup discriminant analysis. European Journal of Operational Research, Volume 138, pp. 294-301. K.Kim, 2004. Financial time series forecasting using support vector machines. Neurocomputing , Volume 55, pp. 307-319. K.S Shin T.S Lee H.J Kim, 2005. An application of support vector machines in bankruptcy prediction model. Expert Systems with Applications, Volume 28, pp. 127-135. Kalay, A., Singhal, R., Tashjian, E, 2007. Is Chapter 11 costly?. Journal of Financial Economics., Volume 84, pp. 772-796. Kaplan, S., 1994. Campeau's Acquisition of federated post-bankruptcy results.. Journal of Financial Economicss., Volume 35, pp. 123-136. Karels, G. V. and Prakash, A. P., 1987. Multivariate Normality and Forecasting of Business Bankruptcy.. Journal of Business Finance and Accounting. , 14(4), pp. 573-593. Kass, G., 1980. An Exploratory Technique For Investigating Large Quantities of Categorical data.. Applied Statistics., 29(2), pp. 119-127. Kathleen McMillan and Jonathan Dundee, 2011. How to Write Dissertation and Project Reports.. 2nd ed. Dundee: Pearson. Keasey, K. and R. Watson. , 1986. The prediction of small company failure: Some behavioral evidence for the UK.. Accounting and Business Research, Issue 17, pp. 49-57. . Keskin, Y., 2002. İşletmelerde Finansal Başarısızlığın Tahmini, Çok Boyutlu Model Önerisi ve Uygulaması, Doktora Tezi, Hacettepe Üniversitesi.. s.l.:s.n. 80

Ketz, J. E., 1978. The effect of general price-level adjustments on the predictive ability of. Journal of Accounting Research, Supplement(16), pp. 273-284. Kiviluoto, K., 1998. Predicting bankruptcies with the self-organizing map. Neurocomputing, 21(1-3), p. 191–201. Kolodner, J., 1991. Improving human decision making through case-based decision aiding.. AI Magazine, 12(2), pp. 52-68. Korol, T., 2014. A fuzzy logic model for forecasting exchange rates.. Knowledge-Based Systems, Volume 67, pp. 49-60. Kyung-Shik Shin, Taik Soo Lee, Hyun-jung Kim, 2005. An application of support vector machines in bankruptcy prediction model.. Expert Systems with Applications., 28(1), pp. 127-135. Laitinen, E., 1991. Financial ratios and different failure processes.. Journal of Business Finance & Accounting, 5(18), pp. 649-673. Lennox, C., 1999. The accuracy and incremental information content of audit reports in predicting bankruptcy.. Journal of Business Finance & Accounting., 26(5/6), pp. 757-778. Liang, B. J. a. T., 1995. Fuzzy indexing and retrieval in case-based system.. Expert Systems with Applications., 8(1), pp. 135-142. Lili Sun, Prakash P. Shenoy, 2007. Using Bayesian networks for bankruptcy prediction: Some methodological issues. European Journal of Operational Research, 180(2), pp. 738-753. Lin, F.Y. and McClean, S, 2000. The prediction of Financial Distress Using Structured Financial Data From the Interne.. IJCSS International Journal of Computers Science and Signal, 1(1), pp. 43-57. Lin, T.-H., 2009. A crossmodelstudyofcorporatefinancialdistresspredictioninTaiwan:Multiplediscriminantanalysis,logit,p robitandneuralnetworksmodels.. Neurocomputing, Volume 72, pp. 3507-3516. Loh, W.-Y. and Shih, Y.-S, 1997. Split Selection Method For Classification Trees.. Statistica Sinica, Volume 7, pp. 815-840. Loh, W.-Y., 2011. Classification and Regression Trees.. 1st ed. NY: Willey & Sons Inc. . Lugovskaja, L., 2009. Predicting default of Russian SMEs on the basis of financial and non-financial variables",. Journal of Financial Services Marketing,, 14(4), pp. 301-313. M. Adnan Aziz Humayon A. Dar, 2006. "Predicting corporate bankruptcy: where we stand?. The international Journal of business in society., 6(1), pp. 18-33. M. odom and R. Sharda, 1990. A neural network model for bankruptcy prediction. in Proc. Int. Joint Conf. Neural Networks. San Diego, CA, s.n. Makridakis, S., 2001. Insider Trading Behavior Prior to Chapter 11 Bankruptcy Announcements.. Journal of Business Research, 54(1), pp. 63-70. 81

Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, Ian H. Witten, 2009. The WEKA Data Mining Software: An Update;. SIGKDD Explorations, 11(1), pp. 1-50. Martin, D., 1977. Early warning of bank failures: A logit regression approach.. Journal of Banking and Finance., Volume 1, pp. 249-276. McKee, T., 2000. Developing a bankruptcy prediction model via rough sets theory. International Journal of Intelligent Systems in Accounting, Finance and Management, Volume 9, pp. 59-173. Mensah, Y. M., 1983. The differential Bankruptcy predictive ability of specific price level adjustments:Some Empirical Evidence. The Accounting Review, LVIII(2), pp. 228-246. Merwin, G., 1942. Financial Small corporations in five manufacturing industries, 1926-1936. New York: National Bureau of Economic Research. Meyer, P. and H. Pifer., 1970. Prediction of bank failures.. JOUHli1l of Finance, 4(25), pp. 853-868. Morris, E. H. a. R., 1983. The significance of Base year in developing Failure prediction models.. Journals of Business Finance and Accounting., pp. 209-223. Myong-Jong Kim and Dae-Ki Kang, 2010. Ensemble with neural networksn for bankruptcy prediction.. Expert systems with applications, Volume 37, pp. 3373-3379. Myoung-Jong Kim, Dae-Ki Kang, 2009. Ensemble with neural networks for bankruptcy prediction. Expert Systems with Applications, 37(4), pp. 3373-3379. Ning Chena, Bernardete Ribeiro, Armando Vieira, An Chena, 2013. Clustering and visualization of bankruptcy trajectory using self-organizing map.. Expert Systems with Applications., 40(1), pp. 385393. Ohloson, J. A., 1980. Financial Ratios and the probabilistic pridiction of Bankruptcy. Journal of Accounting Research , 18(1), pp. 109-131. O'Leary, D., 1992. On bankruptcy information systems. European Journal of Operational Research, 56(1), pp. 67-69. Opler, T. C. and Titman, S., 1994. Financial Distress and Corporate Performance. The journal of Finance, 18(1), pp. 109-131. P. Melville, R. J. Mooney, 2003. Constructing Diverse Classifier Ensembles Using Artificial Training Examples. In: Eighteenth International Joint Conference on Artificial Intelligence, 505-510. New York, s.n. P. Ravi Kumar , V. Ravi, 2007. Bankruptcy Prediction in banks and firms via statistical and intelligent techniques - A review. European Journal of Operational Research, Volume I, pp. 1-28. Paliwal, M., and Kumar, U., 2009. Neural networks and statistical techniques: A review of applications.. Expert Systems with Applications, 36(1), pp. 2-17.

82

Pawlak, Z., 1982. Rough Sets. International journal of Computer and Information Science, Volume 11, pp. 341-356. Perold, F., 1999. Long term Capital Management Case Study Harvard Business School. s.l.:s.n. Pindodo,J. and Rodriques, L.F., 2004. Parsimonious Models of Financial Insolvency in Small Companies. Small Business Economics, pp. 51-66. Pirooz Shamsinejad, Mohammad Saraee and Farid Sheikholeslam, 2010. A New Path Planner for Autonomous Mobile Robots Based on Genetic Algorithm”, the 3rd IEEE International Conference on Computer Science and Information Technology (ICCSIT 2010). Chengdu, China, IEEE, pp. 115-120. Pompe P., Feedlers A., 1997. Using Machine Learning, Neural Networks and statistics to predict Corporate Bankruptcy. s.l., s.n., pp. 267-276. Pulvino, T., 1999. Effects of bankruptcy court protection on asset sales.. Journal of financial Economics., Volume 52, pp. 151-186. R. Slowinski,S. Greco, B. Matarazzo, 2001. Rough sets theory for multicriteria decision analysis.. European Journal of Operational Research., 129(1), pp. 1-47. R. Susmaga,C. Zopounidis,A.I. Dimitras, R. Slowinski, 1999. Business failure prediction using rough sets. European Journal of Operational Research, Volume 114, pp. 263-280. R.Slowinski and J. Stefanowski., 1994. RoughDas: Rough set based data analysis system, Version 2.0, User's Guide Book. Pozan, Poland.. s.l.:s.n. Rajeev Singhal, Yun (Ellen) Zhu, 2013. Bankruptcy risk, costs and corporate diversiﬁcation. Journal of Banking & Finance, Volume 37, pp. 1475-1489. Rubin, D. B., 2002. Statistical Analysis With Missing Data. 2nd ed. New York: Wiley. S. Greco, B. Matarazzo, R. Slowinski, 1998. A new rough set approach to evaluation of bankruptcy risk.. C. Zopounidis (Ed.), Operational Tools in the Management of Financial Risks, Kluwer Academic Publishers, Dordrecht, pp. 121-136. S. Greco, B. Matarazzo, R. Slowinski, 1998. A new rough set approach to multicriteria and multiattribute classification.. Rough Sets and Current Trends in Computing, pp. 60-67. S. Jones, D.A. Hensher, 2004. Predicting firm financial distress: A mixed logit model. Accounting Review, 4(79), p. 1011–1038. S.Balcaen, H. Ooghe , 2006. 35 years of studies on business failure: an overview of the classic statistical methodologies and their related problems. The British Accountin Review, Issue 38, pp. 6393. Sangjae Lee and Wu Sung Choi , 2013. A multi-industry bankruptcy prediction model using backpropagation neural network and multivariate discriminant analysis.. Expert Systems with Applications., Volume 40, p. 2941–2946.

83

Sangjae Lee and Wu Sung Choi, 2013. A multi-industry bankruptcy prediction model using backpropagation neural network and multivariate discriminant analysis.. Expert Systems with Applications, 40(8), pp. 2941-2946. Sankaran Mahadevan, , Ramesh Rebba , 2005. Validation of reliability computational models using Bayes networks. Reliability Engineering & System Safety., 87(2), pp. 223-232. SAS Institute Inc., 2012. Applied Analytics Using SAS® Enterprise Miner. Cary: NC: SAS Institute Inc. SAS Institute Inc., 2013. Getting Started with SAS® Enterprise Miner. Cary: NC: SAS Institute Inc.. SAS Institute Inc., 2013. SAS Enterprise Miner 13.2 Reference Help.. 1st ed. Carry: SAS Institute Inc.. Schwenker, Friedhelm; Kestler, Hans A.; Palm, Günthe, 2001. Three Learning Phases For Radial Basis Function Network.. Neural Network, Volume 14, pp. 439-458. Shapiro, A. F., 2002. The merging of neural networks, fuzzy logic, and genetic algorithms. Insurance: Mathematics and Economics., 31(1), pp. 115-131. Sinkey, J., 1975. A multivariate statistical analysis of the characteristics of problem. Journal of Finance, 1(30), pp. 21-36. Skogsvik, K., 1990. Current cost accounting ratios as predictors of business failure: The Swedish case.. Journal of Business Finance and Accounting., 17(1), pp. 137-160. Smith, R. and A. Winakor, 1935. Change in Financial Structure of Unsuccessful Industrial Corporations.. Urbana: University of Illinois Press.. Sunday Olusanya Olatunji, Ali Selamat, Abdul Azeez, Abdul Raheem, 2011. Predicting correlations properties of crude oil systems using type-2 fuzzy logic systems.. Expert Systems with Applications., 38(9), pp. 10911-10922. Sungbin Cho, Hyojung Hong and Byoung-Chun Ha, 2010. A hybrid approach based on the combination of variable selection using decision trees and case-based reasoning using the Mahalanobis distance: For bankruptcy prediction.. Expert Systems with Applications., 37(4), p. 3482– 3488. Sung-Hwan Min, Jumin Lee and Ingoo Han, 2006. Hybrid genetic algorithms and support vector machines for bankruptcy prediction. Expert Systems with Applications, 31(3), pp. 652-660. T.-P. Liang, B. Jeng, Y.-M. Jeng, 1997. FILM: A fuzzy learning method for automated knowledge acquisition. Decision Support Systems, Volume 21, p. 61–73. Takahashi, K., Y. Kurokawa and K: Watase. , 1984. Corporate bankruptcy prediction in Japan.. Journal of Banking and Finance , 2(8), pp. 229-247. Tezcan, N., 2002. Firmalarda Mali Başarisizliğin Tahmini. Yüksek Lisans Tezi, Yıldız. s.l.:Teknik Üniversitesi, Sosyal Bilimler Enstitüsü.

84

Theodossiou, P., 1991. Alternative models for assessing the financial condition of business in Greece. Journal of Business Finance and Accounting., 5(18), pp. 697-720.. Theodossiou, P., 1991. Alternative models for assessing the financial condition of business in Greece.. Journal of Business Finance & Accounting., 18(5), pp. 697-720. Thomas E. McKee and Terje Lensberg, 2002. Genetic programming and rough sets: A hybrid approach to bankruptcy classification. European Journal of Operational Research., 138(2), p. 436– 451. Thorburn, K. S., 2000. Bankruptcy auctions: costs, debt recovery and firm survival.. Journal of Financial Economics., Volume 58, pp. 337-368. Toshiyuki Sueyoshia, Mika Goto, 2009. Methodological comparison between DEA (data envelopment analysis) and DEA–DA (discriminant analysis) from the perspective of bankruptcy assessment.. European Journal of Operational Research, 199(2), p. 561–575. Tseng-Chung Tang and Li-Chiu Chi, 2005. Predicting multilateral trade credit risks: comparisons of Logit and Fuzzy Logic models using ROC curve analysis.. Expert Systems with Applications., 28(3), pp. 547-556. Turko, R., 1999. Finansal Yönetim. Istanbul: Alfa Yayin. V. Popova and J.C. Bioch, 2001. Bankruptcy prediction with rough sets, ERIM Report Series Research in Management (ERS-2001-11-LIS). s.l.:s.n. Vapnik, V., 1998. in: S. Haykin (Ed.) Statistical Learning Theory. Adaptive and Learning systems, Volume 736. Varun, B., 2009. Prediction of Business failure: a Comparison of Discriminat And logistic Regression Analyses. Istanbul University Journal of the School of Business Administration, 38(1), pp. 21-36. Vranas, A., 1992. The significance of financial characteristics in predicting business failure: An analysis in the Greek context.. Foundations of Computing and Decision Sciences., 4(17), pp. 257-275. W.J. Banks, L.A. Prakash, 1994. On the performance of linear programming heuristics applied on a quadratic transformation in the classification problem.. European Journal of Operational Research., 74(23), pp. 23-28. West, R., 1985. A factor analytic approach to bank condition.. Journal of Banking and Finance, Volume 9, pp. 253-266. Wheelen, T. L. and Hunger, J. D, 2000. Strategic Management: Business Policy.. 7th ed. New Jersey: Prentice Hall. Whitaker, R. B., 1999. The Early Stages of Financial Distress.. Journal of Economics and Finance, 23(2), pp. 123-133. Wruck, K. H., 1990. Financial distress, reorganization, and organizational efficiency.. Journal of Financial Economics , Volume 27, pp. 419-444. 85

Yoav Freund, Robert E. Schapire, 1996. Experiments with a new boosting algorithm. In: Thirteenth International Conference on Machine Learning,148-156. San Francisco, s.n. Z. Pawlak, J. Grzymala-Busse, R. Slowinski, W. Ziarko, 1995. Rough sets. Communications of the ACM Association for Computing Machinery, 38(11), pp. 89-97. Z.Pawlak, 1984. Rough classification. International Journal of Man–Machine Studies, Volume 20, p. 469–483. Zhi Xiao, Xianglei Yang, Ying Pang, Xin Dang, 2012. The prediction for listed companies’ financial distress by using multiple prediction methods with rough set and Dempster–Shafer evidence theory.. Knowledge-Based Systems., Volume 26, pp. 196-206. Zhong Gao, Meng Cui and Lai-Man Po, 2008. Enterprise Bankruptcy Prediction Using Noisy-Tolerant Support Vector Machine. Leicestershire, Inernational Seminar on Future Information Technology and management Engineering. Zijiang Yang, Wenjie You, Guoli Ji, 2011. Using partial least squares and support vector machines for bankruptcy prediction.. Expert Systems with Applications., 38(7), pp. 8386-8342. Zmijewski, M., 1984. Methodological issues related to the estimation of financial distress prediction models. Journal of Accounting Research, Volume 22, pp. 59-82.

86

Appendix-A: Table 4.2 Containing 5th and 95th percentile for the data one year before bankruptcy

Ratio X1T1 X2T1 X3T1 X4T1 X5 T1 X6 T1 X7 T1 X8 T1 X9 T1 X10T1 X11T1 X12T1 X13T1 X14T1 X15T1 X16T1 X17T1 X18T1 X19T1 X20T1 X21T1 X22T1 X23T1 X24T1 X25T1 X26T1 X27T1 X28T1 X29T1 X30T1 X31T1 X32T1 X33T1 X34T1 X35T1 X36T1 X37T1 X38T1 X39T1 X40T1 X41T1

5th Percentile -1.8087 0.03740 -0.03.100 -11.92 -1.4155 .02918 -.918965 .00 0.037400 -.8959 0.03740 .0025 -.918965 -1.1998 -.9189 .04000 -.920121 0.037400 0.03739 -70.6875 0.03740 -1.00 -4.00 -5.00 .000 .03739 .037739 -50.0074 .005612 -3.6791 -.06169 .03739 .03195 -3.8257 -39779.2582 -25.6936 .037395 .037395 -.029436 .037395 -.031650

95th Percentile 0.2034 7.2770 .498100 0.533 .210230 2.9812 .188265 2.00 0.9623 .2414 2.220 .6847 .1940 .2599 .1882 1.43100 0.18837 1.4271 5.6254 39.1222 55.2085 0.00 .00 00 .788467 1.9779 .9996 4.201 5.11744 1.4446 .5469 4.1172 2.9812 5.0731 130226.9675 52.7188 5.6457 105.4681 .534420 89.23444 .497679

87

Table 4.3 Containing 5th and 95th percentile for the data 2 year before bankruptcy.


5th Percentile

95th Percentile

88

-1.115026

.159229

.037400

8.751000

-.027860

.501547

-8.560863

.519579

-.989231

.193456

.020722

2.735512

-.955825

.201191

.037395

1.703400

.037395

.953118

-.498550

.233775

.037395

1.689679

.001994

.729849

-.955825

.202553

-.827642

.206167

-.955825

.201191

.037395

1.047462

-.955825

.201191

.037395

1.047462

.037395

6.960619

-80.712785

30.762810

.037400

79.994575

-.989223

.192470

-4.007975

.435580

-5.241754

.331631

0.000000

.783154

.037395

2.424602

.037395

1.184873

-44.313079

5.601510

.004474

7.137088

-3.781695

1.531244

-.052498

.628665

.037395

5.029267

.024548

2.735016

-4.211561

3.792801

-48565.080866

122014.484723

-41.690082

47.516025

.037395

6.977095

.037395

103.189517

-.028087

.539383

.037395

104.394512

-.027860

.539383



5th Percentile

95th Percentile

.031965

-.972076

.141759

.037400

8.890000

-.016905

.468870

-6.905555

.511092

-.920121

.188337

.026572

2.737783

-.774949

.223634

.037395

1.541811

.037395

.941941

-.375261

.212126

.037395

1.534541

.001649

.710965

-.774949

.225917

-.818240

.213688

-.774949

.223634

.037395

.952067

-.774949

.223634

.037395

.952067

.037395

8.166963

-52.219235

29.680575

.037395

82.063272

-.892493

.201859

-4.720051

.354988

-7.170293

.232486

0.000000

.713828

.037395

2.390569

.037395

1.211031

-29.093187

6.118316

.007012

6.618499

-3.698125

1.328322

-.029597

.587621

.037395

4.839647

.031965

2.721380

-3.305242

3.313479

-26.372779

54.999548

.037395

8.169533

.037395

89.432601

-.016905

.474131

.037395

105.529853

-.016368

.484197 4.839647

89



5th Percentile

95th Percentile

.037395

-1.219065

.136217

.037400

8.865000

-.013334

.460847

-6.094200

.507600

-.955825

.201191

.037395

2.700640

.028180

2.981235

.037395

1.367371

.037395

.936470

-.416678

.223132

.037395

1.321276

.002393

.746375

.028180

2.981235

-.994880

.195949

.028180

2.981235

.036170

.887481

.028180

2.981235

.036170

.887481

.037395

6.167819

-68.358510

49.605810

.037395

83.020426

-.940580

.208534

-5.099491

.384648

-5.571105

.281673

0.000000

.701105

.037395

2.577224

.037395

1.024419

-20.857862

5.726517

.008468

6.916481

-3.820015

1.493944

-.027031

.624435

.037395

4.408267

.037395

2.683662

-3.345348

4.089810

-22860.596748

92513.787496

-25.852989

43.391217

.037395

6.171712

.037395

105.721041

-.018068

.479826

.037395

109.861564 .496959

90



5th Percentile

95th Percentile

91

-1.065177

.184879

.037400

8.210000

-.016852

.484403

-4.427293

.536836

-.774949

.223634

.037198

2.824415

.020722

2.735512

.037395

1.316398

.037395

.939327

-.322255

.233877

.037395

1.275609

.003160

.668179

.020722

2.735512

-.846037

.223866

.020722

2.735512

.037395

.907009

.020722

2.735512

.037395

.895978

.037395

8.613158

-85.765685

70.377725

.037395

70.371335

-.834801

.225834

-6.172182

.467050

-7.063776

.339079

0.000000

.665245

.037395

2.500346

.037395

.999834

-22.505820

7.307410

.009204

6.400484

-3.076038

1.431133

-.016499

.621382

.037395

3.948719

.036859

2.740754

-3.111253

2.892728

-20652.587857

97398.145554

-24.799585

45.484270

.037395

8.202192

.037395

107.601496

-.017567

.495606

.037395

92.653018

-.018878

.493702

Table 4.7 Univariate Statistics for data sample one year before bankruptcy N

Mean

Std. Deviation

No. of Extremesa

Missing Count

Percent

Low

High

X1T1

928

-.436121

4.8265051

0

.0

108

21

X2T1

923

2.267315

5.0382807

5

.5

0

71

X3T1

928

.142327

.3068697

0

.0

8

32

X4T1

928

-19.474041

450.1726461

0

.0

129

2

X5T1

928

-.217274

2.1586339

0

.0

101

13

X6T1

928

1.300405

6.9325752

0

.0

0

43

X7T1

927

-.102521

.9911883

1

.1

112

23

X8T1

928

1.64

14.798

0

.0

0

42

X9T1

928

.491088

.3110961

0

.0

0

2

X10T1

928

-.144232

.9620095

0

.0

143

62

X11T1

928

1.620557

14.7969397

0

.0

1

68

X12T1

928

.159110

.2187743

0

.0

0

101

X13T1

927

-.098013

1.0012761

1

.1

112

24

X14T1

928

-.285414

4.1479253

0

.0

126

26

X15T1

927

-.102521

.9911883

1

.1

112

23

X16T1

928

1.032683

9.3955239

0

.0

1

62

X17T1

927

-.102524

.9911746

1

.1

112

23

x18T1

928

1.028998

9.3953480

0

.0

2

65

x19T1

928

8.819350

116.9838048

0

.0

2

103

X20T1

928

20.108646

3566.1037146

0

.0

127

99

X21T1

928

30.772723

259.7486124

0

.0

0

118

X22T1

928

-.25

2.554

0

.0

.

.

X23T1

928

-2.46

28.439

0

.0

.

.

X24T1

928

-1.48

17.134

0

.0

.

.

X25T1

928

.463984

6.9189794

0

.0

0

58

X26T1

928

.404242

.7750083

0

.0

0

141

X27T1

928

.291357

.5574048

0

.0

0

36

X28T1

928

-15.215299

162.8420164

0

.0

145

38

X29T1

928

1.369037

5.9758064

0

.0

1

117

X30T1

928

.890412

30.0707478

0

.0

113

53

X31T1

928

.499001

8.2881821

0

.0

19

51

X32T1

928

7.685521

132.2526113

0

.0

0

67

X33T1

928

1.276195

6.9909388

0

.0

1

44

X34T1

928

.924524

29.4145612

0

.0

96

115

X35T1

928

8351.470574

0

.0

84

140

X36T1

928

7.815929

159.0286135

0

.0

76

107

X37T1

927

8.847431

117.0451446

1

.1

0

104

X38T1

927

39.087343

325.2562168

1

.1

0

136

547133.010729 2

92

x39T1

928

.164003

.3871066

0

.0

8

35

X40T1

928

26.543483

124.7393648

0

.0

0

127

X41T1

928

.138622

.3487470

0

.0

11

31

a. Number of cases outside the range (Q1 - 1.5*IQR, Q3 + 1.5*IQR). b. . indicates that the inter-quartile range (IQR) is zero.

Table 4.8 Univariate Statistics for data sample two year before bankruptcy:

N

Mean

Std. Deviation

No. of Extremesa

Missing Count

Percent

Low

High

X1T2

928

-.126149

10.6946103

0

.0

101

18

X2T2

925

2.755301

6.3077397

3

.3

0

93

X3T2

928

.145116

.2131140

0

.0

5

34

X4T2

928

-3.554374

26.2637603

0

.0

133

1

X5T2

927

-.210058

1.6877541

1

.1

106

16

X6T2

927

1.183268

5.8505294

1

.1

0

29

X7T2

927

-.116987

.7929120

1

.1

119

15

X8T2

928

1.113855

5.8408280

0

.0

0

55

X9T2

928

.497185

.2880474

0

.0

0

0

X10T2

928

-.327446

5.3577094

0

.0

158

81

X11T2

928

1.110447

5.8406352

0

.0

1

54

X12T2

928

.170353

.2279073

0

.0

0

99

X13T2

927

-.113741

.7996759

1

.1

119

16

X14T2

928

-.657221

10.6261883

0

.0

109

22

X15T2

927

-.116987

.7929120

1

.1

119

15

X16T2

928

.666211

4.7486737

0

.0

1

52

X17T2

927

-.116987

.7929120

1

.1

119

15

x18T2

928

.665877

4.7495542

0

.0

1

52

x19T2

928

5.003501

42.0326727

0

.0

2

110

X20T2

928

60.944146

3137.1724015

0

.0

142

89

X21T2

928

44.484094

309.1130122

0

.0

0

122

X22T2

928

-.183365

1.6769597

0

.0

108

20

X23T2

928

-2.683131

22.9349743

0

.0

154

59

X24T2

928

-2.667787

23.6768343

0

.0

154

37

X25T2

928

.289044

1.4165818

0

.0

0

51

X26T2

928

.609075

.9297194

0

.0

0

51

X27T2

928

.465085

1.3492928

0

.0

0

28

X28T2

928

-41.537831

685.2767919

0

.0

148

55

X29T2

928

1.556448

5.5777834

0

.0

1

125

X30T2

928

.756777

21.4480823

0

.0

113

52

X31T2

928

.367848

2.6778065

0

.0

14

56

X32T2

928

44.045620

1237.6237326

0

.0

0

63

93

X33T2

928

1.180883

5.8503652

0

.0

1

29

X34T2

928

20.466627

561.3885151

0

.0

93

95

X35T2

928

-6393.958732 546295.4197156

0

.0

70

138

X36T2

928

-2.303385

192.2746617

0

.0

87

101

X37T2

927

5.027756

42.0527685

1

.1

0

111

X38T2

927

27.958886

132.8239327

1

.1

0

127

x39T2

928

.178540

.4327857

0

.0

5

39

X40T2

928

32.422630

231.9920376

0

.0

0

125

X41T2

928

3.159643

83.4001079

0

.0

5

40

a. Number of cases outside the range (Q1 - 1.5*IQR, Q3 + 1.5*IQR).

Table 4.9 Univariate Statistics for data sample three year before bankruptcy

N

Mean

Std. Deviation

No. of Extremesa

Missing Count

Percent

Low

High

X1T3

928

-.152962

1.0385898

0

.0

104

11

X2T3

927

2.749813

5.7115676

1

.1

0

102

X3T3

928

.146560

.1873058

0

.0

4

20

X4T3

928

-1.803658

13.7484274

0

.0

135

3

X5T3

927

-.102524

.9911746

1

.1

112

23

X6T3

927

1.010380

1.1190518

1

.1

0

36

X7T3

927

-.126381

.9023768

1

.1

112

17

X8T3

928

.694191

1.0725480

0

.0

0

50

X9T3

928

.491663

.2826344

0

.0

0

0

X10T3

928

-.006830

.3131531

0

.0

157

96

X11T3

928

.689235

1.0703604

0

.0

0

48

X12T3

928

.168888

.2205183

0

.0

0

82

X13T3

927

-.123669

.9066376

1

.1

112

18

X14T3

928

-.092008

1.0123813

0

.0

113

22

X15T3

927

-.126381

.9023768

1

.1

112

17

X16T3

928

.362830

.8012899

0

.0

1

47

X17T3

927

-.126381

.9023768

1

.1

112

17

x18T3

928

.376288

.6172879

0

.0

2

47

x19T3

928

5.787770

58.1963690

0

.0

2

108

X20T3

928

-4.660024

482.4983818

0

.0

127

99

X21T3

928

23.629440

103.6703401

0

.0

0

122

X22T3

928

-.095571

.9982193

0

.0

110

25

X23T3

928

-4.637831

45.1477117

0

.0

155

46

X24T3

928

-5.496356

50.9316150

0

.0

157

22

X25T3

928

.209726

.3162577

0

.0

0

38

X26T3

928

.830881

6.3655513

0

.0

0

56

94

X27T3

928

.889831

8.3839439

0

.0

0

36

X28T3

928

-8.455995

107.5723951

0

.0

133

70

X29T3

928

2.094125

18.7134149

0

.0

1

124

X30T3

928

1.004896

32.5207291

0

.0

120

56

X31T3

928

.281926

1.6874224

0

.0

12

52

X32T3

928

2.094638

20.2466047

0

.0

0

71

X33T3

928

.978278

1.3907070

0

.0

1

36

X34T3

928

-1.258732

30.1861351

0

.0

96

92

X36T3

928

10.128026

265.9225798

0

.0

59

103

X37T3

927

5.807971

58.2262456

1

.1

0

108

X38T3

927

31.327416

229.7271188

1

.1

0

127

x39T3

928

.151201

.1844662

0

.0

4

22

X40T3

928

35.842833

269.1824970

0

.0

0

124

X41T3

928

3.124092

83.5912666

0

.0

3

28


Table 4.10 Univariate Statistics for data sample four year before bankruptcy Table 4.10 N

Mean

Std. Deviation

No. of Extremesa

Missing Count

Percent

Low

High

X1T4

928

-.173175

.9779860

0

.0

121

16

X2T4

925

3.019160

7.1987185

3

.3

0

98

X3T4

928

.145791

.1779401

0

.0

5

23

X4T4

927

-1.470527

10.3967189

1

.1

142

5

X5T4

927

-.116987

.7929120

1

.1

119

15

X6T4

927

1.042635

1.3907275

1

.1

0

31

X7T4

928

1.300405

6.9325752

0

.0

0

43

X8T4

928

.687550

.9687480

0

.0

0

42

X9T4

928

.503605

.2788197

0

.0

0

0

X10T4

928

-.007038

.3408096

0

.0

170

113

X11T4

928

.681218

.9674495

0

.0

0

39

X12T4

928

.183454

.2560843

0

.0

1

80

X13T4

928

1.302593

6.9324967

0

.0

0

43

X14T4

928

-.105828

.8994755

0

.0

128

17

X15T4

928

1.300405

6.9325752

0

.0

0

43

X16T4

928

.369948

.6631395

0

.0

1

39

X17T4

928

1.300405

6.9325752

0

.0

0

43

x18T4

928

.374560

.6326094

0

.0

2

39

x19T4

928

10.155274

151.9705440

0

.0

2

125

X20T4

928

-13.382993

815.4188959

0

.0

125

113

X21T4

928

30.324822

194.2054425

0

.0

0

138

X22T4

928

-.102105

.7818219

0

.0

127

21

X23T4

928

-5.795678

89.3203831

0

.0

164

46

X24T4

928

-6.311462

91.4136545

0

.0

158

30

95

X25T4

928

.218274

.4325690

0

.0

0

33

X26T4

928

.668108

1.2811048

0

.0

0

55

X27T4

928

.392867

.7428170

0

.0

0

24

X28T4

928

-6.885582

100.9061376

0

.0

144

73

X29T4

928

1.818821

6.8452573

0

.0

0

129

X30T4

928

1.139221

31.2651135

0

.0

114

60

X31T4

928

.452752

8.6250310

0

.0

12

56

X32T4

928

1.467498

7.0327693

0

.0

0

69

X33T4

928

1.036377

1.3866071

0

.0

1

31

X34T4

928

-.916501

16.8631678

0

.0

103

104

X35T4

928

26884.588919

0

.0

52

136

X36T4

928

4.360924

62.1136661

0

.0

65

96

X37T4

927

10.165556

152.0521921

1

.1

0

124

X38T4

927

35.577803

267.1218918

1

.1

0

121

x39T4

928

.142655

.3950704

0

.0

7

28

X40T4

928

39.247402

219.7461392

0

.0

0

124

X41T4

928

2.616322

64.7991745

0

.0

4

33

373743.197126 3

a. Number of cases outside the range (Q1 - 1.5*IQR, Q3 + 1.5*IQR). Table 4.11 Univariate Statistics for data sample five year before bankruptcy N

Mean

Std. Deviation

No. of Extremesa

Missing Count

Percent

Low

High

X1T5

928

-.172782

1.0149037

0

.0

121

20

X2T5

927

2.829966

6.0655194

1

.1

0

87

X3T5

928

.151702

.1815187

0

.0

5

16

X4T5

927

-1.145720

6.3335731

1

.1

139

8

X5T5

927

-.126381

.9023768

1

.1

112

17

X6T5

927

1.040013

1.0276340

1

.1

0

37

X7T5

927

1.183268

5.8505294

1

.1

0

29

X8T5

928

.659396

.8909585

0

.0

0

40

X9T5

928

.502371

.2808699

0

.0

0

0

X10T5

928

.006600

.2795171

0

.0

156

162

X11T5

928

.646685

.8831673

0

.0

1

37

X12T5

928

.176178

.2386720

0

.0

0

66

X13T5

927

1.185012

5.8504283

1

.1

0

28

X14T5

928

-.113455

.9443782

0

.0

122

24

X15T5

927

1.183268

5.8505294

1

.1

0

29

X16T5

928

.378503

.6858182

0

.0

1

45

X17T5

927

1.183268

5.8505294

1

.1

0

29

x18T5

928

.376491

.6925033

0

.0

2

44

x19T5

928

6.138037

77.5741934

0

.0

2

119

X20T5

928

159.759521

5091.7697254

0

.0

129

117

X21T5

928

18.588667

92.6717126

0

.0

0

133

96

X22T5

928

-.116831

.8905469

0

.0

116

21

X23T5

928

-4.435434

55.1402660

0

.0

158

58

X24T5

928

-4.851490

58.2163124

0

.0

159

43

X25T5

928

.204382

.4084191

0

.0

0

32

X26T5

928

.647224

1.1521097

0

.0

0

52

X27T5

928

.572762

6.1248955

0

.0

0

22

X28T5

928

-8.703963

227.1230790

0

.0

140

90

X29T5

928

1.656466

6.3122065

0

.0

1

138

X30T5

928

.734257

19.9489760

0

.0

120

52

X31T5

928

-.041617

11.6041550

0

.0

13

56

X32T5

928

1.307071

5.8451373

0

.0

0

64

X33T5

928

1.025311

1.0571790

0

.0

1

35

X34T5

928

2.744535

70.1411300

0

.0

98

98

X35T5

928

32491.360418

0

.0

50

126

X36T5

928

-4.221002

418.1366420

0

.0

60

88

X37T5

927

6.127394

77.6160452

1

.1

1

117

X38T5

927

39.323022

219.8241031

1

.1

0

123

x39T5

928

.161733

.2420099

0

.0

5

17

X40T5

928

46.842724

408.8061708

0

.0

0

119

X41T5

928

.156068

.2768338

0

.0

7

16

426169.224685 6


Table 5.1 Prediction accuracy of the model starting from year one to five using Decision Trees Model Classification Table for data one year before Event: Observed Predicted Bankrupt NonAccuracy % Bankrupt Bankrupt 340 124 73.27% NonBankrupt 180 284 61.2%

Classification Table for data Two years before Event: Observed Predicted Bankrupt NonAccuracy % Bankrupt Bankrupt 279 185 60.0%

Overall Accuracy %

Overall Accuracy %

NonBankrupt

220

244

52.0% 56.0%

67.2%

Classification Table for data Three years before Event: Observed Predicted Bankrupt NonAccuracy % Bankrupt Bankrupt 300 164 64.5% Non123 341 73.5% Bankrupt Overall Accuracy % 69.0%

Classification Table for data Four years before Event: Observed Predicted Bankrupt NonAccuracy % Bankrupt Bankrupt 150 314 32.0% Non24 440 94.8% Bankrupt Overall Accuracy % 63.0%

Classification Table for data Five years before Event: Observed Predicted Bankrupt NonAccuracy % Bankrupt Bankrupt 185 279 39.8% Non20 444 95.6% Bankrupt Overall Accuracy % 67.5%

97

Table 5.2 Prediction accuracy of the model starting from year one to five using HP Trees Model Classification Table for data one year before Event: Observed Predicted Bankrupt NonAccuracy % Bankrupt Bankrupt 364 100 78.44% NonBankrupt 230 234 50.4%


Overall Accuracy %

Overall Accuracy %

NonBankrupt

220

244

52.9% 68.3%

61%

Classification Table for data Three years before Event: Observed Predicted Bankrupt NonAccuracy % Bankrupt Bankrupt 333 131 71.7% Non325 139 70.0 % Bankrupt

Classification Table for data Four years before Event: Observed Predicted Bankrupt NonAccuracy % Bankrupt Bankrupt 234 230 51.0% Non225 239 73.0% Bankrupt

Overall Accuracy %

Overall Accuracy %

70.5%

62.0%

Classification Table for data Five years before Event: Observed Predicted Bankrupt NonAccuracy % Bankrupt Bankrupt 150 314 32.0% Non44 420 90.2% Bankrupt Overall Accuracy % 61.3 %

Table 5.3 Prediction accuracy of the model starting from year one to five using Neural Network Model Classification Table for data one year before Event: Observed Predicted Bankrupt NonAccuracy % Bankrupt Bankrupt 445 19 95.9% NonBankrupt 23 441 95.4%


Overall Accuracy %

Overall Accuracy %

NonBankrupt

9

455

97.6% 97.7%

95.4%

Classification Table for data Three years before Event: Observed Predicted Bankrupt NonAccuracy % Bankrupt Bankrupt 444 20 95.5% Non34 430 92.0 % Bankrupt

Classification Table for data Four years before Event: Observed Predicted Bankrupt NonAccuracy % Bankrupt Bankrupt 440 24 95.0% Non30 434 92.4 % Bankrupt

Overall Accuracy %

Overall Accuracy %

93.25%

Classification Table for data Five years before Event: Observed Bankrupt Non-Bankrupt

Bankrupt 440 64

Overall Accuracy %

Predicted Non-Bankrupt 24 400

Accuracy % 95.0% 86.2%

90.1 %

98

92.2%

Table 5.4 Prediction accuracy of the model starting from year one to five using Auto Neural Model Classification Table for data one year before Event: Observed Predicted Bankrupt NonAccuracy % Bankrupt Bankrupt 440 24 94.0 % Non26 439 93.0 % Bankrupt Overall Accuracy %

93.5%

Classification Table for data Two years before Event: Observed Predicted Bankrupt NonAccuracy % Bankrupt Bankrupt 463 1 99.5% Non1 463 99.5% Bankrupt 99.5% Overall Accuracy %

Classification Table for data Three years before Event: Observed Predicted Bankrupt NonAccuracy % Bankrupt Bankrupt 0 464 0 Non0 464 100.00 Bankrupt


Overall Accuracy %

Overall Accuracy %

50.0%

97.7%

Classification Table for data Five years before Event: Observed Predicted Bankrupt NonAccuracy % Bankrupt Bankrupt 0 464 0 Non0 464 100.0 Bankrupt Overall Accuracy % 50.0 %

Table 5.5 Prediction accuracy of the model starting from year one to five using HP Neural Model Classification Table for data one year before Event: Observed Predicted Bankrupt NonAccuracy % Bankrupt Bankrupt 420 44 90.0 % Non404 60 12.0 % Bankrupt Overall Accuracy %

51.0%

Classification Table for data Two years before Event: Observed Predicted Bankrupt NonAccuracy % Bankrupt Bankrupt 440 24 95.0% Non464 0 0.0% Bankrupt 47.25.0% Overall Accuracy %

Classification Table for data Three years before Event: Observed Predicted Bankrupt NonAccuracy % Bankrupt Bankrupt 440 24 95.0% Non225 239 73.0% Bankrupt

Classification Table for data Four years before Event: Observed Predicted Bankrupt NonAccuracy % Bankrupt Bankrupt 420 44 90.0 % Non52 412 88.9% Bankrupt

Overall Accuracy %

Overall Accuracy %

84.0%


Bankrupt 454 404

Overall Accuracy %


Accuracy % 97.8% 12.0%

54.6 %

99

89.4%

Table 5.6 Prediction accuracy of the model starting from year one to five using Neural Network Model Classification Table for data one year before Event: Observed Predicted Bankrupt NonAccuracy Bankrupt % Bankrupt 275 189 59.30% Non169 295 63.57 % Bankrupt Overall Accuracy %

46.64%

Classification Table for data Three years before Event: Observed Predicted Bankrupt NonAccuracy Bankrupt % Bankrupt 431 33 92.27% Non405 59 13.0% Bankrupt Overall Accuracy %

52.4%

Classification Table for data Two years before Event: Observed Predicted Bankrupt NonAccuracy Bankrupt % Bankrupt 435 29 93.7% Non392 72 16.6% Bankrupt 55.15% Overall Accuracy %

Classification Table for data Four years before Event: Observed Predicted Bankrupt NonAccuracy Bankrupt % Bankrupt 335 129 74.2% Non240 224 48.0% Bankrupt 61.1% Overall Accuracy %

Classification Table for data Five years before Event: Observed Predicted Bankrupt NonAccuracy Bankrupt % Bankrupt 185 279 39.8% Non44 420 90.51% Bankrupt Overall Accuracy % 64.7 %

Table 5.7 Prediction accuracy of the model starting from year one to five using Neural Network Model Classification Table for data one year before Event: Observed Predicted Bankrupt NonAccuracy Bankrupt % Bankrupt 275 189 59.30% Non169 295 63.57 % Bankrupt Overall Accuracy %

46.64%

Classification Table for data Three years before Event: Observed Predicted Bankrupt NonAccuracy Bankrupt % Bankrupt 431 33 92.27%

Classification Table for data Two years before Event: Observed Predicted Bankrupt NonAccuracy Bankrupt % Bankrupt 435 29 93.7% Non392 72 16.6% Bankrupt 55.15% Overall Accuracy %

Classification Table for data Four years before Event: Observed Predicted Bankrupt NonAccuracy Bankrupt % Bankrupt 335 129 74.2%

100

NonBankrupt

405

59

13.0%

NonBankrupt

240

52.4%

Overall Accuracy %

224

48.0% 61.1%

Overall Accuracy %

Classification Table for data Five years before Event: Observed Predicted Bankrupt NonAccuracy Bankrupt % Bankrupt 185 279 39.8% Non44 420 90.51% Bankrupt Overall Accuracy % 64.7 %

Table 5.8 Prediction accuracy of the model starting from year one to five using HP SVM Model Classification Table for data one year before Event: Observed Predicted Bankrupt NonAccuracy % Bankrupt Bankrupt 313 151 67.7% Non230 224 49.13 % Bankrupt Overall Accuracy %

58.41%

Classification Table for data Three years before Event: Observed Predicted Bankrupt NonAccuracy % Bankrupt Bankrupt 189 275 40.7% Non151 313 67.7% Bankrupt Overall Accuracy %

54.0%


Classification Table for data Four years before Event: Observed Predicted Bankrupt NonAccuracy % Bankrupt Bankrupt 180 284 38.7% Non163 325 70.0% Bankrupt 54.2% Overall Accuracy %


101

Table 5.9 Prediction accuracy of the model starting from year one to five using Neural Network Model Classification Table for data one year before Event: Observed Predicted Bankrupt NonAccuracy % Bankrupt Bankrupt 454 10 98.0% NonBankrupt 0 464 100.0%


Overall Accuracy %

Overall Accuracy %

NonBankrupt

464

0

0.0% 50.0%

99.0%


Classification Table for data Four years before Event: Observed Predicted Bankrupt NonAccuracy % Bankrupt Bankrupt 444 20 95.6% Non450 14 3.0 % Bankrupt

Overall Accuracy %

Overall Accuracy %

47.25%

49.0%

Classification Table for data Five years before Event: Observed Predicted Bankrupt NonAccuracy % Bankrupt Bankrupt 454 10 98.0% Non450 14 3.0 % Bankrupt Overall Accuracy % 50.5 %

Table 5.10 Prediction accuracy of the model starting from year one to five using MBR Model Classification Table for data one year before Event: Observed Predicted Bankrupt NonAccuracy % Bankrupt Bankrupt 185 279 39.8% Non165 299 64.47 % Bankrupt Overall Accuracy %

52.1%




Overall Accuracy %

Overall Accuracy %

59.5%


Bankrupt 221 130

Overall Accuracy %


Accuracy % 47.2% 71.9%

59.55 %

102

61.3%

Table 5.11 Bankruptcy prediction accuracy using Naïve Bayes Model Classification Table for data one year before Event: Observed Predicted Bankrupt NonAccuracy % Bankrupt Bankrupt 427 37 92.0% Non94 370 79.7 Bankrupt Overall Accuracy %

85.8%




Overall Accuracy %

Overall Accuracy %

85.8%

52.2%


Table 5.12 Bankruptcy prediction accuracy using BayesNet Model Classification Table for data one year before Event: Observed Predicted Bankrupt NonAccuracy % Bankrupt Bankrupt 464 0 100.0% Non12 452 79.7 Bankrupt Overall Accuracy %

85.8%

Classification Table for data Three years before Event: Observed Predicted Bankrupt NonAccuracy % Bankrupt Bankrupt 362 98 78.0% Non7 457 98.0% Bankrupt 88.0% Overall Accuracy %




103

Table 5.13 Bankruptcy prediction accuracy table using SMO OR SVM Model Classification Table for data one year before Event: Observed Predicted Bankrupt NonAccuracy % Bankrupt Bankrupt 342 122 73.7% Non233 231 49.7% Bankrupt Overall Accuracy %

61.7%





Table 5.14 Bankruptcy prediction accuracy table using RBFNetwork Model Classification Table for data one year before Event: Observed Predicted Bankrupt NonAccuracy % Bankrupt Bankrupt 356 108 76.7% Non247 217 46.7% Bankrupt Overall Accuracy %

61.7%





104

Table 5.15 Bankruptcy prediction accuracy table using KSTAR Model Classification Table for data one year before Event: Observed Predicted Bankrupt NonAccuracy % Bankrupt Bankrupt 464 0 100% Non0 464 100% Bankrupt Overall Accuracy %

100%





Table 5.16 Bankruptcy prediction accuracy table using LWL Model Classification Table for data one year before Event: Observed Predicted Bankrupt NonAccuracy % Bankrupt Bankrupt 378 86 81.5% Non362 102 21.9% Bankrupt Overall Accuracy %

51.7%





105

Table 5.17 Bankruptcy prediction accuracy table using AdaBoostM1 Model Classification Table for data one year before Event: Observed Predicted Bankrupt NonAccuracy % Bankrupt Bankrupt 247 217 53.2% Non167 297 64.0% Bankrupt Overall Accuracy %

58.6%





Table 5.18 Bankruptcy prediction accuracy table using ClassificationviaRegression Model Classification Table for data one year before Event: Observed Predicted Bankrupt NonAccuracy % Bankrupt Bankrupt 150 314 32.3% Non144 320 68.9% Bankrupt Overall Accuracy %

50.6%





106

Table 5.19 Bankruptcy prediction accuracy table using Decorate Model Classification Table for data one year before Event: Observed Predicted Bankrupt NonAccuracy % Bankrupt Bankrupt 245 219 52.8% Non198 266 57.3% Bankrupt Overall Accuracy %

55.0%





Table 5.20 Bankruptcy prediction accuracy table using Dagging Model Classification Table for data one year before Event: Observed Predicted Bankrupt NonAccuracy % Bankrupt Bankrupt 233 231 50.21% Non120 344 74.4% Bankrupt Overall Accuracy %

62.3%





107

Table 5.21 Bankruptcy prediction accuracy table using ogisticBoost Model Classification Table for data one year before Event: Observed Predicted Bankrupt NonAccuracy % Bankrupt Bankrupt 316 148 68.1% Non256 208 44.8% Bankrupt Overall Accuracy %

54.5%





Table 5.22 Bankruptcy prediction accuracy table using MultiBoostAB Model Classification Table for data one year before Event: Observed Predicted Bankrupt NonAccuracy % Bankrupt Bankrupt 381 83 82.1% Non308 156 33.6% Bankrupt Overall Accuracy %

57.8%





108

Table 5.23 Bankruptcy prediction accuracy table using Random Committee Model Classification Table for data one year before Event: Observed Predicted Bankrupt NonAccuracy % Bankrupt Bankrupt 262 202 56.5% Non225 239 51.5% Bankrupt Overall Accuracy %

54.0%





Table 5.24 Bankruptcy prediction accuracy table using HyperPipes Model Classification Table for data one year before Event: Observed Predicted Bankrupt NonAccuracy % Bankrupt Bankrupt 89 375 19.2% Non92 372 80.2% Bankrupt Overall Accuracy %

49.6%





109

Table 5.25 Bankruptcy prediction accuracy table using NNge Model Classification Table for data one year before Event: Observed Predicted Bankrupt NonAccuracy % Bankrupt Bankrupt 246 218 53.01% Non238 226 48.7% Bankrupt Overall Accuracy %

50.8%





Table 5.26 Bankruptcy prediction accuracy table using OneR Model Classification Table for data one year before Event: Observed Predicted Bankrupt NonAccuracy % Bankrupt Bankrupt 28 436 6.0% Non15 449 96.7% Bankrupt Overall Accuracy %

51.3%





110

Table 5.27 Bankruptcy prediction accuracy table using ZeroR Model Classification Table for data one year before Event: Observed Predicted Bankrupt NonAccuracy % Bankrupt Bankrupt 184 280 39.6% Non188 276 59.5% Bankrupt Overall Accuracy %

49.5%





Table 5.28 Bankruptcy prediction accuracy table using Random Forest Model Classification Table for data one year before Event: Observed Predicted Bankrupt NonAccuracy % Bankrupt Bankrupt 256 208 55.2% Non245 219 47.2% Bankrupt Overall Accuracy %

51.2%





111

Table 5.29 Bankruptcy prediction accuracy table using J48 Model Classification Table for data one year before Event: Observed Predicted Bankrupt NonAccuracy % Bankrupt Bankrupt 297 167 64.1% Non274 190 40.9% Bankrupt Overall Accuracy %

52.5%





Table 5.30 Bankruptcy prediction accuracy table using SimpleCart Model Classification Table for data one year before Event: Observed Predicted Bankrupt NonAccuracy % Bankrupt Bankrupt 416 48 89.6% Non417 47 10.1% Bankrupt Overall Accuracy %

49.86%





112

Table 5.31 Bankruptcy prediction accuracy table using END Model Classification Table for data one year before Event: Observed Predicted Bankrupt NonAccuracy % Bankrupt Bankrupt 297 167 64.0% Non274 190 40.9% Bankrupt Overall Accuracy %

52.5%





Table 5.32 Bankruptcy prediction accuracy table using MLP neural network Model Classification Table for data one year before Event: Observed Predicted Bankrupt NonAccuracy % Bankrupt Bankrupt 463 0 100.0% NonBankrupt 0 463 100.0%


Overall Accuracy %

Overall Accuracy %

100.0%

NonBankrupt

58

265

82.0% 86.2%

Classification Table for data Three years before Event: Observed Predicted Bankrupt NonAccuracy % Bankrupt Bankrupt 312 6 98.1%

Classification Table for data Four years before Event: Observed Predicted Bankrupt NonAccuracy % Bankrupt Bankrupt 247 85 74.4%

NonBankrupt

91.0%

NonBankrupt

94.5%

Overall Accuracy %

29

293

Overall Accuracy %


Bankrupt 108 88

Overall Accuracy %


Accuracy % 32.1% 72.2%

51.6%

113

185

137

42.5% 58.7%

Table 5.33 Bankruptcy prediction accuracy table using CHAID Model Classification Table for data one year before Event: Observed Predicted Bankrupt NonAccuracy % Bankrupt Bankrupt

Classification Table for data Two years before Event: Observed Predicted Bankrupt NonAccuracy % Bankrupt Bankrupt

NonBankrupt

NonBankrupt

366 308

98

156

78.9% 33.6%

194

270

137

327

41.8% 70.5%

56.1% 56.2%

Overall Accuracy %

Overall Accuracy %

Classification Table for data Three years before Event: Observed Predicted Bankrupt NonAccuracy % Bankrupt Bankrupt

Classification Table for data Four years before Event: Observed Predicted Bankrupt NonAccuracy % Bankrupt Bankrupt

NonBankrupt

NonBankrupt

396 340

68

124

85.3% 26.7%

60

404

12.9%

32

432

93.1%

56.0% Overall Accuracy %

53.0%

Overall Accuracy %

Classification Table for data Five years before Event: Observed Predicted Bankrupt NonAccuracy % Bankrupt Bankrupt NonBankrupt

275

189

59.3%

171

293

63.1%

Overall Accuracy %

61.2%

Table 5.34 Bankruptcy prediction accuracy table CHAID Exhaustive Model Classification Table for data one year before Event: Observed Predicted Bankrupt NonAccuracy % Bankrupt Bankrupt 302 162 65.1% Non223 241 51.9% Bankrupt

Classification Table for data Two years before Event: Observed Predicted Bankrupt NonAccuracy % Bankrupt Bankrupt 300 164 65.0% Non0 464 100.0% Bankrupt

82.2% 58.5%

Overall Accuracy %

Overall Accuracy %

Classification Table for data Three years before Event:

Classification Table for data Four years before Event:

Observed

Observed

Predicted Bankrupt

Non-Bankrupt

Accuracy %

Bankrupt

Predicted Bankrupt

Non-Bankrupt

Accuracy %

Bankrupt 349

115

75.2%

Non-Bankrupt

60

404

12.9%

32

432

93.1%

Non-Bankrupt 301

163

35.1% 55.2%

Overall Accuracy %

Overall Accuracy %

Classification Table for data Five years before Event: Observed Predicted Bankrupt Non-Bankrupt Accuracy % Bankrupt 424 40 91.4% Non-Bankrupt 273 191 41.2%

Overall Accuracy %

66.3%

114

53.0%

Table 5.35 Bankruptcy prediction accuracy table CART Model Classification Table for data one year before Event: Observed Predicted Bankrupt NonAccuracy % Bankrupt Bankrupt


NonBankrupt

30.8%

NonBankrupt

57.9%

Overall Accuracy %

394 321

70

143

Overall Accuracy %

84.9%

335

129

72.2%

269

195

42.0% 57.1%

Classification Table for data Three years before Event: Observed Predicted Bankrupt NonAccuracy % Bankrupt Bankrupt

Classification Table for data Four years before Event: Observed Predicted Bankrupt NonAccuracy % Bankrupt Bankrupt

NonBankrupt

NonBankrupt

400 342

64

122

86.2% 26.3%

389

75

83.8%

348

116

25.0%

56.2% Overall Accuracy %

54.4%

Overall Accuracy %

Classification Table for data Five years before Event: Observed Predicted Bankrupt NonAccuracy % Bankrupt Bankrupt NonBankrupt

441

23

95.0%

416

48

10.3%

Overall Accuracy %

52.7%

Table 5.36 Bankruptcy prediction accuracy table QUEST Model Classification Table for data one year before Event: Observed Predicted Bankrupt NonAccuracy % Bankrupt Bankrupt


NonBankrupt

88.1%

NonBankrupt

94.0%

Overall Accuracy %

404 404

60 60

Overall Accuracy %

88.1%

298

166

65.0%

200

264

56.2% 82.0%

Classification Table for data Three years before Event:

Classification Table for data Four years before Event:

Observed

Observed

Predicted Bankrupt

Non-Bankrupt

Accuracy %

Bankrupt

Predicted Bankrupt

Non-Bankrupt

Accuracy %

Bankrupt 200

264

56.2%

Non-Bankrupt

0

464

0.0%

0

464

100.0%

Non-Bankrupt 0

464

100.0% 78.2%

Overall Accuracy %

Overall Accuracy %

Classification Table for data Five years before Event: Observed Predicted Bankrupt Non-Bankrupt Accuracy % Bankrupt 464 0 100.0% Non-Bankrupt 464 0 0.0% Overall Accuracy % 50.0%

115

50.0%

Table 5.37 Bankruptcy prediction accuracy table K-NN Model Classification table one year before Event

Classification table two years before Event

Classification table three year before Event

Classification table four years before Event

Classification table five years before Event

116

Appendix B

Figure 5.4 Model Decision Trees

117

Figure 5.5 Model HP Tree

118

Figure 5.6 Neural Network Model

119

Figure 5.7 Auto Neural Model

120

Figure 5.8 HP Neural Model

121

Figure 5.9 DMNeural Model

122

Figure 5.10 Regression Model

123

Figure 5.11 HP SVM Model

124

Figure 5.12 HP Regression Model

125

Figure 5.13 Memory Based Reasoning Model

126

Bankruptcy Prediction Using Data Mining ...

Bankruptcy Prediction Using Data Mining ...

Suggest Documents

Corporate bankruptcy prediction using data mining ... - Semantic Scholar

RAINFALL PREDICTION USING DATA MINING TECHNIQUES - A ...

Enrollment Prediction Models Using Data Mining - Ashutosh ...

climate change prediction using data mining 1

Meteorological Phenomena Forecast Using Data Mining Prediction ...

climate change prediction using data mining 1

Bankruptcy Prediction using A Multiple Criteria Decision Making Data ...

Bankruptcy Prediction with Missing Data - Semantic Scholar

Bankruptcy Prediction of Financially Distressed Companies using

Machine Learning Methods of Bankruptcy Prediction Using ...

Bankruptcy Prediction Using Machine Learning - Scientific Research

High-Performance Bankruptcy Prediction Model using ...

Companies Bankruptcy Prediction by Using Altman ... - CiteSeerX

Early bankruptcy prediction using ENPC - Core

Bankruptcy Prediction of Financially Distressed Companies using

Early Warning System for Bankruptcy: Bankruptcy Prediction

Early Warning System for Bankruptcy: Bankruptcy Prediction

Early Warning System for Bankruptcy: Bankruptcy Prediction

Customer Churn Prediction in Telecom using Data Mining - plaza

Seasonal to Inter-annual Climate Prediction Using Data Mining KNN ...

Wine Vinification prediction using Data Mining tools - wseas.us

Wine Vinification prediction using Data Mining tools - wseas.us

IRJET- Financial Distress Prediction of a Company using Data Mining

Heart Disease Prediction System Using Data Mining and Hybrid