Departamento de Arquitectura y Tecnologıa de Computadores. University of ... Roman et al. in [14], where they apply parametric and non-parametric tests to.
Predicting Financial Distress: A Case Study Using Self-organizing Maps A.M. Mora, J.L.J. Laredo, P.A. Castillo, and J.J. Merelo Departamento de Arquitectura y Tecnolog´ıa de Computadores University of Granada, Spain {amorag,juanlu,pedro,jmerelo}@geneura.ugr.es
Abstract. In this paper we use Kohonen’s Self-Organizing Map (SOM) for surveying the financial status of Spanish companies. From it, we infer which are the most relevant variables, so that a fast diagnostic on their status can be reached and, besides, explained via a few rules of thumb extracted from the behavior of those variables. This map can be used as part of a decision making process, or as a first stage in an automatic classification tool. Results show that variables, identified in an easy and visual way (using SOM and U-Matrix graph), are in agreement with those obtained using parametric and non-parametric tests, which are more complex and difficult to apply.
1
Introduction
The prediction of company failure is one of the main problems that the manager of a company must solve. Sometimes there are some warning signs regarding a foreseeable financial difficult and he is able to act in order to prevent that situation from happening. Besides, this problem has become a classic among the problems studied by researchers, so that possible investors (or commercial partners) have more elements of judgement on which to base their decisions. The history of this subject [1,6,12] shows that, along the years, there is a search for more generic (less restrictive) tools. Hence from Beaver’s univariate analysis [2] it has arrived to the application of artificial neural networks (ANN) [3,4] and Self-Organizing Maps (SOM) [8,9]. The use of statistical techniques requires the existence of a functional relation among dependent and independent variables, and if that assumption is not correct, the method will not yield good predictions. Besides, these methods are very sensitive to exceptions (giving bad generalization power), in such a way that some atypical examples could mean the method performing at a lower level. On the other hand, ANN (in general) and Kohonen’s self-organizing map [11] (in particular) can represent non-linear relationships among variables. Also, in changing environments, as new examples become available, both are able to adapt their output. In some conditions, SOM can be a better alternative to determine relationships among data, since none of the suppositions are made over
Supported by NadeWeb project TIC2003-09481-C04-01.
F. Sandoval et al. (Eds.): IWANN 2007, LNCS 4507, pp. 774–781, 2007. c Springer-Verlag Berlin Heidelberg 2007
Predicting Financial Distress: A Case Study Using Self-organizing Maps
775
probability distributions, nor linearity constraints, so they can be applied to a wider range of problems, imposing less restrictions, and requiring a less strict data pre-processing. No wonder that SOM has already been applied in this area. Kaski et al. in [8] calculate, a scale to measure the differences between two firms, using learning techniques. They use then on a SOM to visualize the firms’ situation and the direction in which they are moving. The same type of algorithm is used in [9] to predict bankruptcies, comparing it with linear discriminate analysis and Learning Vector Quantization (LVQ, [10]). We have worked with two types of independent variables, some obtained from financial statements while others are non-financial data, related to 470 Spanish companies during 1998 and 2000. We have considered the results presented by Roman et al. in [14], where they apply parametric and non-parametric tests to these data. On the other hand, we have applied self-organizing map to the same data in order to identify significant differences among firms with continued losses and healthy ones. So the U-Matrix algorithm and graph, and SOM plane analysis map have been used to perform a visual clustering of the samples, along with an analysis of the variables that most affect on continued losses. Afterwards, we have extracted the variables that let us make the appropriate characterization of firms included in each group. Finally we have compared the results yielded with SOM application and parametric and non-parametric tests to prove the value of those variables and the SOM methodology applied to this problem. The remainder of this paper is organized as follows: section 2 describes the dataset as well as the methods used to process the samples. Section 3 shows the results yielded by SOM and compares them with those obtained in [14]. The conclusions are reported in section 4 along with a description of the future lines to address this problem.
2
Data and Methodology
The dataset comes from the Infotel database1 , and contains 470 of which 170 had book losses during the years 2001, 2002 and 2003, and 300 companies which present a good financial health. In addition there are data of the 470 mentioned companies concerning to years 1998, 1999 and 2000. The choice of the independent variables must be made considering their meaning related to the company financial health. This is a very difficult choice so we have consider the variables and indicators which have been important in other related works and described in the mentioned paper [14]. In addition we take into account some qualitative information which may warn about financial problems in a company. The dependent variable is either ’1’, in the case of continued book losses (in 3 consecutive years), or ’0’ for a healthy company. Relating to the methodology, on one hand, Roman et al. [14] choose some key variables using two methods: parametric test, that identifies the variables which have significant differences in their mean values for healthy and failed 1
Bought from http://infotel.es
776
A.M. Mora et al.
companies in the same period (ANOVA test); and non parametric test (MannWhitney and Kolmogorov-Smirnov tests). On the other hand we have applied a Self-Organizing Map to the dataset. The Self Organizing Map (SOM) was introduced by Teuvo Kohonen in 1982 [11]. It is a non-supervised neural network [13,7] that tries to imitate the selforganization done in the sensory cortex of the human brain, where neighbouring neurons are activated by similar stimulus. It is usually used as a clustering/classification tool or used to find unknown relationships between a set of variables that describe a problem. The main property of the SOM is that it makes a nonlinear projection from a high-dimensional data space (one dimension per variable) on a regular, low-dimensional (usually 2D) grid of neurons. SOM is further processed using Ultsch method [16], the Unified distance matrix (U-matrix), which uses SOM’s codevectors (vectors of variables of the problem) as data source and generates a matrix where each component is a distance measure between two adjacent neurons. It allows us to visualize any multi-variated dataset in a two-dimensional display, so we can detect topological relations among neurons and infer about the input data structure. High values in the U-matrix represent a frontier region between clusters, and low values represent a high degree of similarities among neurons on that region, clusters. Although Kohonen’s SOMs are not as accurate as other pattern recognition tools at the task of classification, they can be applied to many different types of data, yielding visualization of natural structures in the data and their relations, highlighting groupings and allowing the user to visually discover the number of clusters and their topological relationships. In addition, SOM makes easy the estimation of the variables that have more influence on these groupings. Other statistical and soft computing tools can also be used for this purpose, but since Kohonen’s SOMs offers a visual way of doing it, it is much more intuitive, and takes advantage of the capabilities of the human brain as a pattern recogniser.
3
Experiments and Results
The failure analyzed refers to the existence of continued losses for three years, in this way, if one firm has had losses during the years 2001, 2002 and 2003, then it is marked as a continued loss in all the rows of that firm in the previous years. We will compare both methods considering the same samples, and the same time period (1998 to 2000). So we have 3 set of data (which have data for almost the same firms), one for each accounting period, distributed as follows: - 1998 : 461 firms, 161 had book losses. - 1999 : 467 firms, 167 had book losses. - 2000 : 479 firms, 179 had book losses. Relating to SOM application, firstly we compose a single dataset by merging the data of the 3 periods. Initially each row in a set was a vector of 38 variables, 37 independent and 1 dependent that took a value of ’1’ if the firm had book losses and ’0’ if it did not. This set of variables has been modified by deleting useless ones (those without significance in our study, for instance internal database firm
Predicting Financial Distress: A Case Study Using Self-organizing Maps
777
code) and by transforming categorical ones into non-categorical, turning it again into a different set of 38 variables (1 dependent). So, we have transformed all the categorical variables into non-categorical ones by creating as new variables as distinct values of each category we have, and giving a value of ’0’ or ’1’ depending on the value of the categorical variable and the new variable we are filling up. For example, if we have the variable size, with three possible values, ’1’,’2’ or ’3’, we can create three new variables size1, size2 and size3. Each one will have a value of ’1’ if the old value of size was ’1’,’2’ or ’3’, respectively and will have a value of ’0’ in other case. For these experiments, we have used Matlab 6.5 along with SOM Toolbox [17], Version 2.0 beta. We have made some experiments to determine the best configuration for training the SOM and get the best results. The parameters used for training are shown in Table 1. Table 1. SOM Parameters Map Grid Size 8x8 Map Lattice hexagonal Size of Vectors 37 Normalization Var (in [-1,1]) The rest of parameters take the default values in SOM Toolbox.
We have chosen this map size because it gives the best visualization of the clusters in the data (for the amount of data that we have). A bigger size would imply a bigger distance between each unit (neuron) in the map, so these clusters could not be visualized easily. The lattice is hexagonal because we obtained better results in this type (each neuron have a smaller neighbourhood by using it). The normalization of data is a necessary pre-processing, because of the differences in the values of some variables which would dominate the map organization. So the data supplied by Infotel is pre-processed by assigning values ’-1’ and ’1’ for the independent variables whose values were ’0’ or ’1’ respectively (old categorical ones). We normalize the rest of independent variables by using the method of variance (’var’ parameter in function som normalize), so all their values will be located in the range [-1,1] except whose which are outstanding. We have trained the SOM with the whole sample data, corresponding to the accounting periods 1998-2000. After training and post processing, the U-Matrix graph [16] was obtained. We labelled the samples in the U-Matrix with a ’P’ when it corresponds to a firm with continued losses, otherwise the unit was labelled with a ’n’. The variables are named with capitals excepting the new non-categorical ones obtained by transforming categorical. In this case, the SOM has not been trained with the ideal amount of data in order to differentiate companies of both classes (maybe we did not have enough data or maybe we have used too similar samples), therefore there was not a clear boundary between clusters of failed companies (which have had continued losses) and those successful companies. Looking at the U-Matrix graph (see Fig. 1), only
778
A.M. Mora et al.
3 clusters stand out: a general cluster, formed by dark gray color (blue in color images), that is asociated to vectors representing companies close to each other, in this case; a warm spot at the lower right corner, and a hot spot at the upper right corner; these spots represent clusters of companies whose features stand out against the rest. There is also a small cluster in the 3rd hexagon from the right in the top row.
Fig. 1. U-Matrix graph (1998-2000 data) (For a color and better-quality image you can visit: http://geneura.ugr.es/∼amorag/som/finan/UMATRIX 98 00.bmp)
Also looking at Fig. 2, the warm spot seems to correspond to large companies (high values on the 3th component, tamano3 ), which have been audited (5th component, AUDITADA), with the first social form (cod formasoc1, 6th component, corresponding to value ’S.L.’), with a high value in the auditors opinion (5th component starting by the last one, opi auditoria1, which means favourable opinion), high acid-test (TEST ACI ), high current ratio (SOLVENCIA), high delay in reporting (RETRASO ), high leverage (NIV END ) and belonged to a group (last component, VINC GRUPO ). There are more failed companies in this cluster (9) than successful ones (4), so the cluster probably corresponds to old members of company conglomerates, with a big size, which are conveniently closed without incurring in big losses. This result was also found in one of our previous studies [5]. On the other hand, the hot cluster is composed mainly of successful companies, with a small size (tamano1 ), and most economic indicators in a healthy shape. Finally, the small cluster is associated to almost-failed
Predicting Financial Distress: A Case Study Using Self-organizing Maps
779
Fig. 2. SOM Plane Analysis (1998-2000 data) (For a color and better-quality image you can visit: http://geneura.ugr.es/∼amorag/som/finan/SOM 98 00.bmp)
companies, which have cod formasoc3, with no other distinctive value. It probably corresponds to generally healthy companies (10 healthy vs. 9 failed) which have had book losses for some reason, others than financial matters. Relating to the application of parametric and non-parametric test, as we previously said, we consider the results obtained in [14] (ANOVA as parametric test and Mann-Whitney and Kolmogorov-Smirnov as non-parametric tests). As a summary we show the facts obtained which are related to SOM results, and the variables identified using the tests: – a) 87% of companies with continued book losses go into bankruptcy. (RIESGO ) – b) 73% of big companies have continued book losses. (tamano3 ) – c) 65% of middle and small companies have continued book losses. (tamano1 and tamano2 ) – d) 44% of audited companies have continued book losses. (AUDITADA) – e) 47% of S.A. type companies have continued book losses. (cod formasoc1 ) – f ) 61% of companies with delay in beeing audited have continued book losses. (RETRASO ) – g) companies with a bad auditor’s opinion have continued book losses. (opi auditoria3 ) Now we compare the results obtained by both methods (SOM and parametric and non-parametric tests): Firstly, following the a) fact, there is a direct relationship between continued book losses and bankruptcy of companies (legal
780
A.M. Mora et al.
failure), so is important to identify the main factors which are related to book losses. Considering the U-Matrix analysis previously commented, the warm spot may correspond to a group of companies which have continued book losses. The outstanding variables (for both methods) in this cluster are: tamano3 (large companies), which is identified too using the other tests (fact b)). Variable AUDITADA (audited), which is mentioned in fact d). Finally ’S.A.’ companies (variable cod formasoc1 ) as well as companies with delay (variable RETRASO ) are also identified by both methods (facts e) and f )). Relating to the hot cluster, the variable identified using SOM is tamano1 (small companies), which is also connected with the complementary fact of c). The third cluster has less importance because it represents an exceptional situation. Thus, identified variables looking at the U-Matrix graph and the SOM Plane analysis, have a big relationship with those obtained using the parametric and non-parametric tests, being in many cases the same and proving the facts in almost all the cases.
4
Conclusions and Future Work
In this paper we continue the collaboration with the Axesor S.A. company (Infotel) and the studies about legal failure (bankruptcy) that we developed in previous articles [3,5]. In these papers some statistical methods were used to demonstrate the influence of some financial variables on the bankruptcy. Due to the dificulty to interpret those results, we decided to face this problem using Kohonen’s SelfOrganizing Maps. This new model have a great advantage which is its generalization capability and the ease determination of those variables. So, the U-Matrix graph makes easy to identify (in a visual way) the clusters in the data set, and the SOM Plane analysis allow to fix the key variables group for each cluster. We have compared obtained results against those presented in [14], where parametric and non-parametric tests were used to determine the most important financial variables (those with significative differences) and where some facts were proposed. Results obtained using the SOM are in agreement with those presented in that article. So, U-Matrix and SOM analysis can be used as a valid tool in order to determine hidden relationships as well as the clusters in a dataset. It also allows to identify important variables which characterize each cluster in a visual way by a non expert person. As future work we will apply SOM to a bigger dataset with information about more companies and related to more accounting periods in order to identify new clusters and other important variables. In the same line, we will create a new dataset considering different variables, indicators and qualitative information which describe the status of a company, following the same objective as in previous case and comparing the results with those obtained with other methods. Finally, we will consider other dependent variables (for instance, bankruptcy) in the same dataset, to test the utility of SOM methodology in any case.
Predicting Financial Distress: A Case Study Using Self-organizing Maps
781
References 1. Altman, E.I.: The success of business failure prediction models. An international survey. Journal of Banking, Accounting and Finance 8, 171–198 (1984) 2. Beaver, W.H.: Financial Ratios as Predictors of Failures. Empirical Research in Accounting: Selected Studies. Journal of Accounting Research 5, 71–111 (1966) 3. Castillo, P.A., De la Torre, J.M., Merelo, J.J., Rom´ an, I.: Forecasting Business Failure. A Comparison of Neural Networks and Logistic Regression for Spanish Companies. In: 24th European Accounting Association, Athens, Greece (2001) 4. Charalambous, C., Charitou, A., Kaourou, F.: Application of feature extractive algorithm to bankruptcy prediction Neural Networks. In: Proceedings of the IEEEINNS-ENNS International Joint Conference, IJCNN 2000, vol. 5, pp. 303–308 (2000) 5. De la Torre, J.M., G´ omez, M.E., Roman, I., Castillo, P.A., Merelo, J.J.: Bankruptcy prediction adapted to firm characteristics. An empirical study. In: Proceedings of the 26th Annual Congress European Accounting Association, p. A-108, Sevilla (2003) 6. Dimitras, A.I., Zanakis, S.H, Zopounidis, C.: A survey of business failure with an emphasis on prediction methods and industrial applications. European Journal of Operational Research 90, 486–513 (1996) 7. Haykin, S.: Neural Networks: A Comprehensive Approach. IEEE Computer Society Press, Piscataway, USA (1994) 8. Kaski, S., Sinkkonen, J., Peltonen, J.: Bankruptcy Analysis with Self-Organizing Maps in Learning Metrics. IEEE Trans. Neural Networks 12(4), 936ss (2001) 9. Kiviluoto, K.: Predicting bankruptcies with the self-organizing map. Neurocomputing 21(1-3), 191–201 (1998) 10. Kohonen, T.: The Self-Organizing Map. In: Proceedings of the IEEE, vol. 78(9), pp. 1464–1480 (1999) 11. Kohonen, T.: The Self-Organizing Maps. Springer, Heidelberg (2001) 12. Laitinen, T., Kankaanp¨ aa ¨, M.: A comparative analysis of failure prediction methods: the Finnish case. The. European Accounting Review 8(1), 67–92 (1999) 13. Mehra, P., Wah, B.: Artificial Neural Networks. IEEE Computer Society Press, Piscataway, USA (1997) 14. Rom´ an, I., G´ omez, M.E., De la Torre, J.M., Merelo, J.J., Mora, A.M.: Predicting Financial Distress: Relationship between Continued Losses and Legal Bankruptcy. In: 29th Annual Congress European Accounting Association, Dublin (2006) 15. Tam, K.Y., Kiang, M.Y.: Predicting bank failures: a neural network approach. In: Trippi, R.R., Turban, E.: (eds.) Neural Networks in Finance and Investing, pp. 267–301 (2000) ISBN:1-55738-919-5 16. Ultsch, S.: Kohonen’s Self-organizing maps for exploratory data analysis. In: Proceedings of INNC’90, Paris, Kluwer Academic, Dordrecht, pp. 305–308 (2000) 17. Vesanto, J., Himberg, J., Alhoniemi, E., Parhankangas, J.: Self-organizing map in Matlab: the SOM Toolbox. In: Proceedings of the Matlab DSP Conference 1999, pp. 35–40, Espoo, Finland (1999) http://www.cis.hut.fi/projects/somtoolbox/