ate data set, can be applied for behavior scoring ... Business. Fig. 1. Steps in developing credit scoring model using f
Developing Credit Scoring Models With Fuzzy Rule Based Classifiers Arijit Laha Institute for Development and Research in Banking Technology Castle Hills, Hyderabad 500 057 India ( e-mail:
[email protected]) Abstract – Credit-risk evaluation is a very challenging and important problem in the domain of financial analysis. Many classification methods have been suggested in the literature to tackle this problem. Statistical and neural network based approaches are among the most popular paradigms. However, most of these methods produce so called “hard” classifiers, those generate decisions without any accompanying confidence measure. In contrast “soft” classifiers, such as those designed using fuzzy set theory produce a measure of support for the decision (and also alternative decisions) that provide the analyst with greater insight. In this paper we propose two schemes for designing fuzzy rule based credit scoring models. First fuzzy rules are extracted from the training data using a Self-organizing Map (SOM) based method. Then two classifiers, basic fuzzy rule based classifier and fuzzy rule based k-nn classifier, are developed for credit scoring. A method of seamlessly integrating business constraints into the model is also proposed.
I. I NTRODUCTION Credit scoring is a method of modelling potential risk related to a credit portfolio. Popularly statistical and neural network techniques are used with historical data to produce a scoring models, which can be used by financial institutions to evaluate portfolios in terms of risk [7]. Credit scoring tasks can be divided into two distinct types. The first type is credit application scoring, where the task is to classify applicants into “good” and “bad” risk groups. The data used for modelling is generally consisted of financial information and demographic information about the loan applicant. In contrast, the second type of tasks deal with existing customers and payment history information are also used here. This is distinguished from the first type because this reflects the customers payment pattern on the loan and the task is called behavior
scoring. The remaining sections of this paper will be focused on application scoring. However, the techniques developed here, working with appropriate data set, can be applied for behavior scoring also. Here we propose a data-driven (i.e., using learning algorithms) scheme for developing credit scoring models that incorporates a Self-organizing Map (SOM) [3] based method for fuzzy rule extraction [4], [5], [6] for classifier design. One of the two variants of model proposed here uses the fuzzy k nn rule [2] on the fuzzy labels of the test data point and its k nearest neighbors for more robust and context sensitive decision making. We also outline the design of a post-processing filter that allows a domain expert to impose business constraints on the decision according to the risk policy. The rest of the paper is organized as follows: in section 2 we describe the scheme in detail, section 3 contains the experimental results and we conclude the paper in section 4. II. D ESIGNING A CREDIT SCORING MODEL The scheme for developing the credit scoring model is depicted in Figure 1. First a fuzzy rule base is extracted (i.e., learned) from the training set using a SOM based prototype extraction algorithm, followed by a fuzzy rule extraction and tuning algorithm applied on the prototypes. Then we use the rule base to design two types of classifiers for credit scoring. The first one uses the fuzzy rules directly, while the other applies fuzzy k -nn rule to the test data and its k nearest data points from the training/reference set. Then we propose a method of incorporating the business constraints in the model. A. Extraction of prototypes Most challenging problem in prototype based classification is to find representative prototypes in
The learning system
Training Data
SOM
DYNAGEN
Fuzzy Rule Extraction
FRKNN
Decision Business Constraints
Fig. 1.
Steps in developing credit scoring model using fuzzy rule based k-nn (FRKNN) classifier
adequate number. Usually some clustering technique such as k -means algorithm [1] is used to find clusters and their centers form the set of prototypes. However, for most of the conventional clustering algorithms, the number of clusters (and hence prototypes) need to be specified by the user. The number of prototypes is the most important factor influencing the performance of the classifier. If the number is less than adequate, the performance suffers, while a larger than required number of prototype increases the computational load unnecessarily. In the proposed scheme we use the dynamic prototype generation algorithm, DYNAGEN introduced in [4], [5], that dynamically extracts adequate number of prototypes from the training data. In this method, the user need not guess or find by trialand-error, the number of prototypes adequate for the task. It is also relatively immune to underutilization of prototypes. The DYNAGEN algorithm first trains a SOM with the training data (note that, the label information is not used for training the SOM). Then the SOM prototypes are labelled and fine-tuned iteratively . In each iteration, based on their classification performance, the prototypes are deleted, split and/or merged. At the end the algorithm produce a good set of prototypes, where each prototype good degree of the properties (1) representativeness: the prototype represents significant number of points and (2) purity: majority of the points represented by the prototype comes from a single class, which is represented by it. For more details see [4], [5]. B. Designing fuzzy rule base Though the prototype based classifier produced by DYNAGEN works well with simpler data sets, its performance suffers when the data dimension is large and/or there is large variation in the variances of different components of the data. This is true
for all purely distance based classifiers, because in such a situation the distance measures fails to reflect the concept of “similarity” in a proper fashion. Fuzzy rule bases can be used to address the above problems. In our scheme we convert the set of prototypes into a set of fuzzy rules and finetune them using the method proposed in [6]. The method tunes the rules with respect to their context in the feature space. Out of two variants of the tuning algorithms proposed in [6], we use the one uses “softmin” as the conjunction operator. Using the resulting rule base {Ri }, where rule Ri for class k is of the form Ri : x1 is CLOSE TO vi1 AND · · · AND xp is CLOSE TO vip then class is k .
The fuzzy set CLOSE TO vij is modelled by a Gaussian membership function : µij (xj ; vij , σij ) = exp −(xj − vij )2 /σij 2 ,
where vij and σij are the center of the fuzzy set and a constant controlling the spread of the fuzzy set respectively. Both of them are initialized from the respective prototype and fine-tuned by the algorithm. For a data point x ∈