Real Time Credit Card Default Classification Using

Proceedings of the 2017 Industrial and Systems Engineering Research Conference K. Coperich, E. Cudney, H. Nembhard, eds.

Real Time Credit Card Default Classification Using Adaptive Boosting-Based Online Learning Algorithm Hongya Lu, Haifeng Wang and Sang Won Yoon Department of Systems Science and Industrial Engineering State University of New York at Binghamton, Binghamton, NY 13902 Abstract This study proposes an adaptive boosting-based (AdaBoost) online learning algorithm to recognize potential credit card default clients for real time scenarios. Credit card applicants’ qualification classification is a key and challenging competitiveness problem for credit card companies to reduce their market risk. In this paper, the applicant information is represented by a feature vector including education level, credit scores, historical payment records, etc. During the learning process, the training dataset is divided into two parts, original and newly added samples, based on the customer application time. The AdaBoost algorithm is applied to enhance the base decision tree learning ability using original samples. To stabilize the dynamic and stochastic process changes, a re-weighting technique is adopted when new data is received. Then, online learning mechanism is used to update the model parameters to achieve algorithm online self-adaptation. The AdaBoost-based online learning method is compared with other state-of-art classification methods including Extreme Learning Machine (ELM), Random Forest (RF), Support Vector Machine (SVM), Naive Bayesian (NB), and K-Nearest Neighborhood (K-NN) using 5-fold cross validation evaluation. The experimental results show that the Online AdaBoost algorithm can not only reduce the training time, but also obtain stable and competitive classification accuracy on credit card default clients dataset.

Keywords Adaptive Boosting, Extreme Learning Machine, Online Learning, Real Time Default Classification

1. Introduction This research proposes an application of online learning to the credit card default detection system that achieves realtime model tuning with minimal computational efforts. For credit card issuers, the number of credit card clients, the consumption amount, and the default rates are influential factors on the market share of the banks. Banks will suffer loss if credit cards are issued to clients that can potentially default their payments. Thus, it becomes a great challenge to credit card issuers in maximizing the profit to provide credit cards to customers with low delinquency probabilities. Artificial Intelligence (AI) has been applied to identify credit card default clients [5]. Various research specifically investigates the relationship between customer information and the customer default probability. The customer information includes user behavior and personal information such as educational background. In recent years, various data mining techniques are applied to identify clients with high probabilities to default, and have shown high performance in accuracy. However, most of these techniques use offline machine learning to train the classification models, and applications of online learning to real-time credit card default identification are not fully discussed. For most offline learning techniques, retraining is required whenever new data are added to the model. Thus it becomes quite time-consuming to apply batch learning in real-time systems in which the data size grow over time. Moreover, large data size will put great pressure on the data processing and storage system. Thus, to implement an efficient default identification model in real-time scenarios, online learning techniques should be considered since it is a model re-weighting technique with sequentially added new observations, instead of requiring model retraining. The credit default identification, as a dynamic classification problem, has the following characteristics: 1) real-time incremental data; 2) large number of features; 3) imbalanced data. Most research has been focusing on feature selection and data mining method improvements, which considers more of the numerous features and imbalance of data. Limited research has been conducted to deal with the real-time data or ease the pressure on storage memory for increasing data streams. However, with the development of data collection system that provides large amounts of data, it is not

Lu, Wang and Yoon only classification effectiveness, but also computation efficiency that is needed in a feasible credit card real-time data processing and storage system. A real-time credit card default identification system is proposed in this research. Both Online Sequential Extreme Learning Machine (OS-ELM) and Online Adaptive Boosting (Online AdaBoost) methods are applied to the default of credit card clients dataset from the University of California, Irvine (UCI) repository. Experiments are conducted in two perspectives. First, the offline learning from basic ELM and AdaBoost are compared in terms of testing accuracy and training efficiency. Then, the OS-ELM and Online AdaBoost are used to improve the computation speed by maintaining the same level of accuracy as offline algorithms. The features and performances of these two online learning techniques are evaluated and discussed based on the experimental results. This paper is organized as follows. Section 2 provides a review on current research trends of real-time credit card identification. Section 3 introduces the methodology of OS-ELM and online AdaBoost. The experimental design and results are presented in Section 4. Section 5 contains the conclusions and prospects for future extensions.

2. Literature Review Data mining techniques have been applied to credit scoring since early 1990s [7]. The decision making processes of credit commerce and consumption are simulated as static and dynamic models, and a wide range of statistical methods and data mining methods are investigated to explore the credit card scoring performance [8][9]. The identification of credit card default clients, as a classification automatique, has applied not only simple classifiers such as logistic regression and discriminant analysis [11], but also Neural Networks, Support Vector Machine, etc [10][12]. To further extend the applicability of data mining in credit scoring, ensemble learning was proposed to pursue better classification performance [13], and imbalanced credit scoring data sets are specifically explored to provide empirical evaluation using multiple data mining approaches [14]. However, all these researches are based on batch training, and few researches have done for sequentially growing training datasets, even though the credit scoring systems provide large real-time data. Online learning is a machine learning paradigm that allows learning whenever new samples appear by incrementally integrating the previous model information with the effects of new samples. The application of online learning to credit card default identification systems is particularly appropriate as the training data size is larger than can be processed efficiently in batch. One characteristic of online learning is that no assumption on the availability of a sufficient training set is required before the learning process [15]. By sequentially learning one example at a time, online learning represents a family of efficient and scalable algorithms compared to traditional batch learning, and tackles the problems of memory consumption and retraining costs with new incoming data [16]. The ideas and performances of two online learning approaches are reviewed in this paper: Online Sequential Extreme Learning Machine (OS-ELM) and Online Sequential Adaptive Boosting (OS-AdaBoost) methods. The OS-ELM is adapted from the basic Extreme Learning Machine (ELM) technique to allow the model to learn one-by-one or chunk-by-chunk [2]. The basic ELM proposes a feedforward network with a generalized single hidden layer which is not strictly functioning as neurons. A typical ELM is featured by random feature mapping and linear parameter solving processes. The OS-ELM approach shows improved computation power and generalization performance [3], which still needs more improvements in terms of stability [4]. The OS-AdaBoost method combines online learning with ensemble learning. By integrating multiple weak learners to form a strong learning model, and by allowing the weak learners to update with one-by-one incoming new observations, the OS-AdaBoost method shows great balance between cost-sensitive training and classification performance, while the robustness remains to be further improved [6].

3. Methodology The principles of Online Sequential Extreme Learning Machine (OS-ELM) and Online Adaptive Boosting method are briefly introduced in this section, which provide theoretical supports to the research of real-time credit card default classification. 3.1 OS-ELM Method OS-ELM on the basis of ELM was developed for single layer feedforward networks with additive and Radial Basis Function hidden nodes [4]. With a random feature mapping from the input to the hidden layer, and with specific activation functions applied to calculate network output from the hidden layer output, the memory of the ELM is stored in the output layer weight vector β. The OS-ELM learns incrementally by updating the weight vector β with

Lu, Wang and Yoon sequentially incoming data in either single observations or chunks. In the initialization stage of an OS-ELM, the initial hidden layer output matrix H 0 is first calculated with preassigned input weights and the activation function. The output weight β(0) is then estimated with H 0 and the initial N 0 observations. The online learning stage calculates the randomly mapped output H i from the incoming N i samples, and the corresponding update of hidden layer output is calculated in β(i) . In the kth chunk of data (k = 0 for the initialization stage), the weight matrix β(k+1) is updated with a combined effect of β(k) and H ( k + 1). The algorithm is described in the pseudo code as follows. Algorithm 1 OS-ELM Method [3] 1: 2: 3: 4: 5: 6: 7:

Define: L - number of hidden layers, B - activation function Initialization phase: Initialize input weight ai and bi for all i ∈ 1, ..., N Calculate the output matrix H 0 from the hidden layer Calculate the weight vector β0 = P0 H 0 T X 0 , where P0 = (H 0 T H 0 )−1 , X 0 = [x1 , ..., xN 0 ]T Online learning phase: For the N i new samples, calculate the corresponding output H i from the hidden layer

8: 9: 10: 11: 12: 13:

Calculate the output matrix β(i) if N i > 1 then Pk+1 = Pk −Pk H k+1 T (I +H k+1 Pk H k+1 T )−1 H k+1 Pk β(k+1) = β(k) − Pk+1 H k+1 T (X k+1 − H k+1 β(k) ) else Pk H k+1 H k+1 T Pk Pk+1 = Pk − 1+H TP H

14: 15: 16:

β(k+1) = β(k) − Pk+1 H k+1 (xk+1 T − H k+1 T β(k) ) end if Output: T = Hβ

k+1

k

k+1

3.2 Online AdaBoost Method The online AdaBoost method introduced in [6] applies a Poisson distribution to map the number of training times k required for every chunk of incoming observation in each weak learner for Adaptive Boosting. The online AdaBoost method extracts the training error of each weak learner for online update with every incoming data chunk. Algorithm 2 depicts an online AdaBoost model with M weak learners processing N data records appearing in a stream of 100 records in each chunk, which is adapted from the one-by-one training in [6] to chunk-to-chunking model tuning. The "memory" of the algorithm is stored in both the weights of weak learners (λ), and the weights assigned to each chunk (α). The algorithm learns by updating the weights assigned to the data and weights assigned to each weak learner by updating the error rate in each iteration. In the learning process, the nth new data chunk is first assigned a weight of λ = 1. Training is then implemented on each weak learner. By judging whether the training error is smaller than 70%, λ and α are updated in different ways. The Decision Tree (DT) algorithm is selected as the weak learner in this paper because of its advantages in feature selection, and not requiring much on data preparation. Algorithm 2 Online AdaBoost Method for Chunk Data 1:

2: 3: 4: 5: 6: 7: 8:

Input: S - dataset, M - number of weak learners, N number of inputs for one-by-one incoming observations Initialize SCm , SW m for all m ∈ 1, ..., M Set the number of observations in one chunk as k for n = 1, ..., N − k do Set the weight of the example λ = 1 for m = 1, ..., M do Train the weak learner with the nth to (n+(k − th 1)) input and predict hm (n) for the chunk if 70% of hm (n) are correct then

9: 10: 11: 12:

SCm = SCm + λ, em =

SW m (SCm +SW m )

λ λ = 2(1−e m) else SW m = SW m + λ, em =

SW m (SCm +SW m )

λ = 2eλm end if end for end for 1−em 17: Output: H(x) = argmaxy ∈Y ∑M m=1 log( em )I(hm (n) == Y (n)) 13: 14: 15: 16:

Lu, Wang and Yoon

4. Experimental Results and Analysis 4.1 Dataset Description The Default of Credit Card Clients Dataset from University of California, Irvine (UCI) repository [5] is used in this paper to test the performance of the online learning models. The dataset contains 30,000 total instances, and 6,626 (22.1%) of them are default cases. The response variable adopts a bipolar representation on the default condition of the customer in the current month (default: 1, no default: -1). The dataset used the following 23 variables to represent potentially default-related user behavior and personal information: Table 1: Dataset description for the identification of credit card delinquency. Variable X1 X2 X3 X4 X5 X6-X11 X12-X17 X18-X23

Description Credit line ($) given to to the customer Gender Educational Background Marital status Age (year) Delays of most recent 6 monthly payments Amount of bill statements in the most recent 6 months Amount of payments made in the most recent 6 months

Data Type Continuous Discrete Discrete Discrete Discrete Discrete Continuous Continuous

Range (10000, 810000) 1, 2 1, 2, 3, 4 0, 1, 2, 3 (21, 75) (−1, 9) (1, 22722) (1, 7943)

The data set is randomly split into a training set and a testing set. The model parameters are tuned over the training data set. A 5-fold cross-validation evaluator is proposed on the whole data set to calculate the performance measures of the model. The classification accuracy and computation time on the testing set are computed to compare the two online learning models.

Testing Accuracy

0.77 0.78 0.79 0.80 0.81 0.82 0.83

4.2 Experimental Design In AdaBoost, the experimental design is to set the number of weak learners in the model. Multiple research has shown that boosting DTs learning algorithms yields good classification performances [17]. The DT algorithm is thus used as the weak learner in the AdaBoost algorithm. Figure 1 shows the variation of accuracies on the testing set with the AdaBoost algorithm using 5-fold cross validation for 5 replications, with a confidence level of 90%. It is observed that with a small number of weak learners, the testing accuracy did not appear to be stable even though there is chance for higher accuracy. With the number of base learners increased, the accuracy did appear to be stable, while the burden on computation also increases. To balance between stability and computation time, 10 base learners are used in the experiment, which is the starting point of the "stable state" in the base learner design. The design for ELM includes

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

Number of Base Learners

Figure 1: The variation of testing accuracy by the number of base learners in Adaptive Boosting (CI = 90%). the number of nodes and the selection of functions. Inappropriate node number settings for the single hidden layer may cause overfitting problems. Also, the selection of activation function is decisive in minimizing the computation time and maximizing the algorithm stability. In the same way of calculating CIs at each level of parameter settings, the ELM is designed with 20 neurons in the hidden layer. To compare the influence of activation functions in ELM model, Radial Basis Function (RBF), Hardlim Function (HF) and Sigmoid Function (Sig) are considered. Experimental results

Lu, Wang and Yoon show that RBF and HF outperform Sig in terms of computational effectiveness, and HF outperforms RBF in terms of computational efficiency. Thus, Hardlim Function is used in the experiment as the activation function for online ELM model. 4.3 Performance Evaluation The experimental results of both offline and online learning models are evaluated and discussed in this part. The performance of the models are compared from effectiveness and efficiency prospectives. From effectiveness perspective, testing accuracy, area under ROC curve (AUC), specificity, and sensitivity are calculated. From efficiency perspective, training time and testing time are measured in seconds. The training times for both online learning algorithms are measured as the time to train a chunk of data with 100 records. The offline methods include not only ELM and AdaBoost, but also the classical techniques of KNN, SVM, RF, and NB. The program is run on an Apple laptop with a 1.6 GHz Intel Core i5 processor. The model effectiveness measures, together with the training and testing time, are shown in Table 2. The algorithms that have shown a testing accuracy above 70% are compared in Figure ??. Table 2: Computation results of offline and online techniques. Model Type

Offline

Online

Method ELM AdaBoost KNN SVM RF NB ELM AdaBoost

Accuracy (STD) 76.86% (0.0036) 78.99% (0.0068) 81.66% (0.0093) 81.83% (0.0079) 81.96% (0.0012) 69.97% (0.0014) 76.88% (0.0036) 78.71% (0.0057)

AUC (STD) 56.31% (0.0064) 64.63% (0.0096) 64.33%(0.0101) 64.70% (0.0051) 64.36% (0.0026) 68.36% (0.0018) 55.62% (0.0138) 63.15% (0.0104)

(a) Comparison of testing accuracy

Specificity (STD) 90.97% (0.0046) 90.36% (0.0059) 95.40%(0.0033) 95.57% (0.0037) 95.92% (0.0019) 71.18% (0.0016) 89.27% (0.0027) 82.10% (0.0097)

Sensitivity (STD) 30.94% (0.0034) 38.89% (0.0058) 33.26% (0.0178) 33.83% (0.0085) 32.81% (0.0028) 65.54% (0.0023) 30.42% (0.0023) 40.20% (0.0258)

Training Time (STD) 14.0456 (0.0070) 24.0916 (0.4507) 202.4704 (15.0424) 195.6874 (13.7093) 7.6501 (0.1531) 1.0095 (0.2029) 0.1596 (0.0055) 0.1251 (0.0701)

Testing Time (STD) 1.0561 (0.0173) 2.0654 (0.1353) 4.7105 (0.9117) 5.0727 (0.3228) 0.0417 (0.0029) 5.1338 (0.1310) 1.9880 (0.2191) 2.2569 (0.1936)

(b) Comparison of training time

Figure 2: A comparison of different algorithms in perspectives of computational accuracy and training efficiency. The algorithms are evaluated with a criteria of achieving maximum preditive accuracy while being efficient in training. Although KNN, SVM and RF have shown a testing accuracy greater than 81%, the training times of KNN and SVM are obviously longer than other algorithms by more than 100 times. RF exhibits great performance as being effective and time-efficient. The online AdaBoost algorithm, although being 3.25% less accurate than RF, obviously outperforms RF in terms of computation efficiency with a 99.4% improvement on the training time of RF. In perspective of online learning, both online ELM and AdaBoost maintain the accuracy level of the offline algorithms, while significantly reduce the training time. The average training time of both online AdaBoost and ELM have shown a 99% improvement on their offline versions in terms of computation time. It is concluded that online AdaBoost has the best computational efficiency, and offline RF has the best predictive accuracy. Online AdaBoost balances relatively better than offline RF between effectiveness and computational speed.

Lu, Wang and Yoon

5. Conclusion and Future Work This research studies the credit card default identification problem by proposing a time-efficient real-time learning model. Online learning is deployed in the default detection area. Online ELM and Online AdaBoost methods are applied to the default of credit card clients dataset. With experimental designs on both algorithms, and by comparing the online techniques to other techniques with certain performance measures, experimental results are obtained and concluded. Results have revealed that the online AdaBoost and online ELM maintained the accuracy of their offline algorithms while significantly improving the computation speed. From the results in this paper, online AdaBoost is recommended for real-time default detection, which shows outstanding computation effectiveness and efficiency among other methods. It is further proved in this paper that online learning has great potential for real-time identification problems. Future extensions of this work is discussed in two directions. First, the stability of online learning under potential concept drift conditions. The distribution of incoming data may change over time and influence the effectiveness of the online learning model, which is a notation of "concept drift". In order to maintain stability, it is needed to design parameter-free online algorithms that ease on the assumptions of the data distribution, to allow real-time model update. The second direction is to sustain the robustness of online learning for datasets with noise or missing records. A combination of online learning to other techniques, such as feature reduction and data integration, promises to strengthen the model. A wider range of online learning techniques, such as Adaptive Bagging, could be applied and compared in perspectives of not only speed and accuracy, but also stability and robustness. By looking into the above directions, the potential of online learning will be further revealed in real-time systems applications.

References 1. Huang, G., Qin, Z., and Siew, C., 2006, "Extreme learning machine: theory and applications," Neurocomputing, 70(1), 489-501. 2. Yuan, L., Soh, Y., and Huang, G.B., 2009, "Ensemble of online sequential extreme learning machine," Neurocomputing, 72(13), 3391-3395. 3. Liang, N., Huang, G.B., and Saratchandran, P., 2006, "A fast and accurate online sequential learning algorithm for feedforward networks," IEEE Transactions on Neural networks, 17(6), 1411-1423. 4. Huang, G., Huang, G.B., Song, S., and You, K., 2005, "Trends in extreme learning machines: a review," Neural Networks, 61, 32-48. 5. Yeh, C., and Lien C., 2009, "The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients," Expert Systems with Applications 36(2), 2473-2480. 6. Oza, N., 2005, "Online bagging and boosting," IEEE international conference on systems, man and cybernetics, May 25, 2340-2345. 7. Rosenberg, E., and Alan, G., 1994, "Quantitative methods in credit management: a survey," Operations research, 42(4), 589-613. 8. Hand, D., and William, H., 1997, "Statistical classification methods in consumer credit scoring: a review," Journal of the Royal Statistical Society: Series A (Statistics in Society), 160(3), 523-541. 9. Giudici, P., 2001, "Bayesian data mining, with application to benchmarking and credit scoring," Applied Stochastic Models in Business and Industry, 17(1), 69-81. 10. Baesens, B., Setiono, R., Mues, C., and Vanthienen, J., 2003, "Using neural network rule extraction and decision tables for credit-risk evaluation," Management science, 49(3), 312-329. 11. Lee, T.S., Chiu, C.C., Lu, C.J., and Chen, I.F., 2002, "Credit scoring using the hybrid neural discriminant technique," Expert Systems with applications, 23(3), 245-254. 12. Bellotti, T., and Jonathan, C., 2009, "Support vector machines for credit scoring and discovery of significant features," Expert Systems with Applications, 36(2), 3302-3308. 13. Wang, G., Hao, J., Ma, J., and Jiang, H., 2011, "A comparative assessment of ensemble learning for credit scoring," Expert systems with applications, 38(1), 223-230. 14. Brown, L., and Christophe M., 2012, "An experimental comparison of classification algorithms for imbalanced credit scoring data sets," Expert Systems with Applications, 39(3), 3446-3453. 15. Ade, R.R., and Deshmukh, P.R., "Methods for incremental learning: a survey," International Journal of Data Mining and Knowledge Management Process, 3(4), 119. 16. Hoi, S.C., Wang, J., and Zhao, P., 2014, "Libol: A library for online learning algorithms," The Journal of Machine Learning Research, 15(1), 495-499. 17. Freund, Y., and Llew, M., 1999, "The alternating decision tree learning algorithm," International Conference of Machine Learning, 99-109.

Real Time Credit Card Default Classification Using

Real Time Credit Card Default Classification Using

Suggest Documents

an empiricial investigation of credit card default - CiteSeerX

Comparison of Several Data Mining Methods in Credit Card Default

Forecasting and Stress Testing Credit Card Default ...

Credit Default Sharing Instead of Credit Default Swaps: Toward a ...

Valuation of Credit Default Swaptions and Credit Default Index ...

Real-Time Hand Shape Classification

Credit Card Fraud Detection System Using Intelligent

Credit Card Fraud Detection Using Neural Network

Real-Time and Off-Line Transmission Line Fault Classification Using ...

Classification of Electrical Disturbances in Real Time Using Neural ...

Real-time object classification on FPGA using moment ... - CiteSeerX

Real-Time Classification of Complex Patterns Using Spike-Based ...

Colour Classification Using Entropy Algorithm in Real Time Colour ...

Real-time Classification of Malicious URLs on Twitter using Machine ...

A Real Time QRS Complex Classification Method using Mahalanobis ...

A Modular Approach to Real-Time Cork Classification Using ... - Core

A Real Time Facial Expression Classification System Using Local ...

A Near Real-time IP Traffic Classification Using

Time to default analysis in personal credit scoring

real time classification and clustering of ids alerts using machine ...

A Near Real-time IP Traffic Classification Using ...

Real-time Eye Gaze Direction Classification Using ... - arXiv

CARD: Compact And Real-time Descriptors - CiteSeerX

A Customized Classification Algorithm for Credit-Card Fraud Detection