Churn Prediction in Customer Relationship ... - IEEE Xplore

S uPsattati enranb lRi et cyo g n i t i o n , P a r t 1

Churn Prediction in Customer Relationship Management via GMDH-Based Multiple Classifiers Ensemble Jin Xiao, Sichuan University Xiaoyi Jiang, University of Münster Changzheng He and Geer Teng, Sichuan University

Customer churn prediction is an

I

n recent years, companies have turned to customer relationship management (CRM) to handle interactions with current and future custom-

important problem in ers. CRM helps managers zero in on those customers who most contribute to customer relationship their companies’ profits. This issue is of particular interest because it’s both management. Experimental results from a novel multiple classifiers ensemble selection model based on the group method of data handling (GMDH) are encouraging. march/april 2016

difficult and costly to steer customers away from competitors. To avoid customer churn, in which the loss of companies’ customers, churn prediction has become increasingly important. This task of predicting whether customers will change their business relationships and move to a competitor in the future is clearly crucial to a company’s bottom line, especially in the fields of telecommunication and finance.1 From a pattern recognition perspective, churn prediction can be defined as a supervised problem: given a predefined forecast horizon, predict future churners.2 Classification can help solve this problem, with logistic regressions, naive Bayes, decision trees, artificial neural networks, and

support vector machines being the most widely used methods for classifying customers as churn or nonchurn. However, individual classifiers usually can’t obtain accurate results. Recently, the multiple classifiers ensemble (MCE) has become a powerful tool for improving classification performance in numerous fields.3 MCE works by combining the results of baseline classifiers through a votelike procedure. One way to improve MCE performance would be to select and combine a subset of base classification results for a final decision, but determining how to find the optimal subset is still a challenge. In the context of churn prediction, existing MCE approaches have two limitations. On

1541-1672/16/$33.00 © 2016 IEEE Published by the IEEE Computer Society

37

Pattern recognition, Part 1

one hand, classes are imbalanced, especially in the field of marketing— for example, the current churn rate is about 2 percent per month for banks4 and 2.2 percent per month for mobile telecommunication providers,5 meaning the number of nonchurn customers is far larger than that of churners. On the other hand, data collected from companies usually contain some noise, such as incorrect inputs and data transfer errors, which can reduce MCE performance. In this article, we consider MCE with a built-in model-selection function for customer churn prediction. In particular, we present a multiple classifiers ensemble selection model based on the group method of data handling (MCES-GMDH). GMDH was first proposed by A.G. Ivakhnenko in 1970.6 It starts from a set of initial solutions and then pairwise combines them to form multiple candidate solutions. A selection mechanism determines a subset of these candidate solutions, which in turn build the input for the next combination-and-selection iteration (layer). The algorithm stops when it meets a certain criterion, resulting in an optimized model. GMDH is particularly appropriate here because it implicitly contains a selection mechanism: not all the base classification results are involved in the final decision process. In addition, the involved base classification results and the way in which they’re combined in a multilayer network are automatically determined through a modeling (training) procedure. Recently, GMDH has been applied in many areas, such as CRM, finance, and engineering.

Approach Description This article introduces GMDH to MCE, forming MCES-GMDH, a model that can select an appropriate ensemble solution from the classifiers pool adaptively, determine the combination weights among selected base 38

classifiers, and complete the ensemble selection process automatically. Let Train and Test be a training set with m1 samples and a test set of m2 samples, respectively, where Train is used for training a prediction model and Test for evaluating the final trained MCES-GMDH model’s performance. The class label has two different values: 0 (nonchurn customers) and 1 (churn customers). The MCES-GMDH model’s training contains two phases: base classifiers training and classifiers ensemble selection. The MCE model’s classification performance is directly inﬂuenced by both the base classifiers’ accuracy and the diversity of their results: the higher the classification accuracy and the diversity among results, the better the ensemble model’s performance will be. Both phases supplement each other and are indispensable.

to deal with the class imbalance issue in customer churn prediction, most existing approaches adopt re-sampling techniques. Training Step 1: Base classifiers Training

To deal with the class imbalance issue in customer churn prediction, most existing approaches adopt resampling techniques. The most commonly used methods include random oversampling and undersampling. Oversampling samples the churn customers with replacement until the minority and the majority classes are (approximately) equal, while the undersampling samples the nonchurn www.computer.org/intelligent

customers with replacement. A. Marqués and colleagues compared logistic regressions and support vector machines with and without re-sampling on five real customer classification datasets and demonstrated superior performance with oversampling compared to undersampling techniques.7 In this work, we adopt random oversampling to balance the class distribution. To train the base classifiers, the MCES-GMDH model first uses a bootstrap technique to select a training subset, randomly taking samples from the original training set Train to form a subset. Random oversampling helps balance the training subset’s class distribution and trains a base classifier in the subset via a classification algorithm (in this work, we use neural networks as the base classification algorithm). By repeating the process above N times, we obtain N base classifiers. Training Step 2: classifiers Ensemble Selection

The MCES-GMDH method fi rst classifies the training set Train by N base classifiers and obtains the classification results X = (X1, X 2 ,…, X N). Here, the vector Xi, i = 1, 2,..., N of m1 rows and one column are the estimated probabilities of the decision class (churn or nonchurn) in Train by the ith base classifier. Therefore, X is a matrix with m1 rows and N columns. Suppose the actual class label in Train is Y—if so, then W = (Y , X)m1 × (N +1) composes the dataset for applying the GMDH algorithm. This dataset W is equally divided into model learning set A and model selecting set B at random. Given a churn prediction problem, GMDH builds the general relationship between the output and input variables in the form of a mathematical description, which is also called a reference function. Generally, the iEEE iNTElliGENT SYSTEmS

Related Work in Multiple Classifiers Ensemble (MCE) Strategies

A

ccording to the abstract level of the ensemble function, MCE can be classified into three types: data-based ensemble, feature-based ensemble, and decision-based ensemble. The most widely used approach—decision-based ensemble—first each base classifier makes decisions from test samples and then combines these decisions by using an ensemble function. In general, we can divide the MCE process into two parts. First, base classifier results are generated. Clearly, good classification performance at this stage is essential to improve the final ensemble model’s performance. It’s also important to obtain sufficient result diversity by using different data and base classifiers. Second, these results are combined to form the final ensemble by either combining all base classification results (which can include redundant information and reduce performance) or by selecting a subset of the results (which has proven to be effective). Successful applications of MCE include C. Ding and colleagues’1 work, which proposed a multidirectional, multilevel, dual-cross patterns (MDML-DCPs) face representation scheme to comprehensively incorporate both holistic- and component-level DCP features. In addition, Y. Luo and colleagues2 proposed novel ensemble algorithms for multilabel image classification, including a manifold regularized multitask learning algorithm that exploits the common structure of multiple classification tasks.3 Customer churn prediction is a key step in maximizing customer value for an enterprise, but traditional classification methods tend to yield unsatisfactory performance in class-imbalanced churn prediction data. This is mainly because they always generalize from the training set on the simplest hypothesis that best fits the data.4 This hypothesis seldom concerns the minority samples, so the classification rules that predict the minority class tend to be fewer and weaker than those of the majority class. Consequently, test samples belonging to the minority class are misclassified more often than those belonging to the majority class. In general, two types of approaches deal with this class imbalance issue in customer churn prediction: data-level and algorithm-level solutions. Data-level solutions adopt resampling techniques, such as random undersampling and oversampling to balance the class distribution 5; they then construct classification models on the balanced training set. Algorithm-level solutions attempt to adapt the existing classification algorithms to strengthen learning with regard to the minority class, typically by introducing a cost-sensitive learning technique. 6 So far, MCE has been introduced to customer churn prediction in the literature through class imbalance. W. Buckinx and colleagues7 introduced random forests to customer churn prediction, comparing them to other methods to show the advantages of an ensemble learning algorithm. A. Lemmens and C. Croux8 applied bagging

march/april 2016

and boosting to customer churn prediction, finding that both methods increased prediction accuracy. K.W. De Bock and D. Van den Poel9 proposed GAMensPlus, an ensemble model based on generalized additive models (GAMs); K. Coussement and K.W. De Bock10 furthered their work by applying GAMens (the ensemble version of GAMs) to churn prediction. Other works combine cost-sensitive learning with ensemble learning. For instance, C. Chen and colleagues11 presented a weighted random forest (WRF), and Y.Y. Xie and colleagues12 proposed improved balanced random forests for customer churn prediction that added in WRFs. These propositions have made important contributions to the customer classification with imbalanced class distribution, but they’re based on a multiple classifier combination that uses all the base classifiers to form their final decisions. In our work, we explore the potential of an MCE selection model.

References

1. C. Ding et al., “Multi-Directional Multi-Level Dual-Cross Patterns for Robust Face Recognition,” to be published in IEEE Trans. Pattern Analysis and Machine Intelligence, DOI: 10.1109/ TPAMI.2015.2462338. 2. Y. Luo et al., “Multiview Matrix Completion for Multilabel Image Classification,” IEEE Trans. Image Processing, vol. 24, no. 8, 2015, pp. 2355–2368. 3. Y. Luo et al., “Manifold Regularized Multitask Learning for Semi-Supervised Multilabel Image Classification,” IEEE Trans. Image Processing, vol. 22, no. 2, 2013, pp. 523–536. 4. Y. Sun et al., “Cost-Sensitive Boosting for Classification of Imbalanced Data,” Pattern Recognition, vol. 40, no. 12, 2007, pp. 3358–3378. 5. B. He et al., “Prediction of Customer Attrition of Commercial Banks Based on SVM Model,” Procedia Computer Science, vol. 31, 2014, pp. 423–430. 6. K.M. Ting, “An Instance Weighting Method to Induce CostSensitive Trees,” IEEE Trans. Knowledge and Data Eng., vol. 14, no. 3, 2002, pp. 659–665. 7. W. Buckinx and D. Van den Poel, “Customer Base Analysis: Partial Defection of Behaviourally Loyal Clients in a NonContractual FMCG Retail Setting,” European J. Operational Research, vol. 164, no. 1, 2005, pp. 252–268. 8. A. Lemmens and C. Croux, “Bagging and Boosting Classification Trees to Predict Churn,” J. Marketing Research, vol. 43, no. 2, 2006, pp. 276–286. 9. K.W. De Bock and D. Van den Poel, “Reconciling Performance and Interpretability in Customer Churn Prediction Using Ensemble Learning Based on Generalized Additive Models,” Expert Systems with Applications, vol. 39, no. 8, 2012, pp. 6816–6826. 10. K. Coussement and K.W. De Bock, “Customer Churn Prediction in the Online Gambling Industry: The Beneficial Effect of Ensemble Learning,” J. Business Research, vol. 66, no. 9, 2013, pp. 1629–1636. 11. C. Chen, A. Liaw, and L. Breiman, Using Random Forest to Learn Imbalanced Data, tech. report 666, Dept. Statistics, Univ. California, Berkeley, 2004. 12. Y.Y. Xie et al., “Customer Churn Prediction Using Improved Balanced Random Forests,” Expert Systems with Applications, vol. 36, no. 3, 2009, pp. 5445–5449.

www.computer.org/intelligent

39


description can be considered as a discrete form of the Volterra functional series or Kolmogorov-Gabor (K-G) polynomial: Y = f (X1 , X2 , … , XN )

∑

ai Xi +

i =1

N N

+

d 2 (B) =

N N

N

= a0 +

∑∑

aij (Xi X j )

i =1 j =1

(1)

N

∑ ∑ ∑ aijk (Xi X j Xk ) + , i =1 j =1 k =1

where Y is the output (the actual class label in training set Train); X = (X1, X 2 ,…, X N) is the input vector, that is, the classification results of Train by the N base classifiers; a 0, a1, and so on are coefficients or weights; and Xi X j is the Hadamard product that takes two matrices Xi and Xj of the same dimension and produces another matrix where each element is the product of corresponding elements of the original two matrices. Especially, the form of the first-order (linear) K-G polynomial including n variables (neurons) can be expressed as follows: f (X1 , X1 , … , XN ) = a0 + a1X1 + a2 X2 + … + aN XN .

(2)

The MCES-GMDH model chooses the linear reference function of the form in Equation 2. In this case, the terms are regarded as N initial models of the GMDH algorithm: v1 = a1X1, …, vN = aNXN. GMDH has a multilayer structure that starts from the initial model set v1, v2, …, vN, and then carries on parameter estimation via inner criterion (least squares), obtains next-layer candidate models (inherit, mutation) based on model learning set A, evaluates the next-layer candidate models by external criterion based on model selecting set B, and finally chooses the best ones (and thus ignores the others) for the next layer.8 In MCE selection modeling, the most commonly used evaluation criterion is the classification error of the ensem40

ble. Therefore, in this study, we select the asymmetric regularity (AR) criterion as an external criterion for building the MCES-GMDH model, which is defined as follows:

∑ (YB − XB aˆ A )2,

(3)

where Y B is the actual output of model selecting set B, X B is the input vector of B, and aˆ A is a coefficient vector of the constructed model based on model learning set A. Therefore, the AR criterion indicates the forecasting error in B by the model constructed in A. The detailed modeling process of classifiers ensemble selection by GMDH is as follows.

after finding y*, we only need to reverse from the layer where the optimal complexity model is to the input layer to find the features contained in y*. First, the N initial models are fed pairwise at each unit, and the reference function is the fi rst order. Thus, 2 n1 = CN (= N (N − 1) / 2) c a n d i d a t e models with the form below are generated at the fi rst layer: wt = a0,t + a1,t vi + a2,t v j

i, j = 1, 2,…, N ;

i ≠ j,

(4)

where wt is the estimated output, and the coefficients a 0,t, a1,t, and a 2,t are estimated based on model learning set A by least squares. We calculate the external criterion values of n1 candidate models based on set B according to Equation 3, fi nd the smallest one, and then assign it to www.computer.org/intelligent

the fi rst-layer external criterion value S1. The N candidate models with the smallest external criterion values are selected as the inputs of the second layer, where the number of candidate models is still n1: zt = b0,t + b1,t wi + b2,t w j ;

i , j = 1, 2,… , N ;

i ≠ j,

(5)

Similarly, we can calculate the external criterion values of n1 candidate models based on set B according to Equation 3 and assign the smallest one to the second-layer external criterion value S2. Then, N candidate models with the smallest external criterion values are selected as the inputs of the third layer, where the process continues, obtaining the third-layer external criterion value S3. It terminates when it finds the optimal complexity ensemble model y* by the following termination rule motivated by the optimal complexity theory: along with the increase of model complexity, the value of the external criterion will decrease first and then increase, and the global minimum value corresponds to the optimal complexity model.8 Taking the case in Figure 1 as an example, we found that S1 > S 2 < S 3, that is, the optimal complexity model y* is in the second layer. It also turns out that candidate model z 6 has the smallest external criterion value among the six candidates in the second layer, thus y* = z6. After fi nding y*, we only need to reverse from the layer where the optimal complexity model is to the input layer to fi nd the features contained in y*. As shown in Figure 1, v 2 , v 3, and v4 are selected, and v 1 is eliminated. In other words, the classification results of the second (X 2), third (X 3), and fourth (X4) base classifiers are selected in the final ensemble. Finally, we obtain the optimal complexity ensemble model, y = a 0 + a1 X 2 + a 2 X 3 + a3X4, where the coefficients are still estimated iEEE iNTElliGENT SYSTEmS

Initial models

in the model learning set A by least squares.

Experimental Results and Evaluation We conducted experiments on two datasets. The first is a real customer churn prediction dataset for credit card business from a commercial bank in Chongqing, China (“Chinachurn” dataset). The data collection interval is from May 2012 to December 2012. Table 1 shows the 25 churn variables we selected. For the classification output, after discussion with experts at the bank, we defined churn customers as those who canceled their cards between May and December or didn’t spend anything in three continuous months. We obtained march/april 2016

Second layer

w1 = a 0,1 + a 1,1ν1 + a 2,1 ν2 ν1

Working with the Trained Model

In this phase, the MCES-GMDH model first classifies the test set Test by N base classifiers, respectively, and obtains the classification results x = (x1, x2 ,…, xN). Here, the vector xi , i = 1, 2,..., N, of m2 rows and one column are the estimated probabilities of the decision class (churn or nonchurn) in test set Test by the ith base classifier. Then, we put the classification results of the selected base classifiers into the optimal complexity ensemble model y* and obtain the final estimated probabilities of the decision class in Test. Taking the problem in Figure 1 as an example, there are four base classifiers, and the optimal complexity ensemble model is y = a 0 + a1 X 2 + a2 X3 + a3X4. Therefore, we only need to put x2 , x3, and x4 into the model. Finally, we need a threshold (such as 0.5) to obtain the forecasted class label of the samples in Test. For each sample, if its final estimated probability is smaller than the threshold, it will be assigned the class label 0 (that is, nonchurn customer); otherwise, the label is 1 (that is, churn customer).

First layer

Third layer

z1

y1

w2

z2

y2

w3

z3

y3

w4

z4

y4

z5

y5

ν2

ν3

ν4

w5

y*= z6

w6 Select

y6

External criterion

Selected models (stay) Unselected models (eliminate)

Figure 1. The process of multiple classifiers ensemble selection based on GMDH. Here, v 2, v 3, and v4 are selected, and v1 is eliminated. In other words, the classification results of the second (X2), third (X3), and fourth (X4) base classifiers are selected in the final ensemble.

3,250 samples, among which 2,800 were nonchurn customers and 450 were churn customers. The customer churn rate was 10.77 percent, resulting in highly class-imbalanced data. The second dataset is from the machine learning database at the University of California, Irvine (UCI; http:// archive.ics.uci.edu/ml). It deals with cellular service providers’ customers and the data pertinent to the voice calls they make. We had 3,333 samples, among which 2,850 were nonchurn customers and 483 were churn customers (meaning they changed service providers). The ratio was 5.9006, and the class distribution was highly imbalanced. The dataset includes 20 features, which, due to space limitation, we don’t present here. We compared the classification performance of our MCES-GMDH model with some re-sampling-based and costsensitive models. In some re-samplingwww.computer.org/intelligent

based methods, we first adopted random oversampling to balance the class distribution of the training set and then implemented the classifiers ensemble methods bagging and boosting,9 GAMensPlus,10 and GAMens.11 For cost-sensitive based methods, we selected weighted random forest (WRF)12 and improved balance random forest (IBRF)13 as the benchmarking methods. Because adaptive ensemble selection is performed in the MCES-GMDH model, and we just care about the performance difference between MCES-GMDH model and other ensemble models, we don’t conduct feature selection and compare each model’s classification performance with the original dataset to ensure comparison fairness. We divided each dataset into three equal parts at random, with onethird for testing and the remainder for training. In the WRF and IBRF models, we used the Classification 41

Pattern Recognition, Part 1 Table 1. Feature description of the “China-churn” dataset. Feature

Name

Feature

Name

X1

Total consumption times

X14

No. cash transactions in the last 1 month

X2

Total consumption amount

X15

No. cash transactions in the last 2 months

X3

Total times of cash withdrawal

X16


X4

Customer survival time

X17


X5

Total contributions

X18

Months of transaction times reducing continuously

X6

Valid survival time

X19

If overdue in the last 1 month

X7

Average amount ratio

X20

If overdue in the last 2 months

X8 X9

Whether associated charge was incurred Consumption times in the last 1 month

X21

Amount usage ratio in the last 1 month / historical average usage ratio

X10

Consumption times in the last 2 months

X22

Sex

X11


X23

Annual income

X12


X24

Nature of work industry

X13


X25

Education

Table 2. Customer churn prediction performance of seven models on two datasets. MCES-GMDH

Bagging

Boosting

GAMensPlus

GAMens

WRF

IBRF

The results in China-churn dataset Type I accuracy

0.9120

0.8641

0.8409

0.8904

0.8553

0.8532

0.8832

Type II accuracy

0.9403

0.9012

0.9340

0.9364

0.9438

0.9343

0.9381

AUC

0.9693

0.9362

0.9392

0.9587

0.9481

0.9408

0.9558

The results in UCI churn dataset Type I accuracy

0.8803

0.8462

0.8281

0.8404

0.8353

0.8492

0.8532

Type II accuracy

0.9103

0.8912

0.9240

0.9036

0.9138

0.9093

0.9014

AUC

0.8862

0.8562

0.8505

0.8612

0.8658

0.8655

0.8668

and Regression Tree (CART) algorithm to generate the base decision tree classifier. As for the GAMensPlus and GAMens models, we selected generalized additive models (GAMs) as the basic classification algorithm, and for the MCES-GMDH and bagging and boosting models, we selected a back-propagation (BP) neural network to generate base classifiers. The size of the base classifier pool was fixed to be 50; the final classification result was the average value of 10 experiments in each case. 42

To evaluate the performance of the strategies referred to in this study, we adopted the following three commonly used evaluation criteria:10 • type I accuracy (true positive rate) = TP TP + FN , where TP is the number of samples that are actual positive class and predicted positive class, and FN is the number of samples that are actual positive class and predicted negative class; • type II accuracy (true negative rate) = TN FP + TN , where TN is the www.computer.org/intelligent

number of samples that are actual negative class and predicted negative class, and FP is the number of samples that are actual negative class and predicted positive class; and • the area under the receiver operating characteristic curve (AUC). The receiver operating characteristic (ROC) curve is an important evaluation criterion of classification models in data with an imbalanced class distribution. In the case of two classes, IEEE INTELLIGENT SYSTEMS

The Authors the ROC graph is a true positive rate—false positive rate graph, where the y-axis is the true positive rate (type I accuracy) and the x-axis is the false positive rate (FP/(FP + TN)). For the MCES-GMDH model, we can plot the ROC curve by changing its threshold value. Because it’s difficult to compare ROC curves of different models directly, the AUC is more convenient and popular. The top half of Table 2 shows the classification performance comparison of seven different models on the China-churn dataset. We can see from it that the MCES-GMDH proposed here yields the best performance in terms of type I accuracy and AUC, and it’s comparable with GAMens in terms of type II accuracy. In fact, when studying customer churn prediction, type I accuracy (the classification accuracy of churn customers) is generally of more importance than type II accuracy (the classification accuracy of nonchurn customers). Overall, AUC is a more comprehensive index than type I and type II accuracy. In particular, it can evaluate the total prediction performance better in the case of classimbalanced datasets. Also in terms of AUC, we can conclude that the MCES-GMDH model achieves better customer churn prediction performance than the compared methods. The classification performance comparison of six different models on the UCI churn dataset is presented in the lower half of Table 2. The MCES-GMDH model turns out to be the best in terms of both type I accuracy and AUC evaluation criteria, and is just slightly worse than the boosting and GAMens models in type II accuracy criterion. It demonstrates that the MCES-GMDH model has the best customer churn prediction performance on this dataset compared with six other classifier march/april 2016

Jin Xiao is an associate professor in the Business School at Sichuan University and a

postdoctoral research fellow at the Chinese Academy of Sciences. His research interests include business intelligence, data mining, customer relationship management, and economic forecasting. Xiao received a PhD in management science and engineering from the Business School at Sichuan University. Contact him at [email protected].

Xiaoyi Jiang is a full professor of computer science at the University of Münster. His research interests include biomedical imaging, 3D image analysis, pattern recognition, and multimedia. Jiang received a PhD and Venia Docendi (Habilitation) degree in computer science from the University of Bern, Switzerland. Contact him at xjiang@ uni-muenster.de. Changzheng He is a professor in the Business School at Sichuan University. His re-

search interests include data mining, forecasting, and knowledge discovery in customer relationship management. He received an M.Sc. in mathematics from Southwest China Normal University, Chongqing, China. Contact him at [email protected].

Geer Teng, corresponding author, is a lecturer in the Faculty of Social Development and Western China Development Studies at Sichuan University. His research interests include pattern recognition, business intelligence, and customer relationship management. Teng received a PhD in management science and engineering from the Business School at Sichuan University, China. Contact him at [email protected].

ensemble models. Furthermore, Table 2 shows that the smallest difference is between the two types of accuracy (type II–type I) on both datasets. In general, the smaller this difference is, the more consistent the classification performance in two class samples. Thus, the MCES-GMDH model proposed here can better deal with the classification issue with imbalanced distribution.

T

he market changes continuously, and classification techniques are widely utilized in CRM for customer segmentation, customer loyalty analysis, customer churn prediction, customer credit scoring, and so on. Therefore, the MCES-GMDH model can be potentially used for additional areas of CRM and thus increase the level of enterprises’ CRM.

Acknowledgments

This work is partly supported by the National Natural Science Foundation of China (NSFC) under grant numbers 71471124 and 71501136, Sichuan Province Youth Foundation under grant number 2015RZ0056, Sichuan Province So-


cial Science Planning Project under grant number SC14C019, Excellent Youth Fund of Sichuan University under grant number 2013SCU04A08, Research Start-up Project of Sichuan University under grant number 2014SCU11053, and Soft Science Foundation of Sichuan Province under grant number 2015ZR0145.

References 1. A. Ghorbani, F. Taghiyareh, and C. Lucas, “The Application of the Locally Linear Model Tree on Customer Churn Prediction,” Proc. Int’l Conf. Soft Computing and Pattern Recognition, 2009, pp. 472–477. 2. Y. Richter, E. Yom-Tov, and N. Slonim, “Predicting Customer Churn in Mobile Networks through Analysis of Social Groups,” Proc. SIAM Int’l Conf. Data Mining, 2010, pp. 732–741. 3. L. Rokach, “Ensemble-Based Classifiers,” Artificial Intelligence Rev., vol. 33, nos. 1 and 2, 2010, pp. 1–39. 4. J. Zhao and X.H. Dang, “Bank Customer Churn Prediction Based on Support Vector Machine: Taking a Commercial Bank’s VIP Customer Churn as the Example,” Proc. 4th Int’l Conf. Wireless Communications, Networking and Mobile Computing, 2008, pp. 1–4. 43


5. A. Berson, S. Smith, and K. Thearling, Customer Retention Building Data Mining Applications for CRM, McGraw-Hill, 2000. 6. A.G. Ivakhnenko, “Heuristic Self-Organization in Problems of Engineering Cybernetics,” Automatica, vol. 6, no. 2, 1970, pp. 207–219. 7. A. Marqués, V. García, and J. Sánchez, “On the Suitability of Resampling Techniques for the Class Imbalance Problem in Credit Scoring,” J. Operational Research Soc., vol. 64, no. 7, 2013, pp. 1060 –1070. 8. J.A. Muller and F. Lemke, SelfOrganising Data Mining: An Intelligent Approach to Extract Knowledge from Data, Libri, 2000.

9. A. Lemmens and C. Croux, “Bagging and Boosting Classifi cation Trees to Predict Churn,” J. Marketing Research, vol. 43, no. 2, 2006, pp. 276–286. 10. K.W. De Bock and D. Van den Poel, “Reconciling Performance and Interpretability in Customer Churn Prediction Using Ensemble Learning Based on Generalized Additive Models,” Expert Systems with Applications, vol. 39, no. 8, 2012, pp. 6816–6826. 11. K. Coussement and K.W. De Bock, “Customer Churn Prediction in the Online Gambling Industry: The Beneficial Effect of Ensemble Learning,” J. Business Research, vol. 66, no. 9, 2013, pp. 1629–1636.

Keeping YOU at the Center of Technology myComputer, myCS, Computing Now

12. C. Chen, A. Liaw, and L. Breiman, Using Random Forest to Learn Imbalanced Data, tech. report 666, Dept. Statistics, Univ. California, Berkeley, 2004. 13. Y.Y. Xie et al., “Customer Churn Prediction Using Improved Balanced Random Forests,” Expert Systems with Applications, vol. 36, no. 3, 2009, pp. 5445–5449.

Selected CS articles and columns are also available for free at http://ComputingNow.computer.org.

What’s Trending? The information you need and only the information you need. Industry intelligence delivered on your terms, when and how you want it. • myCS—delivers your publications your way • myComputer—customizable mobile app delivering targeted information specific to your specialty • Computing Now—this award winning website features industry news and developments. Learn something new. Check out these resources today!

Stay relevant with the IEEE Computer Society

More at www.computer.org

44


iEEE iNTElliGENT SYSTEmS