A Genetic Programming Based Framework for Churn Prediction in ...

1 downloads 1945 Views 171KB Size Report
service provider, CRM personnel need to focus on those expected to churn soon. In literature ..... In: International Series in Quantitative Marketing, vol. 18, pp. 607–633. ... International Journal of Control and Automation 6,. 303–314 (2013). 9.
A Genetic Programming Based Framework for Churn Prediction in Telecommunication Industry Hossam Faris , Bashar Al-Shboul, and Nazeeh Ghatasheh The University of Jordan, Amman, Jordan {hossam.faris,b.shboul,n.ghatasheh}@ju.edu.jo

Abstract. Customer defection is critically important since it leads to serious business loss. Therefore, investigating methods to identify defecting customers (i.e. churners) has become a priority for telecommunication operators. In this paper, a churn prediction framework is proposed aiming at enhancing the ability to forecast customer churn. The framework combine two heuristic approaches: Self Organizing Maps (SOM) and Genetic Programming (GP). At first, SOM is used to cluster the customers in the dataset, and then remove outliers representing abnormal customer behaviors. After that, GP is used to build an enhanced classification tree. The dataset used for this study contains anonymized real customer information provided by a major local telecom operator in Jordan. Our work shows that using the proposed method surpasses various state-of-the-art classification methods for this particular dataset. Keywords: Churn prediction, Genetic Programming, Self Organizing Maps, Telecommunication.

1

Introduction

Nowadays, presence of multiple service providers in the mobile telecommunication industry creates an intensive competition environment. Therefore, it is possible for any customer to have different subscriptions with the same service provider or switch completely to another service provider for cost or quality reasons. The case when a costumer cancels his subscription is called ”customer churn”. Since the customer is considered as the most important asset for the company, many researchers and practitioners tackled the customer churn problem in the telecommunication sector [1–5]. Service providers are concerned about predicting when a churn might happen. The Customer Relationship Management (CRM) activities intend to maximize the lifetime of a customer, they include acquisition, followup, and retention. For that it is important to target more those customers expected to churn. An 

King Abdullah II School for Information Technology, The University of Jordan, Amman, Jordan.

D. Hwang et al. (Eds.): ICCCI 2014, LNAI 8733, pp. 353–362, 2014. c Springer International Publishing Switzerland 2014 

354

H. Faris, B. Al-Shboul, and N. Ghatasheh

illustrative example in [2] shows how a CRM contact might extend the lifetime of a customer intending to churn. Due to the huge number of subscriptions for a service provider, CRM personnel need to focus on those expected to churn soon. In literature, researchers and practitioners have developed wide range of data mining and heuristic based models for industrial and business applications [6–8]. For churn prediction in particular, applied data mining approaches include traditional classification methods like Decision trees algorithms, Naive Bayes and Logistic Regression [2,9,10]. It also includes artificial intelligence based approaches like Artificial Neural Networks, Genetic Programming and Support Vector Machines [11–13]. Among the numerous number of data mining approaches applied in the literature for predicting customer churn, we noticed that Genetic Programming (GP), which is an evolutionary heuristic technique, is much less investigated. We believe that GP has some powerful features which can contribute to the problem domain. In this paper we propose a framework for predicting customer churn in telecommunication companies. The framework is based on combining Self Organizing Maps (SOM) and GP in a hybrid churn prediction technique. This work is different from previous works in that no one to the best of our knowledge have proposed the idea of hybridizing two different methods similarly. In a nutshell, SOM is used to perform data reduction and eliminate outliers, and then GP is applied to develop the final prediction model to classify a set of testing data. The main goal of the proposed hybrid technique is to enhance the ability of identifying which customer are expected to churn. The hybrid technique is tested using real data obtained from a major Jordanian telecommunication operator. The approach is then evaluated using different criteria and compared to other classical classification techniques. This paper is structured as follows. In section 2 we present the overall framework proposed in this work. An overview of the data set used in this work is given in section 3. The valuation criteria used in order to evaluate the proposed hybrid approach are listed in section 4. Experiments and results are discussed in section 5. Finally, the conclusion of this work is provided in section 6.

2

Proposed Churn Management Framework

The model development method proposed in this research is based on using SOM clustering with GP in two stages. In the first stage, data reduction is performed by applying SOM clustering algorithm on the selected training data set. This process will split the training data set into a number of smaller sets (clusters). Clusters which contain only churners or non-churners are selected to form a new training data set. While other clusters that could not separate churners and non churners are excluded. In the second stage, GP is applied on the joined clustered resulted from stage one. Consequently, GP starts its evolutionary cycle and develops the final classification model. Finally, the developed GP model is assisted using left aside data set. We will denote for this approach as SOM+GP. The proposed framework is illustrated in Figure 1 while SOM and GP are described in the following two subsections.

A Genetic Programming Based Framework

355

Fig. 1. Customer churn prediction framework

2.1

SOM Clustering

SOM represent a special type of unsupervised Neural Networks [14–16]. SOM is based on a basic idea of mapping input layer patterns to n-dimensional set of output layer nodes, while preserving the inputs topological organization in the output layer [15, 16]. Therefore one among many applications of the SOM is clustering. Vector Quantization [17] is a data compression technique used in SOM to simplify output space. This makes SOM powerful in representing huge and complex input data in a relatively simple output space [18]. A simple way to represent SOM is the analogy of an input layer nodes mapped to an output matrix, map of nodes, where each node in the output map is linked to every input node [15,18]. Figure 2 shows how ”Customer” nodes are connected to all the nodes in the output layer. If W and H denote the width and the height of the output map respectively for N inputs, the number of the links in the SOM is the product of W , H and N . There are no interconnections between the map nodes, therefore each node in the map is identified with (i, j) width and height coordinates respectively, while having the center node as a reference anchor point [16, 18]. SOM algorithm seeks reducing the neighborhood distance in the output clusters. Normally a hexagonal form represents the neighborhood area. It starts with a large radius and shrinks it over the runs. Having the nodes falling within the radius as neighbors with various weights. Where as closer the nodes to the reference node as more the weight [16,18]. Over time the output layer, output space, becomes smoother with representative clusters of nodes.

356

H. Faris, B. Al-Shboul, and N. Ghatasheh

Fig. 2. SOM structure

2.2

Genetic Programming

Genetic Programming (GP) is an independent domain evolutionary algorithm which automatically creates computer programs. It was first inspired by the biological evolution theories [19, 20]. GP has some advantages when used to model complex problems such as flexibility and interpretability, [21, 22]. GP algorithm performs as an evolutionary cycle which can be summarized in the following five basic steps in order: 1. Initialization: GP algorithm starts by creating a predetermined number of individuals which form a population. This number is set by the user. Each individual represents a computer program which is in our case a classification model for customer churn prediction. The output of the model will be 1 for churn or 0 for active customer. GP individuals can be viewed as symbolic tree structures which are graphical representations of their equivalent Sexpressions in LISP programming language [23]. Figure 3 gives an example of a simple GP symbolic tree where X1 and X2 are some input variables multiplied by random coefficients. The following next three processes are performed while the termination conditions in step 5 are not met. 2. Fitness Evaluation: In this process, each individual is evaluated using a specific measurement. In this work, we use mean squared error (MSE) for evaluating all individuals. MSE can be represented by the following equation: n

M SE =

1 (yi − yˆi )2 n i=1

(1)

where y is real actual value, yˆ it the estimated target value. n is the total number of measurements [23]. Therefore, the fittest individual is the one which has the minimal MSE value. The goal of GP is to minimize the MSE of the evolved individuals.

A Genetic Programming Based Framework

357

Fig. 3. Simple genetic program represented as a structure tree

3. Selection: In this process a number of individuals are selected to reproduce new individuals and form new generation. There are different techniques to perform this selection [24]. In this work, Tournament selection is applied which is one of the common and effective selection techniques. 4. Reproduction: This evolutionary process creates new individuals using reproduction operators and which replace others. In general, this process makes small random changes to the construction of the individuals. Reproduction operators include the following: – Crossover: Refers to producing two new individuals (children) by selecting a random subtree in each of the two parents and swapping the resultant subtrees. The new individuals form a new generation or offspring. – Mutation: This operator is applied on a single GP individual. A random node is selected in the tree of the individual and then the subtree under this node is replaced by a new randomley generated subtree. Usually, the mutation rate is set to be much smaller than the crossover rate. – Elitism: This operator selects one or more individuals with high fitness values and copies them to the next generation without any modification. 5. Termination: GP evolutionary cycle stops iterating when it finds an individual with the required fitness value or when the maximum number of iterations set by the user is reached. By this process the individual programs evolve and have better fitness values by time. In this work, GP will develop a classification model in order to fit the given data set and minimize the error with respect to the actual value .

3

Dataset Description

Data used in this research was provided by a major cellular telecommunication company in Jordan. The data set contains 11 attributes of randomly selected 5000 customers subscribed to a prepaid service for a time interval of three

358

H. Faris, B. Al-Shboul, and N. Ghatasheh

months. The attributes cover outgoing/incoming calls statistics. The data were provided with an indicator for each customer whether the customer churned (left the company) or still active. The total number of churners is 381 (7.6% of total customers).

4

Model Evaluation Criteria

In order to evaluate the developed churn prediction model, we refer to the confusion matrix shown in Table 1 which is the primary source for accuracy estimation in classification problems. Based on this confusion matrix, the following four different criteria are used for evaluation: Table 1. Confusion matrix Actual non-churners churners Predicted non-churners A B Predicted churners C D

1. Accuracy: Identifies the percentage of the total number of predictions that were correctly classified. A+D (2) A+B+C +D 2. Actual churners rate (Coverage rate): The percentage of predicted churn in actual churn. It can be given by the following equation: Accuracy =

D (3) B+D 3. Hit rate (HR): Shows the percentage of predicted churn in actual churn and actual non-churn: Actual churners rate =

D (4) C +D 4. Lift coefficient (LC): shows the precision of model. It can be given by the following equation: Hit rate =

D (5) (C + D).CP where parameter CP represents the real churn percentage in the data set. The higher the lift is, the more accurate the model is [13]. Since the churn dataset is highly imbalanced, using only accuracy rate is insufficient. Therefore, we refer to more appropriate evaluation criteria; they are the churn rate, hit rate and lift [25]. These criteria give more attention to the rare class which is in our case is the churn class. Our goal is to obtain a prediction model with high churn and hit rates. Lif t coef f icient =

A Genetic Programming Based Framework

5

359

Experiments and Results

Before applying SOM to cluster the training subsets, the number of clusters has to be determined. To find the optimal number, different empirical SOM sizes were applied on all the datset (i.e; 2×2, 3×3, 4×4 and 5×5). Then we found that SOM with the size of 3×3 is the best in identifying two clusters with the highest rates of churners and non-churners. This approach was adopted in [26]. Thus, 3×3 SOM is used in the first stage of our experiments to reduce the training sets as described earlier in section 2. Table 2. Largest two clusters identified churners and non-churners using different SOM sizes SOM size Churners non-Churners 2×2 10 (1.2%) 1232 9 (89.73%) 3×3 793 (95.08%) 6774 (49.03%) 4×4 821 (98.44%) 5099 (37.11%) 5×5 784 (94.01%) 4626 (33.67%)

In order to give a better indication of how well the developed classification model will perform when it is asked to classify new data, a cross validation with five folds is applied. The data set described in the previous section is split into 5 random subsets of equal sizes. The first four subsets are used for training and the last one is used for testing. Therefore, all criteria described earlier are computed for the testing subset. Then the same process is repeated for different four subsets and another subset is used for testing. Consequently, this process is repeated five times. Finally, the average values for the five different tests is calculated. Training and testing subsets were loaded into Heuristiclab framework1 then a symbolic regression via GP was applied with parameters set as shown in Table 3. The best generated GP model tree was evaluated using all the criteria mentioned in section 4 and compared with those for basic GP without SOM, k-Nearest Neighbour (IBK), Naive Bayes (NB) and Random Forest (RF). For IBK, linear search was used to find the best number of neighbours which is found to be 1. For RF, number of trees was set empirically to 10. Comparison results are shown in Figure 4. Figure 4-a shows that SOM+GP approach is better than the basic GP and NB algorithms. Using SOM to eliminate the outliers enhanced the accuracy of GP by approximately 8%. Although it can be noticed that IBK and RF outperformed SOM+GP in accuracy, they failed in predicting churners with very poor results; 1% and 0.5% respectively as shown in Figure 4-c. On the other side, SOM+GP 1

HeuristicLab is a framework for heuristic and evolutionary algorithms that is developed by members of the Heuristic and Evolutionary Algorithms Laboratory (HEAL), http://dev.heuristiclab.com

360

H. Faris, B. Al-Shboul, and N. Ghatasheh Table 3. GP parameters Parameter Value Mutation probability 15% Population size 1000 Maximum generations 100 Selection mechanism Tournament selector Maximum Tree Depth 10 Maximum Tree Length 50 Elites 1 Operators {+,-,*,/,AND,OR,NOT, IF THEN ELSE}

1

0.25

0.9

0.8

0.2

0.7

0.15 Hit rate

Accuracy

0.6

0.5

0.4

0.1

0.3

0.2

0.05

0.1

0

IBK

NB

RF

Basig GP

0

SOM+GP

IBK

(a) Accuracy

NB

RF

Basig GP

SOM+GP

Basig GP

SOM+GP

(b) Hit rate

1

4

0.9

3.5

0.8 3

2.5

0.6

Lift Coefficient

Actual churners rate

0.7

0.5

0.4

2

1.5

0.3 1 0.2 0.5

0.1

0

IBK

NB

RF

Basig GP

SOM+GP

0

IBK

(c) Churn rate

NB

RF

(d) Lift Fig. 4. Evaluation results

achieved the best performance in means of hit rate and lift coefficient as shown in Figures 4-b and 4-d respectively. This shows that taking only the accuracy as a performance measurement does not show the whole picture. According to these results, we can conclude that the SOM + GP model significantly performs

A Genetic Programming Based Framework

361

better than the basic GP model and the other classical approaches selected from the literature.

6

Conclusions

In this paper, we investigated the application of a churn prediction approach based on Self Organizing Maps (SOM) and Genetic programming (GP) for predicting possible churners in a Jordanian cellular telecommunication network. SOM was used to eliminate outliers which correspond to unrepresentative customers’ data. While GP was used to develop a final churn prediction model. Performance of the proposed approach was evaluated using different criteria and compared with other common classification approaches. It was found that the SOM with GP approach has a promising capability in predicting possible churns and it outperformed the common classifiers.

References 1. Blattberg, R.C., Do, K.B., Scott, N.A.: Database Marketing: Analyzing and Managing Customers. In: International Series in Quantitative Marketing, vol. 18, pp. 607–633. Springer, New York (2008) 2. Owczarczuk, M.: Churn models for prepaid customers in the cellular telecommunication industry using large data marts. Expert Systems with Applications 37, 4710–4712 (2010) 3. Kim, N., Lee, J., Jung, K.H., Kim, Y.S.: A new ensemble model for efficient churn prediction in mobile telecommunication. In: 2013 46th Hawaii International Conference on System Sciences, pp. 1023–1029 (2012) 4. Migu´eis, V.L., Van den Poel, D., Camanho, A.S., Cunha, J.F.: Modeling partial customer churn: On the value of first product-category purchase sequences. Expert Syst. Appl. 39, 11250–11256 (2012) 5. Migu´eis, V.L., Van den Poel, D., Camanho, A.S., Cunha, J.F.: Predicting partial customer churn using markov for discrimination for modeling first purchase sequences. Advances in Data Analysis and Classification 6, 337–353 (2012) 6. Saha, S.K., Ghoshal, S.P., Kar, R., Mandal, D.: Cat swarm optimization algorithm for optimal linear phase fir filter design. ISA Transactions 52, 781–794 (2013) 7. Yazdani, D., Nasiri, B., Azizi, R., Sepas-Moghaddam, A., Meybodi, M.R.: Optimization in dynamic environments utilizing a novel method based on particle swarm optimization. Int’l Journal of Artificial Intelligence 11, 170–192 (2013) 8. Sheta, A., Faris, H., Alkasassbeh, M.: A genetic programming model for s&p 500 stock market prediction. International Journal of Control and Automation 6, 303–314 (2013) 9. Huang, B., Kechadi, M.T., Buckley, B.: Customer churn prediction in telecommunications. Expert Syst. Appl. 39, 1414–1425 (2012) 10. Li, G., Deng, X.: Customer churn prediction of china telecom based on cluster analysis and decision tree algorithm. In: Lei, J., Wang, F.L., Deng, H., Miao, D. (eds.) AICI 2012. CCIS, vol. 315, pp. 319–327. Springer, Heidelberg (2012) 11. Adwan, O., Faris, H., Jaradat, K., Harfoushi, O., Ghatasheh, N.: Predicting customer churn in telecom industry using multilayer preceptron neural networks: Modeling and analysis. Life Science Journal 11 (2014)

362

H. Faris, B. Al-Shboul, and N. Ghatasheh

12. Obiedat, R., Alkasassbeh, M., Faris, H., Harfoushi, O.: Customer churn prediction using a hybrid genetic programming approach. Scientific Research and Essays 8, 1289–1295 (2013) 13. Yu, X., Guo, S., Guo, J., Huang, X.: An extended support vector machine forecasting framework for customer churn in e-commerce. Expert Systems with Applications 38, 1425–1430 (2011) 14. Kohonen, T.: Clustering, taxonomy, and topological maps of patterns. In: Proceedings of the Sixth International Conference on Pattern Recognition, Silver Spring, MD, pp. 114–128. IEEE Computer Society Press (1982) 15. Kohonen, T.: The self-organizing map. Proceedings of the IEEE 78, 1464–1480 (1990) 16. Ba¸c˜ ao, F., Lobo, V., Painho, M.: Self-organizing maps as substitutes for K-means clustering. In: Sunderam, V.S., van Albada, G.D., Sloot, P.M.A., Dongarra, J. (eds.) ICCS 2005. LNCS, vol. 3516, pp. 476–483. Springer, Heidelberg (2005) 17. Gray, R.: Vector quantization. IEEE ASSP Magazine 1, 4–29 (1984) 18. Kohonen, T.: Data management by self-organizing maps. In: Zurada, J.M., Yen, G.G., Wang, J. (eds.) Computational Intelligence: Research Frontiers. LNCS, vol. 5050, pp. 309–332. Springer, Heidelberg (2008) 19. Koza, J.: Evolving a computer program to generate random numbers using the genetic programming paradigm. In: Proceedings of the Fourth International Conference on Genetic Algorithms. Morgan Kaufmann, La Jolla (1991) 20. Koza, J.R.: Genetic Programming: On the programming of computers by means of natural selection, vol. 1. MIT press (1992) 21. Espejo, P.G., Ventura, S., Herrera, F.: A survey on the application of genetic programming to classification. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews 40, 121–144 (2010) 22. Kotanchek, M., Smits, G., Kordon, A.: Industrial strength genetic programming. In: Riolo, R.L., Worzel, B. (eds.) Genetic Programming Theory and Practice, pp. 239–256. Kluwer (2003) 23. Affenzeller, M., Winkler, S., Wagner, S., Beham, A.: Genetic Algorithms and Genetic programming- Modern Concepts and Practical Applications. CRC Press (2009) 24. Miller, W., Sutton, R., Werbos, P.: Neural Networks for Control. MIT Press, Cambridge (1995) 25. Burez, J., Van den Poel, D.: Handling class imbalance in customer churn prediction. Expert Syst. Appl. 36, 4626–4636 (2009) 26. Tsai, C.F., Lu, Y.H.: Customer churn prediction by hybrid neural networks. Expert Systems with Applications 36, 12547–12553 (2009)

Suggest Documents