Computational Intelligence in Data Mining and Prospects in ...

16 downloads 98975 Views 165KB Size Report
The aim of this paper is to explore different data mining tools and applications and how they can be used to detect telecommunication fraud, fault and improve ...
Journal of Emerging Trends in Engineering and Applied Sciences (JETEAS) 2 (4): 601-605 © Scholarlink Research Institute Journals, 2011 (ISSN: 2141-7016) jeteas.scholarlinkresearch.org Journal of Emerging Trends in Engineering and Applied Sciences (JETEAS) 2 (4): 601-605 (ISSN: 2141-7016)

Computational Intelligence in Data Mining and Prospects in Telecommunication Industry Isinkaye O. Folasade Department of Computer Science and Information Technology University of Science and Technology Ifaki-Ekiti, Ekiti State, ___________________________________________________________________________ Abstract The development of mobile phone networks, video and internet technologies have created enormous pressure on the telecommunication industry. They generate and store very huge amount of data which need intelligent tools to analyze. Data mining techniques are powerful mechanisms that have different features and abilities suitable for analyzing great amount of data due to the fact that it allows the selection, exploring and modeling of large volume of dataset to uncover previously unknown data patterns for business advantage. Computational intelligence in data mining provides complementary and searching methods to solve complex and real-world problems. The aim of this paper is to explore different data mining tools and applications and how they can be used to detect telecommunication fraud, fault and improve market effectiveness. It also describes how data mining can be used to uncover useful information embedded within large datasets. __________________________________________________________________________________________ Keywords: computational intelligence techniques, data mining, telecommunication industry, fraud detection, fuzzy-logic __________________________________________________________________________________________ a deep understanding of the knowledge hidden in ITRODUCTIO Data mining refers to the process of extracting or telecommunication data is vital to the industry’s mining knowledge from large amount of data (Jiawei competitive position and organizational decision and Micheline, 2000) or it is the application of specific making. This paper discusses how data mining can be algorithms for extracting patterns from data (Fayyah et used to discover and extract useful patterns from large al., 1996). It has the ability to learn from past success databases to find observable patterns. and failures and to predict what will happen in the future. Data mining is therefore useful in any field There is a growing increase in the quantity of data in where there are large quantities of data from which to the world today, especially in the telecommunication extract meaningful patterns and rules (Berry and industries. These data include call detail data, network Linof, 2000). Data mining has been used by data and customer data (Chang, 2009). The need to statisticians, data analysts, management information handle such large volumes of data has paved way to systems and in the telecommunication industry. the development of computational techniques for extracting knowledge from large amount of data. In Data mining in telecommunication is an important telecommunication industries, call detail data is useful application because telecommunication routinely for marketing and fraud detection applications. All generates and stores a tremendous amount of high telecommunication industries maintain data about the quality data. The quantity of data is always so large phone call that traverse their network in the form of that manual analysis of the data is practically call detail records. These call detail records are kept impossible. The advent of data mining technology on-line for many months; hence billions of call detail promised solution to these problems and that is the records were usually available for data mining (Cortes main reason the telecommunication industry was an and Pregibon, 2001). earlier adopter of data mining technology. According to Thearling (1999), data mining is the Telecommunication is a service-oriented business; extraction of hidden predictive information from large hence data mining can be viewed as an extension of databases and it a suitable technology with great the use of expert systems (Liebwitz, 1998), which was potential to help companies focus on the most majorly designed to address the complexity associated important information in large databases. The SAS with maintaining a huge network infrastructure and the institute (2000) defines data mining as the process of need to maximize network reliability while selecting, exploring and modeling large amount of minimizing labour cost. Also, within huge amount of data to uncover previously unknown data patterns for data usually lies hidden knowledge of strategic business advantage. Hence, data mining has to do with importance which the natural ability cannot analyze applying data analysis and discovery algorithms to unless with powerful tools (Hills, et al., 2006). Hence, enumerate patterns over data for prediction and 601

Journal of Emerging Trends in Engineering and Applied Sciences (JETEAS) 2 (4): 601-605 (ISSN: 2141-7016) description (Ranjan, 2007, Costea, 2006). Some data mining techniques allow telecommunication industries to mine historical data for the purpose of predicting when a customer is likely to churn. These techniques use billing data, call detail data, subscription information and customer information. The companies can take action if desired based on the induced model. One such model is the use of neural network to estimate the probability of cancellation at a given time in the future (Mani et al., 1999). Also, the problem of direct-marketing has been recognized by Wang et al. (2005) as a classification problem and also the importance of cost-sensitive data in the directmarketing and customer retention domain (Domingos, 1999). There are several applications of data mining techniques in telecommunication as observed by Liebwitz (1998) as identifying telecommunication patterns, predicting which customers are likely to default on payments, improving service quality and resource utilization and also facilitating multidimensional data analysis to improve how to understand the behaviors of customers (Liebwitz, 1998). Fayyah et al (1996) explains that Fuzzy Logic [FL], Probabilistic Reasoning [PR], Neural Networks [NNs], and Evolutionary Algorithms [EAs] are the main components of computational intelligence which provide complementary reasoning and searching methods to solve complex real-world problem. The principal constituent methodologies in computational intelligence are complementary rather than competitive (Abraham, 1998). Different Types of Telecommunication Data Data mining has attracted world-wide attention in recent years due to the fact that there are huge amount of electronic data and the imminent need to turn such data into useful knowledge (Mani et al., 1999). The first step in data mining process is to clearly understand the data so that appropriate applications could be developed. The telecommunication data include the call detail data, the network data and the customer data. Call Detail Data: Whenever a call is made on a telecommunication network, there is always descriptive information about the call which is saved as a call record. The call detail record contains adequate information which describes the significant features of every call made. Such call detail record will usually consist of the originating and terminating phone numbers, the date and time of call and the duration of the call. The call detail records of every customer must be summarized into a single record that describes the customer’s calling behavior before useful knowledge could be extracted. This can help in generating customer profiles which can be mined for marketing purposes.

602

etwork Data: Telecommunication networks consist of different configurations of equipment which consist of many interconnected components. Each of the components is capable of generating error and status messages that can lead to a huge quantity of network data. The data is normally stored and analyzed in order to support network management functions such as fault isolation and detection. Data mining technology helps to perform the functions above by automatically extracting knowledge from network data. Customer Data: Telecommunication industries maintain a very large database of information due to their numerous numbers of customers. The information consists of names, address information as well as service plan, contract information, credit score, family income and payment history. These data are often used alongside other data such as using customer data in conjunction with call detail data to identify phone fraud. DATA MIIG TECHIQUES Data mining techniques have different features that make them suitable for analyzing large quantity of data, and these techniques are the result of a long process of research and product development. Data mining tools have the capability of analyzing massive databases and to deliver answers to questions at a very fast speed. OLAP (online analytical processing) is one of the analytical tools that focus on providing multidimensional data analysis which are based on verification where the system is limited to verifying user’s hypotheses (Zadeh, 1998). They are mainly used for simplifying and supporting interactive data analysis. The main goal of data mining is to discover new patterns in data for the purpose of predictions and description. They also check the statistical significant of the predicted patterns and give relevant reports on them. Although the boundaries between prediction and description are not sharp, the distinction is very vital in understanding the overall discovery goal which is achieved through the following data mining techniques. Clustering: This involves identifying a finite set of clusters to describe a set of data items (Abraham, 1998) or it can be described as a method by which similar records are grouped together. The clusters could consist of a richer representation which could be hierarchical or overlapping clusters. Using clustering, telecommunication customer data can be grouped based on customer name, customer address, service plan, credit score and payment history Regression: This takes a numerical dataset and develops a mathematical formula that fits the data. That is, for a set of data, regression technique predicts attribute value automatically for a new data depending on the dependency of an attribute on another. This

Journal of Emerging Trends in Engineering and Applied Sciences (JETEAS) 2 (4): 601-605 (ISSN: 2141-7016) technique is very efficient when working with categorical data where order is not important. Summarization: This technique involves finding a compact description for a subset of data. Summarization is often applied to interactive exploratory data analysis and automatic report generator. Classification: This is used to predict group membership for data instances in order to predict and determine which out of a predefined set of classes a data item belong to. Classification technique is very efficient when working with categorical data or a mixture of continuous numeric and categorical data. It is also capable of processing a wider variety of data with output that is much easier to interpret. A telecommunication industry whose customer credit history is known can classify its customer record as Good, Medium or Poor. Dependency modeling: The technique finds a model that describes significant dependencies between variables. Dependency model could either be structured level model that specifies which variables are locally dependent on each other, or quantitative level model which specifies the strengths of the dependencies using some numeric scale. Change and Deviation Detection: It focuses on discovering the most significant changes in data from previously measured or normative value. DATA MIIG ALGORITHMS COMPOETS Data mining algorithm helps in constructing specific algorithms to implement the general techniques discussed above. It has to do with deciding an actual algorithm for searching for patterns in a dataset, which entails selecting the model and parameters which are appropriate and matching a particular algorithm for knowledge discovery. There are three major components in any data mining algorithm which are model representation, model evaluation and model search. Model Representation: This is the language used to represent patterns that could be easily identified. The representation must not be too scanty otherwise no amount of illustration will generate an accurate model of the data. Also, if the representation is too enormous, it increases the danger of over fitting the training data which results in reduced prediction accuracy of unseen data. Fuzzy logic is an appropriate computational intelligence technique that could be employed for implementation here. Using Fuzzy logic involves the identification of a classifier system to design a model that has the ability to predict if a particular pattern should be classified or not. The classic approach for this problem is based on Bayes’ rule. Another model representation is interpretability in Fuzzy system; this 603

allows Fuzzy system to be evaluated according to their performance or accuracy. In order to evaluate Fuzzy system, there is a need for a way of accessing their interpretability, simplicity or user friendliness. Model Evaluation: Some other algorithms focused on either accuracy or interpretability, but recent algorithms try to combine these two features. Model evaluation criteria is therefore a quantitative statement of how well a specific pattern meet the goal of knowledge discovery process. In model evaluation criteria, different rules may be applied to remove redundancy which usually appears as overlapping Fuzzy set (Setnes, 1998). Similarity-driven rule base simplification is a method that uses similarity measure to quantify the redundancy among Fuzzy sets in the rule base. This method is very useful in reducing the number of Fuzzy sets from model, making it simple and yet robust. Multi-objective function for genetic algorithm (GA) based identification is another rulebased method that improves the classification capability, GA by applying optimization method where cost function is based on the model accuracy measured in terms of the misclassifications (Setnes, 1999). Finally, Orthogonal transforms for reducing the number of rules is also very vital. It evaluates the output contribution of the rules to obtain an important ordering. When it comes to modeling, orthogonal least squares (OLS) is a well suitable tool to use (Yen and Wang, 2001). Search Method: This is made up of parameter and model search. As soon as the model representation and evaluation are set, the problem is scaled to an optimization task of finding the models/and parameters that optimize the evaluation criteria. Model search occurs as a loop over the parameter search method. Computational intelligence tools for the initialization step of the identification procedure include: Fuzzy logic (FL), Probabilistic reasoning (PR), Neural networks (NNs), and Genetic algorithms (GAs). Computational intelligence based search methods for identification of Fuzzy based classifications allows fixed membership functions to be used to partition feature space which pave way to functions that are based on data to better explain the data patterns. The automatic determination of Fuzzy classification rule from data can be approach from any of the techniques listed: Neuro-fuzzy method, Genetic algorithm based rule selection and finally Fuzzy clustering in combination with GA-optimization. Data mining algorithm tools that could be used by these techniques include : Microsoft Decision Trees Algorithm, Microsoft Time Series Algorithm, Microsoft Clustering Algorithm, Microsoft Sequence Clustering Algorithm. They help to build algorithms that can create a data mining model. To create a model: an algorithm first analyzes a dataset and looks for specific patterns and trends (in

Journal of Emerging Trends in Engineering and Applied Sciences (JETEAS) 2 (4): 601-605 (ISSN: 2141-7016) this case call detail data, customer data and network data), the algorithm uses the results of this analysis to define the parameters of the mining model and finally, these parameters are then applied across the entire data set to extract actionable patterns and detailed statistics.

can be used by marketing departments to better target recruitment campaign and by active monitoring of customer call base to highlight customers who may by signature in their usage pattern be thinking on switching to another network provider.

Data Mining Prospects in Telecommunication Industry The prospects of data mining techniques in telecommunication are many. They help in predicting which customers are likely to default on payment, identifying of telecommunication patterns, catching fraudulent activities, improving service quality and resource utilization and facilitating multi-dimensional data analysis to improve understanding of customer behavior (Berry and Linoff, 2004). Information gained from data mining techniques can be used for application ranging from market analysis, fraud detection, and customer retention to production control and science exploration (Han and Kamba, 2001). Data mining helps to reduce company inefficiencies by making good predictions about business outcomes. Fraud Detection: In order to identify patterns of fraud, data mining application analyzed large amount of cellular call data (Fawcett and Provost, 1997) which are used to generate monitors. These monitors watch a customer’s behavior with respect to one pattern of fraud. The monitors are then fed into a neural network that determines when there is sufficiently evidence of fraud to raise an alert. Data mining also assists in detecting fraud by identifying and storing the phone numbers called when a phone known to be used fraudulently. Hence, data mining can be used generally to protect telecommunication operator revenues due to fraud (Estevez et al., 2006) or customer insolvency. Marketing and Customer Profiling: Telecommunication is one of the most data-intensive industries. They maintain vey huge amount of information about their customers. As such, they are the leader in the use of data mining to identify and retain customers, maximize the profit obtained from each customer. Data mining also helps in generating customer’s profiles from call detail records and then mining these profiles for marketing purposes. This method is now used to identify whether a phone line is used for voice or fax and to classify a phone line as belonging to a business or residential customers. A variety of data mining methods are now being used to model customer lifetime value for telecommunication customers (Rosset et al., 2003 ) because it is much more expensive to acquire new telecommunication customers than to retain existing ones. In order to improve customer relationship and to combat high cost of churn, increasing sophisticated data mining techniques are now being employed to analyze why customers churn and which customers are most likely to churn in the nearest future. This kind of information 604

etwork Fault Prediction and Isolation: Data mining applications have been developed in order to identified and predict network faults. The Telecommunication Alarm Sequence analyzer (TASA) is one of the data mining tools that help in fault identification by automatically discovery recurrent patterns of alarms within the network data. This patterns discovered by the tool are used to construct a rule-based alarm correlation system. TASA is also capable of finding episodic rules that depend on temporal relationships between the alarms. Another technique used in predicting telecommunication switch failures is the GA (genetic algorithm) which is used for mining historical alarm logs (Weiss and Hirsh, 1998) to search for predictive sequential and temporal patterns. COCLUSIO This paper describes how data mining tools and techniques can be used in telecommunication companies to discover and extract useful patterns from very large volume of dataset in order to find observable patterns, which can help in identifying telecommunication patterns, catching fraudulent activities, improving service quality and resource utilization, facilitating multi-dimensional data analysis to improve the understanding of customer behavior. REFERECES Abraham A. 1998. Intelligent Systems: Architecture and Perspectives, Recent Advances in Intelligent Paradigms and Applications, Abraham A., Jain L. and Kacpryk J. (Eds), Studies in Fuzziness and Soft Computing, Springer Verlag, Germany, ISBN 3790815381, Chapter 1, pp1-35 Berry M. and Linoff G. 2000. Mastering Data Mining: The Art and Science of Customer Relationship Management, John Wiley and Sons Inc, New York. Berry M. and Lino G. 2004 Data Mining Techniques for Marketing, Sales, and Customer Relationship Management: Indianapolis, 2nd Edition, Wiley Publishing Inc, New York. Chang Y. T. 2009. Applying Data Mining to Telecom Churn Management, International Journal of Reviews in Computing, 1(10): 67-77. Cortes C. and Pregibon, D. 2001. Signature-based Methods for Data Streams, Data Mining and Knowledge Discovery, 5(3):167-182.

Journal of Emerging Trends in Engineering and Applied Sciences (JETEAS) 2 (4): 601-605 (ISSN: 2141-7016) Costea A. 2006. The Analysis of the Telecommunications sector by the means of Data Mining Techniques, Journal of Applied Quantitative Methods 1 (2): 144-150. Domingos P. 1999. Metacost: A General method for making classification cost sensitive. Proceedings 5th ACM Sigkdd Conf. Knowledge Discovery and Data Mining (KDD ‘99). ACM press, pp 155-164 Estevez P. A., Held C. M. and Perez C. A. 2006 Subscription Fraud Prevention in Telecommunication using Fuzzy Rules and Neural Networks, Expert Systems with Application. 31 (2): 337-344. Fawcett T. and Provost F. 1997. Adaptive Fraud Detection, Data Mining and Knowledge Discovery. 1 (3): 291-316. Fayyah U., Piatetsky-Shapiro G. and Smyth P. 1996. Knowledge discovery and data mining: Toward a unifying framework. Proceeding of the Second International Conference on Knowledge Discovery and Data Mining (KDD’96). E. Simoudis, J. Han, and U. Fayyad (eds), AAAI Press, Portland, Oregon, August 2-4, pp: 82-88. Han J and Kamber M. 2001. Data Mining: Concepts and Techniques, Academic Press. Hills S., Agarwal D., Bell R. and Volinsky C. 2006 Building an Effective Representation for Dynamic Networks, Journal of Computational and Graphical statistic. 15 (3): 584-608. Jiawei H. and Micheline K. 2000. Data Mining: Concepts and Techniques, 1st Edition Morgan Kauffman. Liebwitz J. 1998. Expert System Application to Telecommunications, John Wiley and Sons Inc. New York. Mani D. R., J. Drew J., Betz A. and Datta P. 1999. Statistics and data mining techniques for lifetime value modeling. Proceedings 5th ACM Sigkdd Conf. Knowledge Discovery and Data Mining (KDD ‘99). ACM press, pp 94-103. Ranjan J. 2007. Application of Data Mining Techniques in Pharmaceutical industry, Journal of Theoretical and applied Information Technology. .3(3): 61-67 Rosset S., Neumann E., Eick U. and Vatnik M. 2003. ustomer lifetime value model for decision support. Data Mining and Knowledge Discovery. 7 (3): 321339

605

SAS Institute 2000. Best Price in Churn Prediction, A SAS Institute White Paper Setnes M. and Roubos J. 1999. Transparent fuzzy modeling using fuzzy clustering and GA’s, In NAFIPS, New York, USA, pp 198-202. Setnes M., Babusoeka R. and Verbruggen H.B. 1998. Complexity Reduction in Fuzzy Modeling, Mathematics and Computing Simulation. Thearling K. 1999. An introduction of Data Mining, Direct Marketing Magazine, 28-31 Wang K. et al., 2005 Mining Customer Value: From Association Rules to Direct Marketing, Int’l J. Data Mining and Knowledge Discovery (DMKDJ). 11 (1): 57-80 Weiss G. and Hirsh H. 1998 Learning to predict rare events in event sequences, In: R. Argrawal and P. Stolorz (Eds.). Proceedings of the fourth International Conference on Knowledge Discovery and Data Mining. Menlo Park, CA: AAAI Press, pp. 359-369. Yen J. and Wang L. 2001. Simplifying Fuzzy Rulebased Models using Orthogonal Transformation Methods, IEEE Trans. On System, Man and Cybernetics, 2(31) pp 199-206. Zadeh L. 1998. Roles of Soft Computing and Fuzzy Logic in the Conception, design and Deployment of Information/Intelligent Systems, in: Computational Intelligence: Soft Computing and Fuzzy- Neuro Integration with Applications. O. Kaynak et al. (Eds), Springer Verlag, Germany, pp1-9

Suggest Documents