International Journal of Applied Engineering Research, ISSN 0973-4562 Vol. 10 No.29 (2015) © Research India Publications; http://www.ripublication.com/ijaer.htm
CUSTOMER SEGMENTATION FRAMEWORK USING REDEFINED CENTROIDS R.Sabitha,1 Asst.Professor, Dept of IT, Info Institute of Engineering, Coimbatore, India
[email protected]
Dr.S.Karthik,2 Professor &Dean, Dept of CSE, SNS College of Technology, Coimbatore, India
[email protected]
1
Abstract—In today’s businesses, achieving customers satisfaction have critical role in organization's goals. All of customers doesn’t have equal share in prosperity of organization. Hence, identification of key customers is very important and vital for a company to establish relationship with consumers. It has also been experimental that maintaining old customers generates more profit than attracting new ones. So, Customer retention has to be maintained. There is always a relationship between customer benefits and business costs, which has to be optimized by the managers. This paper proposes to calculate customer’s loyalty based on LRFM customer relationship model which consists of four dimensions: Length(L), Recency (R), Frequency (F), and Monetary (M). Clustering analysis based on K-Means is used to cluster customers in order to identify high profile customers and promote marketing strategies. For improving the efficiency of the K-Means algorithm, initial cluster centroids values are modified and assigned using various methodologies like Random, Hartigan and Binary search. The validity of clustering process using the various initialisation methods is analyzed with R-Squared index and it is found that binary search outperforms. The codification of LRFM is also done to obtain the customer loyalty status.
2
again evaluated and an understanding of the customer is achieved. Nowadays, how to make marketing strategies has been an essential issue for industry. The emergence of the business-to-customer (B2C) markets has resulted in various studies on developing and improving customer retention and profit enhancement. This is mainly due to the retail business becoming increasingly competitive with costs being driven down by new and existing competitors. In general, consumer markets have several characteristics such as repeat buying over the relevant time interval, a large number of customers, and a wealth of information detailing past customer purchases. In those markets, the goal of CRM is to identify a customer, understand and predict the customer-buying pattern, identify an appropriate offer, and deliver it in a personalized format directly to the customer. One typical example of the CRM model corresponds to the case of an online retail shop which sells various products through internet and performs transaction directly with customers through the internet. [1]
Keywords—Customer Segmentation; Clustering; K-Means; initial centroids.
I. INTRODUCTION Data mining techniques are the processes designed to identify and interpret data for the purpose of understanding and deducing actionable trends and designing strategies based on those trends. Data mining techniques extract the raw data, and then transform them to get the transformed data, and then get meaningful patterns among the transformed data. As businesses evaluate their investments on marketing activities, they tend to focus on their data mining techniques and capability. How to learn more about customers and their inclination towards particular products, use that information to make appropriate choices to customers, and understand which marketing strategies can succeed in long term customer satisfaction and retention. Managers can understand their customer by evaluating customer behavior, customer segregation, customer profiles, loyalty and profitability. Data Mining helps managers to identify valuable patterns contained in raw data and their relations so as to help the major decisions. [1] The basic structure of CRM model lifecycle is shown in Fig.1. The model has two initiating pts. Firstly, the customer does some purchase and then the data is measured and evaluated. Afterwards, the company mines the evaluated data and then they can have an understanding of the patterns that the customer shows while purchasing. With the help of that data, the organization can formulate its steps to maximize or optimize its business plans. Secondly, the organization takes some action for improving the customer’s satisfaction by making a good informative offer, and then studies the actions taken by the customer. Then the actions of the customer are
Fig.1. CRM Life cycle A retail shop defines a customer as a person who has already bought products or performed a transaction with the shop. The increased availability of individual consumer data presents the possibility of direct targeting of individual customers. That is, the abundance of customer information enables marketers to take advantage of individual-level purchase models for direct marketing and targeting decisions. But, such an enormous amount of data can be a huge clutter, and it can become cumbersome to draw meaningful conclusions from such raw data. This is where the utility of customer behavior prediction using Data mining techniques comes in.At its core, segmentation is all about grouping the database records based on some characteristic like demographic information or behavior. Further, the ability to segment customers comes down to how well you know your customers and how they behave. Customers can be segmented in almost an infinite number of ways. How to best segment your customers is dependent on the business goals. However, there are four common segmentation approaches you can start with: 22032
International Journal of Applied Engineering Research, ISSN 0973-4562 Vol. 10 No.29 (2015) © Research India Publications; http://www.ripublication.com/ijaer.htm
Demographic Behavioral Psychographic Benefit One or more of these approaches are used to group the target market. It is important to fully leverage tools like your customer relationship management system to capture the information that is necessary for customer segmentation and targeting.Large amount of data is being collected every day in many business and science areas. This data needs to be analyzed in order to find interesting information from it, and one of the most important analyzing methods is data clustering. Clustering is one of the most important data mining tools which help data analyzers to understand the natural grouping of attributes in the data. Cluster analysis is used in many fields such as data mining, pattern recognition and pattern classification, data compression, machine learning, image analysis, and bioinformatics. Data clustering is a method in which a cluster of objects is made that is somehow similar in characteristics. The criterion for checking the similarity is implementation dependent. Existing clustering algorithms can be classified into two major classes, hierarchal and partitioning algorithms. A hierarchal method uses a nested sequence of partitions. This can be done by considering all data as one cluster and then dividing it into smaller ones, this is called Divisive clustering. The other class considers each data point as cluster and then merges them to form bigger cluster, this is called Agglomerative clustering.This paper proposes to calculate customer’s loyalty based on LRFM customer relationship model which consists of four dimensions: Length(L), Recency (R), Frequency (F), and Monetary (M). Clustering analysis based on K-Means is used to cluster customers in order to identify high profile customers and promote marketing strategies. For improving the efficiency of the K-Means algorithm, initial cluster centroids values are modified and assigned using various methodologies like Random, Hartigan and Binary search. The validity of clustering process using the various initialisation methods is analyzed with R-Squared index and it is found that binary search outperforms. The codification of LRFM is also done to obtain the customer loyalty status. This study focuses on the initialization problems in K-Means algorithm. The initialization problem of K-Means algorithm is formulated by two ways; first, how many clusters required for clustering task and second, how to initialize initial cluster centers for K-Means algorithm. This paper addresses the second issue of the initialization problem. To resolve it, binary search based initialization method and Hartigan’s method is used to initialize the initial cluster points for K-Means algorithm. In the model, initial cluster centers are obtained with the help of binary search and Hartigan based method, after that K-Means algorithm is applied. Performance of the proposed algorithm is evaluated with Tafeng dataset and compared with each other. The performance of the binary search method is better than all other methods as the execution time is reduced and also the clusters are well separated. II.
LITERATURE REVIEW
A. RFM and LRFM Models RFM model is a well-known customer value analysis method widely applied to segment customers. It is a behavior-based model to analyze the behavior of a customer and then make predictions according to the behavior in the
database. It consists of three measures – recency, frequency, and monetary and are combined into a three-digit RFM cell code. Recency measures the number of periods since the last purchase. Frequency measures the number of purchase made in a given time period. Monetary measures the total amount of money spent during a given period of time, or the average dollar amount per purchase or all purchases to date. The general way to use RFM model in customer behavior analysis is to sort the customer data by each dimension of RFM and then divide them into five equal quintiles. For recency, the customer database is sorted by purchase dates by descending order. The top segment is given a value of 5 and the others are assigned of 4, 3, 2 and 1. For frequency and monetary, to sort customer visiting frequency data and the customer data related to the amount of the money spent in descending order, respectively. The top 20% is also assigned the value of 5. The value of 4 is also given to the next 20% and so on. Some literature has attempted to develop new RFM models to test whether they perform better than the traditional RFM models by taking additional variables into account. The customer loyalty depends on the relationship between a firm and customers, revealing that the key of customer loyalty is built from a long customer relationship management. In this regard, in this paper, RFM model is extended as LRFM model by taking length (L) into account.[10,13] B. Customer Relationship Management The objective of CRM is to keep customers that contribute to the enterprise, which is also a continuous improvement process. CRM should really be called Contact Management, which represents the specific collection of all information on the interaction between the customer and the company. CRM is a behavior in which an enterprise tries to understand and reach customers through full interaction; moreover, it is a business strategy that enhances customer loyalty and profit gaining. Maximization of customer equity is a core objective of customer–company relationship management. Marketing capability has a larger influence than research and development ability on enterprise performance and management strategy of customer relationship, and maintenance are the main ability of marketing. Drawbacks The existing system has kept the monetary value constant that makes the segmentation of customers less efficient. The k-means algorithm the initial centers cannot be explicitly selected. The execution time for the original k-means clustering algorithm is higher. The random selection of initial cluster centers is not efficient than the proposed method and does not produce high quality clusters.[5] III.
Segmentation Framework
This paper proposes to calculate customer’s loyalty based on LRFM customer relationship model which consists of four dimensions: relation length(L), recent transaction time (R), buying frequency (F), and monetary (M). Clustering analysis based on K-Means is used to classify customers in order to set marketing strategies. An issue related to the K-Means algorithm is how many numbers of clusters exist in a dataset and initialization of initial cluster centers. The convergence result of K-Means algorithm is highly dependent on the initial cluster centers. If 22033
International Journal of Applied Engineering Research, ISSN 0973-4562 Vol. 10 No.29 (2015) © Research India Publications; http://www.ripublication.com/ijaer.htm
the initial cluster centers are not chosen properly the local optimum problem will exist in K-Means. The good convergence result is directly proportional to the good cluster centers. Hence the proposed method addresses the initialization as well as local optimum issues of K-Means.For improving the efficiency of the K-Means algorithm initial value for the cluster centroids are assigned using various methodologies like Random, Hartigan and Binary search. The validity of clustering process is analyzed with R-Squared index. Finally the above methods are compared based on the validity results. This study constructs a model for clustering customer value based on LRFM attributes and K-means algorithm. The LRFM model is regarded as input attributes then to yield quantitative value for K-means clustering. The proposed procedure for classifying customer value is given below. The framework is depicted in Fig.2. The proposed procedure can be divided into four processes: (1) Select the dataset and preprocess data; (2) Use length, recency, frequency and monetary to yield quantitative value as input attributes for cluster analysis, and cluster customer value as output by using K-Means clustering algorithm. [2]
Fig.2. Segmentation Framework B. Data Preprocessing
(3) Use Hartigan’s and Binary Search methods in addition to the random method to find the initial cluster centers; and
Computing LRFM Processing starts with creating the summary of each customers purchase history in the following manner:
(4) Finally, evaluate the results of experiment by using RS Index, compare with different methods and list the comparisons of experimental results in different level.
Algorithm to Calculate LRFM:
A. Business and Data Understanding The data understanding phase involves taking a closer look at the data available for mining. This phase includes collecting initial data, describing data, exploring data, and verifying data quality. Research analysis process includes the following steps: To collect and sort data based on LRFM model parameters. Sorting data is an important step in the process that the accuracy of the result is strongly correlated with how to do it. Then data related to the index values of LRFM was extracted from customers' databases, pre-processing and data preparation stage was performed according to the code allocated to any customer. For data pre-processing, activities of eliminating inadequate and incomplete data were performed. Dataset Description In this paper the Ta-Feng dataset is used, which is actually a grocery shopping supermarket dataset. Ta-Feng dataset, contains 817741 transactions belonging to 32266 users and 23812 items. [12] Column definition 1: Transaction date and time 2: Customer ID 3: Age: 10 possible values, 4: Residence Area 5: Product subclass 6: Product ID 7: Amount 8: Asset 9: Sales price
Step 1: The records from the raw transaction data file are grouped by customers and sorted based on the date of transaction. Step 2: Aggregation For all transactions, aggregate the records with (multiple transactions) same day transactions: Identify all-but-last transactions Aggregate the transactions Remove the remaining records Restore the data The resulting dataset contains data in which no customer has more than one transaction in a given day. Step 3: Computing Length values Find the date of first purchase of the customer. Calculate the sum of the first purchase. Identify the date of the last purchase of the customer. Calculate the sum of the last purchase. Calculate the length T for each customer. It is the time difference between the customer first ever purchased and end of each calibration period. Step 4: Computing Recency values Find the date of first purchase of the customer. Calculate the sum of the first purchase. Identify the date of the last purchase of the customer. Calculate the sum of the last purchase. Compute recency which is the length of time between first ever purchased and last observed purchase. Step 5: Computing Frequency values Split the dataset into calibration and validation period.
22034
International Journal of Applied Engineering Research, ISSN 0973-4562 Vol. 10 No.29 (2015) © Research India Publications; http://www.ripublication.com/ijaer.htm
Classify the transaction into first ever calibration period or validation period. Create the frequency summary table [by using the pivoting concept]. Step 4: Computing Monetary Values Compute the average spend per transaction for calibration and validation periods.
centroids are assigned using various methodologies like Random, Hartigan and Binary search.
Discretization of LRFM:
Output: A set of k clusters
Converting LRFM variables into discrete codes: For the dimensions L, R, F and M perform the following: Sort the customer based on the dimensions. Divide them into five equal customer groups. Assign the customers one of the five discrete codes {1, 2, 3, 4, and 5}.
Steps:
C. Clustering
3) Then calculate the distance between each data object di (1