by direct marketing actions to target customers. Keyword: CRM(Customer Relationship Management), web mining, association rules, one-to-one marketing, ...
International Journal of Electronic Business Management, Vol. 1, No. 1, pp. 36-45 (2003)
WEB MINING FOR CRM - AN EMPIRICAL STUDY OF COMPUTER GAME SERVICE COMPANY Shi-Ming Huang*1, Irene Kwan2 and Shing-Han Li1 1 Department of Information Management National Chung Cheng University Chiayi, Taiwan 2 Department of Information Systems Lingnan University Hong Kong
ABSTRACT The rapid growth of Internet in the past decade has speedily expanded the number of Web domains to over 43 million worldwide. To personalize e-customers relationship becomes important to maintain business on-line. Establishing an effective customer relationship management scheme using web mining technique seems to be the only way out. It is difficult to represent the difference and effectiveness of an Internet media by the number of visitor and page views. This paper applies an empirical study on a web site, ICP-game point, which publishes electronic news related to computer games. Using web mining, we aim to exploit the association rules between membership’s behavior, and applying differentiation and one-to-one marketing strategy. Our ultimate target is to personalize the electronic news by direct marketing actions to target customers. Keyword: CRM(Customer Relationship Management), web mining, association rules, one-to-one marketing, ICP(Internet Content Provider) relationship. The development of network game aroused more business players involved in computer game market, and that competition become more drastic. DFC Intelligence forecasts worldwide industry growth for video games and PC games to be 37% to 45% from 2002 to 2007[1]. By 2007 it estimates the worldwide market for interactive entertainment software and hardware will be between $28.4 billion and $30.1 billion. Note that this figure does not include consumer spending on accessories, game rentals and used games. It is estimated that these areas will account for another $5 billion or more in annual spending [2]. They also forecasts that 114 million people worldwide are expected to be playing online games by the year 2006. However, the market of game market of 2002 was not growth; instead there was a significant loss of investor value that occurred
1. INTRODUCTION The rapid development of electronic commerce incurs heavy business investment in the Internet. When economy evolves from product-oriented to customer-oriented, product and service promotion are focused on customization and personalization, Customers Relationship Management (CRM) becomes essential segments. Since Internet has the advantage of Internet technology, it is the best platform to implement CRM. One-to-one marketing contributes direct benefit to improve customer loyalty; it becomes one of the most important goals in CRM. Data-mining is employed to help web-site to identify its customers’ profile, analyzes their behavior and on-line feedback to publish personalized electronic news and eventually, provide management for customer *
Corresponding author
36
International Journal of Electronic Business Management, Vol. 1, No. 1, pp. 36-45 (2003) data warehouse or data mart, web mining attempt to analyze relevant internal web base data and discovers insights for doing the relational promotion. Some web mining software combines network server and database to record transactional information on website viewers and then provide these details in statistical form on spreadsheets. This kind of spreadsheets offers the aggregation of historical data to help the miner to understand the websites’ operation[9]. Web Mining Method may be divided by five types: Description/Summarization, Classification, Estimation/Prediction, Association rule, and Clustering. Description/Summarization (Characterization) Data can be associated with classes or concepts. It can be useful to describe individual classes and concepts in summarized, concise, and yet precise terms. Such descriptions of a class or a concept are called class/concept descriptions. It is a summarization of the general characteristics of a target class of data. In web mining, characterization allows to produce a description summarizing the characteristics of customers who spend certain amount of money in a year [7]. Classification The classification task is one of the most studied in data mining. In essence, the problem consists of assigning records to one out of a small set of pre-defined classes, by discovering some relationship between attributes [12]. In Web mining, classification allows to develop a profile for clients who access particular server files based on demographic information available on those clients, or based on their access patterns. Estimation and Prediction Estimation is used to guess an unknown value. Prediction is used to guess a future value. Prediction can be applied in combination with OLAP techniques to generalize properties of groups of people visiting a Web site. To calculate the estimation and prediction we can use the same algorithms.. This can help a marketer to slice and dice the data to find which item attributes or site characteristics appeal to the most valuable customers”[13]. Association rule Association analysis is the discovery of association rules showing attribute-value conditions that occur frequently together in a given set of data. Association analysis is widely used for market basket or transaction data analysis [14, 15, 16]. In Web mining association identifies the items that have higher probability to be purchased or viewed in any one session. If these items are not placed
in the interactive entertainment industry. Overall, the leading game publishers lost 35% of their market value from January 2002 to January 2003 due to poor economy and weak consumer spending. For this reason, It is important for game publishers to capture accurate information to understand the customers’ response behaviors, using customer profitability to guide marketing and promotions can be targeted to sustain profitability. In this paper, we present a prototype to introduce how to capture usefulness information form the web and to analyze customers’ behavior, we record the ICP members’ viewing information first and then after enough data capture from user login data and browsing pages, we translate and re-format these data into ready-to-mind data/information by statistical methods and OLAP for marketing to follow up with data-mining process.
2 RELEVANT RESEARCH 2.1 Knowledge Discovery in Databases Knowledge discovery is the nontrivial extraction of implicit, previously unknown, and potentially useful information from data[3]. Since the explosive use of databases, data mining has been deployed for almost a decade for knowledge discovery to enhance companies’ competitive advantages [4,5,6]. The steps for data mining are similar to the steps for web mining. It includes data collection, data aggregation, data selection and transform, data excavation, data evolution and learning [7, 8]. 2.2 Web Mining Web mining is the use of data mining techniques to automatically discovery and extracts information from Web documents and services [9]. Web mining can be broadly defined as the discovery and analysis of useful information from the WWW. In Web mining data can be collected at the server side, client side, proxy servers, or obtained from an organization’s database[10]. Web mining can be viewed as consisting of four tasks, Information Retrieval, Information Selection/Extraction, Generalization, and Analysis[9]. We categorize web mining into three areas of interest based on which part of the web to mine[11]: Web content mining: discovery of useful information from the web contents/data/ documents. Web structure mining: discovery the model underlying the link structures of the web. Web usage mining: tries to make sense of the data generated by the web surfer’s sessions or behavior. Differ from ordinary data mining, which focus on pattern recognition and machine learning from 37
International Journal of Electronic Business Management, Vol. 1, No. 1, pp. 36-45 (2003) profitability than causal customers. CRM demands high differentiation, customization, and personalization to maintain good relationship with important customers [19].
together on the same page in a Web catalog, chances are the customer may forget to buy these items. Association is most frequently used for market-basket type of application. Clustering Clustering allows one to group together with clients or data items that have similar characteristics. It is also referred as segmentation. Clustering systems allows to “specify how many clusters to identify within a group of profiles, and then try to find the set of clusters that best represents the most profiles”[13]. Clustering of client information on Web transaction logs can facilitate the development and execution of future marketing strategies, both online and off-line. For example, the automated return mails to clients falling within a certain cluster, or dynamically changing a particular site for a client, on a return visit.
2.4 One-to-One Marketing One-to-One Marketing: Focused on the individual customer, one-to-one marketing is based on the idea of an enterprise knowing its customer. Through interactions with that customer the enterprise can learn how he or she wants to be treated. The enterprise is then able to treat this customer differently than other customers. However, one-to-one marketing does not mean that every single customer needs to be treated uniquely; rather, it means that each customer has a direct input into the way the enterprise behaves with respect to oneself [20]. Marketing is transforming into one-to-one marketing, building long term learning model with every customer become a necessary trend. When offering customer service, company must have the idea of “there is nothing that is totally as same as another” - each customer must receive most suitable personalized service [21]. In practice, it is not easy to accomplish one-to-one marketing. When the number of customers increases tremendously, it is infeasible for company to increase the number of salesman and service standard in time. Apply technology solution on the network is the remedy for one-to-one marketing. Find the “target customer” first, and then use point casting like electronic news to deliver most suitable information to targeting customers [22]. Push technology has been touted as an alternative to how the World Wide Web currently operates, where users go online to search for information [23]. E-Mail plays an important role in the push technology [21]. The study of Kelsey group shows, the expense of e-mail marketing for American small-sized enterprise will increase to 2.2 billion in 2005; about 42% of them consider of accomplishing marketing strategy through e-mail [24]. These companies think it is more effective to use e-mail then the website’s flickering advertisement. Analyst pointed out the advantage of interactive, privacy, and personalized for e-mail marketing are the most valuable parts [25]. The research that Net Value did for Asia in March, 2001 shows Taiwan has the highest utility rate of e-mail in Asia in the year of 2000 [26].
2.3 Customer Relationship Management ERP systems are losing strategic value and become the basic equipment of e-Business; on the contrary, customer relationship management has the potential to grow (Market Intelligence Center of Institute for Information Industry, Taiwan). The top three companies in information industry: Oracle, HP, Intel have declared to cooperate with each other and wish to seize the CRM market in Taiwan. Surveys have shown that 20% of a company's customers make up for 80% of the profit; the average company looses of its customers every five years; it costs 5-10 times more to win a new customer than to keep a new one [17]; by increasing customer retention rate from 90% to 95%, Customer Net Present Value increases by 75%. Therefore, how to keep the customer loyal and customer satisfactions are import. The management of customer relationship means that enterprises use complete resource to completely understand every independent customer and interact with them by all means to improve customer’s lifetime value. When economic model has evolved from product-oriented to customer-oriented, and that usually makes businesses get into the scrape of price. To avoid from getting into the scrape is to be competitive by providing competitive advantages to customers. Effectively different product or service can offer additional value for customers [18]. CRM could be concluded into three key points as follows: Improve the interaction and the relationship between company and customers. Increase customer loyalty. Increase customer profitability. Traditionally, expensive advertisement and marketing exercises were adopted to promote sales, yet recent relevant marketing research has proven that it is not effective to regard customers as same individuals; important customers bring more
3. RESEARCH METHOD The research method can be divided into three steps: (1) collecting and cleaning the information, store the information in the data warehouse. (2) an iterative discovering process by the data mining tools and analysts review of the extracted patterns to 38
International Journal of Electronic Business Management, Vol. 1, No. 1, pp. 36-45 (2003) generate new set of questions to refine the search. After refine the search, the results of the mining process is be translated to association rules, which stored in the knowledge base. (3) the patterns are good predictors of purchasing behaviors, the CRM process uses the scores generated by the data mining process to sharpen the focus of targeted customers or prospects, thereby increasing response rates and campaign effectiveness. Figure 1. presents the research method of web mining for CRM.
The first step is to decide the data source to mine. Websites contain a lot more data than it is needed. Therefore, the foremost step is to identify the scope of study, determine on the relevant and essential data (e.g. members’ basic file, games’ basic information, web pages’ information, and other relevant browsing record) for investigation, study the original database and data access programs, and then write essential program to capture the selected data sets. After collection data, every attribute in the relevant data sets need to be analysis and then make
3.1 Data Collection Process
Data Collecting Process Data Source
Preprocess
Database, flat files, web catalog, and etc.
Data collect Data clean
Data Mining Process Review Patterns
Discovering Patterns
Store Data
Crossing search module
Graph Browser
personalized electronic news module
Refine Patterns
Data Warehouse
CRM Process
Store Association Rule
Knowledge Base Customer
Figure 1: The research method of web mining for CRM suitable representation, in such a way that it is readable by the mining algorithm in the next step. This process ensures the data set is ready to use for specific mining technique and purpose. In this process, generalization can be done through attribute removal or attribute generalization and aggregation can be done by combines conformed records and add up total amount.
3.2 CRM Process Data Warehouse, Data Mining, used to serve the purpose of supporting selection for acquisition, cross-sales, and retention of (real or potential) customers [Inmon 1992, 1996]. By running data mining algorithms on customer web log, data mining can uncover important associations about what products are often purchased together. This knowledge can then be used for product recommendations and product bundling. This knowledge is then used to make a recommendation for a future customer. In this step, the CRM process use the knowledge, which captured in step2, through personalized electronic news module providing solutions to users looking to communicate to current and potential clients through email campaigns. It helps Web visitors make better buying decisions more quickly, and it helps marketers learn more about how to meet the needs of the visitors who come to their Web site
3.2 Data Mining Process Apriori algorithm possesses primary position in the method of searching frequent item sets by association rules. This algorithm is based on the prior knowledge of frequent item sets, and arranges in proper sequence by lever-wise method, and cooperates with Apriori algorithm property to improve searching efficiency [14, 27, 28]. The frequent item sets are required to translate into strong association rules and stored in the knowledge base. The strong association rules could mostly satisfy both minimum support and minimum confidence at the same time. The minimum confidence represents percentage of specific case as below:
Support_count (A∩B) Confidence(A B) = P(B|A) =
39
Support_count (A)
International Journal of Electronic Business Management, Vol. 1, No. 1, pp. 36-45 (2003) represent same specific user. For dial-up or other ISP service, users may not use the same IP to access network, on the other hand, there may be many people access the internet through a specific computer. In this case, instead of analyze by using server web log entry as the data source, we use Gamepoint’s original association rules, and write programs to acquire member’s web log entry in real time; by this way, we can increase the accuracy of user’s web log entry. The content will be classified into two aspects: Article classification and Game attribute. The front part includes: Oracle database and Store Procedure. Crossing Search Module Association Rule Module Personalized Electronic News Module
4. SYSTEM IMPLEMENTATION 4.1 Case background ACERTWP owned several printed and network media. One of his published magazines was the national top 3 game magazine. Our case “Gamepoint” just online in 2000, it is a typical ICP and plays a neutral role in the field of professional game media. The content is mainly different kinds of game information, and it has over 20000 members, delivers more then 50000 pieces of e-news every week. In September 2000, the daily visit is 9197, and the daily page view is 57169.
4.3 Implementing Crossing Search Module The crossing search offers 15 searching methods. The searching methods can be classified into “Game analysis”, “article analysis”, and “member analysis. Since games and articles can be classified, there are two different models: classified and single. Subject to requirements, it will display the value of aggregate amount, total accumulation, or daily accumulation. Besides displaying these forms, most functions includes statistic graphic to assist website’s staff in understand the analysis. Table 1 shows each function model:
4.2 System Framework Web mining is divided into three types: web contents mining, web linkage structures mining and web usage mining. Web usage mining is used in this case. As shown in figure 2 above, the system structure is divided into two parts – front and rear. The rear part included web server and database server. The web server is for web log entry, when we access to webpage, the URL requested, source IP, and time stamp are then recorded. The server web log entry is not enough for case analysis: same source IP does not
R e m o te u ser
u ser
C o n te n t D a ta b a se
PH P3 p rogram u ser
W e b |S e r v e r
F o r m e d te x t f ile s o f v ie w in g lo g
L ocal O r a c le C lie n t
E n tir e k n o w le d g e
R e la tio n a l M od el
M in S u p W e b s ite D e s ig n e r
C r o s s in g Search M od el
S to re P roced ure
R u le s
v ie w in g lo g d a ta b a se
R u le s
O r a c le D B S e r v e r
M in C o n f
P e r s o n a liz e d K n o w le d g e
P e r s o n a liz e d e le c tr o n ic n ew sp ap er m odel
P e r s o n a liz e d K n o w le d g e
P e r s o n a liz e d co n te n t A c c u r a c y e v a lu a tio n
P e r s o n a liz e d k n o w le d g e
E v a lu a t io n M od el
Figure2: System architecture
40
International Journal of Electronic Business Management, Vol. 1, No. 1, pp. 36-45 (2003) Table 1: Crossing search module Game analysis Classified Single Aggregate 3.Total Games amount (Presently Total 1.viewed times of 4.The billboard of viewed times of accumulation each classified single games games 5.Daily viewed 2.Daily viewed times of single times of each Daily accumulation classified games games
Article analysis Classified Single 8. Total Games (Presently) 9.The billboard of 6.Total viewed times of classified total viewed times of single page articles 10.Total amount of 7.Daily viewed viewed articles in times of each classified articles one day 11.The billboard of daily viewed times of single page
Member Analysis 12.Total Viewers 13.Total viewed times of single member 14.Total Viewers 15.The billboard of Total viewed times of single member
engine. This module needs to get the whole data from database, and translates them into simple operation mode; by using the join sequence in Apriori algorithms, it will increase data amount and operation requirements rapidly. The process of Association Rule Module is showing Figure 3.
4.4 Implementing Association Rule Module Crossing search module offers more information and focuses on the integral information as compare to traditional flow analysis. The personalized knowledge is come from the Association rule module which regarded as personalization’s core
Figure 3: The process of association rule module member’s favor and finally get the personalized recommendable game list. For instance, user input a 65% minimum confidence level, and one rule is “70% of the members who have read 1st, 7th, 10th, 36th relevant articles have read the 89th relevant article, too”, then the process will accept this rule, and find out all the members who have read first, 7th,10th, 36th relevant articles but haven not read the 89th relevant article. The 89th relevant article then becomes one of the preparatory content of these members’ personalized electronic news since most likely that these 30% of members would also like to read this
4.5 Personalized electronic news module Personalized electronic news module and association rule module are closely related. The association rule module finds out members’ viewing rules, and we use personalized electronic news module to make efficient use of them. For translating the rules into valuable context that can improve the service’s value, personalized electronic news module include three steps, figure 4 shows the module. Step 1: Select appropriate viewing rules According to the minimum confidence level input by user, select and then using these rules to project each 41
International Journal of Electronic Business Management, Vol. 1, No. 1, pp. 36-45 (2003) - Send personalized electronic news by just clicking the button to drive the submit process, and then auto search would identify all the relevant content of personalized electronic news from database - includes nickname, e-mail address, member’s id, game title, game route, relevant article route, and much more, and format them into html type before finish the packet and delivery work. This procedure would process over and over until all of the mails are sent out.
article as the other 70% of the members do. Step 2: Produce the linkages between personalized electronic news - Upon step1 processing, each member has his/her own preferable content for electronic news, but not all the content that will be posted on the electronic news. Thus steps 2 involves finding out the link addresses of all relevant articles and pictures, and store these knowledge into the database. User also can view the content through the list shown on screen in advance. Step 3: Send personalized electronic news
Viewer’s behavior
Rules
Filter Rule Produce the preparatory
database
Min_Conf
content
Produce the linkages between
Articles
Web page (Articles) database
information
personalized electronic news and articles and pictures
Confirm Send Personalized
Web Manager
electronic news Automaticly
Personal electronic news
Figure 4: Personalized electronic newspaper module
viewing rules of Association model into one-to-one newspaper. Delivering electronic paper needs to cooperate with website closely, so ideas of editing website, fixed content, and loads of mail server need to be considered. All these details require website managers’ approval; therefore, we use a fixed mailbox to receive this electronic newspaper, so as to compare between those viewing rules. The evaluation model aims to test if the anticipation of Association model is efficient. The evaluation model will collect rules selected by step 1,
4.6 System Evaluation The view log is used from 2000/12/20 to 2001/2/23 as the functional mode’s main data, and use the view log from 2001/04/01 to 2001/5/31 as the main data source of functional mode. The detail is shown in table 2. Crossing Search Model is the first part done. Compare to the flow analysis, crossing search model analyzes data without contradiction; this information provided by the model is used to refine the websites content and design. Personalized electronic newspaper is produced by transforming the
42
International Journal of Electronic Business Management, Vol. 1, No. 1, pp. 36-45 (2003) evaluation. The vertical axis represents different minimum support, the first colon in horizontal axis represents different minimum confidence and the second colon shows the parameters to determine lost members. The parameter is designed to avoid the deviation that came from lost members; for example, when the parameter is “