LNCS 3177 - Detecting Credit Card Fraud by Using ... - Springer Link

Detecting Credit Card Fraud by Using Questionnaire-Responded Transaction Model Based on Support Vector Machines Rong-Chang Chen1 , Ming-Li Chiu2 , Ya-Li Huang2 , and Lin-Ti Chen2 1

Department of Logistics Engineering and Management [email protected] 2 Department of Information Management National Taichung Institute of Technology No. 129, Sec. 3, Sanmin Rd., Taichung, Taiwan 404, R.O.C.

Abstract. This work proposes a new method to solve the credit card fraud problem. Traditionally, systems based on previous transaction data were set up to predict a new transaction. This approach provides a good solution in some situations. However, there are still many problems waiting to be solved, such as skewed data distribution, too many overlapped data, fickle-minded consumer behavior, and so on. To improve the above problems, we propose to develop a personalized system, which can prevent fraud from the initial use of credit cards. First, the questionnaireresponded transaction (QRT) data of users are collected by using an online questionnaire based on consumer behavior surveys. The data are then trained by using the support vector machines (SVMs) whereby the QRT models are developed. The QRT models are used to predict a new transaction. Results from this study show that the proposed method can effectively detect the credit card fraud.

1

Introduction

Fraudulent transactions cost cardholders and issuing banks hundreds of millions of dollars each year. It is thus very important to present an effective method to prevent credit card fraud. In dealing with the credit card fraud problem, conventionally, actual transaction data are used to build up a system for predicting a new case [1–5]. This approach indeed provides a good solution in some situations. However, there are still some problems for using this approach [1–3]: – The system should be able to handle skewed distributions, since only a very small percentage of all credit card transactions is fraudulent. – There are too many overlapping data. Many genuine transactions may resemble fraudulent transactions. The opposite also happens, when a fraudulent transaction appears to be normal. – The systems should be able to adapt themselves to new consumer behavior. – There are no or few transaction data for new users. Z.R. Yang et al. (Eds.): IDEAL 2004, LNCS 3177, pp. 800–806, 2004. c Springer-Verlag Berlin Heidelberg 2004

Detecting Credit Card Fraud

801

To improve the problems mentioned above, a personalized system is proposed in this study. We first collect personal transaction data of users by a self-completion online questionnaire. The questionnaire-responded transaction (QRT) data, including both the normal (normal QRT) and the abnormal parts (abnormal QRT), are regarded as the transaction records and are used to construct a personalized (QRT) model, which is in turn used to predict and decide whether a new, actual transaction to be processed is a fraud or not. Since the consumer behavior of an unauthorized use is usually different from that of the legal cardholder, this kind of fraud can be avoided. This method is particularly suitable to new users since there are no or few transaction data for them. Thus, even if a thief uses a credit card applied with the personal information of others or a card is illegally intercepted, the fraud can be prevented. In addition, the collected abnormal QRT data can be used to improve the skewed distribution, which has a very small percentage of abnormal data. The rest of this paper is organized as follows. Section 2 gives a brief overview of SVMs. The proposed approach is described in Section 3. Section 4 presents the experimental results. Finally, conclusions are presented in Section 5.

2

Support Vector Machines (SVMs)

Some artificial intelligence techniques and data mining tools have been successfully applied to fraud detection, such as, case-based reasoning (CBR), neural network, learning machines, and more. Amongst them, the support vector machines (SVMs) are an emerging powerful machine learning technique to classify and do regression. Support Vector Machines (SVMs) were developed by V.N. Vapnik [6] and have been presented with sound theoretical justifications to provide a good generalization performance compared to other algorithms [7]. The SVM has already been successfully used for a wide variety of problems, like pattern recognition, bio-informatics, natural language learning text mining, and more [8,9]. The SVM has some desirable properties that make it a very powerful technique to use: only a small subset of the total training set is needed to split the different classes of the problem, computational complexity is reduced by use of the kernel trick, over-fitting is avoided by classifying with a maximum margin hyper-surface, and so on. SVMs are a method for creating a classification or a general regression functions from a set of training data. For classification, SVMs can find a hyper-surface in the space of possible inputs. This hyper-surface will split the negative examples from the positive examples. The split will be chosen so that it has the largest distance from the hyper-surface to the nearest of the positive and negative examples. This makes the classification correct for testing data that is near, but not identical to the training data. In this study, we will use SVMs to classify the transactions.

3

Approach

The proposed method is illustrated in Fig. 1. First, the transaction data of users are collected by using an online, self-completion questionnaire system. Then, the

802

Rong-Chang Chen et al.

Fig. 1. The procedure for predicting fraud.

data are trained by SVMs whereby QRT models will be set up. Finally, these QRT models are used to classify a new transaction as a fraudulent or a genuine transaction. Data-Collection Method. There are many methods to collect data. Amongst them, interviews and questionnaires are two of the most common survey methods. The biggest advantages of self-completion questionnaires over structural interviews are that they can save money and take less time, as well as allow larger samples to be acquired. In particular, if we allow the questionnaires to be delivered on Internet, the benefits of self-completion questionnaires would be considerably increased. Online Questionnaire System. A successful questionnaire must fulfill three requirements [10]: (1) The questions must be clear and unambiguous. (2) The presentation of questions must be standardized. (3) An efficient, reliable and preferably, cost-effective way of translating the data for subsequent analysis must be used. Following the above suggestions, an online questionnaire system is developed. The design platform of online questionnaire system is Windows 2000, IIS 5.0. The program used is ASP. The database is SQL 2000. The pilot work has been executed to ensure that a good final questionnaire is developed. Questions are designed based on an individual’s consuming preference [11,12]. Participants can access a batch of 15-20 questions each time and can decide to answer more batches by clicking on the button for continue. Consumer behavior varies considerably with each individual. It is, therefore, reasonable to classify their behavior according to several main properties. The gathered personal QRT data are mainly composed of several parts: gender, age, transaction time, transaction amount, and transaction items.


803

Personalized Model vs. Group-Based Model. In general, a consumer group has similar consumer behavior. Thus, it is reasonable to use a group-based model to predict an individual’s behavior. However, when dealing with the transaction item, like a coin having two sides, one will do it, the other not. Under this situation, if we put each user’s data into the same model, the accuracy would not be so high. Therefore, we propose to develop personalized models to improve on the prediction-accuracy. Detailed Database (Ddb) vs. Broader Database (Bdb). The questions in the online questionnaire are based on two different databases, broader database (Bdb) and detailed database (Ddb). The major difference between them is the definition of the transaction items. For instance, in the Bdb, we will classify keyboard, mouse and so on as computer accessories. By contrast, in the Ddb, they will be classified as an individual transaction item, respectively.

4

Results and Discussion

In this paper, we used mySVM [8], which was run on a Pentium III 667 PC with 256K of RAM, and Windows 2000 Professional operating system, to train all QRT data. More than 100 junior college students were invited to answer the questionnaire. They are in their early twenties. Most students did not have any credit card; a few had one card, only. Most participants were willing to answer approximately 105-120 questions which took about 15 minutes to answer. Totally, about 12,000 QRT data were collected. One interesting finding is that the QRT data are not very skewed. The ratio of positive (normal) samples to total samples has a range from 0.35 to 0.67. For convenience, let Ntrain be the number of the training data, Ntest be the number of test data, and R be the ratio of positive (+, normal) samples to negative (-, abnormal) samples. The prediction accuracy is denoted as P. P(+/+) represents the accuracy of normal samples predicted to be normal, P(+/-) represents the percentage of normal samples predicted to be abnormal, and so on. Finally, let P represent the personalized model and Q represent the group-based model. The effect of testing data distribution is shown in Fig. 2(a). The ratios of Ntest /(Ntest +Ntrain ) are 0.33, 0.42, 0.5, and 0.75, respectively. R=1. The SVMs are not very sensitive to the testing distribution, as indicated in Fig. 2(a). Figure 2(b) shows the effect of Ntest on the P(-/-). Ntest is varied from 20 to 70. Ntest is set to be 50% of Ntrain and R = 1. The database used is the Ddb. A personalized QRT model (P Ddb) is set up for a middle-age man, who provided 3-year real transaction data of his credit card. Note that all his real data are normal. Therefore, to correctly predict a fraudulent behavior, abnormal data must be generated. This is a limit case of skewed distribution. To solve this problem, we collect the abnormal QRT data, which can be considered as fraudulent transaction data, to set up a P Ddb model. The testing data are also substituted into youth-group model (youth G Ddb) based on over 100 youths.

804


Fig. 2. The effect of testing data and the variation of P(-/-).

Figure 2(b) shows that the P Ddb model can predict correctly the abnormal data even with a low Ntest . To obtain a high P(-/-), Ntest = 40, where the total number of data is 120, is enough. This is important information since people are usually not willing to spend too much time to answer questions. As mentioned above, most participants were willing to answer about 105-120 questions. As one might expect, the P Ddb result is much better than that in youth G Ddb, indicating that different consumer group has different behavior. Thus, stratification or discrimination is needed to predict accurately the consuming behavior. The comparison of accuracy of the personalized models and the group-based model is illustrated in Table 1. R = 1. The results show that the overall P of P Ddb is a little higher than that of G Ddb. Table 1. Summary of overall P. P Ddb P Bdb youth 1 83% 81% youth 2 88% 90% youth 3 83% 76% avg. 84% 82%

G Ddb avg. 87% 83% 83% 87% 73% 77% 81% 82%

For fraud detection, the P(-/-) is the most concerned, since a high P(-/-) allows only a small percentage of fraudulent behavior. Therefore, multiple models are employed to improve P(-/-). The testing data are tested in three different models, i.e., (P Ddb), (P Bdb), and (G Ddb). There are three different rules: the “1-” rule stands for a situation that if there are 1 or more than 1 abnormal prediction, the predicted result is considered as abnormal. Similarly, the “2-” rule represents that if there are 2 or more than 2 abnormal predictions, the behavior is regarded as abnormal. Figure 3 shows that the “1-” rule can enhance


805

Fig. 3. The change of fraud detection accuracy P(-/-) with different rules. Table 2. Summary of costs and ∆P . Actual Predicted behavior transaction Genuine Fraudulent Genuine 0 ∆P (+/+) Reconfirmation, Cr ∆P (+/-) Fraudulent Fraudulent, Cf ∆P (-/+) Reconfirmation, Cr ∆P (-/-) Table 3. Summary of accuracy difference ∆P . The difference is defined as the accuracy of “-1” rule minus the accuracy of P Ddb, P Bdb, and G Ddb, respectively.

youth 1 youth 2 youth 3 avg.

youth 1 youth 2 youth 3 avg.

P Ddb P Bdb ∆P (+/-) ∆P (-/+) ∆P (-/-) ∆P (+/-) ∆P (-/+) ∆P (-/-) 18% -18% 18% 14% -16% 16% 8% -8% 8% 18% -10% 10% 14% -22% 22% 8% -28% 28% 13% -16% 16% 13% -18% 18% G Ddb Average ∆p ∆P (+/-) ∆P (-/+) ∆P (-/-) ∆P (+/-) ∆P (-/+) ∆P (-/-) 6% -8% 8% 13% -14% 14% -8% -2% 2% 6% -7% 7% -4% -28% 28% 6% -26% 26% -2% -13% 13% 8% -16% 16%

P(-/-) considerably, as compared to the average P(-/-) from three models (avg. 3 models) without using any rules. Tables 2 and 3 show the summary of ∆P and costs. There are mainly two kinds of costs: fraud loss cost Cf and reconfirmation cost Cr. In general, Cf>>Cr. Thus, if we can improve P(-/-), we can save lots of cost. Employing the “-1” rule, the average ∆P of 3 youths is about 16%. Multiplying this percentage by the fraud amount, the benefit of improving detection accuracy is considerable.

806

5


Conclusions

We have presented a new method to solve the credit card fraud problem. Our method is to collect the questionnaire-responded transaction (QRT) data of users by using an online questionnaire and then employ the SVMs to train data and to develop the QRT models, which are then used to decide whether a new transaction to be processed is a fraud or otherwise. The results from this study show that the proposed method can effectively detect fraud. Even without actual transaction data or with little transaction data, a fraud can be detected with high accuracy by the QRT model, especially by using a personalized model.

References 1. S. Maes, K. Tuyls, B. Vanschoenwinkel, and B. Manderick, “Credit Card Fraud Detection Using Bayesian and Neural Networks,” Proc. NEURO Fuzzy, Havana, Cuba (2002). 2. P.K. Chan, W. Fan, A. L. Prodromidis, and S.J. Stolfo, “Distributed Data Mining in Credit Card Fraud Detection,” IEEE Intel. Sys., Nov.-Dec. (1999) 67-74. 3. R. Brause, T. Langsdorf, and M. Hepp, “Neural Data Mining for Credit Card Fraud Detection,” IEEE Int. Conf. Tools with Artif. Intel. (1999). 4. P. Chan and S. Stolfo, “Toward Scalable Learning with Nonuniform Class and Cost Distributions: A Case Study in Credit Card Fraud Detection,” Proc. 4th Int. Conf. Knowl. Disco. and D. Min., Menlo Park, Calif. (1997) 164-168. 5. R. Wheeler and S. Aitken, “Multiple Algorithm for Fraud Detection,” Knowl. Based Sys., (13) (2000) 93-99. 6. V.N. Vapnik, “The Nature of Statistical Learning Theory,” Springer (1995). 7. C.J.C. Burges, “A Tutorial on Support Vector Machines for Pattern Recognition,” Da. Min. and Knowl. Disco., Vol. 2, No.2 (1998) 955-974. 8. S. Ruping, “mySVM-Manual,” Computer Science Department, AI Unit University of Dortmund, Oct. 30 (2000). 9. M. Berry and G. Linoff, “MData Mining Techniques for Marketing, Sales, and Customer Support,” Wiley Computer Pub. (1997). 10. R. Sapsford, “Survey Research,” SAGE Pub. (1999). 11. http://www.104pool.com/. 12. http://survey.yam.com/life2000.