A NEW BINARY SUPPORT VECTOR SYSTEM FOR INCREASING ...

0 downloads 0 Views 422KB Size Report
Mar 9, 2006 - results show that the proposed BSVS is effective especially for predicting a high true negative rate. Keywords: Fraud detection; credit card fraud ...
March 9, 2006 14:55 WSPC/115-IJPRAI

SPI-J068 00462

International Journal of Pattern Recognition and Artificial Intelligence Vol. 20, No. 2 (2006) 227–239 c World Scientific Publishing Company 

A NEW BINARY SUPPORT VECTOR SYSTEM FOR INCREASING DETECTION RATE OF CREDIT CARD FRAUD

RONG-CHANG CHEN Department of Logistics Engineering and Management Taichung, Taiwan 404, R.O.C [email protected] TUNG-SHOU CHEN∗ and CHIH-CHIANG LIN Graduate School of Computer Science and Information Technology Taichung, Taiwan 404, R.O.C ∗[email protected]

Recently, a new personalized model has been developed to prevent credit card fraud. This model is promising; however, there remains some problems. Existing approaches cannot identify well credit card frauds from few data with skewed distributions. This paper proposes to address the problem using a binary support vector system (BSVS). The proposed BSVS is based on the support vectors in the support vector machines (SVM) and the genetic algorithm (GA) is employed to select support vectors. To obtain a high true negative rate, self-organizing mapping (SOM) is first employed to estimate the distribution model of the input data. Then BSVS is used to best train the data according to the input data distribution to obtain a high detection rate. Experimental results show that the proposed BSVS is effective especially for predicting a high true negative rate. Keywords: Fraud detection; credit card fraud; support vector.

1. Introduction Annually credit card fraud costs cardholders and issuers hundreds of millions of dollars. Therefore, preventing credit card fraud has become a top priority. Existing approaches to identifying credit card frauds can be roughly classified into two types. One is the multiple-user approaches (MUA), which are based on data of multiple credit card users.1,4,5,9,13 The other is personalized approaches (PA), which are based on personal data.6,7 In the MUA, researches for credit card fraud involved taking many users’ transaction data to create models for predicting new cases. This type of approach has some good results. However, there are still several problems

227

March 9, 2006 14:55 WSPC/115-IJPRAI

228

SPI-J068 00462

R.-C. Chen, T.-S. Chen & C.-C. Lin

for using this kind of approach: • Consumer behavior varies significantly from one individual to another. When predicting consumer behavior of a certain user by using a model based on many credit card users’ transaction data, the prediction accuracy could be very low in many situations. In particular, for new users, there are no or few transaction data. MUA cannot address a case problem where fraud involves illegally using a cardholder’s personal information to open a credit card account or a newlyissued card is stolen. Under this situation, frauds occur because the transactions are executed before the legal cardholder has had access to it. • There are many overlapping data where legitimate transactions are similar to fraudulent transactions. The opposite also happens. When dealing with a certain transaction item in a similar situation, like a coin having two sides, one will do it, the other not. For this reason, overlapping or contradictory data are unavoidable when using MUA. • Consumer behavior changes over time. Accordingly, it is generally very difficult to deal with the time-varying behavior by using MUA. To improve the problems mentioned above, Chen et al.6 proposed a novel type of approach, i.e. personalized approach, to prevent fraud. They first collect users’ personal transaction data by a self-completion online questionnaire system. The collected questionnaire-responded transaction (QRT) data, including both the normal and the abnormal parts, are regarded as the transaction records and are used to build up a personalized QRT model, which is in turn used to predict and decide whether a new, actual transaction to be processed is a fraud or not.6 Since the unauthorized user’s consumer behavior is usually different from that of the authorized cardholder, the fraud can be avoided. PA is promising, but there are some issues that must be addressed. One of which is to accurately predict with few data; especially for data with a very skewed distribution, i.e. the number of negative (fraudulent) samples is much less than the number of positive (genuine) samples. This is because users are usually not willing to answer too many questions and thus the collected data for a specific user may be few. In this case, it is very difficult to predict the negative samples accurately. Two basic methods were proposed to cope with the rare class problem by changing the class distribution20,21 : Over-sampling, which is to replicate the data in the minority class, and under-sampling, which is to remove part of the data in the majority class. Both methods moderate the problem of imbalance in the dataset. However, both of them have pros and cons. Over-sampling enlarges the training set size and increases the training time. Under-sampling, in contrast, may reduce some of useful data. Both methods help increase the detection rate. However, the processes are task-taking and time-consuming. Moreover, the pattern of fraudulent transactions varies over time, requiring relatively frequent and rapid generation of new models. Consequently, approaches that can detect the fraud effectively are needed.

March 9, 2006 14:55 WSPC/115-IJPRAI

SPI-J068 00462

A New BSVS for Increasing Detection Rate of Credit Card Fraud

229

In this paper, a new and more effective tool called the binary support vector system (BSVS) will be used to increase the true negative (TN) rate for detecting credit card fraud. The proposed BSVS is based on the concept of support vectors, which are selected by employing the genetic algorithm (GA) to optimize the prediction accuracy. In addition, self-organizing mapping (SOM) is first used to see the estimate of the distribution model of the input data. Subsequently, BSVS is used to best train the data along with the input data distribution to obtain a high detection rate. Compared to two previous approaches in Ref. 7, BSVS is simpler and effective to get a higher TN rate. The rest of this paper is organized as follows. Section 2 gives a brief overview of BSVS. The experimental results are analyzed and discussed in Sec. 3. Finally, conclusions are presented in Sec. 4.

2. Binary Support Vector Approach (BSVS) The credit card fraud problem can be considered as a pattern recognition problem. Some artificial intelligence techniques, such as neural networks,10–12,15,17 learning machines,3,14,19 and more, have been successfully applied to pattern recognition problems. Among them, the support vector machine (SVM), which was developed by Vapnik,19 is an emerging powerful machine learning technique. Compared to other algorithms, SVM has been proposed with sound theoretical foundations to provide a good generalization performance3 and it has already been successfully applied to a wide variety of problems, like signal de-noising,18 bio-informatics,2 natural language learning, text mining, and so forth.3 SVM is derived from the generalized portrait algorithm. This supervised technique maps the nonseparable data into a higher-dimensional feature space and defines a separating hyperplane, the maximum margin hyperplane, which maximizes the minimum distance from the hyperplane to the closest training points.19 The maximum margin hyperplane can be represented as a linear combination of these training points. The linear combination depends on the selected kernel function and the training points. The location of the separating hyperplane in the feature space is specified by real-valued weights on training points. Those training points that are situated far away from the hyperplane do not engage in its specification, and therefore their weights are zeros. Only the training points that lie close to the hyperplane between the two classes receive nonzero weights. These training examples are called support vectors since removing them would change the separating hyperplane.3 An example of the support vectors in a two-dimensional feature space is illustrated in Fig. 1. In this figure, gray circles and squares are support vectors. SVM focuses upon the support vectors because they are critical to separate two classes. However, the selection of support vectors may be ineffective in SVM. Note that, for many datasets, SVM may not be able to find any separating hyperplane at all, either because the kernel function is inappropriate or because the dataset

March 9, 2006 14:55 WSPC/115-IJPRAI

230

SPI-J068 00462

R.-C. Chen, T.-S. Chen & C.-C. Lin class -1

class +1

vj v'i

Fig. 1.

vi v'j

The concept of support vectors.

contains mislabeled points. In addition, it is an optimization problem to find the maximum margin hyperplane based on the support vectors. In this paper, we provide another viewpoint to explain support vectors. Based on this new concept, we can find support vectors effectively, and apply them to the separation of the two classes. The creation of a classifier usually involves two stages. The first is the training stage, and the second is the testing stage. In the training stage, candidate support vectors are first selected and tested. Then GA is employed to choose support vectors. Consider a training set R = {vi , ti  | i = 1, 2, 3, . . . , n}, where vi is a training vector and ti is the class of each instance. Each training vector vi is of k dimension. In this paper, only two-class classification is considered. Therefore, the value of ti is set to be either −1 or +1. Let ti = −1 for fraud behavior; otherwise, ti = +1. The training set R is divided into two subsets, a negative subset R− and a positive subset R+ , where R− = {vi , ti  | ti = −1 and i = 1, 2, 3, . . . , n− }, R+ = {vi , ti  | ti = +1 and i = 1, 2, 3, . . . , n+ }, and R− ∪ R+ = R. Similarly, we define V − and V + for R− and R+ , respectively, such that V − = {vi | vi ∈ V and ti = −1} and V + = {vi | vi ∈ V and ti = +1}. Given two vectors vi and vj . Let sim(vi , vj ) denote the similarity value between vi and vj . All similarity definitions can be applied to BSVS, such as Squared Euclidean Distance (SED) and Pearson Correlation Coefficient (PCC). Note that a higher similarity value indicates a remarkable similarity between two vectors. 2.1. Selecting the candidates After inputting training set R and defining similarity, BSVS will be used to locate the candidates of support vectors from V . Consider a vector vi+ ∈ V + . Define − − of vi+ such that vc(i) is the candidate of vi+ if the corresponding candidate vc(i)   − − is the maximum (or minimum) of belongs to V − and sim vi+ , vc(i) and only if vc(i)  + − − − sim vi , vj for each vj ∈ V . The choice of maximum or minimum is an option in BSVS for different purposes, like the selection of single linkage or complete linkage in cluster analysis. The choice will depend on the data distribution. In general, the data of credit card transaction are high-dimensional. Therefore, it is difficult to understand their distribution. However, self-organizing mapping (SOM) can be

March 9, 2006 14:55 WSPC/115-IJPRAI

SPI-J068 00462

A New BSVS for Increasing Detection Rate of Credit Card Fraud

Fig. 2.

231

A typical overview of QRT credit card data.

employed to estimate the distribution model of the input data. Figure 2 shows a typical overview of QRT credit card data. The data in white regions are genuine, while in black regions are fraudulent. A darker color indicates a stronger tendency to the fraudulent behavior. As we can see, there are some overlapping data; i.e. some legitimate data are similar to fraudulent data. Thus, to get high detection rate, it is important to choose an appropriate classifier and a suitable option (i.e. maximum or minimum similarity) to classify the genuine behaviors.   − and −fraudulent − − , , vc(2) , vc(3) , . . . , vc(n+) Let Vc− be the candidate set in V − . Then Vc− = vc(1) where n+ is the size of the positive training set. As we know, if there are two  − − , is identical to vc(p) vectors vi+ and vp+ whose candidates are the same i.e. vc(i)  −  − − − there must exist some overlap in vc(1) , vc(2) , vc(3) , . . . , vc(n+) . Based on the axioms   − − − − after removing the overlap. Here cn − , vc2 , vc3 , . . . , vcn− of set theory, Vc− = vc1 − means the number of candidates in V and cn − should be less than n+ . In a similarway, BSVS can find the set Vc+ in V + . Note that Vc+ ⊂ V + and  candidate + + + + + + Vc = vc1 , vc2 , vc3 , . . . , vcn+ , where cn denotes the number of candidates in V + . Suppose that two training sets R− and R+ are well separated (without mislabeled data, see Fig. 1). Then the results from Vc− and Vc+ are acceptable. However, there are usually some mislabeled training vectors. Consider vj and vj in Fig. 1. If they are mislabeled and have maximum similarity values, then they will be selected to be candidates. Note, however, that they do not lie close to the boundary between the two classes. Therefore, some unsuitable candidates will be selected in Vc− and Vc+ . On the other hand, if the mislabeled vj and vj are selected to be candidates, there may be some “real candidates” that are not selected into Vc− and Vc+ . To remedy the over-selection and mis-selection problems, BSVS repeats the above processes of selecting candidates and try to collect more candidates lying close to the boundary as many as possible. As for the unsuitable candidates, BSVS will filter them out in the testing of candidates and selection of support vectors. For the convenience of discussion, the candidates of the first overlap in V − will be denoted − − and the process of selecting candidates as “”. That is, Vc[1] = Vc− V − , as Vc[1]

March 9, 2006 14:55 WSPC/115-IJPRAI

232

SPI-J068 00462

R.-C. Chen, T.-S. Chen & C.-C. Lin

− where Vc− is the result of the candidate selection based on V − ·Vc[1] is identical to the − − − above result Vc . Next, BSVS deletes Vc[1] from V since these are not repetitively − − collected. Let Vc[2] be the collected candidates of the first two overlaps. Then, Vc[2] = − − − − Vc[1] ∪ (Vc (V − Vc[1] )). Suppose L is the search level of BSVS. Hence the final − − − − candidate set of V − is Vc[L] and Vc[L] = Vc[L−1] ∪ (Vc− (V − − Vc[L−1] )). Similarly, + + + + + + + )). the final candidate set of V is Vc[L] and Vc[L] = Vc[L−1] ∪ (Vc (V − Vc[L−1]     − − − − − + + + + + Let Vc[L] = vcL1 , vcL2 , vcL3 , . . . , vcLn− and Vc[L] = vcL1 , vcL2 , vcL3 , . . . , vcLn+ , − + where cLn − and cLn + denote the sizes of Vc[L] and Vc[L] , respectively.

2.2. Testing the candidates − + and Vc[L] , consider a To evaluate the performance of the candidate sets Vc[L] − training vector vi . Let max sim (vi ) denote the maximum similarity value of − − − ) for all vcLi in Vc[L] , and max sim+ (vi ) denote the maximum similarsim(vi ,vcLi + + + ) for all vcLi in Vc[L] . If max sim− (vi ) > max sim+ (vi ) (or ity value of sim(vi ,vcLi max sim− (vi ) ≤ max sim+ (vi )) and the corresponding ti of vi is −1 (or +1), vi is separated “correctly” based on the current candidate sets. This is the definition of correct separation. Equation (1) defines the accuracy rate of the candidate sets − + and Vc[L] such that Vc[L]

accuracy rate (%) = (the number of correct separation in V )/R.

(1)

2.3. Selection of support vectors In BSVS, the number of search levels is used to solve the mis-selection problem − of support vectors. However, there will exist some over-selected candidates in Vc[L] + and Vc[L] . To ensure high prediction accuracy as well as high efficiency in selecting support vectors, BSVS will employ the genetic algorithm in Ref. 8 to eliminate the over-selected candidates. The student-project problem can be used to solve the problem.8 Here, all candi− + and Vc[L] are translated into the students of student-project problem, dates in Vc[L] and limit the project types. If a student is assigned to a project, i.e. a candidate is chosen, then the project type of this student (i.e. candidate) will be set to be “1”; otherwise, it will be set to be “0”. Moreover, the definition of the chromosome in Ref. 8 is modified as follows: (b1 , b2 , b3 , . . . , bCH ), where CH is the size of all candidates and bi must be either 0 or 1. The ith candidate will be removed from the corresponding candidate set if bi = 0; otherwise, the candidate will be kept. Besides the above modifications, we define the fitness function for the credit card fraud problem. The fitness function, fg , of chromosome g is defined in Eq. (2) as follows: n  F (vi ). (2) fg = i=1

March 9, 2006 14:55 WSPC/115-IJPRAI

SPI-J068 00462

A New BSVS for Increasing Detection Rate of Credit Card Fraud

233

The smaller the value of fg , the better is the chromosome g. Based on g, F (vi ) has four different cases. If vi ∈ V + and max sim− (vi ) < max sim+ (vi ), i.e. a true positive case, F (vi ) equals to zero since there is no cost in this case; if vi ∈ V + and max sim− (vi ) > max sim+ (vi ), i.e. a false negative case, F (vi ) equals to the confirmation cost W since some cost is wasted to confirm this transaction. If vi ∈ V − and max sim− (vi ) > max sim+ (vi ), i.e. a true negative case, F (vi ) equals to the confirmation cost W since this transaction has to be confirmed. Finally, if vi ∈ V − and max sim− (vi ) < max sim+ (vi ), i.e. a false positive case, F (vi ) equals to the transaction price P (vi ). P (vi ) is much larger than W in most cases. The approach proposed in Ref. 20 is used to eliminate the over-selected candi− + and Vc[L] are Vs− and Vs+ , respectively. dates. The results after elimination of Vc[L] − − − − + Let Vs− = {sv1 , sv2 , sv3 , . . . , sv sn− } and Vs+ = {sv 1 , sv2+ , sv3+ , . . . , sv + sn+ }, where − + − − + + sn is the size of Vs , and sn is the size of Vs . We call sv i and sv i the support vectors, and Vs− and Vs+ the support vector sets. We can see that sn − < cLn − , − + , sn + < cLn + , and Vs+ ⊆ Vc[L] . Note that Vs− and Vs+ are the final Vs− ⊆ Vc[L] results in the training stage in BSVS. The procedure for the training stages in BSVS is shown in Fig. 3.

training set similarity definition (sim)

Search Candidates

search level L

Select Support Vectors

meta training result Search the closest candidate meta predicted result test of accuracy rate of training result (test of candidates)

trained result (i.e., trained model) Fig. 3.

The procedure for the training stage in BSVS.

testing set

Search the closest Support Vector

similarity definition trained result (i.e., trained model)

predicted result (i.e., accuracy rate)

Fig. 4.

The procedure for the testing stage in BSVS.

March 9, 2006 14:55 WSPC/115-IJPRAI

234

SPI-J068 00462

R.-C. Chen, T.-S. Chen & C.-C. Lin

2.4. Testing stage in BSVS Consider an input vector u. Let max sim− (u) denote the maximum similarity value − − + of sim(u, sv − i ) for all sv i in Vs , and max sim (u) denote the maximum similarity + + + value of sim(vi , sv i ) for all sv i in Vs . If max sim− (u) > max sim+ (u), u will belong to class −1; otherwise; u will belong to class +1. 3. Results and Discussion For convenience in discussion, let N denote the number of data, Rn denote the number ratio of negative samples to total samples, TN denote the true negative, TP the true positive, and AVG denote the average accuracy. In this study, we used GA to select support vectors. Parameters were set to be as follows. Population size was 50, crossover rate was 0.8, mutation rate was 0.05, and generation was 100. Two-point crossover and two-point mutation were used to find a solution. As for SVM, we used mySVM16 to train and test data. Table 1 shows the comparison of results between mySVM and BSVS for N = 180. Rn is equal to 0.5, indicating that there is a balance data distribution between the positive samples and negative samples. The results are based on “ten times ten-fold” cross validation. There are three options for BSVS. BSVS 0 denotes the classification based on the maximum similarity, BSVS 1 denotes the classification based on minimum similarity and BSVS 2 selects candidates based on both maximum and minimum similarities. The results from BSVS are comparable to those from mySVM (see Table 1). Moreover, with BSVS, we see that BSVS 1 has higher average accuracy than BSVS 0 and BSVS 2, indicating that BSVS 1 has a better match with the QRT data. Thus, we will use BSVS 1 as the base classifier of BSVS in the following experiments. Note that, also, the radial kernel result from mySVM gives a better detection rate. This result is consistent with the observation of the characteristics of the data distribution, which is obtained by employing SOM and is shown in Fig. 5. The distribution model of the input data shows that a higher detection rate can be obtained if we use the radial classifier. When Rn is low, it is generally difficult to obtain a high TN rate. To improve the TN rate, Rn will be increased as described in Ref. 7 by using the over-sampling20,21 Table 1. Comparison of average accuracy AVG between mySVM and BSVS with Rn = 0.5. Classifier mySVM mySVM mySVM mySVM mySVM BSVS 0 BSVS 1 BSVS 2

dot polynomial degree 2 polynomial degree 3 polynomial degree 4 radial

AVG 0.60 0.66 0.70 0.68 0.74 0.65 0.71 0.68

March 9, 2006 14:55 WSPC/115-IJPRAI

SPI-J068 00462

A New BSVS for Increasing Detection Rate of Credit Card Fraud

Fig. 5.

235

Overview of the data distribution for N = 180 and Rn = 0.5.

method to reproduce the data in the minority class. The replicating process is repeated several times, each time the same data are put into the training set. Instead of replicating the data, the personalized QRT approach7 improves accuracy by adding the data. First, they proposed to collect the abnormal QRT data from the online system and to introduce them into the training set. Training begins by using the SVM to produce a better classifier for getting a higher TN rate. Both results show that the TN rate increases with an increase in Rn. However, both methods are troublesome because they must collect or replicate negative samples, retrain, and retest many times to improve the fraud detection rate. In contrast, BSVS is simpler and does not need any replicating or adding processes. Table 2 shows the comparison of TN rate between mySVM and BSVS. The transaction items were divided into six subclasses. The ratio of the number of each subclass can influence the prediction accuracy. 5:4:3:2:2:2 stands for the number ratio of six subclasses. As can be seen, BSVS 1 has a higher TN rate without using the replicating or adding process for the case of 5:4:3:2:2:2. For the random case, if we employ the adding process, the TN rate will be as high as 0.8 at repeat times = 4. Note that, however, the TN rate depends considerably on the distribution of the data. The results in Table 1 also indicate this phenomenon. Table 2.

Comparison of TN rate between mySVM and BSVS with Rn = 0.1. mySVM

Repeat Times

5:4:3:2:2:2 (Adding)

Random (Adding)

5:4:3:2:2:2 (Replicating)

10 9 8 6 4

0.77 0.76 0.73 0.72 0.66

0.74 0.74 0.68 0.58 0.52

Base

0.50

0.50

BSVS 1 5:4:3:2:2:2

Random

0.70 0.70 0.70 0.70 0.70

Without adding or replicating process

Without adding or replicating process

0.40

0.89

0.6

March 9, 2006 14:55 WSPC/115-IJPRAI

236

SPI-J068 00462

R.-C. Chen, T.-S. Chen & C.-C. Lin

165 160 155 150 145 140 135 130 125 120

500 Normal 2.5% 5.0% 7.5% 10.0%

450

BSVS_0

400

BSVS_1

Total cost

Total cost

A good detection system of credit card fraud should take into consideration the cost of the detected fraudulent behavior and the cost associated with preventing it. To reduce the fraudulent cost, one of the most useful methods is to increase the TN rate (or lower the false positive rate). However, an increase in TN rate is usually accompanied by a decrease in TP rate, which in turn causes an additional confirmation cost and even the customer defection cost. Therefore, there is a compromise between these costs. The variation of total cost with confirmation cost is shown in Fig. 6. The confirmation cost varies with different companies. When the confirmation cost is low, a high TN rate leads to a lower total cost. The total cost of BSVS 1 is lower than that of BSVS 0 when the confirmation cost is low since the TN rate of BSVS 1 is higher than in BSVS 0. However, when the confirmation cost is significant, the total cost of BSVS 1 will exceed that of BSVS 0. Figure 6(a) shows the variation of the total cost with fixed confirmation cost. However, it must be specially noted that the confirmation cost depends on the response of the customer and is not necessarily a fixed cost. For example, if the behavior of a consumer is genuine, but is wrongly predicted as fraudulent, then for the first time the customer is likely to be receptive to the confirmation made by the credit card issuer. When the number of wrong prediction and confirmation processes increases, the customer would lose patience and would be very dissatisfied. His/her negative responses would become strong as the incidence increases. Strong negative responses from the customer mean that the confirmation cost would be large. This causes an escalating increase in the total cost as shown in Fig. 6(b). The straight line denoted by “Normal” in Fig. 6(b) reflects the fixed confirmation cost from BSVS 1 in Fig. 6(a). In the other curves reflected at 2.5%, 5.0%, 7.5%, and 10% of the power increment to the confirmation cost show the confirmation cost can no longer be a fixed cost as the number of wrong predictions increase, which causes an inverse parabolic increase in the total cost with the proportional increase in confirmation cost.

350 300 250 200 150 100

0 10 20 30 40 50 60 70 80 90 Confirmation cost

(a) Fixed cost Fig. 6.

0

10 20 30 40 50 60 70 80 90 Confirmation cost

(b) Varying cost

The variation of total cost versus confirmation cost.

March 9, 2006 14:55 WSPC/115-IJPRAI

SPI-J068 00462

A New BSVS for Increasing Detection Rate of Credit Card Fraud

237

4. Conclusions We have developed a new system that is based on the concept of the support vectors and the genetic algorithm to increase detection rates for credit card fraud with a personalized approach on a finite data set. Generally, it is difficult to get a high true negative (TN) rate if the number ratio of negative samples to total samples is low. With the proposed system, a high TN rate can be obtained. Moreover, compared to two previous approaches, i.e. the over-sampling (replicating the data in the minority class) and the adding-sampling (adding the data in the minority class), BSVS is simpler and more effective to get a higher TN rate. Acknowledgments The authors wish to express their appreciation to Jeanne Chen at Hungkung University and Jui-Chen Wu at Providence University for their help during the course of this paper. This work was supported in part by the National Science Council under grant No. NSC 94-2213-E-025-010. References 1. R. Brause, T. Langsdorf and M. Hepp, Neural data mining for credit card fraud detection, 11th IEEE Int. Conf. Tools with Artif. Intell. (1999), pp. 103–106. 2. M. P. S. Brown, W. N. Grundy, D. Lin, C. C. Nello, W. Sugnet, T. S. Furey, M. Jr. Ares and D. Haussler, Knowledge-based analysis of microarray gene expression data by using support vector machines, Proc. Natl. Acad. USA (2000), pp. 262–267. 3. C. J. C. Burges, A tutorial on support vector machines for pattern recognition, Data Min. Knowl. Disco. 2 (1998) 955–974. 4. P. K. Chan, W. Fan, A. L. Prodromidis and S. J. Stolfo, Distributed data mining in credit card fraud detection, IEEE Intell. Syst. (1999), pp. 67–74. 5. P. K. Chan and S. J. Stolfo, Toward scalable learning with nonuniform class and cost distributions: A case study in credit card fraud detection, Proc. 4th Int. Conf. Knowl. Disco. Data. Min. (Menlo Park, CA, 1997), pp. 164–168. 6. R. C. Chen, M. L. Chiu, Y. L. Huang and L. T. Chen, Detecting credit card fraud by using questionnaire-responded transaction model based on support vector machines, Lect. Notes Comput. Sci. 3177 (2004) 800–806. 7. R. C. Chen, T. S. Chen, Y. E. Chien and Y. R. Yang, Novel questionnaire-responded transaction approach with SVM for credit card fraud detection, Lect. Notes Comput. Sci. 3497 (2005) 916–921. 8. P. R. Harper, V. Senna, I. T. Vieira and A. K. Shahani, A genetic algorithm for the project assignment problem, Comp. Oper. Res. 32 (2005) 1255–1265. 9. S. J. Hong and S. Weiss, Advances in predictive models for data mining, Patt. Recogn. Lett. 22 (2001) 55–61. 10. D. S. Huang, Systematic Theory of Neural Networks for Pattern Recognition, Electronic Industry, Beijing, China (1996). 11. D. S. Huang and S. D. Ma, Linear and nonlinear feedforward neural network classifiers: a comprehensive understanding, J. Intell. Syst. 9 (1999) 1–38. 12. D. S. Huang, Radial basis probabilistic neural networks: model and application, Int. J. Patt. Recog. Artif. Intell. 13 (1999) 1083–1101.

March 9, 2006 14:55 WSPC/115-IJPRAI

238

SPI-J068 00462

R.-C. Chen, T.-S. Chen & C.-C. Lin

13. F. S. Maes, K. Tuyls, B. Vanschoenwinkel and B. Manderick, Credit card fraud detection using Bayesian and neural networks, Proc. NEURO Fuzzy, Havana, Cuba (2002). 14. T. M. Mitchell, Machine Learning (McGraw Hill, 1997). 15. R. W. Parks, D. S. Levine and D. L. Long, Fundamentals of Neural Network Modeling Neuro-psychology and Cognitive Neuroscience (MIT Press, 1998). 16. S. R¨ uping, mySVM-Manual, AI Unit University of Dortmund (2000). 17. L. K. Saul, Y. Weiss and L. Bottou (ed.), Advances in Neural Information Processing Systems 17: Proc. 2004 Conf. (MIT Press, MA Cambridge). 18. B. Y. Sun, D. S. Huang and H. T. Fang, Lidar signal de-noising using least squares support vector machine, IEEE Sign. Proces. Lett. 12 (2005) 101–104. 19. V. N. Vapnik, The Nature of Statistical Learning Theory (Springer, 1995). 20. G. M. Weiss and F. Provost, The effect of class distribution on classifier learning, Technical Report, Department of Computer Science, Rutgers University (2001). 21. R. Yan, Y. Liu, R. Jin and A. Hauptmann, On predicting rare classes with ensembles in scene classification, Proc. IEEE Int. Conf. Acoustic, Speech and Sign. (April 2003, pp. 21–24).

Rong-Chang Chen is Assistant Professor in the Department of Logistics Engineering and Management. He received his doctorate degree from National Chiao Tung University, Hsinchu, Taiwan, in 1994. He has six years working experience in management and R&D. Dr. Chen teaches supply chain management (SCM) and electronic commerce courses. His research has been published in Lecture Notes in Computer Science (LNCS), IEE Proc. Communications, Communications and Computer, and IEEE Communications Letters, and more. His research interests are decision support system (DSS), electronic commerce (EC), data mining, and financial engineering.

March 9, 2006 14:55 WSPC/115-IJPRAI

SPI-J068 00462

A New BSVS for Increasing Detection Rate of Credit Card Fraud Tung-Shou Chen received the B.S. and Ph.D. degrees from National Chiao Tung University in 1986 and 1992, respectively, both in computer science and information engineering. He served at the computer center, Chinese Army Infantry School, Taiwan, from 1992 to 1994. During the academic years 1994–97, he was with the Department of Information Management at National Chin-Yi Institute of Technology, Taichung, Taiwan. From August 1998 to July 2000, he was the Dean of Student Affairs and a Professor at the Department of Computer Science and Information Management at Providence University. Since August 2000, he has been a Professor in the Department of Information Management and the Graduate School of Computer Science and Information Technology at National Taichung Institute Technology, Taichung, Taiwan. His current research interests include data mining, machine learning, and data compression. His research has been published in IEEE Trans. Image, J. Electronic Imaging, Image Science J., J. Systems and Software, Lecture Notes in Computer Science (LNCS), Communications and Computer, and more.

239

Chih-Chiang Lin is a graduate student at the Graduate School of Computer Science and Information Technology, National Taichung Institute of Technology. He is currently working for Rising Star Technology Company. He is an experienced programmer with eight years working experience in R&D in footwear industry, pattern design/grading, advanced planning and scheduling, and software for garment design.

Suggest Documents