2013 IEEE 13th International Conference on Data Mining Workshops
A Novel and Successful Credit Card Fraud Detection System Implemented in a Turkish Bank Ekrem Duman
Ayse Buyukkaya
Ilker Elikucuk
Industrial Engineering Department Ozyegin University Cekmekoy, Istanbul, Turkey
[email protected]
Analytical CRM Department DenizBank Istanbul, Turkey
[email protected]
IT Data Mining Department DenizBank Istanbul, Turkey
[email protected]
Abstract— We developed a credit card fraud detection solution for a major bank in Turkey. The study was completed in about three years and the developed system has been in use since February 2013. It had a great impact in the rule based fraud detection process used by the bank. Indeed, while eighty percent of the rules have been eliminated and the number of alerts has been reduced to half, a significant increase in fraud detection has been recorded. As of now the system can catch ninety seven percent of fraud attempts online or, nearly online.
such as e-mail [5]. Fraudsters may also resort to generation of credit card numbers using Bank Identification Numbers (BIN). A recent scheme of Triangulation takes fraud fighters’ several days to realize and investigate [6]. In this method, the fraudster operates through an authentic-looking website, where he advertises and sells goods at highly discounted prices. The unaware buyer submits his card information and buys goods. The fraudster then places an order with a genuine merchant using the stolen card information. He then uses the stolen card to purchase other goods or route funds into intractable accounts. It is only after several days that the merchant and card owners realize about the fraud. This type of fraud causes initial confusion that provides camouflage for the fraudster to carry out their operations.
The study is interesting in both the formulation of the problem and the algorithms implemented. In fact, we noticed that the standard classification algorithms are not fully suitable for the fraud detection problem (as the cost of every individual false negative can be different from the others), and we looked for alternative methods, especially the meta-heuristics. Among them the newly introduced migrating birds optimization algorithm (MBO) turned out to be superior and was implemented. In addition, during the study a cost sensitive decision tree algorithm was developed and introduced to the literature.
Card frauds can be classified as offline and online. In online fraud, the transaction is made remotely and only the card’s details are needed. A manual signature, a PIN or a card imprint is not required at the time of purchase. Though prevention mechanisms like CHIP&PIN reduce the offline fraudulent activities through simple theft, counterfeit cards and NRI; online frauds (internet and mail order frauds) are still increasing in both amount and number of transactions.
Keywords—fraud; meta-heuristics; credit cards; migrating birds optimization
I.
INTRODUCTION
In the last two decades, there has been a significant growth in financial losses due to credit card frauds as the use of credit cards become more and more common. While many papers have reported huge amounts of loss in different countries (see e.g. [1-6]), a considerable amount is due to online frauds (e.g. according to Visa reports about European countries, about 50% of the whole credit card fraud losses in 2008 are due to online frauds [7]).
Credit card frauds not only result in financial losses but also they produce some reputational risks. Customers who have experienced a fraud case tell their stories to many others and, except in the simple theft cases, they see the bank as responsible. Accordingly, banks are expected to take the necessary precautions to prevent frauds. In this regard, if customers observe some security measures taken by their bank, such as receiving an SMS when they make a transaction in an unusual place, they may appreciate the effort. However, if such precautions are exaggerated, then customers start feeling uncomfortable and may switch to competitors.
Credit card frauds can be perpetrated in many different ways such as simple theft, application fraud, counterfeit cards, never received issue (NRI), replication of card information through skimmers or gathering sensitive information through phishing (cloned websites) or from unethical employees of credit card companies.
Credit card fraud detection is a popular but also an extremely difficult problem for (1) there comes only a limited amount of data with the transaction being committed (such as transaction amount, date and time, address, Merchant Category Code (MCC) and acquirer number of the merchant), (2) there are millions of possible places and e-commerce sites
Phishing involves acquiring of sensitive information like card numbers and passwords by masquerading as a trustworthy person or business in an electronic communication
978-0-7695-5109-8/13 $31.00 © 2013 IEEE DOI 10.1109/ICDMW.2013.168
162
where there is a high imbalance between the classes, the cost of mislabeling a minor class record (FN) could be larger than the cost of mislabeling a major class record (FP). However, this does not cope for the cases where the costs of mislabeling the records are varying individually (i.e. changing from record to record).
to use a credit card that makes it extremely difficult to match a pattern, and (3) there may be past transactions made by fraudsters that also fit a pattern of normal (legitimate) behavior [8]. In addition, the problem comes with many constraints. First of all, the profile of normal and fraudulent behavior changes constantly. In addition, the development of new fraud detection methods is made more difficult by the fact that the exchange of ideas in fraud detection, especially in credit card fraud detection is severely limited due to security and privacy concerns. Moreover, data sets are not made available and the results are often censored, making them difficult to assess. Consequently, there is no chance of benchmarking for the models built. Even more, while some of the studies are done using synthetically generated data [9-10], none of the previous studies with real data in the literature give details about the data and the variables used in classifier models. Furthermore, credit card fraud data sets are highly skewed sets with usually a ratio of about 20.000 to 30.000 legitimate transactions to one fraudulent transaction. Lastly, the data sets are also constantly evolving making the profiles of normal and fraudulent behaviors always changing [1-4].
Most of the past studies work on constant misclassification cost matrices or cost matrices composed of a number of constant heterogeneous misclassification costs. However, for example, a fraudulent transaction with a larger transaction amount or a larger available card limit should be more important to detect than one with a smaller amount or available card limit. A constant cost matrix or a combination of constant cost matrices cannot depict this picture. Accordingly, we should take such cases into account while working with classification problem under variable misclassification costs. This is one deficiency of the known solution methods. As will be detailed below, we developed a better solution method in this project. In an earlier study for another bank in Turkey we have implemented a hybrid algorithm based on a novel combination of genetic and scatter search algorithms [26] and throughout this study a cost sensitive decision tree has been developed as a PhD thesis and introduced to the literature [27]. Although decision tree algorithm was successful in detecting frauds especially when the number of alerts is large, it had problems when very small numbers of alerts are required. This is because decision trees produce fewer distinct fraud scores (determined by the number of leaf nodes) and thus there can be many transactions with the same (highest) fraud score. However, it cannot distinguish which of these are more suspicious if fewer alerts are desired. Thus, we returned to meta-heuristic algorithms again and found out that migrating birds optimization (MBO) algorithm [28] performed the best among our alternatives including the classical DM algorithms, the hybridized genetic algorithms and scatter search, and the artificial bee colony (ABC) algorithm [29]. MBO is quite a new algorithm and it was previously implemented for the solution of a combinatorial optimization problem [28].
Techniques used in fraud detection can be divided into two: supervised techniques where past known legitimate/fraud cases are used to build a model which will produce a suspicion score for the new transactions; and unsupervised techniques where there are no prior sets in which the state of the transactions are known to be fraud or legitimate. A brief discussion of supervised and unsupervised techniques can be found in [2]. Many of these techniques like artificial neural networks can be used in both manners, supervised or unsupervised. There is a large body of literature on credit card fraud detection. The general background of the credit card systems and non-technical knowledge about this type of fraud can be learned from [11] and [12]. The most commonly used fraud detection methods are rule-induction techniques, decision trees, neural networks, support vector machines (SVM), logistic regression, and meta-heuristics such as genetic algorithms. These techniques can be used alone or in collaboration using ensemble or meta-learning techniques to build classifiers. Most of the credit card fraud detection systems are based on supervised algorithms such as neural networks [6, 9-10, 13-19]. From the well-known decision tree techniques, ID3, C4.5 and C&RT are used for credit card fraud detection in [20-23]. Also, SVM is used in [24-25] for detecting credit card frauds.
The rest of this paper is organized as follows: Section II details the problem definition and explains the previous solution used in our partner bank. Section III summarizes the main tasks accomplished together with some notes from the story of the project. A brief description of the implemented MBO algorithm is given in section IV. The implementation phase is discussed in section V that depicts how the existing and new systems are merged and what benefits were obtained. The paper is summarized and concluded in section VI.
Classification and in particular fraud detection problems have traditionally been tackled by data mining (DM) algorithms. In traditional settings of DM, the costs of both types of errors are taken as the same. However, in some software packages the algorithms are implemented in a way that, the cost of one type of the errors can be taken as a constant multiple of the cost of the other type of the errors. This utility of software packages is useful because especially
II.
DESCRIPTION OF THE CASE
In our partner bank, a rule based fraud detection system was in place. These rules were generated from a list provided by the credit cards software vendor (who has many
163
transactions, MCC statistics, daily amount statistics, channel usage statistics, and daily number of transactions statistics.
international customers) and from the own experience of bank’s Customer Security and Fraud Department. The rules were basically of two types: online and offline. Online rules produce alerts to be inspected by inspection officers who work 7 days and 24 hours. Suspicious transactions appear on their screen as a list where the most suspicious ones appear at the top. The inspector can reach the other details and previous transactions of the cardholder and then decide either the transaction is legitimate or check the transaction with the customer (by sending SMS or by calling). Offline rules are also known as hidden rules and they automatically send SMSs to customers that basically include the message “from your card such and such transaction has been made, if it is out of your consent please call your bank.” Most of the fraud cases are identified when the customers make a reply to these SMSs. For offline rules and also for most online rules, the bank first gives provision to the transaction then, if a fraud is captured, the card is blocked to prevent any further frauds.
The existing fraud detection solution was a quite successful one as compared to other banks in Turkey (this, we know from our experiences in other banks). We would like to note that Turkey is well known with its advanced banking IT infrastructures and software systems in Europe (as was stated many times during the bank acquisitions and mergers in the last decade). This fraud solution was composed of some expert rules based on the experiences of two credit card software vendors and on the own experiences of the bank. With these rules some groups of transactions that are more likely to include suspicious ones are determined (according to expert opinions or information obtained from the official organization Inter Banks Card Center). For example, since internet transactions are statistically (and logically) more risky, they were sending out SMSs to all internet transactions. Later, by an improvement study they wanted to reduce the number of SMSs since although one individual SMS is very cheap (about two dollar cents), millions of SMSs sum up to considerable amounts. In this regard, they decided not to send SMSs to transactions whose amounts are less than 50 dollars.
The partner bank in this project has been using data analysis/mining solutions for some years and a lot of historical information regarding customers and their credit card usages had been collected. Unfortunately, however, these data marts had been developed mostly for marketing and risk purposes, and they provided only limited information for fraud detection. We thus had to invent and introduce new variables that would be directly beneficial in the fraud detection problem.
From our perspective such a reduction in SMSs can be quite risky in that (1) the real risk the bank is faced to is the remaining available limit on the card rather than the amount of current transaction, and (2) sometimes fraudsters test counterfeit cards by smaller transactions and if everything is okay they continue with larger ones. Thus, one needs to track the remaining card limits to prevent more risk as much as possible.
The number of transactions for each card differs from one to other; however, each transaction record is of the same fixed length and includes the same fields. These fields include transaction date and hour, transaction amount, transaction type, transaction channel, MCC code, merchant address, and the currency used in the transaction [15]. While transaction type shows whether this transaction is a purchase or a cashadvance, transaction channel depicts the medium of the transaction (POS, web, ATM, etc.), and MCC code displays the type of the merchant store where the transaction takes place. Each merchant type has a specific four-digit number assigned, for example, all pharmacies have one common MCC code. Sometimes larger merchants can be assigned a unique code, for example, while all small hospitals are assigned the same code, a large and well-known hospital can have its own code.
We now state our problem definition more explicitly. As in [26] we define: TP = the number of correctly classified frauds TN = the number of correctly classified legitimates FP = the number of false fraud flags FN = the number of transactions classified as legitimate but are in fact fraudulent c = the cost of monitoring an alert
As we anticipated, the information associated with a transaction is quite limited and thus more information (variables) is needed to build a better model. As such, the variables that form the card usage profile and the methods used to build the model make the difference in the fraud detection systems. Our aim in defining the variables used to form the data mart is to discriminate the profile of the fraudulent card usage by the fraudsters from the profile of legitimate (normal) card usage by the actual card holders. The variables are one of the six main types: statistics on all transactions of cardholders, statistics on regions of
TFL = the total amount of losses due to fraudulent transactions S = savings in TFL with the use of fraud detection system ρ = savings ratio More precisely, we set: TFL = sum of the available limits of the cards whose
164
Many of the MCC codes form natural groups. So, instead of working with hundreds of codes, we collected them into 25 groups according to their nature and the risk of availability to commit a fraud. The goods or services bought from merchant stores in some MCC codes such as jewelry stores or stores selling prepaid minutes for cellular phones can be easily converted to cash. As a result, transactions belonging to these MCC codes are more open to fraud and more risky than the transactions belonging to others. The grouping of the MCC codes are done according to both the number of the fraudulent transactions made belonging to each MCC code and the interviews done with the personnel of the bank with domain expertise about the subject.
transactions are labeled as TP or FN c = cost of monitoring including staff wages, SMSs, phone calls (on average it is a small figure usually less than a dollar) S = (available limits of the cards of TP transactions) c(FP+TP) ρ = S/TFL where the maximum value S can take is TFL and ρ can take is 1. As is clear, a good predictor will be the one with a high ρ.
Though some of the customers may have more than one credit card, each card is taken as a unique profile because customers with more than one card generally use each card in a different customer profile with a different purpose. Turkey have passed through a very long high-inflation period and during this time period people have developed the habit of having and using two credit cards (typically from two different banks) with different statement and last payment dates. For example, one with mid-of-month and the other with end-of-month payment date. Each time they needed to use credit card, they have preferred the one having the farther payment date. Because of this and also since the ratio of customers having two cards in our bank are too small, we contended with working on the card level only. In our data mart every card profile consists of a set of variables each of which discloses a behavioral characteristic of the card usage.
Another study that indicated the need for an alternative performance metric in credit card fraud detection is due to Hand et. al [30]. In that study, the authors define two measures, one based on minimizing the overall cost to the card company, and the other based on minimizing the amount of fraud given the maximum number of investigations the card company can afford to make. III.
FLOW OF THE PROJECT
A project team of about 10 people including members of the Analytical CRM Department (developer of most customer analytics projects), the data mining section of IT, the Customer Security and Risk Department, and a faculty and a PhD student was formed. This team made weekly project meetings throughout the three years course of the study. The studies made can be categorized as Data Preparation, Data Analysis, Modeling and Testing, and Implementation. Some notes about the first three categories are given next. Notes about the last title (implementation) are discussed in section V.
The past data in the credit card data warehouse are used to form a data mart representing the card usage profiles of the customers. The data in the data mart is used to form the training set of the modeling phase and the test set for the trained models.
A. Data Preparation The amount, the transaction channel and the MCC code are the most important information carried on a transaction. Using this data we have produced 28 different variables that can be indicating if the incoming transaction is in compliance with the characteristics of the customer or not (some of the variables are duplicated to show the figures at different time intervals like the average card usage at night, in morning hours, in the afternoon and so on, and this way the number of variables in the data mart reached to hundreds). These variables are derived not only from the transaction itself but also from the past transaction history of the card. They may show the spending habits of the customers with respect to geographical locations, days or weeks of the month, hours of the day, channel usages or MCC codes. Later on, these variables are used to build models to be used in the fraud detection systems to identify fraudulent activities that display significant deviations from the normal card usage profile stored in the data mart.
B. Data Analysis The original data of the time period used to form the training set have about 22 million transactions. The distribution of this data with respect to being normal or fraudulent is highly skewed. The time period that is used to build our sample included 978 fraudulent records and 22 million normal ones with a ratio of about 1:22.500. So, to enable the models to learn both types of profiles, some under sampling or oversampling techniques needed to be used. Instead of oversampling the fraudulent records by making multiple copies or etc., we used stratified sampling to under sample the legitimate records to a meaningful number. Firstly, we identified the variables that show the most different distributions with respect to being fraudulent or normal. Then, we used these variables as the key variables in stratified sampling so that the characteristics of their distributions with respect to being fraudulent or not remained the same. For stratified sampling, we used the top five variables which showed the most different distributions to form a stratified sample with a ratio of one, nine or, ninety-nine legitimate
165
Table 1. Results of Algorithms
transactions to one fraudulent transaction (see the next section). The stratification process is done using the stratified sampling node in IBM SPSS Modeler. Before going on to modeling phase, we had conducted a uni-variate data analysis. In this analysis the correlation between the target (is the transaction fraudulent or not?) and the input (other) variables are investigated and the ones having a high correlation are considered to be inputs in the fraud prediction models. C. Modeling and Testing We first tried to find the best composition (balance of positive (fraud) and negative (legitimate) transactions of the training set. In literature 1-to-1 balance is preferred for most data mining applications. However, as credit card data is highly skewed, we believed a higher ratio of negatives could be more helpful. For this purpose, we tried 1-to-1, 1-to-9 and 1-to-99 (i.e. half, 10% and 1% of the samples are frauds) sets. For each sample, 70% of the data (both 70% of the normal transactions and 70% of the fraudulent transactions) are taken as the training set of the models; and 30% of the data are taken as the test set to evaluate the performance of the models developed. Each training set had 685 fraudulent records (978 * 0.70) – same records in all training sets, and each test set had 293 (978 * 0.30) fraudulent records – same records in all test sets.
Algorithm
TPR
Saved Limit Ratio (ρ)
NN
91.74
89.43
Classical DT
90.91
86.44
SVM
83.06
78.27
Novel DT
90.78
94.15
GASS
82.78
91.45
ABC
90.7
92.5
MBO
91.45
93.9
We compared a number of classical DM algorithms including four types of decision trees (C5.0, C&RT, QUEST, CHAID), six types of Neural Networks (as provided by IBM Modeler as a package), four types of support vector machines, logistic regression and four cost sensitive methods developed by ourselves: a novel decision tree (NDT), GASS (combination of genetic algorithms and scatter search), ABC (artificial bee colony) algorithm and MBO (migrating birds optimization) algorithm. Results of the best algorithm of each type are shown in Table 1 where out-of-time sample results are tabulated at 8% cut-off. Most algorithms performed more or less comparable in terms of TPR values but cost sensitive methods performed better in terms of ρ. In the top 8%, MBO and NDT were best, which were followed by ABC and GASS. However, for the lower cut-offs the performance of NDT has decreased leaving the second place to GASS. Consequently, MBO was decided to be implemented in this project. Actually, the superiority of MBO to its followers GASS and ABC was not very strong and verifiable via t-tests. However, as we needed to choose one of them we preferred MBO since it obtained the best results when 22 million transactions were scored. In the next section the basic ideas of MBO are explained.
Based on the results on the test sets we concluded that 1to-9 set has the best performance (regarding the performance comparisons on the 1-to-99 set as it is closest to the real distribution of frauds and legitimates) and thus we decided to continue with that. Later, we decided to change the test set from an “out-of-sample” test set as described above to an “out-of-time” test set. For this purpose, we took the set of all transactions in the following couple of months where 484 fraud cases had been identified. We scored all transactions, sorted them in a non-increasing order of fraud scores and checked the performances of algorithms at the top 8%, 4%, 2% and 1% of the list if ever so much alerts had been produced. 8% corresponded to the number of SMSs sent in the existing system due to the offline rules and 1% corresponded to the number of transactions falling on the screens of inspectors.
IV.
MBO ALGORITHM
Birds spend energy for two purposes while they are flying: to stay in the air (induced power) and to move forward (profile power). The induced power requirement can be less if more than one bird fly together. As such, while a bird moves through the air, air goes above and below the wings. The air flow over the upper surface has to move farther than the lower part of the wing (because upper part has a larger surface) and this causes the air on the upper part to have a lower pressure than the air moving over the lower part (Fig. 1). This pressure difference makes the lifting possible by the wing.
The performance criterion we used for evaluating the algorithms is the savings ratio ρ as described above. However, we have also tabulated the TPR (true positive rate) values which show the ratio of all fraud cases identified. Note that TPR is tried to be maximized in a classical DM algorithm and thus here we wanted to see how our cost sensitive algorithm performs in that sense also. As a quick observation we can say that in the top 8% cut-off, while the TPR and ρ values are about the same with the use of classical DM algorithms, in our cost sensitive methods ρ values were 5 to 7 per cent higher than the TPR values.
The regions of upwash may contribute to the lift of a following bird (see Fig. 2), thus reducing its requirement for
166
induced power [31]. This explains why birds, especially the migrating birds which have to fly long distances fly together in specific formations.
(corresponding to the leader bird), and progressing on the lines towards the tails, each solution is tried to be improved by its neighbor solutions (for the implementation of QAP (quadratic assignment problem), a neighbor solution is obtained by pairwise exchange of any two locations). If the best neighbor solution brings an improvement, the current solution is replaced by that one. There is also a benefit mechanism for the solutions (birds) from the solutions in front of them. In the first implementation of MBO [28] this benefit mechanism is defined as sharing the best unused neighbors with the solutions that follow (here “unused” means a neighbor solution which is not used to replace the existing solution). In other words, a solution evaluates a number of its own neighbors and a number of best neighbors of the previous solution and considered to be replaced by the best of them. Later, in this study we tried another benefit mechanism where some parts of the solutions are shared with the following solutions. Once all solutions are improved (or tried to be improved) by neighbor solutions, , this procedure is repeated a number of times (tours) after which the first solution becomes the last, and one of the second solutions becomes first and another loop starts. The algorithm is stopped after a specified number of iterations.
The V formation is the most famous formation that the migrating birds use to fly long distances. As can be noticed easily, birds align themselves to upwash regions in this formation. Air Flow
Bird’s Wing
Air Flow Fig. 1. Wing of a bird (taken from Duman et al. [28]).
The pioneering study that brings a mathematical explanation to the energy saving in V formation is that of Lissaman and Schollenberger [31]. In that study, it was stated that as birds approach each other (a smaller WTS = wing tip spacing) and as the number of birds increase more energy is saved. Thus a group of 25 birds for example would have approximately 71 per cent more flight range than a single bird. These results were obtained from aerodynamics theory where birds at the sizes of a plane were assumed. Also, for the WTS only positive values were assumed (i.e. the case of overlapping were not considered). Later it was shown that some overlap is useful [32]:
Below, first the notation used and then the pseudo-code of the MBO algorithm as used in [28] are given. Let, n = the number of initial solutions (birds) k = the number of neighbor solutions to be considered x = the number of neighbor solutions to be shared with the next solution
WTSopt = -0.05b
m = number of tours
where b is the wing span.
K = iteration limit
tip
tip
Tr
x rte
ai
lin
vo
g
g
Rear View
i l in
vo r
te
x
a Tr
Pseudocode of MBO: 1. Generate n initial solutions in a random manner and place them on a hypothetical V formation arbitrarily. 2. i =0 3. while (i