credit card fraud detection based on behavior mining

0 downloads 0 Views 145KB Size Report
Higher acceptability and convenience of credit cards for purchases has not only given personal comfort to customers but also attracted a large number of ...
CREDIT CARD FRAUD DETECTION BASED ON BEHAVIOR MINING NIMISHA PHILIP1, SHERLY K.K2 Department of Computer Science & Engineering, Toc H Institute of Science & Technology email:[email protected] Department of Information Technology, Toc H Institute of Science & Technology, Ernakulam, Kerala, India [email protected] Abstract — Globalization and increased use of the Internet for online shopping has resulted in a considerable proliferation of credit card transactions throughout the world.Higher acceptability and convenience of credit cards for purchases has not only given personal comfort to customers but also attracted a large number of attackers. As a result, credit card payment systems must be supported by efficient fraud detection capability for minimizing unwanted activities by adversaries. Most of the well known algorithms for fraud detection are based on supervised training Every cardholder has a certain shopping behavior, which establishes an activity profile for him. Existing FDS try to capture behavioral patterns as rules which are static .This becomes ineffective when cardholder develops new patterns of behavior Here, we propose a unsupervised method to dynamically profile behavior pattern of customer Then the incoming transactions are compared against the user profile to indicate the anomalies, based on which the corresponding warnings are outputted. A FP tree based pattern matching algorithm is used to evaluate how unusual the new transactions are. Key words- Fraud detection, adaptive profiling, FP tree, Behavior mining

----------------------------------------------------------------------------------------------------------------------------------------------------patterns. Fraud detection based on the analysis of existing purchase data of cardholder is a promising way to reduce the rate of successful credit card frauds .Deviation from such patterns is a potential threat to the system.

I. INTRODUCTION Due to a rapid advancement in the electronic commerce technology, the use of credit cards has dramatically increased. As credit card becomes the most popular mode of payment for both online as well as regular purchase, cases of fraud associated with it are also rising. A few of many ways that money can be thieved from the credit card are Phishing, Pharming, Skimming and Dumpster driving. Hackers and fraudsters are becoming more sophisticated and skillful at manipulating internet protocol, web languages and tools to or discover any weakness that they can exploit. Thus the internet transaction fraud is 12 times higher than instore fraud.

II. PROBLEMS WITH CREDIT CARD FRAUD DETECTION One of the biggest problems associated with credit card fraud detection is the lack of the both literature providing experimented results and of real world data for researchers to perform experiments on. This is because fraud detection is often associated with sensitive financial data that is kept confidential for reasons of customer accuracy.

Credit-card-based purchases can be categorized into two types: 1) physical card and 2) virtual card. In a physical-card- based purchase, the cardholder presents his card physically to merchant for making a payment. To carry out fraudulent transactions in this kind of purchase, an attacker has to steal the credit card. If the cardholder does not realize the loss of card, it can lead to a substantial financial loss to the credit card company. In the second kind of purchase, only some important information about a card (card number, expiration date, secure code) is required to make the payment. Such purchases are normally done on the Internet or over the telephone. To commit fraud in these types of purchases, a fraudster simply needs to know the card details. Most of the time, the genuine cardholder is not aware that someone else has seen or stolen his card information. The only way to detect this kind of fraud is to analyze the spending patterns on every card and to figure out any inconsistency with respect to the “usual” spending

Some of the properties a fraud detection system should have in order to perform some good results. • The system should distributions

be able to handle skewed

• The ability to handle noise. • Overlapping data • The systems should be able to adapt themselves to new kinds of fraud. • There is a need for good matrix to evaluate the classified system. • The systems should take into account the cost of the fraudulent behavior detected and cost associated with stopping it.

7

TIST.Int.J.Sci.Tech.Res.,Vol.1 (2012), 7-12

The rule indicates that for all transactions of a customer recorded in a time window, 19% (support) transactions are playing “Xbox contest” in “Saturday”, “8pm-10pm”. There is a 70% probability (confidence) that if a transaction happens in “Saturday”, “8pm-10pm” it would be an Xbox contest.

III. RELATED WORK Most previous work for fraud detection or anomaly detection was classification based. Famous algorithms utilized for fraud detection are NB (naïve Bayesian), proposed by Elkan et al [10], C4.5, by Quinlan [16] and BP (Back-propagation)[2].The researches of Domingos[8], Elkan [9][10] and Witten[8] NB algorithm is very effective in many real world data sets and is extremely efficient in that it learns both linear and nonlinear relationships directly from the data being modeled. in a linear fashion. However when attributes are redundant and not normally distributed, the predictive accuracy is reduced. C4.5 can output not only accurate predictions but also explain the patterns, decision tree and rule set, in it. However, scalability and efficiency problems, such as the substantial decrease in performance, can occur when C4.5 is applied to large data sets. Backpropagation neural networks can process a very large number of instances, and have a high tolerance to noisy data. The BP algorithm requires long training times and extensive testing and retraining of parameters. The common disadvantage of all these algorithms is that they rely on supervised training, which requires human involvement to prepare training cases, and test cases to optimize parameters. To solve this problem, a fraud detection system is proposed to detect fraudulent transactions in an online system .It dynamically profiles user behavior pattern. Since most online systems contain non-stationery data, the system is able to adjust its detector to keep up with the change of the user behavior.

Brin et al [4] introduced a dynamic itemset counting technique to reduce the number of database scans. Ozden et al[15] presented cyclic and interesting association rule mining. Ng et al [14] introduced a constraint-based rule mining technique. Cheung et al[6] presented an incremental updating technique to discover the association rules in a large scale database. Some popular rule generators, RL proposed by Clearwater [7], C4.5 by Quinlan [16] for example, are based on supervised learning, An FP-tree (frequent pattern tree) structure and FPtree growth algorithm, proposed by Han [12] are utilized to uncover these hidden association rules from the recent transactions for this user. V. FRAUD DETECTION SYSTEM An online transaction system includes several web applications and services to provide OLTP (OnLine Transaction Processing stages), FDS (Fraud Detection System), a database storing transaction data, and a database replication in order to provide minimum performance degradation on OLTP by backend data process or analysis. FDS is a backend process, whose impact on the front end of the online system is minimized, since it only talks to the replication database. Fraud Detection System consists of three major modules; (1) Data engine serves as an interface between the replication database and the FDS. It collects and pre-formats the recent transactions of all individual customers in the online system (2)The rule engine module mines the recent transactions to generate a profile, an association rule set stored in an FP-tree, for each user.(3)The rule monitor module monitors the new transaction for every user. Any new transaction of a particular user is compared against the FP-tree for that user to indicate the anomaly.

IV. BASIC IDEA The association rule is first introduced by Agrawal et al [1][17]. The following is a formal statement of the problem: Let I = {i1, i2, . . .,im} be a set of literals, called items. Let D be a set of transactions, where each transaction T is a set of items such that associated with each transaction is a unique identifier, called its TID. A transaction T contains X, a set of some items in I, if X‫ك‬ T. An association rule is an implication of the form X=>Y where X‫ك‬I,Y‫ك‬I and X‫ځ‬Y=‫׎‬. The rule X=>Y holds in the transaction set D with confidence c if c% of transactions in D that contain X also contain Y. The rule X=>Y has support s in the transaction set D if s% of transactions in D contain X‫ڂ‬Y. Two important measures for association rules, support and confidence, are defined as follows. Support is the statistic significance of an association rule. While a high support is often desirable for association rules. The confidence of a rule indicates the degree of correlation in the dataset between X and Y. It is used as a measure of a rule’s strength. Often a large confidence is required for association rules. An example of a association rule:

FIG 1. ARCHITECTURE OF FDS

day(“Saturday”) /\time(“8pm-10pm”)→play(“Xbox contest”) [support=19%, confidence=70%].

8

TIST.Int.J.Sci.Tech.Res.,Vol.1 (2012), 7-12

5. Until no change

A. Data Engine It collects and pre-formats the most recent transactions of an individual user from a transaction database which are analyzed to profile his current behavior. The ‘recent’ is a slide window, which could be a time window or a transaction count window. For example, recent transactions could be all the transactions in the past two months, or the recent 500 transactions.

Monthly spending sequence is considered where a month is divided into four weeks and week is divided into seven days as Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, and Sunday. A day is divided into two morning and evening. IP address of the merchant and name of the city where he is located are also considered to capture the user profile. The items purchased by the user also considered. They are classified into electronic items, clothes, books, groceries, miscellaneous...etc.

Marketing-related statistical study shows that most people exhibit a consistent spending nature, although the consistency may be disturbed to a certain extent due to emergency or accidental causes Each transaction is an itemset. A set of items or attribute-value pairs is referred to as an itemset. Normally, the transactions that are stored in the issuing bank’s database contain many attributes.. Since humans tend to exhibit specific behavioristic profiles, every cardholder can be represented by a set of patterns containing information about the typical purchase category, where they shop, in which country, with what merchants, the time since the last purchase, the amount of money spent and customer demographic features such as IP address, city name etc. The values of the attributes are the converted into categorical variables.

If user city matches merchant city, the purchase is local. If user country matches with merchant country, the purchase is national otherwise international An example of recent transactions for a customer for a typical online transaction system is shown below. Each transaction includes the attributes such as purchased product category, time (in weekday), time (in day part), grouped IP address, the grouped purchased amount and area of purchase. TABLE I: Tid 1 2 3 4 5

In credit card transaction processing, every card has a certain credit limit and any transaction within the available credit limit is a syntactically valid transaction. Credit cardholders, however, normally carry out transactions for values usually much lower than the credit limit of the card .So, spending behavior related to transaction amount is captured by clustering the amount into three clusters: High amount (ch), Low amount (cl), Medium amount (cm) using k-means clustering algorithm. K-means clustering algorithm is an unsupervised learning algorithm for grouping a given set of data based on similarity in their attribute values.

Recent Transactions of a Customer Transaction data ET,ST,EV,139.168,international,cm ET,ST,MR,202.55,national,cl ET,SU,MR,139.168,international,ch BK,ST, EV, 139.168, international, cl CL,ST,EV,139.168,international,cl

The abbreviations: ET-electronic items, BK-book, CLclothes, ST-Saturday, SU-Sunday, MO-Monday, EVevening, MR-morning, cm-Medium amount , cl-low amount, ch-High amount,TUE-Tuesday,THURThursday, FRI-Friday, WED-Wednesday,GR-Groceries, Misc- Miscellaneous. B. Rule Engine The major responsibility of rule engine is to adaptively generate association rule sets to profile user behaviors.

K-means clustering algorithm Algorithm: k-means The k-means algorithm for partitioning where each cluster center Is represented by the mean value of the objects in the cluster.

Set min_supp as 60%, min_conf as 65%.The occurrence frequency is the no of transactions that contains the itemset.(support count).If the frequency of an itemset is greater than or equal to the product of min_sup and the total no of transactions ,then it is a frequent set. Since the rarely occurred items would be filtered out when we generate the frequent itemsets, the rarely occurred behavior such as fraudulent actions will be filtered out.

Input: k-number of clusters D-a data set containing n objects Output: A set of k clusters Method: 1. Arbitrarily choose k objects from D as the initial cluster centers

We use the FP-tree (Frequent Pattern tree) growth algorithm proposed by Han [12] to extract the associations among features from transactions during a certain period in order to profile the user’s behavior and changes in these patterns, may signal fraudulent use. FP tree looks for behavior changes that may imply fraud.

2. Repeat 3. (re)assign each object to the cluster to which the object is the most similar based on the mean value of the objects in the cluster 4. Update the cluster means; calculate the mean value of the objects for each cluster.

FP-tree structure is used to store compressed, crucial information about frequent patterns. It consists of

9

TIST.Int.J.Sci.Tech.Res.,Vol.1 (2012), 7-12

a linked table and a prefix tree, which store quantitative information about frequent patterns.

Since different kinds of frequent items are of different importance to profile a user or a system. For example, a matched IP pattern could be more important than a matched time pattern. A weight function, weight (ti), was used to give various stresses to the different item types. So increase sim_credit (ti) by G(s, c) × weight (ti) instead of G(s, c). The weight function is a fixed look up table, which maps the different item types to different weights. A neural network is used to train real data to get the optimized look up table.

An FP-tree is constructed from an empty root and a header table. Branches of the FP-tree are then inserted into the tree by scanning the transactions a second time. The items in each transaction are processed in the order of descending support A branch is created for each transaction. For two branches sharing a common prefix, we merge the shared path and increase count of the nodes in that path by one. To make tree traversal easy, a header table is built so that each item pointer points to its occurrences in the tree via a chain of node links. C. Rule Monitor The customer’s profile is utilized to monitor a new transaction to indicate how unusual the new transaction is. At the same time, FP-tree is updated adaptively by accumulating the occurrences of the attributes in new transactions of an individual customer without human involvement.

sim(T) represents the extent that a new transaction is comparable to the customer’s normal behavior patterns. It is compared against a set of thresholds to determine the corresponding fraud likelihood. 2. Alert Accumulating Algorithm A set of thresholds is set up to fire corresponding fraud warnings. Technically, it is possible to misclassify some legal transactions, which do not follow the customer’s normal behavior. Since the frequent items are filtered by min support before creating the FP-tree. Therefore the tree is not completed; the very unusual patterns are not collected in the tree. Moreover, a customer could also suddenly change his or her behavior. Another important issue is that in order to minimize the fraud detection cost, the purchased amount is a factor of firing alarm. For a very small purchase amount, for example .50$, even it is highly suspicious, fire an alarm is not economical. Since the objective of fraud detection action is to minimize the total cost. By using a suspicious threshold for a single transaction, a sequence of fraud transactions with low purchasing amounts could be missed.

Two techniques are utilized to build the rule monitor. To indicate the anomaly of a new transaction, a FP-tree based pattern matching algorithm is designed. And an alert accumulating algorithm is used to lower the false alarm and to detect a set of fraudulent transactions with low suspicious values. 1. FP-tree based pattern matching algorithm Suppose T = {t1,t2,…,tn} is an incoming transaction. For each frequent item t, calculate a similarity credit sim_credit (ti).by the following steps Double SimMatch(T) { sim=0.0;

To solve these problems, accumulate the warnings from a set of new transactions and calculate the alert values for a set of transactions by accumulating their suspicious values. The transactions to be processed could include all the new transactions after last FP-tree updating, or can use an expiring time window to specify the transactions we would like to accumulate. Then by comparing the alert sum instead of the single alert against a set of thresholds, a corresponding fraud alarm would be fired. The threshold set is decided by the detection sensitivity specified by the user.

For each item ti in T { If(found headtablelink, in the head table) { sim_credit=0.0; headtablelink

for

each

tree

node

in

Nij

in

if (Pj(ti) ‫ك‬T) sim_credit+=G(Nij.s,Nij.c)*weight(ti);

Accumulate all transactions within the specified time window. A simple step function is the most straight forward expiring function. For step function, if time of a transaction, t, is larger than T2, f(t) equals to 1, else f(t) equals to 0.

sim+=sim_credit; } } return sim;

The fraud alert value is calculated by:

}

AlertValue = ∑i=1 n-1(s(Ti) × f(Ti) × amount(Ti)), where Ti is a transaction in the accumulation window, s(Ti) is the suspicious value of Ti, f(Ti) is the expiring weight of Ti specified by expiring function, amount(Ti) is the consumed money of transaction Ti is reasonable that several highly suspicious transactions having very low consumed points will not fire an alert, by contrast, only

For implementation, G(s, c) = –s × log2(1+ε–c), where s is the support, c is the confidence, s≤c≤1.0, ε is a real number used to specify the upper boundary of function G. The function is chosen by the intuition that if a new transaction matches with a rule having larger probability it is more likely to match the user’s behavior, and the confidence value is emphasized

10

TIST.Int.J.Sci.Tech.Res.,Vol.1 (2012), 7-12

transition probability matrix (TPM) and initial probability distribution vector (IPD). Here, the number of states is three representing three clusters. This block generates synthetic transaction amounts for genuine cardholder. The values associated with TPM and IPD can be changed to capture the spending behavior of a cardholder properly.

one highly suspicious transaction having very high consumed points could fire an alert. 3. Output The output of FDS is an alert value indicating the suspicious level of analyzed transactions. This value is sent back to a web application. By comparing the alert value to a set of thresholds, we are able to know the corresponding reaction to be performed. For example, if the alert value for a transaction or a set of transactions is below .45, no warning is given. If the alert value is in a range of .45 to .85, an email is sent to the user in order to give a fraud warning. If the alert value is higher than .85, temporally lock the account to prevent further fraud or loss. And the user should also be informed by an email or a message to verify the transaction and unlock his or her account. If the user confirms this is a fraud transaction, further protection should be performed, for example changing the password. If the user confirms this is a legal transaction, unlock his account and the transaction would be added to the recent transaction set, from which the customer’s FP-tree would be updated. Therefore the user profile, FP-tree, could be adaptively changed by keeping up with the changing of the user’s behavior.

Fraud Markov Chain Module (FMCM): Similar to GMCB, this block generates synthetic fraud transaction amounts. The values associated with TPM and IPD can also be changed in order to capture changing behavior of the fraudster. We use standard performance metrics to analyze the different test cases. True Positive (TP) is the percentage of fraudulent transactions identified as fraudulent. False Positive (FP) is the percentage of genuine transactions identified as fraudulent. It is important to achieve high TP along with low FP in a fraud detection system. However, design constraints are such that any attempt to improve TP results in higher FP. So we empirically determine the design parameters of the system and then simulate by varying inputs. VI. CONCLUSION A novel fraud detection framework is proposed. Individual user’s behavior pattern is dynamically profiled from the transactions by using a set of association rules. An FP-tree (frequent pattern tree) structure and FP-tree growth algorithm are utilized to uncover these hidden association rules from the recent transactions for the user. The incoming transactions for that user are then compared against the profile in order to discover the anomalies, based on which the corresponding warnings are outputted. The FP-tree growth algorithm has been improved and used it for pattern matching. The real world scenario is captured using Markovian modulated poisson process. Unsupervised training and self adjustment to changing user behavior make the proposed system effective for monitoring online transaction systems and provide fraud detection and protection.

4. Data Preparation In real life, any actual credit card database contains fraudulent transactions interspersed with genuine transactions and these two types of transactions are generated by two different parties, namely genuine cardholders and fraudsters. Hence these are independent events with separate arrival rates.. Transaction arrival rate is also an important parameter for evaluating a fraud detection system. In the credit card fraud detection domain, the occurrence of fraudulent transactions is very low compared to the number of genuine transactions. Hence, mixing of genuine and fraudulent transactions need to be controlled properly. However, the existing synthetic data generation models are not capable of controlling transaction arrival rates as well as mixing of two types of transactions. Hence, this real life scenario is captured more accurately using a Markov Modulated Poisson Process. Therefore, we have constructed an MMPP-based transaction simulator consisting of three main components,

VII. REFERENCES [1]. Agrawal, R. and Srikant, R. (1993): Fast algorithms for mining association rules. In Proc. of the 20th Intl. Conf. on Very Large Data Bases, pp.478–499. Santiago, Chile. [2]. Agrawal, R. and Srikant, R. (1995): Mining sequential patterns. In Proc. of the International Conference on Data Engineering, 3–14. Taipei, Taiwan. [3] J. Xu, A. H. Sung and Q. Liu (2005). Online fraud detection system based on non-stationery anomaly detection, The International Conference on Security and Management. [4]. Brin, S., Motwani, R. and Silverstein, C. (1997): Beyond market basket: generalizing association rules to correlations. In Proc. of the ACM SIGMOD Intl. Conf. on Management of Data, pp. 265– 276. Tucson, Arizona, USA. [5]. Chan, P. and Stolfo, S. (1998): Toward scalable learning with nonuniform class and cost distributions: A case study in credit card fraud detection. Proc. of the Fourth International Conference on Knowledge Discovery and Data Mining, pp.164–168.

Markov Modulated Poisson Process Module (MMPPM): MMPP is a doubly stochastic Poisson process, where the arrival rate is determined by the state of a Markov Chain. The MMPPM has two states— Genuine state (G) and Fraud state (F). The Poisson arrival rate takes discrete values corresponding to each state. Frequent items are captured using poisson process and the arrival rate vector represents arrival rate for different weeks. In the G state, Poisson arrival rate vector is λG, while in F, arrival rate vector is λF . pGF is the transition probability from G to F, whereas pFG is the transition probability from F to G. Genuine Markov Chain Module(GMCM): This block consists of a finite Markov chain with associated

11

TIST.Int.J.Sci.Tech.Res.,Vol.1 (2012), 7-12 [6]. Cheung, D., Han, J., Ng, V. and Wang, C. (1996): Maintenance of discovered association rules in large databases: an incremental updating technique. In Proc. of the Intl. Conf. on Data Engineering, pp.106–114. New Orleans, Louisiana, USA. [7]. Clearwater S. and Provost, F. (1993): RL4: A tool for knowledgebased induction. In Proc. of the Second International IEEE Conference on Tools for Artificial Intelligence, pp. 24–30. [8]. Domingos, P. and Pazzani, M. (1996): Beyond independence: conditions for the optimality of the simple Bayesian classifier, in Proc. of the 13th Conference on Machine Learning, pp.105–112, Bari, Italy. [9]. Elkan, C. (2000): Magical thinking in data mining: lessons from CoIL challenge 2000, Department of Computer Science and Engineering, University of California, San Diego, USA. [10]. Elkan, C. (1997): Naïve Bayesian Learning. Technical Report CS97–557, Department of Computer Science and Engineering, University of California, San Diego, USA. [11]. Han, J. and Kamber, M. (2000): Data Mining: Concepts and Technique, Morgan Kaufmann; 1st edition. [12]. Han, J., Pei, J. and Yin, Y. (2000): Mining frequent patterns without candidate generation. SIGMOD’00, 1–12, Dallas, TX, USA. [13]. Kerr, K. and Litan, A. (2002): Online transaction fraud and prevention get more sophisticated, Gartner. http://www.gartnerg2.com/research/rpt-0102-0013.asp [14]. R., Lakshmanan, L., Han, J. And Pang, A. (1998): Exploratory mining and pruning optimizations of constrained association rules. In Proc. of the ACM SIGMOD Intl. Conf. on Management of Data, 13–24. Seattle, Washington, USA. [15]. OZDEN, B., RAMASWAMY, A. and SILBERCHATZ, A. (1998): Cyclic association rules. In Proc. of the Intl. Conf. on Data Engineering, 412–421. [16].QUINLAN, J. R. (1993): C4.5: Program for machine learning. Morgan Kaufmann, San Mateo, CA, USA. [17]. R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In Proc. of the ACM SIGMOD Conference on Management of Data, Washington D.C., May 1993. [18]. M. Houtsma and A. Swami. Set-oriented mining of association rules. Research Report RJ 9567, IBM Almaden Research Center, San Jose, California, October 1993.

12