Testing the fraud detection ability of different user ...

3 downloads 40975 Views 236KB Size Report
(2004) 50 – 58. 16. Fawcett, T. and Phua, C.: Fraud Detection Bibliography, Accessed From: http://liinwww.ira.uka.de/bibliography/Ai/fraud.detection.html. (2005) ...
Testing the fraud detection ability of different user profiles by means of FF-NN classifiers DRAFT – PLEASE DO NOT REDISTRIBUTE

Constantinos S. Hilas1, John N. Sahalos2 1

Dept. of Informatics and Communications, Technological Educational Institute of Serres, Serres GR-621 24, Greece [email protected] 2 Radiocommunications Laboratory, Aristotle University of Thessaloniki, Thessaloniki GR-541 24, Greece [email protected]

Abstract. Telecommunications fraud has drawn the attention in research due to the huge economic burden on companies and to the interesting aspect of users’ behavior characterization. In the present paper, we deal with the issue of user characterization. Several real cases of defrauded user accounts for different user profiles were studied. Each profile’s ability to characterize user behavior in order to discriminate normal activity from fraudulent one was tested. Feed-forward neural networks were used as classifiers. It is found that summary characteristics of user’s behavior perform better than detailed ones towards this task.

1 Introduction Telecommunications fraud can be simply described as any activity by which telecommunications service is obtained without intention of paying [1]. Using this definition, fraud can only be detected once it has occurred. So, it is useful to distinguish between fraud prevention and fraud detection [2]. Fraud prevention is all the measures that can be used to stop fraud from occurring in the first place. These, in the case of telecommunication systems, include Subscriber Identity Module (SIM) cards or any other Personal Identification Number (PIN) like the ones used in Private PBXs. No prevention method is perfect and usually it is a compromise between effectiveness and usage convenience. Fraud detection, on the other hand, is the identification of fraud as quickly as possible once it has happened. The problem is that fraud techniques are constantly evolving and whenever a detection method becomes known, fraudsters will adapt their strategies and try others. Reference [1] provides a classification of telecommunication systems fraud and divides frauds into one of four groups, namely: contractual fraud, hacking fraud, technical fraud and procedural fraud. In [3], twelve distinct fraud types are identified. The authors of the present article have also witnessed fraudulent behavior that is a combination of the above mentioned ones [4]. Telecommunications fraud has drawn the attention of many researchers in the resent years not only due to the huge economic burden on companies’ accountings but also due to the interesting aspect of user behavior characterization. Fraud detection techniques involve the monitoring of users’ behavior in order to identify deviations from some expected or normal norm. Research in telecommunications fraud detection is mainly motivated by fraudulent activities in mobile technologies [1, 3, 5, 6]. The techniques used come from the area of statistical modeling like rule discovery [5, 7, 8, 9], clustering [10], Bayesian rules [6], visualization methods [11], or neural network classification [5, 12, 13]. Combinations of more than one method have also been proposed [14, 15]. In [16] one can find a bibliography on the use of data mining and machine learning methods for automatic fraud detection. Most of the aforementioned approaches use a combination of legitimate user behavior examples and some fraud examples. The aim is to detect any usage changes in the legitimate user’s history. In the present paper we are interested in the evaluation of different user representations and their effect towards the proper discrimination between legitimate and fraudulent activity. The paper proceeds as follows. In the next chapter the data that were used are described along with the

different profile representation of the users. In chapter 3 the experimental procedure is presented. The experimental results are given in chapter 4. In the last chapter conclusions are drawn.

2 Profile Building The main idea behind a user’s profile is that his past behavior can be accumulated. So, a profile or a “user dictionary” of what might be the expected values of the user’s behavior can be constructed. This profile contains single numerical summaries of some aspect of behavior or some kind of multivariate behavioral pattern. Future behavior of the user can then be compared with his profile. The consistency with normal behavior or the deviation from his profile may imply fraudulent activity. An important issue is that we can never be certain that fraud has been perpetrated. Any analysis should only be treated as a method that provides us with an alert or a “suspicion score”. The analysis provides a measure that some observation is anomalous or more likely to be fraudulent than another. Special investigative attention should then be focused on those observations. Traditionally, in computer security, user profiles are constructed based on any basic usage characteristic such as resources consumed, login location, typing rate and counts of particular commands. In telecommunications, user profiles can be constructed from appropriate usage characteristics. The aim is to distinguish a normal user from a fraudster. The latter is, in most of the cases, a user of the system who knows and mimics normal user behavior. All the data that can be used to monitor the usage of a telecommunications network are contained in the Call Detail Record (CDR) of any Private Branch Exchange (PBX). The CDR contains data such as: the caller ID, the chargeable duration of the call, the called party ID, the date and the time of the call, etc [17]. In mobile telephone systems, such as GSM, the data records that contain details of every mobile phone attempt are the Toll Tickets. Our experiments are based on real data extracted from a database that holds the CDR for a period of eight years from an organization’s PBX. According to the organization’s charging policy, only calls to national, international and mobile destinations are charged. Calls to local destinations are not charged so they are not included in the examples. In order to properly charge users, for the calls they place, a system of Personal Identification Numbers (PIN) is used. Each user owns a unique PIN which “unlocks” the organization’s telephone sets in order to place costly outgoing calls. If anyone (e.g., a fraudster) finds a PIN he can use it to place his own calls from any telephone set within the organization. Several user accounts, which have been defrauded, have been identified. Three of them will be presented in this paper. All three contain both examples of legitimate and fraudulent activity. The first example (User1) is a case where the fraudster reveals a greedy behavior. After having acquired a user’s PIN, he places a large amount of high cost calls to satellite services. The second example (User2) is a case where the fraudster did not place considerably costly calls but used the user’s PIN during non working hours and days. In the third example (User3) the fraudster seems to be more cautious. He did not place costly calls and he mainly uses the account during working hours and days. An interesting observation on the third case is that the stolen PIN was used from different telephone sets and the call destinations are never associated with the legitimate user’s telephone set. The detailed daily accounts of the three users were examined and each day was marked as either normal or defrauded. If during the day no fraudulent activity was present then the day was marked as normal. If at least one call from the fraudster was present then the whole day was marked as fraud. We call this first division “characterized”. Each day was also marked, as normal or not, according to the first day of fraudulent activity. That is, each user’s account is split into to sets, one pre- and one post-fraud. This second division is referred to as “uncharacterized”. For each user, three different profile types are constructed. The first profile (Profile1) is build up from the accumulated weekly behavior of the user. The profile consists of seven fields which are the mean and the standard deviation of the number of calls per week (calls), the mean and the standard deviation of the duration (dur) of calls per week, the maximum number of calls, the maximum duration of one call and the maximum cost of one call (Fig. 1). All maxima are computed within a week’s period.

Fig. 1. Profile1 of telephone calls

The second profile (Profile2) is a detailed daily behavior of a user which is constructed by separating the number of calls per day and their corresponding duration per day according to the called destination, i.e., national (nat), international (int), and mobile (mob) calls, and the time of the day, i.e., working hours (w), afternoon hours (a), and night (n) (Fig. 2). Last, the third profile (Profile3) is an accumulated per day behavior (Fig. 3). It consists of the number of calls and their corresponding duration separated only according to the called destination, that is, national, international and mobile calls.

Fig. 2. Profile2 of telephone calls

Fig. 3. Profile3 of telephone calls

The last two profiles were also accumulated per week to give Profile2w and Profile3w. So, we have 3 users, 5 profile representations, and 2 different ways to characterize the user accounts as normal of fraudulent. The above give 30 different data sets.

3 Experimental Procedure All data were standardized so that for each characteristic the average equals zero and the variance equals one. Principal Component Analysis (PCA) was performed in order to transform the input vectors into uncorrelated ones. PCA transforms the k original variables, X1, X2… Xk, into p new variables, Z1, Z2, …,Zp, which are linear combinations of the original ones, according to: p

Zi 

a

ij

Xj .

j 1

(1)

This transformation has the properties: Var[ Z1 ]  Var[ Z2 ]    Var[ Z p ] , which means that Z1 contains the most information and Zp the least; and Cov[ Zi , Z j ]  0, i  j , which means that there is no information overlap between the principal

components [18]. In order to test the ability of each profile to discriminate between legitimate usage and fraud, feed-forward neural networks (FF-NN) were used as classifiers. The feed-forward neural network is defined, [12], by: M

y

d

 w  g ( w j

j 0

i 0

ji

 Zi ) .

(2)

The g is a non-linear function (e.g. g ( x )  tanh( x ) ), w j are the weights between the output y and the hidden layer, w ji are the weights from the i-th input, Zi , to the j-th neuron of the hidden layer. Linear outputs were used. The problem is a supervised learning one with the task to adapt the weights so that the inputoutput mapping corresponds to the input-output pairs the teacher has provided. For each of the cases of input data, one feed-forward neural network was build which consisted of one hidden layer and one linear output. The hidden layer consisted of p units, where p was the number of principal components that yielded from the PCA procedure. The neural network was

trained using resilient backpropagation method [19]. Each input data set was divided into training, validation and test sets. One fourth of the data was used for the validation set, one fourth for the test set and one half for the training set. The validation error was used as an early stopping criterion. The evaluation of each classifier’s performance was made by means of the corresponding Receiver Operating Characteristic (ROC) curve [20]. A ROC curve is actually a graphical representation of the trade off between the true positive and the false positive rates for every possible cut off point that separates two overlapping distributions. Equivalently, the ROC curve is the representation of the tradeoffs between the sensitivity and the specificity of a classifier. The cases were compared only in pairs in the following sense. For each profile type and for each user the first case was the one where the user account was split in two parts, before and after the first day that a fraudulent activity appeared. The second case was the one where the same user account was split into normal and fraudulent activity using a detailed day-by-day characterization. The area under the curve was used as the statistic that exhibits the classification performance of each case.

4 Experimental Results In the following figures (Fig.4 – Fig.8) the ROC curves for User1 are given. Each one corresponds to the five different user profiling approaches. In each figure two lines are plotted. The solid one (uncharacterized) is the ROC curve that yielded when each case in the user’s profile was characterized only relative to its position before or after the first case of fraud. The dashed line (characterized) corresponds to the profiling approach which used a thorough characterization of each case as fraudulent or not. The plots for the other users are similar. In ROC curve analysis the area under the curve is usually used as a statistic that indicates how well a classifier performs. An area of 1 indicates a perfect classifier while an area of 0.5 indicates performance equal to random selection of cases. The areas under all the ROC curves for each user – profile combination are given in Tables 1 and 2. However, ROC curves should be judged with reservation. An example of misleading conclusion may be exhibited comparing Fig. 4 and Fig. 5. According to Table 1, Profile2 seems to give better separation, as the area under the plotted curve (the “characterized” case) is larger that the corresponding area in Fig. 4. Close examination of the two figures reveals that by using Profile1 one gets more that 70% correct classification without any false alarms and more than 80% of true positives with only 2% of false alarms. For such small percentage of false alarms, Profile2 gives less than 25% of true positive hits.

Fig. 4. ROC curves, using Profile1, showing the trade off between true-positive and false-positive rate for User1. His behavior is divided as fraudulent or not using the first day that a fraudulent activity appeared (uncharacterized) or using a detailed day-by-day characterization (characterized). The diagonal line is the theoretical ROC curve for random separation of the cases, while the dotted one is an instance of a real random separation

The aforementioned comment is of great importance when working with large datasets. The telecommunications network from which we drew our examples has more than 6000 users. That means that even a false positive rate of only 1% may give up to 60 false alarms. Close examination of the above may become particularly costly to the organization in terms of lost workforce. One should also bear in mind that the ROC curves plotted in the figures, belong to families of curves. This is due to the random initialization of the FF-NN classifiers each time they are used. So, each one of the lines depicted here is actually one characteristic instance of the family. Some additional considerations when comparing ROC curves are given in [21].

Fig. 5. ROC curves, using Profile2, showing the trade off between true-positive and false-positive rate for User1 when his behavior is thoroughly characterized as fraudulent or not

Fig. 6. ROC curves, using Profile3, showing the trade off between true-positive and false-positive rate for User1 when his behavior is thoroughly characterized as fraudulent or not

Profile1 works better than all the others as it exhibits the highest true positive rate with the smallest false positive rate. Among all of the examined user accounts, there were cases where Profile1 gave 90% positive hits without any false positive ones. Our next selection for a fraud detection technique would be Profile2 (or its weekly counterpart) which is actually a detailed profile of the user’s actions. In fact, this profile could, also, be used in a rule based approach. Fraudsters tend to be greedy, which for a telecommunications environment means that they tend to talk much or to costly destinations. They are also aware of velocity traps. That is they will try to avoid using a user’s account concurrently with the legitimate user. This fact concentrates their activity during the non-working hours. Separating user’s activity, both by destination and by timeof-day, acts towards the identification of such actions.

Fig. 7. ROC curves, using Profile2w, showing the trade off between true-positive and false-positive rate for User1 when his behavior is thoroughly characterized as fraudulent or not

Fig. 8. ROC curves, using Profile3w, showing the trade off between true-positive and false-positive rate for User1 when his behavior is thoroughly characterized as fraudulent or not

The prevalent observation in all figures is that the “characterized” case gives better separation between the normal and the defrauded part of the user’s account. However, a case-by-case characterization of each call is particularly expensive, and this was the main reason why only few examples are examined in the present study. Another observation is that profiles that represent characteristics which are aggregated in time outperform the daily ones. As was expected, individual characterization of each case gives better results. There are cases, like User3, where a cautious fraudster can hide his activity beneath the legitimate user’s one. Better discrimination of the last case would have been accomplished by means of some other method, like rule discovery. Table 1. Area under ROC curves for the three basic profiles

unchar char

Profile1 0.7555 0.9090

User1 Profile2 0.8069 0.9134

Profile3 0.8238 0.8839

Profile1 0.6765 0.8345

User2 Profile2 0.7213 0.8811

Profile3 0.6490 0.7790

Profile1 0.9047 0.9297

User3 Profile2 0.7971 0.7789

Profile3 0.7853 0.7931

Table 2. Area under ROC curves for the weekly aggregated behavior

unchar char

User1 Profile2w Profile3w 0.8181 0.8430 0.9120 0.9267

User2 Profile2w Profile3w 0.7241 0.6615 0.8760 0.8252

User3 Profile2w Profile3w 0.8741 0.8536 0.7298 0.8687

5 Conclusions In the present paper, tests were performed to evaluate the fraud detection ability of different user profile structures. The profiles are used as a user characterization method in order to discriminate legitimate from fraudulent usage in a telecommunications environment. Feed-forward neural networks were used as classifiers. The input data consisted of real user accounts which have been defrauded.

From the analysis it is concluded that accumulated characteristics of a user yield better discrimination results. Here, only weekly aggregation of the user’s behavior was tested against detailed daily profiles. Aggregating user’s behavior for larger periods was avoided in order to preserve some level of on-line detection ability. The user profiling approaches that were presented here have the benefit of respecting users’ privacy. That is, except from some coarse user behavior characteristics, all private data (e.g. called number, calling location, etc) are hidden from the analyst. Feed-forward neural networks with more hidden layers were also used. However, there was no significant rise in the performance of the classifiers. The experiments presented here use supervised learning and may have consulting use for the type of user profiles (or user representations) that can better characterize a user’s behavior. The task of detecting fraud in an unsupervised manner is a more difficult one, given the dynamic appearance of new fraud types. The application of an appropriate clustering method, like classic cluster analysis or the more sophisticated self-organizing map (SOM) is considered as the next step to the present work. Moreover, any accurate fraud detection technique is bound to be proprietary to the environment in which it is working.

References 1. Gosset, P. and Hyland, M.: Classification, detection and prosecution of fraud in mobile networks. Proceedings of ACTS Mobile Summit, Sorrento Italy (1999) 2. Bolton, R. J. and Hand, D. J.: Statistical fraud detection: a review. Statistical Science, vol. 17, no. 3. (2002) 235–255 3. ACTS ACo95, project ASPeCT: Legal aspects of fraud detection. AC095/KUL/W26/DS/P/25/2. (1998) 4. Hilas, C. S. and Sahalos, J. N.: User profiling for fraud detection in telecommunications networks. Proc. of the 5th Int. Conf. on Technology and Automation - ICTA ’05, Thessaloniki Greece (2005) 382-387 5. Moreau Y., Preneel B., Burge P., Shawe-Taylor J., Stoermann C., Cooke C.: Novel Techniques for Fraud Detection in Mobile Telecommunication Networks. ACTS Mobile Summit. Granada Spain (1997) 6. Buschkes, R., Kesdogan, D., and Reichl, P.: How to increase security in Mobile Networks by Anomaly Detection. Proc. of the 14th Annual Computer Security Applications Conference ACSAC '98. (1998) 8 7. Rosset, S., Murad, U., Neumann, E., Idan, Y., and Pinkas G.: Discovery of fraud rules for telecommunications – challenges and solutions. Proc. of ACM SIGKDD-99. ACM Press, San Diego, CA, USA (1999) 409-413 8. Adomavicious, G. and Tuzhilin, Al.: User profiling in Personalization Applications through Rule Discovery and Validation. Proc. of ACM SIGKDD-99. ACM Press, San Diego, CA, USA (1999) 377 - 381 9. Fawcett, T. and Provost, F.: Adaptive Fraud Detection. Data Mining and Knowledge Discovery, 1, Kluwer (1997) 291 – 316 10. Oh, S. H. and Lee, W. S.: An anomaly intrusion detection method by clustering normal user behavior. Computers & Security, Vol. 22, No. 7. (2003) 596-612 11. Cox, K. C., Eick, S. G., Wills, G. J. and Brachman R. J.: Visual data mining: Recognizing telephone calling fraud. J. Data Mining and Knowledge Discover, 1 (2). (1997) 225-231 12. Taniguchi, M., Haft, M., Hollmen, J. and Tresp V.: Fraud detection in communication networks using neural and probabilistic methods. Proc. of the 1998 IEEE Int. Conf. in Acoustics, Speech and Signal Processing, Vol. 2. (1998) 1241-1244 13. Manikopoulos, C. and Papavassilliou, S.: Network Intrusion and Fault Detection: A Statistical Anomaly Approach. IEEE Communications Magazine, Vol. 40, 10. (2002) 76 – 82 14. Hollmen, J. and Tresp, V.: Call-based fraud detection in mobile communication networks using a hierarchical regime switching model. Proc. of the 1998 Conf. on Advances in Neural Information Processing Systems II, MIT Press, (1999), 889-895 15. Phua, C., Alahakoon, D. and Lee, V.: Minority Report in Fraud Detection: Classification of Skewed Data. ACM SIGKDD Explorations – Special Issue on Imbalanced Data Sets, vol. 6 (1). (2004) 50 – 58 16. Fawcett, T. and Phua, C.: Fraud Detection Bibliography, Accessed From: http://liinwww.ira.uka.de/bibliography/Ai/fraud.detection.html. (2005)

17. Hinde, S. F.: Call Record Analysis. Making Life Easier - Network Design and Management Tools (Digest No: 1996/217), IEE Colloquium on, (1996) 8/1 – 8/4 18. Manly, B. F. J.: Multivariate Statistical Methods: A Primer. 2nd ed. Chapman & Hall. (1994) 19. Riedmiller, M., and H. Braun.: A direct adaptive method for faster backpropagation learning: The RPROP algorithm. Proc. of the IEEE Int. Conf. on Neural Networks, Vol.1. (1993) 586591 20. Downey, T. J., Meyer, D. J., Price R. K., and Spitznagel E. L.: Using the receiver operating characteristic to asses the performance of neural classifiers. Int. Joint Conf. Neural Networks, Vol. 5. (1999) 3642 -3646 21. Provost, F., Fawcett, T. and Kohavi, R.: The case against accuracy estimation for comparing induction algorithms. Proc of the Fifteenth Int. Conf. on Machine Learning. San Francisco, Morgan Kaufmann (1998) 43-48