Designing an expert system for fraud detection in a ...

5 downloads 11877 Views 535KB Size Report
It should be pointed out that in a Private Branch Exchange (PBX) phone call ..... the absence of additional fraud evidence the NOC (Network Operations Center).
Designing an expert system for fraud detection in a private telecommunications network DRAFT – PLEASE DO NOT REDISTRIBUTE

Constantinos S. Hilas Dept. of Informatics and Communications, Technological Educational Institute of Serres, Terma Magnisias, Serres, GR-621 24, Greece [email protected]

Abstract. Telecommunications fraud not only burdens telecom provider’s accountings but burdens individual users as well. The latter are particularly affected in the case of superimposed fraud where the fraudster uses a legitimate user’s account in parallel with the user. These cases are usually identified after user complaints for excess billing. However, inside the network of a large firms or organization, superimposed fraud may go undetected for some time. The present paper deals with the detection of fraudulent telecom activity inside large organizations’ premises. Focus is given on superimposed fraud detection. The problem is attacked via the construction of an expert system which incorporates both the network administrator’s expert knowledge and knowledge derived from the application of data mining techniques on real world data. Keywords: Fraud detection, user modeling, expert systems, telecommunications, data mining applications.

1

1 Introduction

Telecommunications fraud can be simply described as any activity by which telecommunications service is obtained without intention of paying [6]. This kind of fraud has certain characteristics that make it particularly attractive to fraudsters. The main one is that the danger of localization is small. This is because all actions are performed from a distance, which in conjunction with the mess topology and the size of networks makes the process of localization time-consuming and expensive. Additionally, no particularly sophisticated equipment is needed, if one is needed at all. The simple knowledge of an access code, which can be acquired even with methods of social engineering, makes the implementation of fraud feasible. Finally, the product of telecommunications fraud, a phone call, is directly convertible to money [13]. Several categories of telecommunications fraud have been reported. The main are the technical fraud, the contractual fraud, the hacking fraud, and the procedural fraud [6]. Technical, contractual and procedural fraud usually burdens the telecom service provider, while hacking fraud also harms the subscriber. The latter may happen in the form of the superimposed fraud where the fraudster (hacker) uses the service in parallel with the subscriber and burdens his account. All fraud cases can actually be viewed as fraud scenarios, which are related to the way the access to the network was acquired. Detection techniques tailored to one case may fail to detect other types of fraud. For example, velocity traps, which can identify the use of a cloned cell phone, will fail to detect a case of contractual fraud. So, fraud detection focuses on the analysis of users’ activity. The related approaches are divided into two main subcategories, the absolute analysis and the differential one. The first searches for limits between legal and fraudulent behavior, while the second tries to detect extreme changes in the user’s behavior. In both cases, analysis is achieved by means of statistical and probabilistic methods, neural networks and rule-based systems. In [17] the use of indicators of excessive usage is being criticized as they may not only imply fraud but they may also point to the best customers. A comparison of probabilistic methods with those that use rules is given in [21]. In 1999, Fawcett and Provost, [5], proposed a combination of rules and profile extraction, in order to detect fraud. The outputs of their system are combined via a trained linear model in order to produce alarms. Rosset et al. report encouraging results from the use of rules that are exported with a variant of the C4.5 algorithm, [18]. Alves et al. propose two anomaly detection methods based on the concept of signatures for the

2

detection of superimposed fraud [2]. The appropriate feature extraction procedure is dealed with in [22]. In a previous work, [10], the author of the present paper concluded to a user behavior characterization model that gives good results towards superimposed fraud detection. The use of expert systems towards fraud detection has either not been published or is referred to under different names [16] with the most common one being “data mining”. There is however a limited bibliography in relative subjects such as intrusion detection in computer systems [19], [14], user profiling for credit card fraud detection [15], auto insurance fraud [3], or consumer behavior analysis [1]. Some recent publications combine data mining or expert systems approaches towards telecom churn prediction [23], [20] and subscription fraud detection [4]. In the present paper a rule-based expert system is presented which aims to the detection of superimposed fraud cases in the telecommunications network of a large organization. Rules are induced by both using the network administrator’s expert knowledge and by applying data mining methods on real world data. The paper proceeds as follows. In the next chapter the telecommunications environment in which the expert system will operate is presented. In the third chapter the expert system’s operating characteristics and specifications are outlined. In Chapter 4 a brief analysis of prior data mining analysis of the data in hand is given, while the structure of the expert system is presented with the use of flow charts in Chapter 5. Experimental results are given in Chapter 6. In the last chapter conclusions are drawn.

2. Operating Environment

The present paper describes the construction of an expert system that aims towards the detection of superimposed fraud cases in the telecommunication network of a large organization. It is a tailored made application, which can also be applied to similar networks after the incorporation of any proprietary network policy and expert knowledge. Here the organization under study is a large University with more than 5,000 employees (administrative, teaching and research stuff). Each employee that holds a permanent post is supplied with a telephone set (terminal) and a unique Personal Authorization Code (PAC) that overrides the terminal’s class of restrictions (COR) and authorizes him to place costly outgoing calls. The PAC is also used in order to properly charge users for the calls they place. According to the organization’s charging policy, only calls to national,

3

international and mobile destinations are charged. Calls to local destinations are not charged so they are not included in the study. If anyone (e.g., a fraudster) finds a valid PAC he can use it to place his own calls from any telephone set within the organization and charge the calls to the legitimate PAC owner. Although a user’s PAC can unlock any telephone set in the organization one expects its use to be highly correlated with the owner’s telephone set. This could be used as a powerful rule to imply legitimate use. Adding to this, the user may also be related with a fax machine (e.g. in a Department’s secretariat) and / or a third telephone set placed in a laboratory. The PAC can be used concurrently from two telephone sets. This observation may be used as a clue in velocity traps. However, after the analysis of the real world data it was found that there is a case where the concurrent use of a PAC from two telephone sets is legitimate. This is the case when a user uses her PAC to send a multipage fax message and at the same time she uses it from another terminal to place a voice call. The University’s personnel may be divided into two main categories, namely the Administrative – Technical Staff and the Teaching – Research Staff. The majority of the Administrative – Technical Staff works from 07:00 a.m. to 15:00 p.m. five days a week (Monday to Friday). Two exceptions are the cleaning staff (06:00 – 14:00), and the security staff (works in three shifts, 24 hours a day – 7 days a week). Personnel that belong in the last two categories have access to offices, laboratories, classrooms, etc. so one may expect their PAC to be loosely correlated with a single terminal. Teaching and Research staff does not have fixed office hours and they usually work after hours. It is not surprising for research to be conducted after Midnight or during Weekends. There are also people that have some temporal labor relation with the University. These are graduate and post-graduate students, part-time support personnel, visitors, personnel on detachment, etc. Due to their temporal labor relation these people do not have a PAC. Teaching staff (e.g. Professors, Lecturers, etc.) are very likely to give their PAC to secretaries or students in order to assist them to several administrative jobs. The sharing of a PAC, even with people one trusts, makes the PAC prone to fraud. The analysis of the fraud cases that were studied is enhancing this remark. The most defrauded PACs are those that belong to the Teaching staff. The experience of the network administrators along with the analysis of real fraud cases revealed that the fraudsters share some common characteristics. First of all they are greedy and tend to make many and expensive calls. Common destinations are premium services such as phone auctions, “party” lines,

4

matchmaking lines, etc. Long duration or long distance calls with friends and close relatives are also common. Another important attribute is that fraudsters exhibit great mobility within the organization. They constantly change terminals probably in fear of localization. In a particular case, the fraudster revealed a hacked PAC to more people which started a “hail” of phone calls from the organization premises to premium phone services.

3. Expert System Operating Characteristics and Specifications

The development of the expert system (ES) was based on the network’s administrator’s expert knowledge and the application of data mining techniques on several detailed user accounts. The analysis yielded appropriate tests that must be performed in order to identify new fraud cases. These tests are expressed by means of IF … THEN … ELSE rules. User demographic data were also incorporated into the ES in order to enhance its accuracy. These are the user’s labor relation with the organization (e.g. administrative, teaching staff, etc.), his telephone number, his fax number and his home telephone number. All these data are public domain and their use cannot be considered as an intrusion into user’s privacy. Moreover, according to the Greek legislation [7] one may analyze private data as long as the analysis is conducted for administrative; security or research reasons, within an organization’s premises and the raw data are protected from unauthorized access. It is also stressed that during the expert system’s design and implementation process there is no need to have access to the raw data. One may only need to know their structure. Adding to this, during the data analysis process, all data may be anonymous and the system’s output may only give warnings and alarms that are forwarded to the authorized personnel for visual inspection. An additional specification for the expert system is its ability to perform both real-time tests and batch tests on historical data. It should be pointed out that in a Private Branch Exchange (PBX) phone call analysis cannot actually happen on real-time. This is because all the details of the call are available to the system only after the call has been completed. These details are written in the Call Detail Record (CDR), which is outputted by the PBX to a peripheral logging system. This is a much different procedure compared to a credit card validation scheme [15] where the transaction is completed only after the credit card is first checked for its validity and available credit limit. Additionally, during the credit card validation check one may also compare the current purchase with the owner’s profile in

5

order to diagnose probable fraud. This crosschecking cannot be made in a PBX. A PBX will only check the validity of the PAC prior to unlocking the outgoing trunk. The probability of a call being fraudulent may only be computed after the completion of the call. This gives to the fraudster the opportunity to perpetrate at least one fraudulent action. For the shake of integrity it should be noted that there are telecommunication networks where call related data are available to the system before the completion of a call. This can be done by means of the ISUP function available in the Signaling System 7 (SS7). However, SS7 signaling is not common in PBXs. An additional specification is the ability of the expert system to batch process past user accounts. This feature is available in order to perform sample checks on historical data during periods of low system usage. These checks are important as the ES is constantly learning new fraud cases and can identify fraudulent activity that may have passed unidentified in the past. This implies that the system must have the ability to adapt to new cases. Once a new fraud case is identified the relevant data are fed back to the system in order to adjust old rules or identify and incorporate new ones.

4. Prior Data Mining Analysis

The expert system is integrated with the organization’s telecommunications system. Prior to the design of the expert system, research was conducted in order to identify user behavior characteristics when using the system. Time series analysis revealed seasonal characteristics and trends [9]. Experimentation with several user profiles gave clues about user behavior and most importantly it revealed those features that best separate normal from fraud cases [8], [10]. The most appropriate profiles were combined with decision trees, which yielded rules and thresholds to separate fraudulent from normal behavior. In particular, several cases of normal and fraudulent use were given as input to the C4.5 algorithm and the output was translated to the rules in hand [11]. The outcome of the aforementioned research was combined with the expert knowledge of the telecommunications network administrator. The latter was also expressed as a set of rules.

4.1. Rule Examples Examples of the daily and weekly rules that were extracted during the analysis prior to the expert system’s design, [11], are commented below. Several user profile representations were tested and the

6

most promising (in terms of user behavior characterization) is shown in Fig. 1. It consists of seven fields which are the mean and the standard deviation of the number of calls per week (Calls), the mean and the standard deviation of the duration (Dur) of calls per week, the maximum number of calls, the maximum duration of one call and the maximum cost of one call. All maxima are computed within a week’s period. The daily data that were used for the construction of the aforementioned profile are the number of calls per day (Calls), the duration of these calls (Dur), the corresponding charging units (Units), the maximum duration of one call (MaxDur), and the maximum units for one call in this day (MaxUnits). These features were also combined in one profile, (Fig. 2), to test the daily characteristics of fraud.

Fig 1 should be placed about here

Fig 2 should be placed about here

Typical or legal use was marked as class 1 while fraud was marked as class 2. The WEKA implementation of the C4.5 algorithm [24] was used to classify calls into two classes. The task was to identify the values of the appropriate variables that better separate the classes. The main weekly rules are: IF StdDur94.3 THEN class=2 (confidence: 99.2%, support: 63.7% of class 2) IF MeanCalls>0.86 AND StdDur 302 sec THEN class = 2 (confidence: 72%, support: 52% of class 2). IF calls = 1 AND Dur < 84 sec THEN class = 1 (confidence: 68%, support: 46.7% of class 1). The first rule says that if during a day a PAC is used in order to place more than three costly calls (e.g. to mobile or international destinations) and the duration of each of these calls exceeds 2 and a half minutes then we are 72% confident that we deal with a fraud case. This outcome is supported by the 52% of all fraudulent cases. The second rule implies that if the user places no more than one costly call per day with duration less than one and a half minutes then we can be 68% confident that he is a legitimate user. The tree that yielded the aforementioned rules is shown in Fig. 4.

Figure 4 should be placed about here

5. Expert System Structure

As was stated earlier the expert system’s structure is expressed by means of IF…THEN…ELSE rules. The rules that were derived from the administrator’s expert knowledge are combined with those that have been identified from the data mining analysis of both normal and fraudulent cases. These rules are visualized by means of the flow charts that follow (Fig. 6 to Fig. 11). The flow charts visualize the procedure of call testing and may actually be combined to form a large one. They are presented here in fragments due to paper size limitations. Their fragmentation was made in such a way to properly describe the separate functions that are performed on the data. A comprehensive representation of the expert system is given in the following figure, Fig. 5:

Fig. 5 should be placed about here

The data (i.e. a new call record) enter the graph form the left top arrow and are checked within each one of the systems building blocks. If a warning or alarm is raised then the process stops and the network administrator is informed. If the incoming data pass all tests without raising any alarm then the

8

call is attributed to the legitimate owner of the PAC. The legitimate usage of the network is then incorporated into the historical user profile (behavior), which will later be used to check new calls. Both the new legitimate and the new fraudulent activity is used to retrain the models that yielded the validation rules. Hence, the system is constantly trained and adapted to new incoming data (i.e. new outgoing call traffic).

5.1 On-line tests Each block from the above graph is analyzed in the form of a flow chart in figures (Fig. 6 - Fig. 9). Here, the set of rules, which are applied sequentially on the data of a new outgoing call in order to decide whether it can be considered normal of fraudulent, are presented. After the detailed record of the call is written in the CDR database, a query is sent to the personnel database (i.e. a database that contains information about the users) in order to retrieve the PAC owner’s personal data. These are the owner’s working position; his office telephone number (User_EXT); his fax number (User_FAX); any third telephone set associated with the user (User_EXT2). The data of the current call record that is been tested are the caller ID (EXT), the called party ID (CalledID), the date and time. The working position is used to derive the caller’s working hours. If the call has originated from one of the PAC owner related telephone sets during typical working hours then the call is considered normal and the procedure is stopped (Fig. 6). If this is not the case, tests are performed in order to detect any extreme behavior (Fig. 7), i.e. calls after midnight or during public holidays, calls to destinations closely related with fraud, e.g. premium rate calls or calls to “party” lines, etc. The term Zones in Fig. 7 is referred to the system that is used in Greece to classify international calls. International destinations are classified in six tariff bands or Zones according to their distance or existing bilateral agreements. Hence, calls to the USA, EU and Balkan countries are characterized as Zone 1, while most of the south hemisphere countries lay in Zones 5 and 6. The highest the Zone number the highest the charge per second. Many of the countries in Zones 4, 5 and 6 have been correlated to telecom fraud, mainly due to existing party line and erotic line operators. After a new outgoing call has passed all the aforementioned tests, it is tested against the rules that were derived from the data mining analysis of known normal and fraudulent cases (Fig. 8).

9

Last, the call is checked for consistency with the user’s historical behavior (user profile) (Fig. 9). The focus of this last test is to identify if the called number has ever been called from one of the owner’s terminals. This is a critical test because it may identify a destination that has never before been called form the user’s office or fax. If the new destination is called from an unrelated (with the legitimate user) terminal there is high probability of fraud. Moreover, if a destination has been repeatedly called from unrelated terminals but never from the legitimate user’s terminal then the presence of fraud is definite. More tests also examine the usage of the PAC in order to identify PACs that have never been used from the legitimate user’s terminal. This may imply a case of subscription fraud within the organization or some false in the user database. One may have applied for a PAC using a fake id. In the mid 90’s, when the PAC system was first introduced, there were such a hurry to deliver the PACs to the users; a need to advertise and test the system; and such a small number of administrative personnel (actually only one) to serve the users, that the procedure of PAC applications was more liberal and relied on the employees’ personal ethics. This approach made contractual fraud feasible and is an example of the security problems that may appear during the first stages of a new systems introduction. Nowadays, the physical presence and an identification card are requested to receive the PAC.

Fig 6 should be placed about here

Fig 7 should be placed about here

Fig 8 should be placed about here

5.2 Batch tests The expert system may also be set to perform off-line batch tests during periods of low call traffic. These periods are during the summer vacations, Christmas and Easter holidays and have been identified through seasonality analysis of the call volume [9]. The expert system may randomly select a PAC, extract the associated detailed account and perform tests on it (Fig. 10). Hence, historical data are analyzed after newer knowledge has been incorporated into the system. During the batch process call

10

data may also be aggregated for longer periods, e.g. a day or a week, and more knowledge on the PBX’s performance or the users’ behavior may be derived.

Fig 9 should be placed about here

Fig 10 should be placed about here

5.3 The user mobility issue An additional test checks the user’s mobility within the organization (Fig. 11). As was mentioned earlier a PAC can unlock any telephone set. There are users, e.g. security or cleaning personnel, that may move anywhere in the Campus. Others, e.g. Department secretaries, are expected to work in their office. Hence, the relation of a user’s working position with his mobility may supply the analyst with interesting clues about the use of the associated PAC. According to the administrator’s expert knowledge a PAC’s high mobility within the Campus is highly correlated with fraud. The parameters in Fig. 11 are: the vector CalledNumbers where the individual called IDs are stored, NewCalledNumber is the called ID in the record that is currently being processed, TimesCalled is a vector where numbers of calls to each one of the CalledNumbers are stored, CallsFromExt is the number of calls from the User_EXT, CallsFromFAX is the number of calls from the User_FAX, CallsFromExt2 is the number of calls from the User_EXT2, OtherExts is a vector of all the terminals from which calls have been placed except from the user related terminals, and CallsFromOtherExts holds the number of calls from each one of those unrelated terminals.

Fig 11 should be placed about here

6. Experimental Results

In telecommunications systems user transactions and implicitly user behavior is contained in the Call Detail Record (CDR) of any Private Branch Exchange (PBX). The CDR contains data such as: the caller ID, the chargeable duration of the call, the called party ID, the date and the time of the call, etc [12].

11

Our experiments are based on real data extracted from a database that holds the CDR records from an organization’s PBX. The data span in a period of eight years. Several defrauded user accounts have been identified. All contain both examples of legitimate and fraudulent activity. Fraudulent activity was identified after user complaints that followed high charging. According to the organization’s charging policy, only calls to national, international and mobile destinations are charged. Based on the fact that telephone sets are primarily supplied to all personnel in order to help them fulfill their everyday job, each employee is given some free call units according to his working position. Any excess calls are charged to the employee. A field expert examined the detailed accounts and each phone call was marked as either normal or defrauded. If during a day no fraudulent activity was present then the whole day was marked as normal. If at least one fraudulent call was present then the whole day was marked as fraud. The expert was also interviewed and he described the rules he used in order to classify a phone call as fraudulent or not. His statements were expressed in the form of the IF … THEN rules we use. 22,000 phone calls were examined in total. The calls correspond to 5,541 days (2,702 days of legal behavior and 2,839 of fraudulent one). The weekly aggregation of these calls for each user is represented by means of the user profile (Fig. 1). This yielded 2,014 vectors (weeks) with 7 variables each. Typical or legal calls constitute the 51.1% of the whole data set (class 1) while fraud is the remaining 48.9% (class 2). It is pointed out that the rate of weeks and days is not 1:7, because the weekly aggregation was performed per user and days with zero activity were omitted. The aforementioned data were analyzed by means of data mining techniques, as in paragraph 4.1. A PAC’s mobility within the organization has been identified as a critical element in fraud identification. Two real cases of user behavior are presented as examples in order to clarify the aforementioned comments. Let A be a user’s phone number, B his fax number and C a third terminal related to him (e.g. a lab phone). From the user’s detailed account one can find all the terminals from which his PAC have been used and express them as a vector, e.g. ext1={A, B, C, D, E, F}. A related vector may be the number of calls that were placed from each one of them, e.g. calls1={1216, 250, 0, 3, 15, 1, 1}. If the numbers in the last vector are expressed as percentages, i.e. calls1_perc={82%, 17%, 0, 0.2%, 1%, 0.07%, 0.07%}, one can see that most of the calls (~99%) in the example originate from user related terminals, while only a small percentage originate from unrelated ones. The

12

aforementioned example is actually a real case of a PAC’s usage for eight years. This small unrelated usage was identified as calls from colleagues’ offices. The following example is based on a defrauded user account. The derived vectors are ext2={A1, B1, C1, D1, E1, F1, G1}, calls2={422, 54, 0, 17, 1125, 31, 129} and calls2_perc={23.73%, 3.03%, 0%, 0.95%, 63.27%, 1.74%, 7.25%}. It is obvious that only 27% of the “user’s” calls originated from user related terminals. In the absence of additional fraud evidence the NOC (Network Operations Center) may examine the probability that terminal E1 is somehow related with the user or, even better, authorized NOC personnel may contact the user and ask him if he can identify the called destinations and E1. The output of mobility tests on user accounts is presented in Table 1. Both fraud and normal cases are present in the table. The column Identified as Fraud shows how each case was characterized by the expert system while the column Fraud shows the human expert’s characterization. YES denotes a fraud case while NO a normal one. The Velocity Trap Alarm column shows the number of concurrent calls identifications. Observe that within the specific network a positive velocity trap in not a firm clue about fraud (rows 12 and 16). As described earlier there are cases where a user sends a multipage fax message and concurrently places a telephone call. Hence, the expert system was programmed to ignore velocity trap alarms when they both come from user related terminals. There is always a chance for something to go wrong but there are other rules to cross check such cases. Talking about false positive alarms of the system one can also comment case 13 (row 13, Table 1). The expert system identified a PAC, which was almost never used from the legitimate owner’s terminal. After on-site investigation it was concluded that the user had moved in a nearby office without notifying the NOC. So, he started using the telephone set placed in the new office, which triggered the alarm. The column Different destinations called from unrelated terminals shows the number of unique different destinations called from terminals that are completely unrelated to the legitimate user. Among them stems the impressive case 4 where the user seems to have an amazingly broad circle of colleagues and friends. No one ever complaint for excess billing and the fraudster went undetected for a very long period.

13

Table 1 should be placed about here

After close examination of the elements in Table 1 one can conclude that fraud cases are closely correlated to high user mobility. As an example one can see cases 2 and 4 where more than 75% of the calls were made from terminals unrelated to the legitimate user. The exact opposite is case 20 where all calls were placed from just one terminal. User 16 shows some mobility, which can be attributed to visits to a colleague’s office for co-operation. This was also confirmed by a human expert analysis of the user’s detailed account.

6. Conclusions and Discussion

In the present paper an expert system is presented which was build in order to detect fraudulent activity in the telecommunications network of a large organization. Prior to the expert system’s integration with the organization’s CDR databases, calls were examined manually and only after a user’s request. User’s requests usually followed excess billing. The expert system incorporates the network administrator’s knowledge along with common sense observations and knowledge derived from the application of data mining techniques on historic call data. The knowledge is expressed in the form of rules that are described in the paper. When this work was initiated there were already many historical data that had never been examined thoroughly. Several user accounts were selected for examination. Some of them were known defrauded accounts; others were known normal use examples while some of them were selected randomly. The analysis of these user accounts gave interesting clues on how telecommunications fraud is perpetrated within the organization’s network. The expert system was designed to adapt to new data and is programmed to incorporate new rules. When a new fraud case is detected or reported, all the data related to it are analyzed and the outcome is expressed in the form of new rules that are fed back to the system. Appropriate adjustment of existing rules may also be performed. Due to the organization’s profile (a university) there used to be a liberal approach on how telecom services were allowed to the personnel. After the analysis of the problem the first measure against fraud was the adoption of a more strict policy. Now, an employee is given a PAC only if he applies for one.

14

The PAC has limited capabilities, e.g. it can call international destinations only after the explicit request of its owner. Premium rate destinations cannot be reached from the organization’s intranet, especially destinations related to auctions, party lines, erotic lines, etc. There are cases where one must have access to private data in order to analyze a user’s detailed account thoroughly. An appropriate example is the case of a professor who gives his PAC to a graduate student in order to help him with the administrative tasks of a forthcoming conference. The terminal from which the calls originate will probably be in the professor’s laboratory and in this sense it is correlated with him. Even in the extreme case where the student takes advantage of his professor’s trust it is difficult to diagnose a fraud. A more thorough analysis would need details about the called numbers (e.g. called party relation with the PAC’s owner), which is a direct violation of personal privacy. Under these circumstances only the PAC’s owner may authorize the analysis and it is desirable that he can assist throughout the data analysis process. Human factor is often reported as the weak link in many security installations. The telecommunications network of the organization under study is not an exception. The main reason that fraud can be perpetrated in the close environment of the organization is the end user’s inattention. Users tend to write their personal authorization codes (PAC) in notes that they place in places where they can be freely accessed. Common were the cases of notes sticked under the telephone set. Others reveal their PAC to colleagues, students or friends in order to help them carry out a job. Hence, besides the fraud detection techniques and the strict network policy it is of great importance to educate users on security issues. This will at least protect them against carelessness and a fraudster’s social engineering approach. Efficient detection and limitation of fraud need a centralized fraud management system. The system should administer the maintenance and integrity of all security infrastructures and should be equipped with the ability to act not only against fraudulent actions but against fraudsters as well. There is also need for a sophisticated data collection system whose maintenance and configuration should be easy to implement. The easy and economical integration of new technologies and products is of main concern, while the ability to interchange extracted knowledge with similar systems is a desirable feature. Finally, all incidents should be examined and presented to the system’s administrator in near real-time in order to activate counter-measures and limit the effect of fraud.

15

Future research will focus on the examination of user mobility profiles by means of machine learning techniques. This analysis can be easily extended to study how different login locations may imply fraud in any information system. It is also interesting to study the correlation of user mobility and location with fraud in cellular systems. Social network analysis may also be an interesting approach to fraud detection problems as long as the user privacy issue is solved.

Acknowledgements The author would like to thank the staff of the Telecommunications Center of the Aristotle University of Thessaloniki, Greece, for their contribution of data. The author would also like to thank Dr. Sotirios Goudos, Administrator of the AUTH Telecommunications Network, for his invaluable comments on a draft of the paper. This work was supported in part by the Research Committee of the Technological Educational Institute of Serres, Greece.

References 1. 2. 3. 4.

5. 6. 7. 8. 9. 10. 11.

12. 13. 14.

15.

Adomavicius, G., and Tuzhilin, A. “User profiling in personalization applications through rule discovery and validation”. ACM KDD-99, San Diego, CA, USA. 1999. pp 377 – 381. Alves, R. et al.: Discovering telecom fraud situations through mining anomalous behavior patterns. In KDD 2006 Workshop on Data Mining for Business Applications, Philadelphia, USA (2006). Belhadji El Bachir, and Dionne G. “Development of an Expert System for the Automatic Detection of Automobile Insurance Fraud”. Working Paper 97-06 ISSN: 1206-3304, August 1997. Estevez, Pablo A., Claudio M. Held, and Claudio A. Perez. Subscription fraud prevention in telecommunications using fuzzy rules and neural networks. Expert Systems with Applications, 31 (2006) 337– 344. Fawcett, T. and F. Provost: Adaptive fraud detection. Journal of Data Mining and Knowledge Discovery, vol. 1, no. 3, (1997), 291–316. Gosset P. and M. Hyland: Classification, detection and prosecution of fraud in mobile networks, In Proc. of ACTS Mobile Summit, Sorrento, Italy, (1999). Greek Law 2472/1997. Protection of individuals with regard to the processing of personal data. Hellenic Data Protection Authority site, www.dpa.gr. [Last access May 25, 2008]. Hilas, C. S., and John N. Sahalos, “User profiling for fraud detection in telecommunication networks”, 5th International Conference on Technology and Automation, Thessaloniki, Greece, October 2005. pp 382-387 Hilas, C. S., S. K. Goudos, and J. N. Sahalos, “Seasonal decomposition and forecasting of telecommunication data: A comparative case study.” Technological Forecasting & Social Change, vol. 73, 5, June 2006, pp 495 – 509. Hilas, C. S., and Sahalos, J.: Testing the fraud detection ability of different user profiles by means of FF-NN classifiers. In S. Kollias et al. (Eds.): ICANN 2006, Part II, NCS 4132, Springer-Verlag Berlin Heidelberg, (2006), 872 – 883. Hilas, C. S. and J. N. Sahalos. “An application of decision trees for rule extraction towards telecommunications fraud detection”. In B. Apolloni et al. (Eds.): KES 2007/ WIRN 2007, Lecture Notes in Artificial Intelligence, vol. 4693, Part II, Berlin-Heidelberg: Springer – Verlag, 2007, pp. 1112–1121. Hinde, S. F.: Call Record Analysis. Making Life Easier - Network Design and Management Tools (Digest No: 1996/217), IEE Colloquium on, (1996) 8/1 – 8/4. Hoath P: Telecom fraud, gory details. Computer Fraud & Security, (1998), 10–14. Jackson, K. A., DuBois, D. H., and Stallings, C. A. “An Expert System Application for Network Intrusion Detection”. 14th National Computer Security Conference, National Institute of Standards and Technology/National Computer Security Center, Washington, DC, October 1991. pp. 215-225. Kokinnaki, A. I. “On atypical database transactions: Identification of probable frauds using machine learning for user profiling”. Proceedings of the IEEE Knowledge & Data Engineering Exchange Workshop, KDEX, 1997, pp 107-113.

16

16. Liao S. H. “Expert system methodologies and applications – a decade review from 1995 to 2004”. Expert Systems with Applications, vol. 28. 2005. pp 93 – 103. 17. Moreau, Y., and J. Vandewalle.: Detection of mobile phone fraud using supervised neural networks: A first prototype. In Proc. Int. Conf. on Artificial Neural Networks - ICANN’97, (1997), 1065–1070. 18. Rosset, S., U. Murad, E. Neumann, Y. Idan, and G. Pinkas.: Discovery of fraud rules in telecommunications – Challenges and solutions. In S. Chaudhuri and D. Madigan (Eds.), Proc. 5th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, San Diego, CA, USA, (1999), 409 – 413. 19. Sebring, M. M., Shellhouse, E., Hanna, M. E., and Whitehurst, R. A. “Expert systems in intrusion detection: A case study”. 11th National Computer Security Conference, National Institute of Standards and Technology /National Computer Security Center, Baltimore, MD, October 1988. pp. 74-81. 20. Shin-Yuan Hung, David C. Yen, Hsiu-Yu Wang. Applying data mining to telecom churn management. Expert Systems with Applications, 31 (2006) 515–524. 21. Taniguchi, M., M. Haft, J. Hollmen, and V. Tresp.: Fraud detection in communication networks using neural and probabilistic methods. In Proc. of the 1998 IEEE Int. Conf. in Acoustics, Speech and Signal Processing ICASSP’98, Volume II, (1998), 1241–1244. 22. Wang Dong et al.: A feature extraction method for fraud detection in mobile communication networks. In Proc. 5th World Cong. On Intelligent Control and Automation, Hangzhou, China, (2004), 1853 – 1856. 23. Wei Chih-Ping and Chiu I-Tang, Turning telecommunications call details to churn prediction: a data mining approach. Expert Systems with Applications 23 (2002) 103 – 112. 24. Witten Ian H., and Eibe Frank. Data Mining: Practical Machine Learning Tools and Techniques (Second Edition), Morgan Kauffmann, June 2005.

17

Figure Captions Fig. 1. The basic vector for the weekly user behavior representation Fig. 2. The basic vector for the daily user behavior representation Fig. 3. Decision tree for the daily representation of the users Fig. 4. Decision tree for the weekly representation of the users Fig. 5. Expert System Structure Fig. 6. Use of user demographic data for new call checking Fig. 7. Extreme behavior tests Fig. 8.Compare calls with the previously learned rules Fig. 9. Compare new call with user’s historical profile Fig. 10. Batch processing of historical user data Fig. 11. User mobility check (number of different terminals and frequency of use)

18

Fig. 1. The basic vector for the weekly user behavior representation

19

Fig. 2. The basic vector for the daily user behavior representation

20

Fig. 3. Decision tree for the weekly representation of the users

21

Fig. 4. Decision tree for the daily representation of the users

22

Fig. 5. Expert System Structure

23

Fig. 6. Use of user demographic data for new call checking

24

Fig. 7 Extreme behavior tests

25

Fig. 8 Compare calls with the previously learned rules

26

Fig. 9 Compare new call with user’s historical profile

27

Fig. 10. Batch processing of historical user data

28

Fig. 11. User mobility check (number of different terminals and frequency of use)

29

Table 1. A part of the expert system’s output that is related to user mobility (22 users for eight years)

Case

Number of calls

No of calls from user’s Ext and Fax

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

331 616 345 7482 204 3313 1303 2466 2921 386 1599 374 34 348 161 1502 758 483 2025 533 55 52

172 145 172 1647 0 2099 257 76 143 206 1099 364 3 344 160 1466 637 463 1893 533 55 52

No. of other terminals used 10 3 11 16 10 29 23 19 7 9 6 6 9 3 1 10 12 2 3 0 1 0

No. of calls from other terminals 159 471 173 5835 204 1214 1046 2390 2778 180 500 10 31 4 1 36 121 20 132 0 0 0

Different destinations called from unrelated terminals 33 64 87 1280 46 702 203 433 290 64 48 21 16 1 0 142 11 12 706 0 30 0

Percentage of calls from Ext & Fax (%) 51.96% 23.54% 49.86% 22.01% 0.00% 63.36% 19.72% 3.08% 4.90% 53.37% 68.73% 97.33% 8.82% 98.85% 99.38% 97.60% 84.04% 95.86% 93.48% 100.00% 100.00% 100.00%

Percentage of calls from unrelated terminals (%) 48,04% 76.46% 50.14% 77.99% 100.00% 36.64% 80.28% 96.92% 95.10% 46.63% 31.27% 2.67% 91.18% 1.15% 0.62% 2.40% 15.96% 4.14% 6.52% 0.00% 0.00% 0.00%

Identified as fraud from the system YES YES YES YES YES YES YES YES YES YES NO YES YES NO NO YES NO NO NO NO NO NO

Fraud

Velocity Trap Alarms

YES YES YES YES YES YES YES YES YES YES NO NO NO NO NO NO NO NO NO NO NO NO

1 1 0 22 1 0 0 0 0 0 0 2 0 0 0 2 0 0 0 0 0 0

30

Suggest Documents