Establishing Fraud Detection Patterns Based on Signatures

31 downloads 386 Views 242KB Size Report
balance (fraud occurs in a relative small number) of the data cases [18], the ... Detection based in Signatures - The user signature is used as a comparison basis.
Establishing Fraud Detection Patterns Based on Signatures Pedro Ferreira1 , Ronnie Alves1 , Orlando Belo1 and Lu´ıs Cortes˜ao2 1

2

University of Minho, Department of Informatics, Campus of Gualtar, 4710-057 Braga, Portugal {pedrogabriel,ronnie,obelo}@di.uminho.pt Portugal Telecom Inova¸ca ˜o, SA, Rua Eng. Jos´e Ferreira Pinto Basto 3810 - 106 Aveiro - Portugal [email protected]

Abstract. All over the world we have been assisting to a significant increase of the telecommunication systems usage. People are faced day after day with strong marketing campaigns seeking their attention to new telecommunication products and services. Telecommunication companies struggle in a high competitive business arena. It seems that their efforts were well done, because customers are strongly adopting the new trends and use (and abuse) systematically communication services in their quotidian. Although fraud situations are rare, they are increasing and they correspond to a large amount of money that telecommunication companies lose every year. In this work, we studied the problem of fraud detection in telecommunication systems, especially the cases of superimposed fraud, providing an anomaly detection technique, supported by a signature schema. Our main goal is to detect deviate behaviors in useful time, giving better basis to fraud analysts to be more accurate in their decisions in the establishment of potential fraud situations.

1

Introduction

Today communication is a common act of living. Recent telecommunications market analysis show that companies have been working very well, especially in the area of new products and services. Telecommunications companies have been continuously and significantly improving their business incomes and extending their influence in the market. However, some studies show that telecommunication companies lose large amounts of money every year due to a large diversity of fraudulent cases. Due to the fact that fraud is continuously evolving and telecommunications networks generate huge amounts of data (sometimes of the order of several gigabytes per day) the detection and identification of fraud cases is extremely hard and costly, demanding for huge amount of resources (human and material) to fight it. Essentially, two main types of fraud can be distinguished [19]: subscription and superimposition fraud. In the former, the fraudsters (faking identifications) especially create a new account without having the intention to pay for the used services. Typically, these cases reveal an intensive high-usage

2

Pedro Ferreira, Ronnie Alves, Orlando Belo and Lu´ıs Cortes˜ ao

right from the beginning. In the latter, the fraudsters make an illegitimate use of a legitimate account by different means. In this case, some abnormal usage is blurred into the characteristic usage of the account. This type of fraud is usually more difficult to detect and poses a bigger challenge to the telecommunications companies. Some of the telecommunications companies use since the 90’s decade several kinds of approaches based on statistical analysis and heuristics methods to assist them in the detection and categorization of fraud situations. Additionally, some of them adopted the use and exploitation of data mining and knowledge discovery techniques. Telecommunications scenarios pose big challenges to traditional data mining techniques. Here can we emphasize three of these challenges. 1) The abstraction level of the analysis. Fraud analysts are typically interested in the customer behavior and not in the call details. For each call, telecommunication systems generate a record - call detail record (CDR) - that has enough information to completely describe a call. However, a CDR is not by itself enough to detect a fraud situation. We are interested in studying the customer behavior and not individual phone calls. Thus, based on CDRs, we must use some kind of profiling techniques in order to reveal, with certain accuracy, the customer behavior along the time. Signature records that include a large diversity of features, such as number of calls, average call duration, average number of calls received, etc., can be used to establish customer profiles. Additionally, customer data (age, job, location, price plan and so on) which is of critical importance in this analysis can also be used in this profile construction. Therefore, we can resume three levels of data [18]: call, behavior and client. 2) Inappropriateness of data for supervised techniques. Data Mining techniques are more suitable to work only in the last two levels of data, and, typically, they can be divided in two categories: Supervised and Unsurpervised Learning. In supervised techniques there is a feedback to the system since the inputs and respective outputs are known. In this case all the instances in data have assigned a predefined class. In unsupervised techniques the system has no hints in how to find the correct answer since no apriori discrimination of the data exists. From the fraud detection point of view, where the goal is to discriminate between normal and fraudulent users, the supervised techniques seem to be more appropriate to the problem. Nevertheless, due to several reasons, like the inexistence of previously known fraud cases, or the imbalance (fraud occurs in a relative small number) of the data cases [18], the direct application of supervised techniques is not always possible. 3) The need for real time or almost real time update of the detection system information due to the high costs associated with fraud. In order to capture the characteristics of an user behaviour the concept of signature can be applied. This concept has already been used successfully for anomalous detection in many areas like credit card usage [11], network intrusion [13, 11] and in particular in telecommunications fraud [3, 21, 1, 5]. A signature corresponds to a set of information that captures the typical behavior of a user. For example, the average number of calls, time of the calls, area where the calls are made and so on. Thus, if in a given moment, an user deviates from which is

Establishing Fraud Detection Patterns Based on Signatures

3

its typical behavior expressed by its signature, that can be a motive to trigger an alarm for further analysis of that user. In the fraud and intrusion detection systems, signatures can be used in two distinct ways: – Detection based in User Profiles - The signature of the user is compared against a database of cases of known non legitimate use. This kind of method fits under the class of supervised learning technique. – Detection based in Signatures - The user signature is used as a comparison basis. A possible differentiation between the actual behaviour of the user and its signature may reveal an anomaly situation. In this paper we tackle the problem of superimposed fraud detection in telecommunication systems. We propose an anomaly detection technique based on the concept of signature. Our goal is to detect deviate behaviors in useful time, giving better basis to analysts to be more accurate in their decisions in the establishment of potential fraud situations. In the following sections, we describe the signature based detection models and algorithms developed as well as the current functional architecture of the proposed system.

2

Detecting Fraud Situations Based on Signatures

Our technique has as a core concept the notion of signature. We emphasize the work of Cortes and Pregibon [5], since it was the main inspiration for the use of signatures. In this section, we start by presenting our own definition of signature. Next, we present all its relevant elements and the theoretical background that allows computing the statistical-based distances of the signatures. Finally, we explain how the management (start and update) of the signatures is done. 2.1

Definition of Signature

A signature of a user corresponds to a vector of feature variables whose values are determined during a certain period of time. The variables can be simple, if they consist into a unique atomic value (ex: integer or real) or complex, if they consist in two co-dependent statistical values, typically the average and the standard deviation of a given feature. A signature S is then obtained from a function ϕ for a given temporal window w, where S = ϕ(w). We consider a time unit the amount of time in which the CDRs are accumulated and that in the end of this period are processed. The value of w is proportional to the time unit, w = α × ∆t. For example, if we consider the ∆t of one day we will have α = 7 for a temporal window of one week. In figure 1 we illustrate the scheme of the evolution of a signature through time. S corresponds to the initial value of the user signature. After a shift of one unit of time, the signature S is then updated to S 0 , according to the new usage information (CDRs that happen between the end of S and S 0 ). For a given set of

4

Pedro Ferreira, Ronnie Alves, Orlando Belo and Lu´ıs Cortes˜ ao S

S'

?

t

W

t

t

Fig. 1. Evolution of a signature through time.

CDRs (shadow area) verified in a unit of time ∆t, a comparison against the most actual value of signature can be made in order to detect deviating behaviors. Since this information is processed to resume the user behavior in a certain time period we denote it as a summary. The reason for this denomination will be made more clear in the next sections. The described type of processing is time oriented, since the set of user actions are accumulated, kept and processed during the time unit for posterior analysis. On the other hand, we can have an action oriented processing that makes the direct comparison of each new action (CDR) against the signature. A signature can be updated according to one of these two modes. In [5] it is pointed that the most adequate model for the updating is the action oriented. This is mainly due to the elevated costs associated with fraud, which require a constant (for every call) update of the signature. In this work we choose the time oriented mode for signature updating. The reason for this is the high processing cost of this operation. As we will see in the following sections, signature processing requires the analysis of massive volumes of data. Since the used time unit can be made not too large (typically one day or less) a reasonable trade-off between processing cost and up to date information is achieved. 2.2

Elements of a signature

Each of the signature feature variables is obtained directly from coded fields from one or more CDRs. These feature variables correspond to a statistical value which describe a certain aspect of the user behavior. Both a signature and a summary correspond to the set of all the variables. The main difference resides in the time window that they resume. In order to capture the user behavior in different situations a signature reflects a longer time window, like for example a week, a month or even half year period. On the other hand, by reasons already pointed out, a summary reflects a much smaller time period, like for example an hour, a half day or complete day. Our proposed model contemplates simple and complex variables, a simple variable corresponds to an average value and a complex variable to the average and standard deviation of a certain feature. In table 1 we list the feature variables and the respective type.

Establishing Fraud Detection Patterns Based on Signatures Description Duration of Calls Number of Calls - Working days Number of Calls - Weekends and Holidays Number of Calls - Working Time (8h-20h) Number of Calls - Night Time (20h-8h) Number of Calls to the Different national networks* Number of Calls as Caller (Origin) Number of Calls as Called (Destination) Number of International Calls Number of Calls as Caller in Roaming Number of Calls as Called in Roaming

5

Type Complex Complex Complex Complex Complex Simple Simple Simple Simple Simple Simple

Table 1. Description of the features variables used in signature and summary and the respective type. *Currently in Portugal exists three wireless telecommunications companies and one major company in fixed telecommunications.*

The choice of the type of the variables depends on several factors, like the complexity of the feature described or the data available to perform such calculation. A feature like the duration of the calls shows a significant variability which is much better expressed through an average/standard-deviation parameter. A feature like the number of international calls is typically much less frequent and thus an average value is sufficient to describe it. 2.3

Anomaly Detection

Given a set of CDRs, C, we would like to know if during the corresponding period of time the user deviates from its typical behavior. First of all, there is the need to process such information. The processing of C, PC , basically consists in extracting from C the set of feature variables described in table 1. Once this step is performed, we have two vectors of feature variables, S(signature) and P C , available for comparison. For the determination of the distance between these two vectors, the usual distance functions like the Euclidean distance can not be applied, since the vectors contain complex variables. Besides, we would like to look for the problem from a probabilistic point of view, i.e. the distance measure corresponds to some probabilistic value of PC being different form S. Since the features in the signature have different types, each variable has to be evaluated by a distinct sub-function. Thus, the dist function is composed by the several sub-functions: dist = φ(f1 , f2 , . . . , fn ). Next, we present through a semi-formal example the details of our distance function. Consider a simplification of a signature S = {(µa , σa ); µb ; µc ; (µd , σd )}, where the first and the last feature variables are complex and the second and the third are simple variables. Let PC = {(µ0a , σa0 ); µ0b ; µ0c ; (µ0d , σd0 )} a vector of variables from a period ∆t already processed. The proposed distance function can be presented as: dist(S, C) = α1 · f1 (S.a, C.a) + α2 · f2 (S.b, C.b) + α3 · f3 (S.c, C.c) + α4 · f4 (S.d, C.d) (1)

6

Pedro Ferreira, Ronnie Alves, Orlando Belo and Lu´ıs Cortes˜ ao

The formula 1 is a linear combination of the distances observed in each of the feature variables. The constants αi are a weighting factor for each of the variables and they can express the importance given to each feature when determining anomaly deviation. These values are provided by the fraud analyst. Since he/she may wants to observe different fraud’s situations. Different distance functions can be provided, by setting the weighting factors αi to different values. This way, the distance function is now defined as in 2. Dist(S, C) = max{dist1 (S, C), dist2 (S, C), . . . , distm (S, C)}

(2)

The main point of using a distance function is that if the distance between S and C exceeds a certain threshold, ξ defined by the analyst, i.e. Dist(S, C) > ξ then an alarm should be raised to future inspection. Otherwise, the user is considered to be within its expected behavior. 2.4

Distance Between Feature Variables

From a statistical point of view, it is frequently acceptable that many random variables have likelihood distributions that can be appropriately described by a normal distribution, if the µ and σ are specified [16]. The normal distribution give us a reasonable approximation to many scientific variables that occur in real world situations. According to this , we suggest an adaptation of the normal distribution function to measure the distance between complex feature variables. For a given variable X, where X ∼ N (µ, σ), the Z-score function provides the likelihood of X taking the value of x, P (X = x) = P (Z = x−µ σ ). In our particular case, we want to measure for a feature variable X taking a value of x the distance from the typical behavior, i.e. the average value. The Z-score function provides a larger likelihood as the value of X tends to µ, being maximal if X = µ. Since we are measuring a distance, we want that our distance function returns a value that is inversely proportional to the likelihood of X taking the value of µ. For that, we only need to subtract our likelihood value P to the accumulated likelihood, that is one, fN ormal = 1 − P . With this formula, distant values of X from µ have a smaller value of fN ormal . Considering the example of the last section, f1 and f4 correspond to fN ormal where µa and σa are the parameters that describe the normal distribution of the feature a and µ0a the value being evaluated.3 To measure the distance between simple feature variables we can use a simple distance or a any other distribution function measure. We propose the use of the Poisson non cumulative distribution [16, 17, 22]. This function has its most important application in the counting of the number of events that occur in a certain time interval or spatial region, when the events are independent from each other. The probability density function of a Poisson variable is given by formula 3. The constant e corresponds to the napier number, λ is the expected value that in our case correspond to the average value described by the signature and k corresponds to the observed value. 3

The value of σa will only be considered for updating of the signature.

Establishing Fraud Detection Patterns Based on Signatures

P (N = k) =

e−λ λk k!

7

(3)

In order to measure the probabilistic distance of the observed value k and the expected value λ of a variable X is given by: fP oisson = dist(λ, k) = |P (X=λ)−P (X=k)| . N is the normalizing factor. Since the Poisson function is N non-symmetric and only defined for values greater than zero, if X > λ then N = P (X = λ) − P (X = ∞) w P (X = λ) and N = P (X = λ) − P (X = 0) if 0 6 X < λ. 2.5

Signature updating

Before describing how the update of a signature is performed, we should say that the initialization of a signature is a straightforward process. The initial signature S0 corresponds to a summary for the period of the initial time window w0 . As we already mentioned in a previous section, the update can be performed in a time oriented or action oriented mode. The chosen mode for this work was the former. In either cases, it is necessary to weight the impact of the new action or set of actions in the new signature values. Following the ideas of [5, 2], the update of a signature St in the instant t + 1, St+1 , through a set of processed CDRs PC is given by the formula 4. St+1 = β · St + (1 − β) · PC

(4)

The constant β indicates the weight of the new actions C in the values of the new signature. Depending on the size of the time window w this constant can be adjusted. In [5] it is pointed that a daily update with a value of β = 0.85 allows to account for the information of the last 30 days. With a value of β = 0.5 only the last 7 days are considered in the signature values. This constant can always be tuned by the fraud analyst. In our system, in contrast to the system in [5], the value of the signature is always updated. If the Dist(St , C) 6 ξ then the user is considered to have a normal behavior. If Dist(St , C) > ξ then an alarm is triggered, but the signature continues to be constantly updated. The reason for this is that the alarm still needs to pass through the analysis of the company fraud expert. It can happen the case that the analyst considers it as a false alarm and the user behavior is within some expected behavior. The continuous update of that user signature avoids the loss of information that was gathered between the moment when the alarm was triggered and the moment the analyst gives the verdict.

3

Model Behavior

In the next two sections we describe how the signature and summary information is managed through the entire system.

8

Pedro Ferreira, Ronnie Alves, Orlando Belo and Lu´ıs Cortes˜ ao input : SummList(List of New Summaries)

/* Compare each Summary against the respective Signature and detect anomalous behaviors */ 1 foreach Summ in SummList do 2 userId = getU serId(Summ); 3 signId = getSignId(userId); 4 if signatureIsActive(signId) == T RU E then 5 w = createW indowT imeF rame(); 6 Sign = loadSignature(signId); 7 if Dist(Sign, Summ) 6 ξ then 8 updateSignature(Sign, Summ, w); 9 else 10 updateSignature(Sign, Summ, w); 11 triggerAlarm(userId); 12 clientT oQuarantine(userId); 13 end 14 end 15 end Algorithm 1: Pseudo algorithm that performs the anomaly detection by comparing the new incoming summaries with the respective signatures.

3.1

Pseudo-Algorithm

The functioning logic of the system is in “batch” mode, i.e. always that new summaries are available, like for instance at the end of the day, the list of summaries is traversed and a comparison against the respective signature is made. In algorithm 1, the foreach cycle between line 1 and 15 processes all the incoming summaries. Line 2 and 3 gets the respective user and signature identification. Next, it is verified if the signature is in an active state, which corresponds to an up to date signature. Line 5 creates a referential for the window frame that is being analyzed and in line 6 all the information relative to user signature is fetched from the database. Lines 7 to 13 tests the distance between the user signature and summary. If an alarm is raised the user becomes part of a “Black List”, which we call quarantine. In either cases the signature is always updated (lines 8 or 10). 3.2

Detecting Anomalies

The anomaly detection procedure consists in a process of several steps that is represented in figure 2. The process starts by the loading step, which is used to import the information to the database of the system. This information refers to the signature and summary information of each user. The signatures are imported once, when the system is started. All the signatures of a user are kept through time. Such information will be used for posterior analysis. A signature may have two status “Active” or “Expired”. For each client only one signature can have the Active state, and it is the most up to date one. The processing

Establishing Fraud Detection Patterns Based on Signatures

9

step corresponds to the algorithm described in section 3.1, where the Active signatures are used for the anomaly detection. In this step, the active signature is updated (see section 2.5) and marked as active. If an alarm is raised, the client is put on the quarantine list. This corresponds to the triggering alarm step that can be described in the section 2.3 and in section 2.4. Finally, all the raised alarms have to pass through the analyst’s verification in order to determine if this alarm corresponds or not to a fraud scenario. Detecting Anomalies Signature has expired / loading

/ processing Signature Signature is Actual Client is fraud

/ triggering alarm

/ verifying client Client to Quarantine

Client is no fraud

Fig. 2. State Chart of the Signature flow.

4

Evaluating Alarms

The system interface is inspired on the ideas of a dashboard system, which shows a complete set of information to facilitate the evaluation process. The analyst has several tools to investigate those alarms. Here, we give a brief overview of three of proposed tools. An alarm corresponds to a situation where the distance between the signature and a summary has exceed the threshold. It is interesting to analyze what were the feature variables with the greatest impact, which after verification, has caused an alarm. This impact can be calculated simply by the ratio of each feature variable (fv) in the overall distance (formula 1). Figure 3 (a) shows a piechart for the distribution of the impact of seven feature variables. In order to have a more general overview of the impact of each feature variable, the TOP-K alarms associated to a given user grouped, and the aggregation ([sum(f v1 ), sum(f v2 ), . . . , sum(f vn )]) of these impacts is calculated. Figure 3 (b) shows the aggregation of the impacts for the TOP 5 alarms of user A. The type of information presented in figure 3 (a) and (b) is very important to the analyst because it supports the understanding of the user behavior and points toward the threshold values that should be used to capture the alarms.

10

Pedro Ferreira, Ronnie Alves, Orlando Belo and Lu´ıs Cortes˜ ao

Fig. 3. (a) Impact of each feature variable; (b) Aggregation of the impact features of a given client.

In order to observe the behavior of a given client during a certain period of time the analyst can make use of a time series chart. In this graphic, all the calculated distances for the select time window can be used to study whether the client shows any particular trend in its behavior. Figures 4 (a) and (b) show two examples, for two different users during the period of one month. Note that in the points where the distance (also called score for output reasons) exceeded the threshold (dashed line) an alarm was raised. Two different threshold values were used for illustration purposes.

Fig. 4. Graph of the distance values(score) of two users in the time interval of one month.

5

Related Work

Fraud detection can be done at two levels, call or behavior, and with two different approaches, user profile or signature based. Most of the techniques use the CDR data to create an user profile and to detect anomalies based on these profiles.

Establishing Fraud Detection Patterns Based on Signatures

11

The work from [9, 8] is an example of this. They mined large amounts of CDRs in order to find patterns and scenarios of normal usage and typical fraud situations. These scenarios were then used to configure monitors that “observe” the user behavior with relation to that type of fraud. These monitors are then combined in a neural network, which raises an alarm when sufficient support of fraud exists. This type of system can be classified in a rule based approach, since it relies in the triggering of certain rules due to abnormal usage. The system presented in [18] is also an example of a rule based system that work in data behavior level. But as stated in [11], rule based systems have the drawback of requiring expensive management of rules. Rules need to be precise (avoid false positive alarms) and constantly evolving (detect new scenarios), which result in very time-consuming programming. The most common and best succeeded methods [21] for fraud analysis are signature based. These methods detect the fraud based on deviation detection by comparing the recent activity with the user behavior data, which is expressed through the user signature. In this context, our work adapts and extends the work of [5] by reformulating the notion of signature and by introducing the notion of statistical-based distances to detect anomalies. Furthermore, we reduce the computation cost by using simple statistical functions avoiding processing costly histograms. A clear problem with a histogram approach is that discretization intervals or buckets must be chosen, and what is (right) for one customer may be (wrong) for another. Other approaches have also been widely applied to fraud analysis, like for example neural networks [15, 19]. In [20] the authors describe neural networks, mixture models, and Bayesian networks telecommunication fraud detection, derived from call records stored for billing. Another applied technique is link analysis. Here the clients links (called numbers) are updated over time, establishing a graph of called, “communities of interest” [4], that can easily reveal networks of fraudsters. These methods are based on the observation that fraudsters seldom change their calling habits, but are often closely linked to other fraudsters [14]. In [10] several methodologies are presented for outlier detection. Lately, there are some efforts to exploration of anomaly metadata [12], pre-defined stream selections with concept-drifting [6] and states approaches based on alarms [7].

6

Conclusions and Future Work

Fraud detection for mobile telecommunications is a relatively recent area of research. Due to its characteristics this type of fraud requires (nearly) real-time and individualized customer analysis. Literature in this area, points that customer signatures provide a mean to describe the current customer behavior and that can be used to efficiently detect fraud situations. In this work, we propose an anomaly detection system to support mobile telecommunications fraud detection. Signatures form the basis of the anomaly detection mechanism. We have adapted and extended the concept of signature in order to accurately capture the statistical information that describes the user behavior and to increase the

12

Pedro Ferreira, Ronnie Alves, Orlando Belo and Lu´ıs Cortes˜ ao

precision on the anomaly detection. Thus, we provide a new definition of signature along with the respective statistical tools for its analysis. We also provide the computational details for the management of the signatures. It is expected that the proposed system will have a critical impact in the fraud prevention and detection procedures of the mobile telecommunications providers. The system constantly adapts to the user behavior patterns. Deviations from these patterns results in an indication to the fraud analyst that an anomalous and eventually fraud situation has occurred. At the moment of this writing, the system implementation has been finished. This work is now in its experimental stage. Currently, we are studying the parameters tuning, the scalability issues and the analyst interaction with the system. We have also been investigating the application of the signatures for user segmentation. We have applied clustering techniques in order to find groups of related users. We believe that the analysis of cluster migrations could also shed light on fraud situations.

7

Acknowledgments

This work was financed by Portugal Telecom Inova¸ca˜o, S.A. under a service acquisition and knowledge transference protocol celebrated with University of Minho. The authors gratefully acknowledge Francisco Paz, Jo˜ao Lopes, Filipe Martins, Eduardo Taborda, and Jo˜ao Pias for their fruitful support on this work.

References 1. Richard J. Bolton and David J. Hand Statistical. Statistical fraud detection: A review. Statistical Science, 17(3):235–255, January 2002. 2. P. Burge, J. Shawe-Taylor, Y. Moreau, H. Verrelst, C. Stoermann, and P. Gosset. Fraud detection and management in mobile telecommunications networks. In Proceedings of the 2nd IEEE European Conference on Security and Detection,, volume 437, pages 91–96, London, April 1997. IEEE. 3. M. Cahill, D. Lambert, J. Pinheiro, and D. Sun. Handbook of massive data sets, chapter Detecting fraud in the real world, pages 911–929. Kluwer Academic Publishers, Norwell, MA, USA, 2002. 4. C. Cortes, D. Pregibon, and C. Volinsky. Communities of interest. Intelligence Data Analysis, 6(3):211–219, 2002. 5. Corrina Cortes and Daryl Pregibon. Signature-based methods for data streams. Data Mining and Knowledge Discovery, (5):167–182, 2001. 6. Kaustav Das, Andrew Moore, and Jeff Schneider. Belief state approaches to signaling alarms in surveillance systems. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 539–544, New York, NY, USA, 2004. ACM Press. 7. Wei Fan. Systematic data selection to mine concept-drifting data streams. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 128–137, New York, NY, USA, 2004. ACM Press.

Establishing Fraud Detection Patterns Based on Signatures

13

8. Tom Fawcett and Foster Provost. Combining data mining and machine learning for effective user profiling. In Simoudis, Han, and Fayyad, editors, Proceedings on the Second International Conference on Knowledge Discovery and Data Mining, pages 8–13, Menlo Park, CA, 1996. AAAI Press. 9. Tom Fawcett and Foster Provost. Adaptative fraud detection. Data Mining and Knowledge Discovery, pages 1–28, 1997. 10. Victoria Hodge and Jim Austin. A survey of outlier detection methodologies. Artificial Intelligence Review, 22(2):85–126, 2004. 11. Y. Kou, T. Lu S. Sirwongwattana, and Y. Huang. Survey of fraud detection techniques. In Proceedings of 2004 IEEE International Conference on Networking, Sensing and Control, Taipei, Taiwan, March 2004. IEEE, IEEE. 12. Tysen Leckie and Alec Yasinsac. Metadata for anomaly-based security protocol attack deduction. IEEE Trans. Knowl. Data Eng., 16(9):1157–1168, 2004. 13. T.F. Lunt. A survey of intrusion detection techniques. Computer and Security, (53):405–418, 1999. 14. John McCarthy. Phenomenal data mining. Commun. ACM, 43(8):75–79, 2000. 15. Yves Moreau, Herman Verrelst, and Joos Vandewalle. Detection of mobile phone fraud using supervised neural networks: A first prototype. In ICANN ’97: Proceedings of the 7th International Conference on Artificial Neural Networks, pages 1065–1070, London, UK, 1997. Springer-Verlag. 16. Myers and Myers. Probability and Statistics for Engineers and Scientists. Prentice Hall, 6th edition. 17. Antonio Pedrosa and Silvio Gama. Introdu¸ca ˜o Computacional a Probabilidade e Estatistica. Porto Editora, 2004. 18. Saharon Rosset, Uzi Murad, Einat Neumann, Yizhak Idan, and Gadi Pinkas. Discovery of fraud rules for telecommunications challenges and solutions. In Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 409–413, New York, NY, USA, 1999. ACM Press. 19. J. Shawe-Taylor, K. Howker, P. Gosset, M. Hyland, H. Verrelst, Y. Moreau, C. Stoermann, and P. Burge. In Business Applications of Neural Networks, chapter Novel techniques for profiling and fraud detection in mobile telecommunications, pages 113–139. Singapore: World Scientific, 2000. 20. Michiaki Taniguchi, Michael Haft, Jaakko Hollmen, and Volker Tresp. Fraud detection in communications networks using neural and probabilistic methods. In Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’98), number 2, page 12411244, 1998. 21. Gary M. Weiss. Data Mining in Telecommunications. kluwer, 2004. 22. Eric W. Weisstein. Poisson distribution. From MathWorld–A Wolfram Web Resource. http://mathworld.wolfram.com/PoissonDistribution.html, 2006.

Suggest Documents