A Customized Classification Algorithm for Credit-Card Fraud Detection Alex G. C. de S´ a∗, Adriano C. M. Pereira, Gisele L. Pappa Computer Science Department, Universidade Federal de Minas Gerais (UFMG), 31270-010, Belo Horizonte, Minas Gerais, Brazil
Abstract This paper presents Fraud-BNC, a customized Bayesian Network Classifier (BNC) algorithm for a real credit card fraud detection problem. The task of creating Fraud-BNC was automatically performed by a Hyper-Heuristic Evolutionary Algorithm (HHEA), which organizes the knowledge about the BNC algorithms into a taxonomy and searches for the best combination of these components for a given dataset. Fraud-BNC was automatically generated using a dataset from PagSeguro, the most popular Brazilian online payment service, and tested together with two strategies for dealing with cost-sensitive classification. Results obtained were compared to seven other algorithms, and analyzed considering the data classification problem and the economic efficiency of the method. Fraud-BNC presented itself as the best algorithm to provide a good trade-off between both perspectives, improving the current company’s economic efficiency in up to 72.64%. Keywords: credit card fraud, Bayesian network classifiers, hyper-heuristic.
∗
Corresponding author. Tel: +55 31 3409-7536 Email addresses:
[email protected] (Alex G. C. de S´ a),
[email protected] (Adriano C. M. Pereira),
[email protected] (Gisele L. Pappa)
Elsevier
July 15, 2018
1
1. Introduction
2
In 2016, a report by CyberSource (CyberSource, 2016) pointed out that
3
the volume of fraudulent e-commerce credit card transactions (chargeback)
4
in Latin America corresponds to 1.4% of the total net of the sector. Auto-
5
matically identifying these transactions has many open challenges. Among
6
them are the high volume of transactions that needs to be processed in al-
7
most real-time and the fact that frauds do not occur frequently, generating
8
very imbalanced datasets. Furthermore, accepting a fraud as a legitimate
9
transaction has a much higher cost than identifying a legitimate transaction
10
as a fraud, as the seller economic losses are much higher in the first case,
11
which generates chargeback.
12
There are different ways of modeling the credit card fraud detection prob-
13
lem, and among the most common approaches are those created to identify
14
anomalies (Halvaiee and Akbari, 2014) and those based on classical methods
15
for data classification (Hens and Tiwari, 2012). This paper focuses on the
16
latter, and models the problem as a classification task, where a classifier is
17
conceived to distinguish fraudulent from legitimate transactions.
18
In particular, we are interested in algorithm that generate interpretable
19
models (classifiers), such as decision trees, classification rules or Bayesian
20
network classifiers (Kotsiantis, 2007). This is because decision makers are
21
more comfortable in accepting automatic decisions they can understand
22
(Freitas, 2014). Although it is well-known that in some domains these meth-
23
ods present lower accuracy than black-box models such as Support Vector
24
Machines, sacrificing accuracy to gain interpretability is a worth trade-off
25
in alert systems.
26
There is a variety of classification algorithms that can generate inter-
2
27
pretable models in the literature. According to the No Free Lunch Theorem
28
(Wolpert and Macready, 1997), the choice of which of these algorithms is the
29
best for a given dataset is still an open problem. The areas of meta-learning
30
and hyper-heuristics have offered different solutions for automatically test-
31
ing different types of algorithms (Pappa et al., 2014). While the literature of
32
meta-learning has focused on selecting the best algorithm according to the
33
characteristics of the target problem (Brazdil et al., 2008), hyper-heuristic
34
methods have proposed different ways of generating customized algorithms
35
for different datasets, which we considered more interesting for this work.
36
A hyper-heuristic is a high-level approach that, given a particular prob-
37
lem instance and a number of low-level heuristics, can select and apply
38
an appropriate low-level heuristic at each decision point. Hyper-heuristics
39
methods have been already conceived for building algorithms to solve spe-
40
cific classification problems (Pappa and Freitas, 2009; S´a and Pappa, 2014).
41
These methods help experts and practitioners in the following task: given
42
a new classification dataset, which is the most suitable combination of the
43
learning algorithms’ components to solve this new problem? In this paper,
44
we take advantage of one of these methods, and use Hyper-Heuristic Evo-
45
lutionary Algorithm (HHEA) (S´a and Pappa, 2014) to create a customized
46
Bayesian Network Classifier (BNC) algorithm, named Fraud-BNC, specifi-
47
cally for detecting frauds in a dataset of interest.
48
We chose to work with BNC algorithms for fraud detection because
49
they are robust statistical methods to classify data. They are based on the
50
theoretical foundations of Bayesian networks (Bielza and Larra˜ naga, 2014)
51
and produce a classification model that assumes cause-effect relations among
52
all data attributes (including the class) (Cheng and Greiner, 1999). These
53
relationships can be used to gain understanding about a problem domain as 3
54
the output BNC model is represented by a directed acyclic graph (DAG).
55
In the DAG, each node maps an attribute and edges define probabilistic
56
dependencies among them. Each node is also associated with a conditional
57
probability table, which represents the network parameters.
58
The literature presents several BNC algorithms (Bielza and Larra˜ naga,
59
2014; Sacha, 1999; Witten et al., 2011). Instead of choosing one of them,
60
HHEA builds a customized BNC algorithm, which has the best combination
61
of the essential modules (components) of the aforementioned algorithms for
62
the dataset at hand. It is important to emphasize that HHEA produces a
63
general BNC algorithm, even being specialized for a particular one.
64
Fraud-BNC was conceived by HHEA for a real-world credit card fraud
65
detection problem. This problem is associated to a classification dataset,
66
provided by UOL PagSeguro 1 , which is a popular online payment service in
67
Brazil. The performances of Fraud-BNC and other baselines were evaluated
68
using a classification metric (F1 ) and a measure of the company economic
69
loss, named economic efficiency. Besides, given the challenges of learning
70
from class-imbalanced data (Sundarkumar and Ravi, 2015; Haixiang et al.,
71
2016), we considered two strategies for dealing with cost-sensitive classifica-
72
tion: instance reweighing and analysis of the class probability threshold.
73
The results showed that the best algorithm built in terms of F1 is usu-
74
ally not the same that obtains the best values of economic efficiency. This
75
happens because the latter is highly influenced by the monetary value of
76
the transaction. Our analysis also showed that using Fraud-BNC with class
77
probability threshold obtains the best results. Furthermore, as Fraud-BNC
78
returns the probability of a transaction being a fraud, it can be used to1
http://pagseguro.uol.com.br
4
79
gether with its monetary value of the transaction to help in the decision
80
make process.
81
The main contributions of this paper are: (i) the generation of a cus-
82
tomized BNC algorithm for a real-world credit-card fraud detection dataset;
83
(ii) the evaluation of how the algorithm performs in terms of both classifi-
84
cation metrics and those used by finance specialists to evaluate fraud levels;
85
(iii) the complete analysis of the customized BNC algorithm in terms of
86
strategies to deal with imbalance data; (iv) the improvement of the current
87
techniques currently used by the company to quantify fraud detection in
88
PagSeguro in up to 72.64%; (v) the concept of how to use the produced
89
BNC model in the auditing system to verify the inconsistent classifications.
90
The remainder of this paper is organized as follows. Section 2 presents re-
91
lated work on fraud detection modeled as a classification problem. Section 3
92
describes HHEA, the method used to automatically generate a customized
93
BNC algorithm for the PagSeguro dataset, which is described in Section 4.
94
The produced algorithm, Fraud-BNC, is presented in Section 5, followed by
95
the definition of the metrics used to evaluated the algorithms, introduced
96
in Section 6. Finally, Section 7 presents the experimental results, while
97
conclusions and directions of future work are described in Section 8.
98
2. Related Work
99
The problem of fraud detection has been extensively studied in the lit-
100
erature. This section reviews works that follow a classification approach to
101
solve the problem. Among the methods already explored are artificial neu-
102
ral networks, decision trees, logistic regression, random forests, artificial im-
103
mune systems, support vector machines (SVM) and hybrid methods (Chan-
5
104
dola et al., 2009; Adewumi and Akinyelu, 2016; Alvarez and Petrovic, 2003;
105
Lindqvist and Jonsson, 1997; Ngai et al., 2011; West and Bhattacharya,
106
2016), among others. Note that all these techniques follow a supervised
107
learning approach, as they assume the existence of labeled data to generate
108
these models.
109
Table 1 presents a comparison between a set of previously proposed
110
methods and Fraud-BNC. Six main characteristics were analyzed: (i) if the
111
method works with real-world data (if not, the work uses artificial data), (ii)
112
whether the data reflects the real-world severe class imbalance, (iii) if feature
113
selection is performed, (iv) if cost-sensitive methods are used to address the
114
class imbalance problem or (v) if (under-)sampling techniques are used with
115
this intention, and (vi) whether a financial analysis was taken into account
116
when looking at the results. These six characteristics are referred in Table 1.
117
The table indicates if the method in the row presents the characteristic
118
defined in the column. ‘Y’ indicates that the method has that characteristic,
119
and ‘N’ the opposite.
120
Note that most works deal with real-world unbalanced data, and use at
121
least one strategy to deal with it. About half of the methods look beyond
122
the results of classification, and the majority disregards any type of feature
123
selection – although the features describing the data may differ significantly.
124
Concerning the learning techniques used by these methods, they encom-
125
pass a large variety of algorithms. Ravisankar et al. (2011), for instance,
126
employed six machine learning techniques, including SVM and logistic re-
127
gression. Guo and Li (2008) proposed to combine confidence values, artifi-
128
cial neural network algorithms and receiver operating characteristic (ROC)
129
curves for detecting credit card frauds. They performed undersampling to
130
deal with class imbalance, resulting in a distribution of 100 legitimate trans6
Table 1: Summary of the six main characteristics of related works when compared to the customized algorithm Fraud-BNC. Characteristics Methods (i)
(ii)
(iii)
(iv)
(v)
(vi)
Fraud-BNC
Y
Y
Y
Y
Y
Y
Halvaiee and Akbari (2014)
Y
Y
N
Y
N
Y
Ravisankar et al. (2011)
Y
N
Y
N
N
N
Caldeira et al. (2012)
Y
Y
Y
N
N
Y
Sahin et al. (2013)
Y
Y
N
Y
Y
Y
Guo and Li (2008)
N
Y
N
N
Y
N
Fu et al. (2016)
Y
Y
Y
N
Y
N
Duman and Ozcelik (2011)
Y
Y
N
Y
Y
Y
Gadi et al. (2008)
Y
Y
N
Y
Y
Y
Vlasselaer et al. (2015)
Y
Y
N
N
N
N
131
actions for each fraudulent. This is the only work that uses synthetic data.
132
Caldeira et al. (2012) also applied artificial neural networks and ran-
133
dom forests to identify frauds in online transactions coming from the same
134
data source we work with. Apart from other standard classification mea-
135
sures, they looked at the economic efficiency of the model, improving the
136
results of the current company policy in 43%. However, they did not account
137
for class imbalance or different classification costs for different classes. Fu
138
et al. (2016), in turn, solved the problem with convolutional neural network
139
(CNN). CNN was applied to a bank data to find a set of latent patterns
140
for each transaction and identify frauds. The issue of data imbalance was
141
tackled by a cost-based sampling method, which involved creating synthetic
142
fraudulent samples from the real frauds.
143
Duman and Ozcelik (2011), on the other hand, developed a hybrid ap-
144
proach based on genetic algorithm (GA) and the scatter search (SS), named 7
145
GASS, to take into consideration a classification cost function when deal-
146
ing with fraud detection. GASS was applied to data from a major bank
147
in Turkey, and used 20% of randomly chosen legitimate transactions for
148
training due to time complexity.
149
Looking at works focusing on interpretable models, Sahin et al. (2013)
150
is the only one i in this category, and developed a cost-sensitive decision
151
tree algorithm. The authors self-referred their work as the pioneer at taking
152
the misclassification costs into account while performing fraud classification.
153
The authors used stratified sampling - i.e., they kept the class imbalance
154
during the sampling process to help learning concepts from both legitimate
155
and fraudulent classes. Although different class distributions were tried
156
during training, the test dealt with the real distribution.
157
Another popular method used to flag fraudulent transactions is Artifi-
158
cial Immune Systems (AIS), as these methods were primarily conceived to
159
identify anomalies. Gadi et al. (2008) explored the clonal selection algo-
160
rithm for credit card fraud detection in a dataset provided by a Brazilian
161
bank. They used a random sampling technique to select 10% of the legit-
162
imate transactions (against 100% of the fraudulent) to reduce the effects
163
of class imbalance. Halvaiee and Akbari (2014) proposed a similar model,
164
called AIS-based Fraud Detection Model (AFDM), based on both the clonal
165
and negative selection algorithms. The main difference of the latter to the
166
algorithm proposed by Gadi et al. is that the authors focused essentially
167
in reducing the training time to build the model. Apart from using classi-
168
fication metrics, both works assessed a cost function used when fraudulent
169
transactions were not detected.
170
Following a different approach, Vlasselaer et al. (2015) proposed APATE
171
(Anomaly Prevention using Advanced Transaction Exploration). APATE 8
172
was designed by combining recency-frequency-monetary variables and so-
173
cial network analysis. They tested APATE on a Belgian credit card issuer
174
dataset and estimated a linear regression, an artificial neural network and
175
a random forest model. Results showed that APATE led to a very high
176
area-under the ROC curve and accuracy, especially for the random forest
177
model.
178
In this work, we deal with both the problem of class imbalance and dif-
179
ferent costs for fraudulent and legitimate transactions under the classifica-
180
tion framework. We use an undersampling technique and two classification
181
approaches to deal with class imbalance. The problem of finding the most
182
suitable algorithm for a dataset of interest is also considered. For this, we use
183
a Hyper-Heuristic Evolutionary Algorithm (HHEA) to automatically gen-
184
erate a customized a Bayesian Network Classifier (BNC) algorithm, Fraud-
185
BNC, for the given input credit card transactions data. We compared the
186
customized algorithm to seven other classification methods, showing that,
187
alongside with its good classification performance, Fraud-BNC is very effec-
188
tive in creating interpretable models.
189
3. Evolving Algorithms for Learning Bayesian Network Classifiers
190
The use of hyper-heuristics to construct algorithms customized to datasets
191
is an outgrowing research field, where methods for generating Bayesian net-
192
works algorithms for classification (S´a and Pappa, 2013; S´a and Pappa,
193
2014), decision trees inducers (Barros et al., 2014) and rule induction al-
194
gorithms (Pappa and Freitas, 2009) have been previously proposed. Ac-
195
cording to Pappa et al. (2014), these approaches are commonly based on
196
evolutionary algorithms (Eiben and Smith, 2003), in this paper referred as
9
197
Hyper-Heuristic Evolutionary Algorithms (HHEAs).
198
HHEA works by simulating Darwin’s evolutionary process and relies on
199
the ideas of natural selection, mutation and survival of the fittest. It works
200
with a population of candidate solutions (individuals) to the problem at
201
hand (in our case, BNC algorithms), and uses an iterative process (namely
202
evolution) to find approximate solutions to the problem through selection,
203
crossover and mutation operators.
204
The HHEA method used in this work builds BNC algorithms. The
205
process followed by BNC algorithms when learning a model is divided in two
206
phases (Cheng and Greiner, 1999): structure and parameter learning. In the
207
structure learning phase, the method learns the causal relationships among
208
the attributes of the input dataset, i.e., which nodes (attributes) in the graph
209
should be connected to each other. Different types of BNC algorithms were
210
already proposed for creating the network structure, including the score-
211
based, constraint-based or hybrid approaches (Daly et al., 2011).
212
The parameter learning phase, in turn, learns Conditional Probability
213
Tables (CPTs) for each node of the BNC. These tables are used to make
214
estimations about the data. However, learning the parameters of a BNC
215
is a relatively straightforward procedure when the network structure is de-
216
fined (Salama and Freitas, 2013). For this reason, HHEA focuses on the
217
structure learning phase.
218
The HHEA method is illustrated in Figure 1(a). It receives as input
219
the dataset and a set of components identified from previously proposed
220
BNC algorithms. It then combines these components, outputting a BNC
221
algorithm tailored to the domain of the input data.
222
HHEA uses a real-coded evolutionary algorithm to search and explore
223
the space of BNC algorithms. In Figure 1(b), each individual represents a 10
Input BNC algorithm components Output Tailored BNC algorithm
Input
BNC Algorithm Dataset
Classification model generated by BNC algorithm
HHEA
C
0.7
Real-coded 0.1 individual 0.6 0.5 0.7
A1
A2
A3
Mapping
-Input: Dataset -Defined components:
Search method = Hill Climbing Scoring metric = Bayesian Reversed edges = False Parents = 6 CPT α = 8.7
-Perform search -Estimate the CPTs -Return the BNC model
(a) HHEA for generating BNC algorithms.
(b) BNC individual’s representation.
Figure 1: The HHEA process and its individual representation.
224
BNC algorithm, randomly generated from a combination of the available
225
components given as input. In total, 4,960,000 components combinations
226
are possible (S´ a and Pappa, 2014), and these components include scoring
227
metrics, independence tests, maximum number of parents a node can have,
228
and search methods for building the BNC structure, plus the parameter α
229
from the parameter (CPT) estimation method.
230
After individuals are created, they undergo a fitness evaluation process.
231
During the evaluation, a mapping between the real-coded individual and
232
a BNC algorithm, created according to the components combined, is per-
233
formed (see Figure 1(b)). In Figure 1(b), each position of the real-coded
234
array represents a BNC component. For different ranges of real values and
235
different component space sizes, the position takes different components
236
during the mapping. The search method in Figure 1(b) is defined as a Hill
237
Climbing because the range of its position in the array is between 0.65 and
238
0.75. The same rule is applied to the other positions of the array (for more
239
details on the BNC components, see S´a and Pappa (2014)). After that, the
240
individuals (algorithms) built are run in a training set to induce BNC mod11
241
els, which are then evaluated using a validation set. The F1 -measure (Witten
242
et al., 2011) is the fitness function calculated from the validation set.
243
Following, the individuals undergo a tournament selection process, where
244
individuals with higher values of F1 -measure have a higher probability of be-
245
ing selected for uniform crossover and one-point mutation operations, which
246
are used to generate a new population. An elitist process also copies the
247
best individuals to the next population. After a predefined number of gen-
248
erations, the best BNC algorithm generated is returned, and its associated
249
model tested using a new set of data coming from the same domain.
250
In order to produce the customized BNC algorithm, the parameters
251
of the HHEA were set in preliminary experiments using a wider range of
252
datasets (S´ a and Pappa, 2014). The best configuration resulted in the fol-
253
lowing parameters: 35 individuals evolved for 35 generations, tournament
254
size of two individuals, elitism of one individual, and crossover and muta-
255
tion probabilities of 0.9 and 0.1, respectively. The relatively low number
256
of individuals and generations are due to the complexity of the solutions
257
generated. Recall that each individual represents a BNC algorithm, which
258
will be trained and validated in a given dataset.
259
4. PagSeguro Transactions Data
260
PagSeguro is currently one of the most popular online payment services
261
in Brazil, and is owned by Universo Online Inc. (UOL)2 . In PagSeguro, each
262
transaction is described by hundreds of different attributes including the
263
transaction status, which can be legitimate or a fraud. 2
http://www.uol.com.br/
12
264
The dataset was obtained from a month of transaction ordered according
265
to the time they occurred. In total, we have 903,801 transactions, where
266
16,639 (1.8%) are fraud and the remaining 887,162 (98,2%) were classified
267
as legitimate, i.e., non-fraud. Although the absolute number of frauds is
268
low, the economic impact they generate is huge. Initially, each transaction
269
was described by a set of 424 attributes. To identify the most relevant
270
attributes for classification, a feature selection step was performed using
271
information gain (Quinlan, 1986). These 424 attributes were reduced to
272
24, including the transaction id and the classification of the transactions as
273
fraud or legitimate.
274
Due to a confidentiality agreement with the company, we cannot detail
275
the attributes used to describe the data. To give an idea about them, Fig-
276
ure 2 shows the Pearson’s correlation matrix of the whole dataset considering
277
the 22 predictive attributes {A1, ..., A22} and the class C. The transaction
278
id was omitted. Observe in the matrix that all correlations are greater to
279
or equal to zero. Regarding direct correlations with the class attribute, A1
280
presents the highest value. Attribute pairs including A4, A5, A8, A11, A13,
281
A20 and A21 present the highest correlations. This does not necessarily
282
mean causality between the attributes, but indicates a relationship that can
283
be explored by the learning algorithm.
284
In order to tackled the problem of class imbalance, we performed a ran-
285
dom under-sampling in the training dataset. Under-sampling techniques
286
reduce the size of the dataset by increasing the ratio of positive or neg-
287
ative classes (He and Garcia, 2009). For each fraudulent transaction, six
288
non-fraudulent were kept in the dataset, as indicated in a preliminary ex-
289
periment as the best distribution. We opt for a random selection because,
290
according to the experiments performed in (Thai-Nghe et al., 2010), there 13
A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 A13 A14 A15 A16 A17 A18 A19 A20 A21 A22 C
1
A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 A13 A14 A15 A16 A17 A18 A19 A20 A21 A22 C
0.8
0.6
0.4
0.2
0
−0.2
−0.4
−0.6
−0.8
−1
Figure 2: Pearson’s correlation matrix of the dataset for the 22 attributes and the class.
291
is not a significant difference in the results produced by more sophisticated
292
(and hence more computational expensive) methods.
293
Even with this new ratio of positive and negative examples, generating
294
and evaluating different algorithms with more than five thousand trans-
295
actions during the HHEA evolution is still a very expensive task (S´a and
296
Pappa, 2014). For this reason, HHEA was trained with a smaller sample
297
of the dataset containing 4,833 transactions (0.5%, approximately), with
298
693 representing fraud and 4,140 legitimate transactions. 70% of this data
299
was used to train the BNC algorithms and build their models, and the re-
300
maining 30% used to validate the F1 fitness function in an unseen set of
301
examples. Additionally, training and validation sets were re-sampled every
302
five (5) generations in order to avoid overfitting in HHEA.
14
303
5. Fraud-BNC: a classification algorithm for fraud detection
304
This section introduces Fraud-BNC, the customized algorithm gener-
305
ated by HHEA for the PagSeguro dataset. Fraud-BNC is presented in Al-
306
gorithm 1. It receives the PagSeguro dataset as input in line 1 and then
307
defines two parameters (optimized by HHEA) in lines 2-3, i.e., the maxi-
308
mum number of parents of a node and the value of α from the conditional
309
probability tables (CPTs). It also defines an initial Bayesian network from
310
where the search for the best structure will start. This initial network has
311
as many nodes as the attributes in the dataset, and all nodes are connected
312
only to the class node. The algorithm sets this initial network as the cur-
313
rent best, and evaluates it using the Heckerman-Geiger-Chickering (HGC)
314
scoring metric with prior (lines 4-5) (Sacha, 1999). HGC determines the
315
posterior probability of the BNC given the training data, and its main idea
316
is to reward BNCs that have a good approximation of the joint probability
317
distribution of the predictive attributes and the class.
318
After these initial steps, the algorithm performs a hill climbing search
319
to build a BNC (lines 6-23). Line 7 defines the stopping criterion as false.
320
This variable will be update when no BNCs better than the current best are
321
found. The hill climbing method (Hesar et al., 2012) initializes the search by
322
trying to find the best (local) operation to perform in the current best BNC
323
(lines 10-17). These operations include adding, deleting or reversing an edge
324
in the Bayesian network. The method applies the respective operation to
325
the current best BNC only if the operation generates a valid BNC, i.e., if
326
it respects the direct acyclic graph properties and the maximum number of
327
parents a node can have. The best network generated in the previous step
328
is found using the HGC metric over the dataset in the variations created
15
329
from the best network (line 12). If any of the BNC variations has a value
330
of HGC higher than the best BNC, it becomes the new best (lines 18-22).
331
This process goes on until the score of the new candidate networks are not
332
better than the score of the current best network. Algorithm 1 Fraud-BNC algorithm. Input: dataset; parents = 5
// Maximum number of parents a node can have
α = 8.031
// α for CPT estimator
bestBNC = Graph with all attributes from dataset as nodes and edges only to the class bestScore = Evaluate bestBNC using HGC with prior do stop = false
// stopping criterion
BNC = bestBNC score = bestScore BNCVariations = Set of valid BNCs generated by adding, removing or reversing an edge for (candBNC in BNCVariations) do candScore = Evaluate candBNC using HGC with prior if (candScore > score) then score = candScore BNC = candBNC end if end for if (score > bestScore) then bestScore = score bestBNC = BNC stop = true end if while !(stop) LearnParameters(bestBNC, α) Return bestBNC
333
After the structure is defined, the parameter learning starts, and the
334
conditional probability tables (CPTs) are estimated (line 24) using the value
335
of α set in line 3 (8.031). Finally, the best BNC is returned (line 25).
336
When compared to other existing BNC algorithms in the literature, in-
337
cluding Na¨ıve Bayes (NB) (Witten et al., 2011), Tree Augmented Na¨ıve
338
Bayes (TAN) (Friedman et al., 1997) and K2 (Cooper and Herskovits, 1992),
339
the algorithm produced by HHEA presents the following differences. While
340
NB assumes that, given a class attribute, all predictive attributes are inde16
341
pendent (the maximum number of parents is equals to one), TAN builds a
342
tree to represent the relationships between predictive attributes (maximum
343
number of parents equals to two). K2, in turn, represents these relations us-
344
ing a less restrictive direct acyclic graph. In K2, it is also necessary to define
345
the maximum number of parents, but it can be considered more restrictive
346
because it sets a causal order of the variables on the BNC. Additionally, both
347
TAN and K2 do not delete edges from the structure of the BNC, an operation
348
that can be performed by Fraud-BNC algorithm. Fraud-BNC also allows the
349
class to have causal precedents in its produced model, representing a more
350
general model, called General Bayesian Network Classifier (GBN) (Cheng
351
and Greiner, 1999). This type of model cannot be generated by NB, TAN
352
and K2.
353
Fraud-BNC is investigated under two scenarios to deal with cost-sensitive
354
classification: instance reweighing and probability threshold analysis. The
355
first approach takes the misclassification costs into consideration by reweigh-
356
ing the training instances according to a set of predefined weights (Witten
357
et al., 2011). The goal of this type of algorithm is to minimize the total
358
misclassification cost, usually aiming to enhance the classification output in
359
the false negatives (Weiss, 2004). As a consequence, this method leads to
360
better responses to the class-imbalanced problem (Liu and Zhou, 2006).
361
In the second scenario, we analyzed the results of the probabilities asso-
362
ciated to the predictions of the classifiers in a validation set to modify the
363
algorithm probability threshold that separates fraudulent from legitimate
364
transactions. Fraud-BNC outputs a probability of a transaction being le-
365
gitimate or fraudulent. Usually, when the class probability of a transaction
366
is lower than 0.5, the transaction is classified as legitimate. Otherwise, it
367
is considered as a fraud. However, in real-world systems, where the cost of 17
368
missing a fraud is much higher, increasing the probability of classifying a
369
transaction as a fraud might be a good option. We make an analysis of the
370
probability values output for the transaction and change the class threshold
371
to increase the value of economic efficiency, although the threshold could
372
also be used to increase the value of F1 .
373
6. Evaluation Metrics: Classification versus Financial Returns
374
This paper deals with a real-world problem, where apart from the re-
375
sults obtained by classification algorithms, the financial costs of missing a
376
fraud are paramount. Therefore, the results obtained should not be evalu-
377
ated considering only classification metrics, but also what we call economic
378
efficiency.
379
Before defining the metrics of interest, let us illustrate all scenarios that
380
might happen during classification. In Table 2, the lines represent the true
381
(real) classes and the columns the predicted ones. True positives (TP) corre-
382
spond to transactions correctly predicted as frauds and true negatives (TN)
383
correspond to transactions correctly predicted as legitimate. False posi-
384
tives (FP) and false negatives (FN) describe, respectively, the number of
385
incorrectly classified legitimate and incorrectly classified fraudulent trans-
386
actions. In financial terms, a TP avoids the company losing money, as the
387
transaction is not authorized, while a TN represents a profit of k% of the
388
transaction value. A FN, in turn, represents a loss of (100-k)% of the trans-
389
action value, as the company gains k % but pays 100% of the cost for the
390
fraud (chargeback). Finally, a FP indicates that the company missed profit
391
by rejecting a legitimate transaction.
18
Table 2: Confusion matrix for the fraud detection problem.
aa aa Predicted aa Fraud aa Real a
392
Legitimate
Fraud
TP
FN
Legitimate
FP
TN
6.1. Classification metrics
393
The performance of the classification algorithms are measured using con-
394
ventional classification metrics, being F1 the main one (Witten et al., 2011).
395
F1 corresponds to the harmonic mean between precision and recall and is
396
defined in Equation 1. It is an appropriate measure as it accounts for differ-
397
ent levels of class imbalance, and considers both precision and recall, which
398
are defined in the Equation 2 and 3, respectively. F1 =
2 · (P recision · Recall) (P recision + Recall)
P recision =
Recall =
399
TP TP + FP
TP TP + FN
(1)
(2)
(3)
6.2. Economic Efficiency
400
The term economic efficiency (EE) accounts for the economic returns
401
a company receives when it correctly classifies legitimate transactions sub-
402
tracted by the losses accumulated for not identifying a fraud, as defined in
403
Equation 4. In Equation 5, k represents the percentage that the company
404
retains per transaction, vi the monetary value of transaction i, and x and 19
405
y the number of legitimate and fraudulent transactions, respectively. Note
406
that the losses are much more penalized than the gains.
EE =
#Transactions X
Returns
(4)
i=1
Returns =
x P (vi · k)
, if i is legitimate (TN).
i=1 y P
(5)
(−vi · (1 − k)) , if i is fraudulent (FN).
i=1
407
408
409
In UOL’s case, k assumes the value 0.03, i.e., the company profits 3% for all transactions and loses 97% of the value of a missed fraud.
410
In order to understand how economically efficient a system is, we need to
411
introduce three other concepts: real economic efficiency (EEReal ), maximum
412
economic efficiency (EEM ax ) and minimum economic efficiency (EEM in ).
413
EEReal corresponds to the values of the metric for the current model used
414
by the company. EEM ax corresponds to the maximum value the company
415
can profit in an ideal scenario, where all frauds are identified and no legiti-
416
mate transactions denied (only the first case in Equation 5 holds). EEM in ,
417
in contrast, reflects the opposite scenario, where all transaction would be
418
misclassified, being all fraudulent transaction authorized and all legitimate
419
transaction denied (only the second case in Equation 5 holds).
420
As we cannot present the actual values of the transactions due to our
421
confidentiality agreement, we present values for the relative economic effi-
422
ciency regarding the current model used by PagSeguro, which is measured
423
by Equation 6.
EErelative =
(EE − EEReal ) (EEM ax − EEReal )
20
(6)
424
7. Experimental Results
425
This section shows the results obtained by Fraud-BNC in terms of F1 and
426
economic efficiency considering the two strategies previously introduced to
427
deal with cost-sensitive classification. The results obtained by Fraud-BNC
428
are compared with those obtained by three traditional BNC algorithms:
429
Na¨ıve Bayes (NB), Tree Augmented Na¨ıve Bayes (TAN) and K2. For all
430
algorithms, the values of the parameter α was set to 0.5. K2 uses the
431
Bayesian score and a maximum number of parent nodes equals to three.
432
The frameworks jBNC (Sacha, 1999) and WEKA (Witten et al., 2011) were
433
used to execute the algorithms.
434
We also compare Fraud-BNC results with those obtained by other types
435
of learning algorithms (Witten et al., 2011): Logistic Regression, Support
436
Vector Machine (SVM), Random Forests and J48 (implementation of C4.5).
437
Apart from SVM, all algorithms were also executed using WEKA. For Ran-
438
dom Forest, three configurations for the number of trees generated were
439
tested: 10, 20 and 30. The value 30, for the number of trees, presented itself
440
as the best trade-off between F1 and economic efficiency, being consequently
441
chosen for Random Forest. For SVM, the package LibSVM (Chang and Lin,
442
2011) was used together with a grid search (easy tool (Hsu et al., 2010)) to
443
optimize the values of the parameters cost (C) and γ of a radial basis func-
444
tion kernel. The values chosen for the parameters were set as 8192.0 for C
445
and 0.03125 for γ.
446
7.1. Fraud-BNC with instance reweighing
447
In this section, we present the results of both Fraud-BNC and the other
448
baselines considering different weights for the number of false negatives (fraud-
449
ulent misclassified as legitimate transactions). As previously explained, the 21
450
company loses significantly more money when missing frauds than the other
451
way round. For this reason, the results of F1 presented start with equal
452
weights for both false positive and false negative examples, which is the
453
standard classification approach. Then, the weight of false negatives is in-
454
creased from 2 to 4, making false negatives more relevant.
455
The results of F1 and economic efficiency are shown in Tables 3 and 4,
456
respectively, and were obtained with a five fold cross-validation procedure.
457
For cross-validation, the partitions were created following the same distribu-
458
tion used for training HHEA (one fraudulent for six legitimate transactions).
459
However, the test set for each partition was enhanced with the remaining
460
legitimate transactions (not used for training) to simulate the real scenario.
461
All results for F1 in Table 3 are compared against Fraud-BNC using 90%
462
confidence intervals (in Student’s t-distribution) on the difference between
463
means (Jain, 1991). The symbols N (H) indicate whether the method in the
464
line is statistically significantly better (worse) than Fraud-BNC. Lines with
465
no indication are those where the results obtained by the methods did not
466
present statistical significance. Table 3: Results of F1 obtained by the proposed algorithm and other baselines using cost-sensitive learning. Algorithms
FN cost variation (FP equals to 1) 1
2
3
4
Fraud-BNC
0.827 (0.026)
0.768 (0.033)
0.732 (0.006)
0.686 (0.021)
NB
0.573 (0.014)H
0.563 (0.016)H
0.561 (0.017)H
0.558 (0.018)H
TAN
0.724 (0.015)H
0.692 (0.017)H
0.671 (0.017)H
0.657 (0.016)H
K2
0.755 (0.010)H
0.723 (0.011)H
0.697 (0.012)H
0.680 (0.013)H
Logistic Regression
0.705 (0.011)H
0.671 (0.013)H
0.645 (0.012)H
0.623 (0.011)H
0.854 (0.008)
0.847 (0.010)N
0.844 (0.012)N
0.833 (0.016)N
Random Forest
0.784 (0.013)H
0.699 (0.016)H
0.645 (0.014)H
0.621 (0.008)H
J48
0.774 (0.013)H
0.772 (0.017)
0.772 (0.017)N
0.772 (0.017)N
SVM
22
467
Notice that the results obtained by Fraud-BNC are statistically signifi-
468
cant better than those obtained by all the other methods except SVM when
469
no classification cost is considered, where the results present no statistical
470
difference. Although both methods obtain similar results, the models gener-
471
ated by Fraud-BNC are interpretable, while the results returned by SVM are
472
not. As the weight given to the false negatives increases, the results of SVM
473
and J48 improve over the results of Fraud-BNC. This is not surprising, as
474
the method was conceived based on a problem where no weight differences
475
in terms of false positives or negatives were considered.
476
The overall best results of F1 are those where FP and FN receive the
477
same weight. Hence, either SVM or Fraud-BNC would be the recommended
478
algorithms. However, the latter is preferred as the interpretability of the
479
model is crucial in application where specialists need to understand the
480
decision making process.
481
Regarding the results of economic efficiency reported in Table 4, again
482
the symbols N (H) indicate whether the method in the line is statistically
483
significantly better (worse) than Fraud-BNC. Lines with no indication are
484
those where the results obtained did not show statistical difference.
485
Note that the results in this table are not consistent with those presented
486
in Table 3. Here TAN and K2 present statistically significant better results
487
than Fraud-BNC, and SVM has statistically significant worse results than
488
Fraud-BNC. This can be explained by the fact that the economic efficiency
489
depends highly on the value of the transaction, and hence it might be better
490
for the system to miss low value frauds but never ignore high value ones.
491
In sum, these results show that the best methods in terms of F1 differ
492
from those that provide the best EE, and finding a trade-off between the
493
values of these two metrics is crucial. However, this can only be done if the 23
Table 4: Results of economic efficiency obtained by the proposed algorithm and other baselines using cost-sensitive learning. Algorithms
FN cost variation (FP equals to 1) 1
2
3
4
Fraud-BNC
0.693 (0.005)
0.722 (0.008)
0.734 (0.010)
0.742 (0.007)
NB
0.591 (0.013)H
0.597 (0.008)H
0.605 (0.011)H
0.617 (0.018)H
TAN
0.728 (0.010)N
0.742 (0.011)N
0.750 (0.011)N
0.753 (0.011)N
K2
0.718 (0.007)N
0.737 (0.012)N
0.745 (0.012)N
0.749 (0.013)N
Logistic Regression
0.552 (0.017)H
0.592 (0.015)H
0.618 (0.015)H
0.634 (0.016)H
SVM
0.626 (0.010)H
0.638 (0.009)H
0.638 (0.015)H
0.650 (0.010)H
Random Forest
0.706 (0.012)
0.745 (0.018)N
0.747 (0.018)
0.747 (0.014)
J48
0.677 (0.026)
0.677 (0.026)H
0.677 (0.026)H
0.677 (0.026)H
494
value of the frauds missed are accounted for. The main reasons for finding
495
this trade-off are that we want BNC algorithms to: (i) be able to generalize
496
for future data and, (ii) generate profit for the company. The first is only
497
achieved by looking at the classification measures, like F1 . The second is
498
strictly associated to economic measures, like EE.
499
7.2. Fraud-BNC with probability threshold analysis
500
This section presents the results obtained when analyzing the values of
501
the probability threshold that defines a fraud. We present the results for a
502
subset of classifiers, which were the ones that obtained the best results in
503
terms of F1 or economic efficiency in the previous section. They are: SVM,
504
TAN, K2 and Fraud-BNC. Recall that although Fraud-BNC and SVM were
505
statistically better than the other two algorithms (TAN and K2) in terms
506
of F1 , their results were worse in terms of economic efficiency.
507
Figure 3 shows the results of F1 and EE in the validation and test sets
508
for the four aforementioned algorithms. We can observe a great variation
509
of the algorithms curves in both metrics and in both sets. This happens
510
mainly due to the variation of the class probability threshold (axis x) to 24
511
predict fraudulent transactions. Usually, this threshold is equal to 0.5 for
512
solving standard classification problem. Nevertheless, we want here to check
513
whether the change of this threshold could result in more profit/smaller
514
losses for PagSeguro.
515
Complementary, these four learning algorithms may behave differently
516
when performing classification because they take different assumptions to
517
create the classification model. For instance, Fraud-BNC may consider the
518
class node having causal predecessors, something that TAN and K2 would
519
not. In addition, the idea of finding the most suitable hyperplane (i.e., the
520
one that maximizes the margin) to separate the examples into classes in
521
the SVM algorithm is totally dissimilar to the idea of finding the best joint
522
probability distribution that fits the variables in the case of the Bayesian
523
algorithms. Therefore, these assumptions could also result in different values
524
of the considered metrics for distinct threshold levels.
525
One interesting thing to notice in the graphs is that the curves of TAN
526
and K2, in both validation and test sets, present more constant values than
527
the ones obtained by SVM and Fraud-BNC. This might happen because
528
the values of probability these algorithms return to fraud examples are, in
529
most cases, very low. Note that, for economic efficiency, Fraud-BNC is bet-
530
ter than all other methods when the threshold considered in the validation
531
set is smaller than 0.3. In the test set, these values present no difference
532
from those obtained by TAN or K2. When the threshold value is equal to
533
0.3, Fraud-BNC achieves 72.64% of relative economic efficiency, which rep-
534
resents the highest economic performance regarding the current company’s
535
scenario. However, notice that the values of F1 for threshold 0.3 with Fraud-
536
BNC are always superior to those presented by TAN and K2, being only
537
worse than those obtained by SVM. Considering that Fraud-BNC produces 25
0.90
●
●
●
●
●
●
●
0.90
●
●
●
●
●
0.80
●
●
0.80
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0.70
0.70
●
●
F1−Measure
0.50 0.40
0.50
●
●
0.40 0.30
1.00
●
●
●
●
●
●
●
● ●
0.60 ●
0.50
0.95
0.90
0.80
0.75
0.70
0.65
0.60
● ●
0.70 ●
●
0.40
●
●
●
● ● ● ●
●
●
0.60
● ● ●
●
0.50
●
0.30
Fraud−BNC SVM TAN K2
●
●
●
0.70
0.55
0.80
Economic Efficiency
Economic Efficiency
(b) F1 in the test set.
Fraud−BNC SVM TAN K2
● ●
0.90
●
0.50
Threshold
(a) F1 in the validation set.
0.80
0.45
0.40
0.35
0.30
0.25
0.20
0.15
0.05
0.10
Threshold
●
Fraud−BNC SVM TAN K2
●
0.20
0.95
0.90
0.80
0.75
0.70
0.65
0.60
0.55
0.50
0.45
0.40
0.35
0.30
0.25
0.20
0.15
0.10
0.05
0.10
0.85
Fraud−BNC SVM TAN K2
●
0.20
0.85
0.30
0.60
0.10
F1−Measure
●
0.60
● ● ●
●
●
Threshold
0.95
0.90
0.85
0.80
0.75
0.70
0.65
0.60
0.55
0.50
0.45
0.40
0.35
0.30
0.25
0.20
0.15
0.05
0.40
0.95
0.90
0.85
0.80
0.75
0.70
0.65
0.60
0.55
0.50
0.45
0.40
0.35
0.30
0.25
0.20
0.15
0.10
0.05
0.10
0.10
0.20
Threshold
(c) Econ. Efficiency in the validation set.
(d) Econ. Efficiency in the test set.
Figure 3: Results for F1 and Economic Efficiency considering different class probability thresholds for fraudulent transactions.
26
538
interpretable models, Fraud-BNC with a modified threshold of 0.3 can be
539
considered the best choice among the tested algorithms.
540
Comparing these results with those obtained when reweighing instances
541
for Fraud-BNC, they present no statistical difference in terms of F1 and
542
better results of economic efficiency. Hence, we recommend the user to use
543
the algorithm and perform a threshold analysis in the validation set or use
544
values of the class threshold equals to or lower than 0.3.
545
Figure 4 shows a graph of the values of the transactions (omitted due
546
to Non-Disclosure Agreement - NDA) and the probability that Fraud-NBC
547
returns for that transaction, ordered by the probability of fraud. If we use
548
a threshold of 0.3, anything above this value is considered a fraud, which
549
corresponds to identifying 71.87% of the fraudulent transactions correctly.
550
If the default values of classification threshold (0.5) were used, then 64.97%
551
of the fraudulent transactions would be correctly classified. This difference
552
corresponds to missing approximately 1,000 fraudulent transactions, which
553
brings a high negative economic effect for the company.
554
On the other hand, for threshold 0.3, we have 98.31% of non fraudulent
555
transactions correctly classified, against 99.07% when the threshold is set
556
to 0.5. As we have many more legitimate transactions, this difference cor-
557
responds to around 7,000 transactions, but that, in the current model used
558
by the company, does not have any economic effect. Note that the values of
559
the transactions do not follow a pattern, but, in general, fraudulent transac-
560
tions do not have very high values. In the cases they do, Fraud-BNC mostly
561
classifies them with very high probabilities of being a fraud.
27
Figure 4: Classification probabilities returned by Fraud-BNC versus the transaction value.
562
7.3. When looking at the results and probabilities is not enough
563
PagSeguro has an auditing system to verify the inconsistent classifica-
564
tions and, consequently, improve the company’s results. By showing to the
565
decision maker the graph produced by Fraud-BNC together with Figure 4,
566
she or he can focus on analyzing the transactions with high monetary value
567
and probability of being a fraud close to the threshold being considered and
568
understand how these transactions where identified.
569
Figure 5 presents the base of the auditing system, i.e., the directed acyclic
570
graph (DAG) representing a Bayesian Network Classifier (BNC) produced
571
by Fraud-BNC. It consists of a set of 22 nodes ({A1, ..., A22}) representing
572
the predictive attributes of the dataset and a class C (fraud or not fraud).
573
The edges define the causal-effect relationships among the attributes, con-
574
sidering the class variable. We will not show the conditional probability
28
575
tables (CPTs) of each node in the graph for the sake of simplicity, as the
576
resultant BNC is quite sophisticated. Note that some of the relationships
577
previously shown in the correlation matrix of Figure 2 appear in the graph.
578
For instance, the high correlation between attribute A1 and class C, and
579
the influences of A4 on many attributes, including A5 (not directly, but via
580
A8 or A21), A8, A11, A13 (if A13 influences A4, A4 also influences A13),
581
A17, A20 and A21. A1
C
A16
A2
A15
A9
A10
A5
A18
A3
A14
A17 A12
A6
A22 A8
A4 A20
A19
A21
A11
A13
A7
Figure 5: The BNC model generated by Fraud-BNC.
582
Consider that our intention is to classify a new transaction as a fraud or
583
legitimate. We first apply the model in Figure 5 to the attributes of the new
584
transaction. Let us assume the model produced two probabilities: 0.305 for
585
class fraud and 0.695 for class legitimate. If our threshold is 0.3 for classi-
586
fying an example as a fraud (and 0.305 is very close to the threshold), the
587
specialist should analyze the DAG to have a more appropriate classification.
588
A straightforward approach to do this is to look the causal-effect relation-
589
ships (edges) that affect the class node in the first order (direct links). In 29
590
the case of Figure 5, a practitioner should check the values and CPTs of at-
591
tributes A1, A2, A3, A14 and A16 before making a final decision of whether
592
the transaction is really a fraud using his expert knowledge. Higher order
593
relationships between the class and the other attributes can also be explored
594
to create a robust auditing system, even being more complex to analyze.
595
8. Conclusions and Future Work
596
This work presented Fraud-BNC, a customized Bayesian Network Classi-
597
fier (BNC) algorithm to solve a real-world credit card fraud-detection prob-
598
lem. The Fraud-BNC algorithm was automatically generated by a Hyper-
599
Heuristic Evolutionary Algorithm (HHEA), which creates customized so-
600
lutions for classification datasets. Fraud-BNC was evaluated on a dataset
601
from PagSeguro. Nevertheless, this algorithm is general enough to solve
602
other classification problems from the literature.
603
We tested different approaches to deal with two problems inherent to
604
fraud transaction data: class imbalance and the fact that miss-classified
605
frauds have a different cost than miss-classifying legitimate transactions.
606
The produced algorithm was analyzed considering two strategies to solve
607
these problems: instance reweighing and class probability threshold analysis.
608
The results obtained by Fraud-BNC were compared to methods within
609
the Bayesian framework and to other state-of-the-art classification methods.
610
Two different types of metrics were considered: a classification measure (F1 )
611
and a metric that assesses the economic impact of the model to the com-
612
pany (economic efficiency). Results show that the best algorithms in terms
613
of F1 usually is not the same that obtains the best values of economic ef-
614
ficiency. This happens because the latter is highly impacted by the value
30
615
of the transaction. Based on our experiments, we believe is most beneficial
616
for the decision maker to use Fraud-BNC following a probability threshold
617
approach. The results of the method can be used together with the values
618
of the transaction to help in the decision making process.
619
One thing worth investigating in the future is whether the strategies re-
620
lated to cost-sensitive classification could be added to the components given
621
to the hyper-heuristic. In this way, HHEA would be able to test in which
622
scenarios different approaches for dealing with cost-sensitive classification
623
are more beneficial. Additionally, a multi-objective optimization framework
624
could be implemented, making the hyper-heuristic to optimize both accu-
625
racy and economic efficiency simultaneously.
626
Acknowledgments
627
This work was partially supported by the following Brazilian Research
628
Support Agencies: CNPq (481204/2013-0, 573871/2008-6, 459301/2014-4),
629
CAPES and FAPEMIG (PPM-00650-15, APQ-01400-14).
630
References
631
Adewumi, A. O., Akinyelu, A. A., 2016. A survey of machine-learning and
632
nature-inspired based credit card fraud detection techniques. International
633
Journal of System Assurance Engineering and Management, 1–17.
634
635
Alvarez, G., Petrovic, S., 2003. A new taxonomy of web attacks suitable for efficient encoding. Computers & Security 22 (5), 435–449.
636
Barros, R. C., Basgalupp, M. P., Freitas, A. A., de Carvalho, A. C. P.
637
L. F., Dec 2014. Evolutionary design of decision-tree algorithms tailored to
31
638
microarray gene expression data sets. IEEE Transactions on Evolutionary
639
Computation 18 (6), 873–892.
640
641
642
643
Bielza, C., Larra˜ naga, P., 2014. Discrete Bayesian network classifiers: A survey. ACM Computing Surveys 47 (1), 5:1–5:43. Brazdil, P., Giraud-Carrier, C., Soares, C., Vilalta, R., 2008. Metalearning: applications to data mining. Springer.
644
Caldeira, E., Brandao, G., Campos, H., Pereira, A., 2012. Characterizing
645
and evaluating fraud in electronic transactions. In: Proceedings of the
646
Latin American Web Congress. pp. 115–122.
647
648
Chandola, V., Banerjee, A., Kumar, V., Jul. 2009. Anomaly detection: A survey. ACM Computing Surveys 41 (3), 15:1–15:58.
649
Chang, C.-C., Lin, C.-J., 2011. LIBSVM: A library for support vector ma-
650
chines. ACM Transactions on Intelligent Systems and Technology 2, 27:1–
651
27:27, available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
652
Cheng, J., Greiner, R., 1999. Comparing Bayesian network classifiers. In:
653
Proceedings of the Conference on Uncertainty in Artificial Intelligence.
654
Morgan Kaufmann Publishers Inc., pp. 101–108.
655
656
657
658
659
660
Cooper, G. F., Herskovits, E., 1992. A Bayesian method for the induction of probabilistic networks from data. Machine Learning 9, 309–347. CyberSource, 2016. Online fraud report (Latin America edition). Tech. rep., CyberSource Corporation, a Visa Company. Daly, R., Shen, Q., Aitken, S., 2011. Learning Bayesian networks: approaches and issues. The Knowledge Engineering Review 26, 99–157. 32
661
Duman, E., Ozcelik, M. H., 2011. Detecting credit card fraud by genetic
662
algorithm and scatter search. Expert Systems with Applications 38 (10),
663
13057–13063.
664
665
666
667
668
669
Eiben, A. E., Smith, J. E., 2003. Introduction to Evolutionary Computing. Springer. Freitas, A. A., 2014. Comprehensible classification models: A position paper. ACM SIGKDD Explorations Newsletter 15 (1), 1–10. Friedman, N., Geiger, D., Goldszmidt, M., 1997. Bayesian network classifiers. Machine Learning 29, 13–163.
670
Fu, K., Cheng, D., Tu, Y., Zhang, L., 2016. Credit card fraud detection
671
using convolutional neural networks. In: Proceedings of the International
672
Conference on Neural Information Processing. Springer, pp. 483–490.
673
Gadi, M. F., Wang, X., Lago, A. P., 2008. Credit card fraud detection with
674
artificial immune system. In: Proceedings of the International Conference
675
on Artificial Immune Systems. Springer, pp. 119–131.
676
Guo, T., Li, G. Y., July 2008. Neural data mining for credit card fraud
677
detection. In: Proceedings of the International Conference on Machine
678
Learning and Cybernetics. Vol. 7. pp. 3630–3634.
679
Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., Bing, G.,
680
2016. Learning from class-imbalanced data: Review of methods and ap-
681
plications. Expert Systems with Applications 73, 220–239.
682
Halvaiee, N. S., Akbari, M. K., 2014. A novel model for credit card fraud
683
detection using artificial immune systems. Applied Soft Computing 24,
684
40–49. 33
685
686
He, H., Garcia, E. A., 2009. Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering 21 (9), 1263–1284.
687
Hens, A. B., Tiwari, M. K., Jun. 2012. Computational time reduction for
688
credit scoring: An integrated approach based on support vector machine
689
and stratified sampling method. Expert Systems with Applications 39 (8),
690
6774–6781.
691
Hesar, A. S., Tabatabaee, H., Jalali, M., 2012. Structure learning of Bayesian
692
networks using heuristic methods. In: Proceedings of International Con-
693
ference on Information and Knowledge Management.
694
695
Hsu, C.-W., Chang, C.-C., Lin, C.-J., 2010. A practical guide to support vector classification. Tech. rep., National Taiwan University.
696
Jain, R., 1991. Art of Computer Systems Performance Analysis Techniques
697
For Experimental Design Measurements Simulation And Modeling. John
698
Wiley & Sons.
699
Kotsiantis, S. B., 2007. Supervised machine learning: A review of classifica-
700
tion techniques. In: Proceedings of the Conference on Emerging Artificial
701
Intelligence Applications in Computer Engineering. IOS Press, pp. 3–24.
702
Lindqvist, U., Jonsson, E., 1997. How to systematically classify computer
703
security intrusions. In: Proceedings of the IEEE Symposium on Security
704
and Privacy. pp. 154–163.
705
Liu, X. Y., Zhou, Z. H., Dec 2006. The influence of class imbalance on cost-
706
sensitive learning: An empirical study. In: Proceeding of the International
707
Conference on Data Mining. pp. 970–974.
34
708
Ngai, E. W. T., Hu, Y., Wong, Y. H., Chen, Y., Sun, X., Feb. 2011. The ap-
709
plication of data mining techniques in financial fraud detection: A classifi-
710
cation framework and an academic review of literature. Decision Support
711
Systems 50 (3), 559–569.
712
Pappa, G., Ochoa, G., Hyde, M., Freitas, A., Woodward, J., Swan, J.,
713
2014. Contrasting meta-learning and hyper-heuristic research: the role of
714
evolutionary algorithms. Genetic Programming and Evolvable Machines
715
15 (1), 3–35.
716
717
718
719
Pappa, G. L., Freitas, A. A., 2009. Automating the design of data mining algorithms: an evolutionary computation approach. Springer. Quinlan, J. R., Mar. 1986. Induction of decision trees. Machine Learning 1 (1), 81–106.
720
Ravisankar, P., Ravi, V., Raghava Rao, G., Bose, I., 2011. Detection of finan-
721
cial statement fraud and feature selection using data mining techniques.
722
Decision Support Systems 50 (2), 491–500.
723
S´ a, A. G. C., Pappa, G. L., 2013. Towards a method for automatically evolv-
724
ing Bayesian network classifiers. In: Proceedings of Annual Conference
725
Companion on Genetic and Evolutionary Computation. pp. 1505–1512.
726
S´ a, A. G. C., Pappa, G. L., 2014. A hyper-heuristic evolutionary algo-
727
rithm for learning Bayesian network classifiers. In: Proceedings of Ibero-
728
American Conference on Artificial Intelligence. pp. 430–442.
729
730
Sacha, J. P., 1999. New synthesis of Bayesian network classifiers and cardiac spect image interpretation. Ph.D. thesis, The University of Toledo.
35
731
Sahin, Y., Bulkan, S., Duman, E., 2013. A cost-sensitive decision tree ap-
732
proach for fraud detection. Expert Systems with Applications 40 (15),
733
5916–5923.
734
Salama, K. M., Freitas, A. A., 2013. Extending the ABC-Miner Bayesian
735
classification algorithm. In: Proceedings of the Workshop on Nature In-
736
spired Cooperative Strategies for Optimization. pp. 1–12.
737
Sundarkumar, G. G., Ravi, V., 2015. A novel hybrid undersampling method
738
for mining unbalanced datasets in banking and insurance. Engineering
739
Applications of Artificial Intelligence 37 (Supplement C), 368–377.
740
Thai-Nghe, N., Gantner, Z., Schmidt-Thieme, L., July 2010. Cost-sensitive
741
learning methods for imbalanced data. In: Proceedings of the Interna-
742
tional Joint Conference on Neural Networks. pp. 1–8.
743
Vlasselaer, V. V., Bravo, C., Caelen, O., Eliassi-Rad, T., Akoglu, L., Snoeck,
744
M., Baesens, B., 2015. APATE: A novel approach for automated credit
745
card transaction fraud detection using network-based extensions. Decision
746
Support Systems 75, 38–48.
747
748
749
750
751
752
Weiss, G. M., Jun. 2004. Mining with rarity: A unifying framework. ACM SIGKDD Explorations Newsletter 6 (1), 7–19. West, J., Bhattacharya, M., 2016. Intelligent financial fraud detection: A comprehensive review. Computers & Security 57, 47–66. Witten, I. H., Frank, E., Hall, M. A., 2011. Data mining: practical machine learning tools and techniques. Morgan Kaufmann Publishers Inc.
753
Wolpert, D. H., Macready, W. G., 1997. No free lunch theorems for opti-
754
mization. IEEE Transactions on Evolutionary Computation 1 (1), 67–82. 36