Document not found! Please try again

A Customized Classification Algorithm for Credit-Card Fraud Detection

0 downloads 0 Views 561KB Size Report
Jul 15, 2018 - (BNC) algorithm for a real credit card fraud detection problem. ... the volume of fraudulent e-commerce credit card transactions (chargeback). 3.
A Customized Classification Algorithm for Credit-Card Fraud Detection Alex G. C. de S´ a∗, Adriano C. M. Pereira, Gisele L. Pappa Computer Science Department, Universidade Federal de Minas Gerais (UFMG), 31270-010, Belo Horizonte, Minas Gerais, Brazil

Abstract This paper presents Fraud-BNC, a customized Bayesian Network Classifier (BNC) algorithm for a real credit card fraud detection problem. The task of creating Fraud-BNC was automatically performed by a Hyper-Heuristic Evolutionary Algorithm (HHEA), which organizes the knowledge about the BNC algorithms into a taxonomy and searches for the best combination of these components for a given dataset. Fraud-BNC was automatically generated using a dataset from PagSeguro, the most popular Brazilian online payment service, and tested together with two strategies for dealing with cost-sensitive classification. Results obtained were compared to seven other algorithms, and analyzed considering the data classification problem and the economic efficiency of the method. Fraud-BNC presented itself as the best algorithm to provide a good trade-off between both perspectives, improving the current company’s economic efficiency in up to 72.64%. Keywords: credit card fraud, Bayesian network classifiers, hyper-heuristic.



Corresponding author. Tel: +55 31 3409-7536 Email addresses: [email protected] (Alex G. C. de S´ a), [email protected] (Adriano C. M. Pereira), [email protected] (Gisele L. Pappa)

Elsevier

July 15, 2018

1

1. Introduction

2

In 2016, a report by CyberSource (CyberSource, 2016) pointed out that

3

the volume of fraudulent e-commerce credit card transactions (chargeback)

4

in Latin America corresponds to 1.4% of the total net of the sector. Auto-

5

matically identifying these transactions has many open challenges. Among

6

them are the high volume of transactions that needs to be processed in al-

7

most real-time and the fact that frauds do not occur frequently, generating

8

very imbalanced datasets. Furthermore, accepting a fraud as a legitimate

9

transaction has a much higher cost than identifying a legitimate transaction

10

as a fraud, as the seller economic losses are much higher in the first case,

11

which generates chargeback.

12

There are different ways of modeling the credit card fraud detection prob-

13

lem, and among the most common approaches are those created to identify

14

anomalies (Halvaiee and Akbari, 2014) and those based on classical methods

15

for data classification (Hens and Tiwari, 2012). This paper focuses on the

16

latter, and models the problem as a classification task, where a classifier is

17

conceived to distinguish fraudulent from legitimate transactions.

18

In particular, we are interested in algorithm that generate interpretable

19

models (classifiers), such as decision trees, classification rules or Bayesian

20

network classifiers (Kotsiantis, 2007). This is because decision makers are

21

more comfortable in accepting automatic decisions they can understand

22

(Freitas, 2014). Although it is well-known that in some domains these meth-

23

ods present lower accuracy than black-box models such as Support Vector

24

Machines, sacrificing accuracy to gain interpretability is a worth trade-off

25

in alert systems.

26

There is a variety of classification algorithms that can generate inter-

2

27

pretable models in the literature. According to the No Free Lunch Theorem

28

(Wolpert and Macready, 1997), the choice of which of these algorithms is the

29

best for a given dataset is still an open problem. The areas of meta-learning

30

and hyper-heuristics have offered different solutions for automatically test-

31

ing different types of algorithms (Pappa et al., 2014). While the literature of

32

meta-learning has focused on selecting the best algorithm according to the

33

characteristics of the target problem (Brazdil et al., 2008), hyper-heuristic

34

methods have proposed different ways of generating customized algorithms

35

for different datasets, which we considered more interesting for this work.

36

A hyper-heuristic is a high-level approach that, given a particular prob-

37

lem instance and a number of low-level heuristics, can select and apply

38

an appropriate low-level heuristic at each decision point. Hyper-heuristics

39

methods have been already conceived for building algorithms to solve spe-

40

cific classification problems (Pappa and Freitas, 2009; S´a and Pappa, 2014).

41

These methods help experts and practitioners in the following task: given

42

a new classification dataset, which is the most suitable combination of the

43

learning algorithms’ components to solve this new problem? In this paper,

44

we take advantage of one of these methods, and use Hyper-Heuristic Evo-

45

lutionary Algorithm (HHEA) (S´a and Pappa, 2014) to create a customized

46

Bayesian Network Classifier (BNC) algorithm, named Fraud-BNC, specifi-

47

cally for detecting frauds in a dataset of interest.

48

We chose to work with BNC algorithms for fraud detection because

49

they are robust statistical methods to classify data. They are based on the

50

theoretical foundations of Bayesian networks (Bielza and Larra˜ naga, 2014)

51

and produce a classification model that assumes cause-effect relations among

52

all data attributes (including the class) (Cheng and Greiner, 1999). These

53

relationships can be used to gain understanding about a problem domain as 3

54

the output BNC model is represented by a directed acyclic graph (DAG).

55

In the DAG, each node maps an attribute and edges define probabilistic

56

dependencies among them. Each node is also associated with a conditional

57

probability table, which represents the network parameters.

58

The literature presents several BNC algorithms (Bielza and Larra˜ naga,

59

2014; Sacha, 1999; Witten et al., 2011). Instead of choosing one of them,

60

HHEA builds a customized BNC algorithm, which has the best combination

61

of the essential modules (components) of the aforementioned algorithms for

62

the dataset at hand. It is important to emphasize that HHEA produces a

63

general BNC algorithm, even being specialized for a particular one.

64

Fraud-BNC was conceived by HHEA for a real-world credit card fraud

65

detection problem. This problem is associated to a classification dataset,

66

provided by UOL PagSeguro 1 , which is a popular online payment service in

67

Brazil. The performances of Fraud-BNC and other baselines were evaluated

68

using a classification metric (F1 ) and a measure of the company economic

69

loss, named economic efficiency. Besides, given the challenges of learning

70

from class-imbalanced data (Sundarkumar and Ravi, 2015; Haixiang et al.,

71

2016), we considered two strategies for dealing with cost-sensitive classifica-

72

tion: instance reweighing and analysis of the class probability threshold.

73

The results showed that the best algorithm built in terms of F1 is usu-

74

ally not the same that obtains the best values of economic efficiency. This

75

happens because the latter is highly influenced by the monetary value of

76

the transaction. Our analysis also showed that using Fraud-BNC with class

77

probability threshold obtains the best results. Furthermore, as Fraud-BNC

78

returns the probability of a transaction being a fraud, it can be used to1

http://pagseguro.uol.com.br

4

79

gether with its monetary value of the transaction to help in the decision

80

make process.

81

The main contributions of this paper are: (i) the generation of a cus-

82

tomized BNC algorithm for a real-world credit-card fraud detection dataset;

83

(ii) the evaluation of how the algorithm performs in terms of both classifi-

84

cation metrics and those used by finance specialists to evaluate fraud levels;

85

(iii) the complete analysis of the customized BNC algorithm in terms of

86

strategies to deal with imbalance data; (iv) the improvement of the current

87

techniques currently used by the company to quantify fraud detection in

88

PagSeguro in up to 72.64%; (v) the concept of how to use the produced

89

BNC model in the auditing system to verify the inconsistent classifications.

90

The remainder of this paper is organized as follows. Section 2 presents re-

91

lated work on fraud detection modeled as a classification problem. Section 3

92

describes HHEA, the method used to automatically generate a customized

93

BNC algorithm for the PagSeguro dataset, which is described in Section 4.

94

The produced algorithm, Fraud-BNC, is presented in Section 5, followed by

95

the definition of the metrics used to evaluated the algorithms, introduced

96

in Section 6. Finally, Section 7 presents the experimental results, while

97

conclusions and directions of future work are described in Section 8.

98

2. Related Work

99

The problem of fraud detection has been extensively studied in the lit-

100

erature. This section reviews works that follow a classification approach to

101

solve the problem. Among the methods already explored are artificial neu-

102

ral networks, decision trees, logistic regression, random forests, artificial im-

103

mune systems, support vector machines (SVM) and hybrid methods (Chan-

5

104

dola et al., 2009; Adewumi and Akinyelu, 2016; Alvarez and Petrovic, 2003;

105

Lindqvist and Jonsson, 1997; Ngai et al., 2011; West and Bhattacharya,

106

2016), among others. Note that all these techniques follow a supervised

107

learning approach, as they assume the existence of labeled data to generate

108

these models.

109

Table 1 presents a comparison between a set of previously proposed

110

methods and Fraud-BNC. Six main characteristics were analyzed: (i) if the

111

method works with real-world data (if not, the work uses artificial data), (ii)

112

whether the data reflects the real-world severe class imbalance, (iii) if feature

113

selection is performed, (iv) if cost-sensitive methods are used to address the

114

class imbalance problem or (v) if (under-)sampling techniques are used with

115

this intention, and (vi) whether a financial analysis was taken into account

116

when looking at the results. These six characteristics are referred in Table 1.

117

The table indicates if the method in the row presents the characteristic

118

defined in the column. ‘Y’ indicates that the method has that characteristic,

119

and ‘N’ the opposite.

120

Note that most works deal with real-world unbalanced data, and use at

121

least one strategy to deal with it. About half of the methods look beyond

122

the results of classification, and the majority disregards any type of feature

123

selection – although the features describing the data may differ significantly.

124

Concerning the learning techniques used by these methods, they encom-

125

pass a large variety of algorithms. Ravisankar et al. (2011), for instance,

126

employed six machine learning techniques, including SVM and logistic re-

127

gression. Guo and Li (2008) proposed to combine confidence values, artifi-

128

cial neural network algorithms and receiver operating characteristic (ROC)

129

curves for detecting credit card frauds. They performed undersampling to

130

deal with class imbalance, resulting in a distribution of 100 legitimate trans6

Table 1: Summary of the six main characteristics of related works when compared to the customized algorithm Fraud-BNC. Characteristics Methods (i)

(ii)

(iii)

(iv)

(v)

(vi)

Fraud-BNC

Y

Y

Y

Y

Y

Y

Halvaiee and Akbari (2014)

Y

Y

N

Y

N

Y

Ravisankar et al. (2011)

Y

N

Y

N

N

N

Caldeira et al. (2012)

Y

Y

Y

N

N

Y

Sahin et al. (2013)

Y

Y

N

Y

Y

Y

Guo and Li (2008)

N

Y

N

N

Y

N

Fu et al. (2016)

Y

Y

Y

N

Y

N

Duman and Ozcelik (2011)

Y

Y

N

Y

Y

Y

Gadi et al. (2008)

Y

Y

N

Y

Y

Y

Vlasselaer et al. (2015)

Y

Y

N

N

N

N

131

actions for each fraudulent. This is the only work that uses synthetic data.

132

Caldeira et al. (2012) also applied artificial neural networks and ran-

133

dom forests to identify frauds in online transactions coming from the same

134

data source we work with. Apart from other standard classification mea-

135

sures, they looked at the economic efficiency of the model, improving the

136

results of the current company policy in 43%. However, they did not account

137

for class imbalance or different classification costs for different classes. Fu

138

et al. (2016), in turn, solved the problem with convolutional neural network

139

(CNN). CNN was applied to a bank data to find a set of latent patterns

140

for each transaction and identify frauds. The issue of data imbalance was

141

tackled by a cost-based sampling method, which involved creating synthetic

142

fraudulent samples from the real frauds.

143

Duman and Ozcelik (2011), on the other hand, developed a hybrid ap-

144

proach based on genetic algorithm (GA) and the scatter search (SS), named 7

145

GASS, to take into consideration a classification cost function when deal-

146

ing with fraud detection. GASS was applied to data from a major bank

147

in Turkey, and used 20% of randomly chosen legitimate transactions for

148

training due to time complexity.

149

Looking at works focusing on interpretable models, Sahin et al. (2013)

150

is the only one i in this category, and developed a cost-sensitive decision

151

tree algorithm. The authors self-referred their work as the pioneer at taking

152

the misclassification costs into account while performing fraud classification.

153

The authors used stratified sampling - i.e., they kept the class imbalance

154

during the sampling process to help learning concepts from both legitimate

155

and fraudulent classes. Although different class distributions were tried

156

during training, the test dealt with the real distribution.

157

Another popular method used to flag fraudulent transactions is Artifi-

158

cial Immune Systems (AIS), as these methods were primarily conceived to

159

identify anomalies. Gadi et al. (2008) explored the clonal selection algo-

160

rithm for credit card fraud detection in a dataset provided by a Brazilian

161

bank. They used a random sampling technique to select 10% of the legit-

162

imate transactions (against 100% of the fraudulent) to reduce the effects

163

of class imbalance. Halvaiee and Akbari (2014) proposed a similar model,

164

called AIS-based Fraud Detection Model (AFDM), based on both the clonal

165

and negative selection algorithms. The main difference of the latter to the

166

algorithm proposed by Gadi et al. is that the authors focused essentially

167

in reducing the training time to build the model. Apart from using classi-

168

fication metrics, both works assessed a cost function used when fraudulent

169

transactions were not detected.

170

Following a different approach, Vlasselaer et al. (2015) proposed APATE

171

(Anomaly Prevention using Advanced Transaction Exploration). APATE 8

172

was designed by combining recency-frequency-monetary variables and so-

173

cial network analysis. They tested APATE on a Belgian credit card issuer

174

dataset and estimated a linear regression, an artificial neural network and

175

a random forest model. Results showed that APATE led to a very high

176

area-under the ROC curve and accuracy, especially for the random forest

177

model.

178

In this work, we deal with both the problem of class imbalance and dif-

179

ferent costs for fraudulent and legitimate transactions under the classifica-

180

tion framework. We use an undersampling technique and two classification

181

approaches to deal with class imbalance. The problem of finding the most

182

suitable algorithm for a dataset of interest is also considered. For this, we use

183

a Hyper-Heuristic Evolutionary Algorithm (HHEA) to automatically gen-

184

erate a customized a Bayesian Network Classifier (BNC) algorithm, Fraud-

185

BNC, for the given input credit card transactions data. We compared the

186

customized algorithm to seven other classification methods, showing that,

187

alongside with its good classification performance, Fraud-BNC is very effec-

188

tive in creating interpretable models.

189

3. Evolving Algorithms for Learning Bayesian Network Classifiers

190

The use of hyper-heuristics to construct algorithms customized to datasets

191

is an outgrowing research field, where methods for generating Bayesian net-

192

works algorithms for classification (S´a and Pappa, 2013; S´a and Pappa,

193

2014), decision trees inducers (Barros et al., 2014) and rule induction al-

194

gorithms (Pappa and Freitas, 2009) have been previously proposed. Ac-

195

cording to Pappa et al. (2014), these approaches are commonly based on

196

evolutionary algorithms (Eiben and Smith, 2003), in this paper referred as

9

197

Hyper-Heuristic Evolutionary Algorithms (HHEAs).

198

HHEA works by simulating Darwin’s evolutionary process and relies on

199

the ideas of natural selection, mutation and survival of the fittest. It works

200

with a population of candidate solutions (individuals) to the problem at

201

hand (in our case, BNC algorithms), and uses an iterative process (namely

202

evolution) to find approximate solutions to the problem through selection,

203

crossover and mutation operators.

204

The HHEA method used in this work builds BNC algorithms. The

205

process followed by BNC algorithms when learning a model is divided in two

206

phases (Cheng and Greiner, 1999): structure and parameter learning. In the

207

structure learning phase, the method learns the causal relationships among

208

the attributes of the input dataset, i.e., which nodes (attributes) in the graph

209

should be connected to each other. Different types of BNC algorithms were

210

already proposed for creating the network structure, including the score-

211

based, constraint-based or hybrid approaches (Daly et al., 2011).

212

The parameter learning phase, in turn, learns Conditional Probability

213

Tables (CPTs) for each node of the BNC. These tables are used to make

214

estimations about the data. However, learning the parameters of a BNC

215

is a relatively straightforward procedure when the network structure is de-

216

fined (Salama and Freitas, 2013). For this reason, HHEA focuses on the

217

structure learning phase.

218

The HHEA method is illustrated in Figure 1(a). It receives as input

219

the dataset and a set of components identified from previously proposed

220

BNC algorithms. It then combines these components, outputting a BNC

221

algorithm tailored to the domain of the input data.

222

HHEA uses a real-coded evolutionary algorithm to search and explore

223

the space of BNC algorithms. In Figure 1(b), each individual represents a 10

Input BNC algorithm components Output Tailored BNC algorithm

Input

BNC Algorithm Dataset

Classification model generated by BNC algorithm

HHEA

C

0.7

Real-coded 0.1 individual 0.6 0.5 0.7

A1

A2

A3

Mapping

-Input: Dataset -Defined components:

Search method = Hill Climbing Scoring metric = Bayesian Reversed edges = False Parents = 6 CPT α = 8.7

-Perform search -Estimate the CPTs -Return the BNC model

(a) HHEA for generating BNC algorithms.

(b) BNC individual’s representation.

Figure 1: The HHEA process and its individual representation.

224

BNC algorithm, randomly generated from a combination of the available

225

components given as input. In total, 4,960,000 components combinations

226

are possible (S´ a and Pappa, 2014), and these components include scoring

227

metrics, independence tests, maximum number of parents a node can have,

228

and search methods for building the BNC structure, plus the parameter α

229

from the parameter (CPT) estimation method.

230

After individuals are created, they undergo a fitness evaluation process.

231

During the evaluation, a mapping between the real-coded individual and

232

a BNC algorithm, created according to the components combined, is per-

233

formed (see Figure 1(b)). In Figure 1(b), each position of the real-coded

234

array represents a BNC component. For different ranges of real values and

235

different component space sizes, the position takes different components

236

during the mapping. The search method in Figure 1(b) is defined as a Hill

237

Climbing because the range of its position in the array is between 0.65 and

238

0.75. The same rule is applied to the other positions of the array (for more

239

details on the BNC components, see S´a and Pappa (2014)). After that, the

240

individuals (algorithms) built are run in a training set to induce BNC mod11

241

els, which are then evaluated using a validation set. The F1 -measure (Witten

242

et al., 2011) is the fitness function calculated from the validation set.

243

Following, the individuals undergo a tournament selection process, where

244

individuals with higher values of F1 -measure have a higher probability of be-

245

ing selected for uniform crossover and one-point mutation operations, which

246

are used to generate a new population. An elitist process also copies the

247

best individuals to the next population. After a predefined number of gen-

248

erations, the best BNC algorithm generated is returned, and its associated

249

model tested using a new set of data coming from the same domain.

250

In order to produce the customized BNC algorithm, the parameters

251

of the HHEA were set in preliminary experiments using a wider range of

252

datasets (S´ a and Pappa, 2014). The best configuration resulted in the fol-

253

lowing parameters: 35 individuals evolved for 35 generations, tournament

254

size of two individuals, elitism of one individual, and crossover and muta-

255

tion probabilities of 0.9 and 0.1, respectively. The relatively low number

256

of individuals and generations are due to the complexity of the solutions

257

generated. Recall that each individual represents a BNC algorithm, which

258

will be trained and validated in a given dataset.

259

4. PagSeguro Transactions Data

260

PagSeguro is currently one of the most popular online payment services

261

in Brazil, and is owned by Universo Online Inc. (UOL)2 . In PagSeguro, each

262

transaction is described by hundreds of different attributes including the

263

transaction status, which can be legitimate or a fraud. 2

http://www.uol.com.br/

12

264

The dataset was obtained from a month of transaction ordered according

265

to the time they occurred. In total, we have 903,801 transactions, where

266

16,639 (1.8%) are fraud and the remaining 887,162 (98,2%) were classified

267

as legitimate, i.e., non-fraud. Although the absolute number of frauds is

268

low, the economic impact they generate is huge. Initially, each transaction

269

was described by a set of 424 attributes. To identify the most relevant

270

attributes for classification, a feature selection step was performed using

271

information gain (Quinlan, 1986). These 424 attributes were reduced to

272

24, including the transaction id and the classification of the transactions as

273

fraud or legitimate.

274

Due to a confidentiality agreement with the company, we cannot detail

275

the attributes used to describe the data. To give an idea about them, Fig-

276

ure 2 shows the Pearson’s correlation matrix of the whole dataset considering

277

the 22 predictive attributes {A1, ..., A22} and the class C. The transaction

278

id was omitted. Observe in the matrix that all correlations are greater to

279

or equal to zero. Regarding direct correlations with the class attribute, A1

280

presents the highest value. Attribute pairs including A4, A5, A8, A11, A13,

281

A20 and A21 present the highest correlations. This does not necessarily

282

mean causality between the attributes, but indicates a relationship that can

283

be explored by the learning algorithm.

284

In order to tackled the problem of class imbalance, we performed a ran-

285

dom under-sampling in the training dataset. Under-sampling techniques

286

reduce the size of the dataset by increasing the ratio of positive or neg-

287

ative classes (He and Garcia, 2009). For each fraudulent transaction, six

288

non-fraudulent were kept in the dataset, as indicated in a preliminary ex-

289

periment as the best distribution. We opt for a random selection because,

290

according to the experiments performed in (Thai-Nghe et al., 2010), there 13

A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 A13 A14 A15 A16 A17 A18 A19 A20 A21 A22 C

1

A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 A13 A14 A15 A16 A17 A18 A19 A20 A21 A22 C

0.8

0.6

0.4

0.2

0

−0.2

−0.4

−0.6

−0.8

−1

Figure 2: Pearson’s correlation matrix of the dataset for the 22 attributes and the class.

291

is not a significant difference in the results produced by more sophisticated

292

(and hence more computational expensive) methods.

293

Even with this new ratio of positive and negative examples, generating

294

and evaluating different algorithms with more than five thousand trans-

295

actions during the HHEA evolution is still a very expensive task (S´a and

296

Pappa, 2014). For this reason, HHEA was trained with a smaller sample

297

of the dataset containing 4,833 transactions (0.5%, approximately), with

298

693 representing fraud and 4,140 legitimate transactions. 70% of this data

299

was used to train the BNC algorithms and build their models, and the re-

300

maining 30% used to validate the F1 fitness function in an unseen set of

301

examples. Additionally, training and validation sets were re-sampled every

302

five (5) generations in order to avoid overfitting in HHEA.

14

303

5. Fraud-BNC: a classification algorithm for fraud detection

304

This section introduces Fraud-BNC, the customized algorithm gener-

305

ated by HHEA for the PagSeguro dataset. Fraud-BNC is presented in Al-

306

gorithm 1. It receives the PagSeguro dataset as input in line 1 and then

307

defines two parameters (optimized by HHEA) in lines 2-3, i.e., the maxi-

308

mum number of parents of a node and the value of α from the conditional

309

probability tables (CPTs). It also defines an initial Bayesian network from

310

where the search for the best structure will start. This initial network has

311

as many nodes as the attributes in the dataset, and all nodes are connected

312

only to the class node. The algorithm sets this initial network as the cur-

313

rent best, and evaluates it using the Heckerman-Geiger-Chickering (HGC)

314

scoring metric with prior (lines 4-5) (Sacha, 1999). HGC determines the

315

posterior probability of the BNC given the training data, and its main idea

316

is to reward BNCs that have a good approximation of the joint probability

317

distribution of the predictive attributes and the class.

318

After these initial steps, the algorithm performs a hill climbing search

319

to build a BNC (lines 6-23). Line 7 defines the stopping criterion as false.

320

This variable will be update when no BNCs better than the current best are

321

found. The hill climbing method (Hesar et al., 2012) initializes the search by

322

trying to find the best (local) operation to perform in the current best BNC

323

(lines 10-17). These operations include adding, deleting or reversing an edge

324

in the Bayesian network. The method applies the respective operation to

325

the current best BNC only if the operation generates a valid BNC, i.e., if

326

it respects the direct acyclic graph properties and the maximum number of

327

parents a node can have. The best network generated in the previous step

328

is found using the HGC metric over the dataset in the variations created

15

329

from the best network (line 12). If any of the BNC variations has a value

330

of HGC higher than the best BNC, it becomes the new best (lines 18-22).

331

This process goes on until the score of the new candidate networks are not

332

better than the score of the current best network. Algorithm 1 Fraud-BNC algorithm. Input: dataset; parents = 5

// Maximum number of parents a node can have

α = 8.031

// α for CPT estimator

bestBNC = Graph with all attributes from dataset as nodes and edges only to the class bestScore = Evaluate bestBNC using HGC with prior do stop = false

// stopping criterion

BNC = bestBNC score = bestScore BNCVariations = Set of valid BNCs generated by adding, removing or reversing an edge for (candBNC in BNCVariations) do candScore = Evaluate candBNC using HGC with prior if (candScore > score) then score = candScore BNC = candBNC end if end for if (score > bestScore) then bestScore = score bestBNC = BNC stop = true end if while !(stop) LearnParameters(bestBNC, α) Return bestBNC

333

After the structure is defined, the parameter learning starts, and the

334

conditional probability tables (CPTs) are estimated (line 24) using the value

335

of α set in line 3 (8.031). Finally, the best BNC is returned (line 25).

336

When compared to other existing BNC algorithms in the literature, in-

337

cluding Na¨ıve Bayes (NB) (Witten et al., 2011), Tree Augmented Na¨ıve

338

Bayes (TAN) (Friedman et al., 1997) and K2 (Cooper and Herskovits, 1992),

339

the algorithm produced by HHEA presents the following differences. While

340

NB assumes that, given a class attribute, all predictive attributes are inde16

341

pendent (the maximum number of parents is equals to one), TAN builds a

342

tree to represent the relationships between predictive attributes (maximum

343

number of parents equals to two). K2, in turn, represents these relations us-

344

ing a less restrictive direct acyclic graph. In K2, it is also necessary to define

345

the maximum number of parents, but it can be considered more restrictive

346

because it sets a causal order of the variables on the BNC. Additionally, both

347

TAN and K2 do not delete edges from the structure of the BNC, an operation

348

that can be performed by Fraud-BNC algorithm. Fraud-BNC also allows the

349

class to have causal precedents in its produced model, representing a more

350

general model, called General Bayesian Network Classifier (GBN) (Cheng

351

and Greiner, 1999). This type of model cannot be generated by NB, TAN

352

and K2.

353

Fraud-BNC is investigated under two scenarios to deal with cost-sensitive

354

classification: instance reweighing and probability threshold analysis. The

355

first approach takes the misclassification costs into consideration by reweigh-

356

ing the training instances according to a set of predefined weights (Witten

357

et al., 2011). The goal of this type of algorithm is to minimize the total

358

misclassification cost, usually aiming to enhance the classification output in

359

the false negatives (Weiss, 2004). As a consequence, this method leads to

360

better responses to the class-imbalanced problem (Liu and Zhou, 2006).

361

In the second scenario, we analyzed the results of the probabilities asso-

362

ciated to the predictions of the classifiers in a validation set to modify the

363

algorithm probability threshold that separates fraudulent from legitimate

364

transactions. Fraud-BNC outputs a probability of a transaction being le-

365

gitimate or fraudulent. Usually, when the class probability of a transaction

366

is lower than 0.5, the transaction is classified as legitimate. Otherwise, it

367

is considered as a fraud. However, in real-world systems, where the cost of 17

368

missing a fraud is much higher, increasing the probability of classifying a

369

transaction as a fraud might be a good option. We make an analysis of the

370

probability values output for the transaction and change the class threshold

371

to increase the value of economic efficiency, although the threshold could

372

also be used to increase the value of F1 .

373

6. Evaluation Metrics: Classification versus Financial Returns

374

This paper deals with a real-world problem, where apart from the re-

375

sults obtained by classification algorithms, the financial costs of missing a

376

fraud are paramount. Therefore, the results obtained should not be evalu-

377

ated considering only classification metrics, but also what we call economic

378

efficiency.

379

Before defining the metrics of interest, let us illustrate all scenarios that

380

might happen during classification. In Table 2, the lines represent the true

381

(real) classes and the columns the predicted ones. True positives (TP) corre-

382

spond to transactions correctly predicted as frauds and true negatives (TN)

383

correspond to transactions correctly predicted as legitimate. False posi-

384

tives (FP) and false negatives (FN) describe, respectively, the number of

385

incorrectly classified legitimate and incorrectly classified fraudulent trans-

386

actions. In financial terms, a TP avoids the company losing money, as the

387

transaction is not authorized, while a TN represents a profit of k% of the

388

transaction value. A FN, in turn, represents a loss of (100-k)% of the trans-

389

action value, as the company gains k % but pays 100% of the cost for the

390

fraud (chargeback). Finally, a FP indicates that the company missed profit

391

by rejecting a legitimate transaction.

18

Table 2: Confusion matrix for the fraud detection problem.

aa aa Predicted aa Fraud aa Real a

392

Legitimate

Fraud

TP

FN

Legitimate

FP

TN

6.1. Classification metrics

393

The performance of the classification algorithms are measured using con-

394

ventional classification metrics, being F1 the main one (Witten et al., 2011).

395

F1 corresponds to the harmonic mean between precision and recall and is

396

defined in Equation 1. It is an appropriate measure as it accounts for differ-

397

ent levels of class imbalance, and considers both precision and recall, which

398

are defined in the Equation 2 and 3, respectively. F1 =

2 · (P recision · Recall) (P recision + Recall)

P recision =

Recall =

399

TP TP + FP

TP TP + FN

(1)

(2)

(3)

6.2. Economic Efficiency

400

The term economic efficiency (EE) accounts for the economic returns

401

a company receives when it correctly classifies legitimate transactions sub-

402

tracted by the losses accumulated for not identifying a fraud, as defined in

403

Equation 4. In Equation 5, k represents the percentage that the company

404

retains per transaction, vi the monetary value of transaction i, and x and 19

405

y the number of legitimate and fraudulent transactions, respectively. Note

406

that the losses are much more penalized than the gains.

EE =

#Transactions X

Returns

(4)

i=1

Returns =

 x P   (vi · k)    

, if i is legitimate (TN).

i=1 y P

(5)

(−vi · (1 − k)) , if i is fraudulent (FN).

i=1

407

408

409

In UOL’s case, k assumes the value 0.03, i.e., the company profits 3% for all transactions and loses 97% of the value of a missed fraud.

410

In order to understand how economically efficient a system is, we need to

411

introduce three other concepts: real economic efficiency (EEReal ), maximum

412

economic efficiency (EEM ax ) and minimum economic efficiency (EEM in ).

413

EEReal corresponds to the values of the metric for the current model used

414

by the company. EEM ax corresponds to the maximum value the company

415

can profit in an ideal scenario, where all frauds are identified and no legiti-

416

mate transactions denied (only the first case in Equation 5 holds). EEM in ,

417

in contrast, reflects the opposite scenario, where all transaction would be

418

misclassified, being all fraudulent transaction authorized and all legitimate

419

transaction denied (only the second case in Equation 5 holds).

420

As we cannot present the actual values of the transactions due to our

421

confidentiality agreement, we present values for the relative economic effi-

422

ciency regarding the current model used by PagSeguro, which is measured

423

by Equation 6.

EErelative =

(EE − EEReal ) (EEM ax − EEReal )

20

(6)

424

7. Experimental Results

425

This section shows the results obtained by Fraud-BNC in terms of F1 and

426

economic efficiency considering the two strategies previously introduced to

427

deal with cost-sensitive classification. The results obtained by Fraud-BNC

428

are compared with those obtained by three traditional BNC algorithms:

429

Na¨ıve Bayes (NB), Tree Augmented Na¨ıve Bayes (TAN) and K2. For all

430

algorithms, the values of the parameter α was set to 0.5. K2 uses the

431

Bayesian score and a maximum number of parent nodes equals to three.

432

The frameworks jBNC (Sacha, 1999) and WEKA (Witten et al., 2011) were

433

used to execute the algorithms.

434

We also compare Fraud-BNC results with those obtained by other types

435

of learning algorithms (Witten et al., 2011): Logistic Regression, Support

436

Vector Machine (SVM), Random Forests and J48 (implementation of C4.5).

437

Apart from SVM, all algorithms were also executed using WEKA. For Ran-

438

dom Forest, three configurations for the number of trees generated were

439

tested: 10, 20 and 30. The value 30, for the number of trees, presented itself

440

as the best trade-off between F1 and economic efficiency, being consequently

441

chosen for Random Forest. For SVM, the package LibSVM (Chang and Lin,

442

2011) was used together with a grid search (easy tool (Hsu et al., 2010)) to

443

optimize the values of the parameters cost (C) and γ of a radial basis func-

444

tion kernel. The values chosen for the parameters were set as 8192.0 for C

445

and 0.03125 for γ.

446

7.1. Fraud-BNC with instance reweighing

447

In this section, we present the results of both Fraud-BNC and the other

448

baselines considering different weights for the number of false negatives (fraud-

449

ulent misclassified as legitimate transactions). As previously explained, the 21

450

company loses significantly more money when missing frauds than the other

451

way round. For this reason, the results of F1 presented start with equal

452

weights for both false positive and false negative examples, which is the

453

standard classification approach. Then, the weight of false negatives is in-

454

creased from 2 to 4, making false negatives more relevant.

455

The results of F1 and economic efficiency are shown in Tables 3 and 4,

456

respectively, and were obtained with a five fold cross-validation procedure.

457

For cross-validation, the partitions were created following the same distribu-

458

tion used for training HHEA (one fraudulent for six legitimate transactions).

459

However, the test set for each partition was enhanced with the remaining

460

legitimate transactions (not used for training) to simulate the real scenario.

461

All results for F1 in Table 3 are compared against Fraud-BNC using 90%

462

confidence intervals (in Student’s t-distribution) on the difference between

463

means (Jain, 1991). The symbols N (H) indicate whether the method in the

464

line is statistically significantly better (worse) than Fraud-BNC. Lines with

465

no indication are those where the results obtained by the methods did not

466

present statistical significance. Table 3: Results of F1 obtained by the proposed algorithm and other baselines using cost-sensitive learning. Algorithms

FN cost variation (FP equals to 1) 1

2

3

4

Fraud-BNC

0.827 (0.026)

0.768 (0.033)

0.732 (0.006)

0.686 (0.021)

NB

0.573 (0.014)H

0.563 (0.016)H

0.561 (0.017)H

0.558 (0.018)H

TAN

0.724 (0.015)H

0.692 (0.017)H

0.671 (0.017)H

0.657 (0.016)H

K2

0.755 (0.010)H

0.723 (0.011)H

0.697 (0.012)H

0.680 (0.013)H

Logistic Regression

0.705 (0.011)H

0.671 (0.013)H

0.645 (0.012)H

0.623 (0.011)H

0.854 (0.008)

0.847 (0.010)N

0.844 (0.012)N

0.833 (0.016)N

Random Forest

0.784 (0.013)H

0.699 (0.016)H

0.645 (0.014)H

0.621 (0.008)H

J48

0.774 (0.013)H

0.772 (0.017)

0.772 (0.017)N

0.772 (0.017)N

SVM

22

467

Notice that the results obtained by Fraud-BNC are statistically signifi-

468

cant better than those obtained by all the other methods except SVM when

469

no classification cost is considered, where the results present no statistical

470

difference. Although both methods obtain similar results, the models gener-

471

ated by Fraud-BNC are interpretable, while the results returned by SVM are

472

not. As the weight given to the false negatives increases, the results of SVM

473

and J48 improve over the results of Fraud-BNC. This is not surprising, as

474

the method was conceived based on a problem where no weight differences

475

in terms of false positives or negatives were considered.

476

The overall best results of F1 are those where FP and FN receive the

477

same weight. Hence, either SVM or Fraud-BNC would be the recommended

478

algorithms. However, the latter is preferred as the interpretability of the

479

model is crucial in application where specialists need to understand the

480

decision making process.

481

Regarding the results of economic efficiency reported in Table 4, again

482

the symbols N (H) indicate whether the method in the line is statistically

483

significantly better (worse) than Fraud-BNC. Lines with no indication are

484

those where the results obtained did not show statistical difference.

485

Note that the results in this table are not consistent with those presented

486

in Table 3. Here TAN and K2 present statistically significant better results

487

than Fraud-BNC, and SVM has statistically significant worse results than

488

Fraud-BNC. This can be explained by the fact that the economic efficiency

489

depends highly on the value of the transaction, and hence it might be better

490

for the system to miss low value frauds but never ignore high value ones.

491

In sum, these results show that the best methods in terms of F1 differ

492

from those that provide the best EE, and finding a trade-off between the

493

values of these two metrics is crucial. However, this can only be done if the 23

Table 4: Results of economic efficiency obtained by the proposed algorithm and other baselines using cost-sensitive learning. Algorithms

FN cost variation (FP equals to 1) 1

2

3

4

Fraud-BNC

0.693 (0.005)

0.722 (0.008)

0.734 (0.010)

0.742 (0.007)

NB

0.591 (0.013)H

0.597 (0.008)H

0.605 (0.011)H

0.617 (0.018)H

TAN

0.728 (0.010)N

0.742 (0.011)N

0.750 (0.011)N

0.753 (0.011)N

K2

0.718 (0.007)N

0.737 (0.012)N

0.745 (0.012)N

0.749 (0.013)N

Logistic Regression

0.552 (0.017)H

0.592 (0.015)H

0.618 (0.015)H

0.634 (0.016)H

SVM

0.626 (0.010)H

0.638 (0.009)H

0.638 (0.015)H

0.650 (0.010)H

Random Forest

0.706 (0.012)

0.745 (0.018)N

0.747 (0.018)

0.747 (0.014)

J48

0.677 (0.026)

0.677 (0.026)H

0.677 (0.026)H

0.677 (0.026)H

494

value of the frauds missed are accounted for. The main reasons for finding

495

this trade-off are that we want BNC algorithms to: (i) be able to generalize

496

for future data and, (ii) generate profit for the company. The first is only

497

achieved by looking at the classification measures, like F1 . The second is

498

strictly associated to economic measures, like EE.

499

7.2. Fraud-BNC with probability threshold analysis

500

This section presents the results obtained when analyzing the values of

501

the probability threshold that defines a fraud. We present the results for a

502

subset of classifiers, which were the ones that obtained the best results in

503

terms of F1 or economic efficiency in the previous section. They are: SVM,

504

TAN, K2 and Fraud-BNC. Recall that although Fraud-BNC and SVM were

505

statistically better than the other two algorithms (TAN and K2) in terms

506

of F1 , their results were worse in terms of economic efficiency.

507

Figure 3 shows the results of F1 and EE in the validation and test sets

508

for the four aforementioned algorithms. We can observe a great variation

509

of the algorithms curves in both metrics and in both sets. This happens

510

mainly due to the variation of the class probability threshold (axis x) to 24

511

predict fraudulent transactions. Usually, this threshold is equal to 0.5 for

512

solving standard classification problem. Nevertheless, we want here to check

513

whether the change of this threshold could result in more profit/smaller

514

losses for PagSeguro.

515

Complementary, these four learning algorithms may behave differently

516

when performing classification because they take different assumptions to

517

create the classification model. For instance, Fraud-BNC may consider the

518

class node having causal predecessors, something that TAN and K2 would

519

not. In addition, the idea of finding the most suitable hyperplane (i.e., the

520

one that maximizes the margin) to separate the examples into classes in

521

the SVM algorithm is totally dissimilar to the idea of finding the best joint

522

probability distribution that fits the variables in the case of the Bayesian

523

algorithms. Therefore, these assumptions could also result in different values

524

of the considered metrics for distinct threshold levels.

525

One interesting thing to notice in the graphs is that the curves of TAN

526

and K2, in both validation and test sets, present more constant values than

527

the ones obtained by SVM and Fraud-BNC. This might happen because

528

the values of probability these algorithms return to fraud examples are, in

529

most cases, very low. Note that, for economic efficiency, Fraud-BNC is bet-

530

ter than all other methods when the threshold considered in the validation

531

set is smaller than 0.3. In the test set, these values present no difference

532

from those obtained by TAN or K2. When the threshold value is equal to

533

0.3, Fraud-BNC achieves 72.64% of relative economic efficiency, which rep-

534

resents the highest economic performance regarding the current company’s

535

scenario. However, notice that the values of F1 for threshold 0.3 with Fraud-

536

BNC are always superior to those presented by TAN and K2, being only

537

worse than those obtained by SVM. Considering that Fraud-BNC produces 25

0.90















0.90











0.80





0.80







































0.70

0.70





F1−Measure

0.50 0.40

0.50





0.40 0.30

1.00















● ●

0.60 ●

0.50

0.95

0.90

0.80

0.75

0.70

0.65

0.60

● ●

0.70 ●



0.40







● ● ● ●





0.60

● ● ●



0.50



0.30

Fraud−BNC SVM TAN K2







0.70

0.55

0.80

Economic Efficiency

Economic Efficiency

(b) F1 in the test set.

Fraud−BNC SVM TAN K2

● ●

0.90



0.50

Threshold

(a) F1 in the validation set.

0.80

0.45

0.40

0.35

0.30

0.25

0.20

0.15

0.05

0.10

Threshold



Fraud−BNC SVM TAN K2



0.20

0.95

0.90

0.80

0.75

0.70

0.65

0.60

0.55

0.50

0.45

0.40

0.35

0.30

0.25

0.20

0.15

0.10

0.05

0.10

0.85

Fraud−BNC SVM TAN K2



0.20

0.85

0.30

0.60

0.10

F1−Measure



0.60

● ● ●





Threshold

0.95

0.90

0.85

0.80

0.75

0.70

0.65

0.60

0.55

0.50

0.45

0.40

0.35

0.30

0.25

0.20

0.15

0.05

0.40

0.95

0.90

0.85

0.80

0.75

0.70

0.65

0.60

0.55

0.50

0.45

0.40

0.35

0.30

0.25

0.20

0.15

0.10

0.05

0.10

0.10

0.20

Threshold

(c) Econ. Efficiency in the validation set.

(d) Econ. Efficiency in the test set.

Figure 3: Results for F1 and Economic Efficiency considering different class probability thresholds for fraudulent transactions.

26

538

interpretable models, Fraud-BNC with a modified threshold of 0.3 can be

539

considered the best choice among the tested algorithms.

540

Comparing these results with those obtained when reweighing instances

541

for Fraud-BNC, they present no statistical difference in terms of F1 and

542

better results of economic efficiency. Hence, we recommend the user to use

543

the algorithm and perform a threshold analysis in the validation set or use

544

values of the class threshold equals to or lower than 0.3.

545

Figure 4 shows a graph of the values of the transactions (omitted due

546

to Non-Disclosure Agreement - NDA) and the probability that Fraud-NBC

547

returns for that transaction, ordered by the probability of fraud. If we use

548

a threshold of 0.3, anything above this value is considered a fraud, which

549

corresponds to identifying 71.87% of the fraudulent transactions correctly.

550

If the default values of classification threshold (0.5) were used, then 64.97%

551

of the fraudulent transactions would be correctly classified. This difference

552

corresponds to missing approximately 1,000 fraudulent transactions, which

553

brings a high negative economic effect for the company.

554

On the other hand, for threshold 0.3, we have 98.31% of non fraudulent

555

transactions correctly classified, against 99.07% when the threshold is set

556

to 0.5. As we have many more legitimate transactions, this difference cor-

557

responds to around 7,000 transactions, but that, in the current model used

558

by the company, does not have any economic effect. Note that the values of

559

the transactions do not follow a pattern, but, in general, fraudulent transac-

560

tions do not have very high values. In the cases they do, Fraud-BNC mostly

561

classifies them with very high probabilities of being a fraud.

27

Figure 4: Classification probabilities returned by Fraud-BNC versus the transaction value.

562

7.3. When looking at the results and probabilities is not enough

563

PagSeguro has an auditing system to verify the inconsistent classifica-

564

tions and, consequently, improve the company’s results. By showing to the

565

decision maker the graph produced by Fraud-BNC together with Figure 4,

566

she or he can focus on analyzing the transactions with high monetary value

567

and probability of being a fraud close to the threshold being considered and

568

understand how these transactions where identified.

569

Figure 5 presents the base of the auditing system, i.e., the directed acyclic

570

graph (DAG) representing a Bayesian Network Classifier (BNC) produced

571

by Fraud-BNC. It consists of a set of 22 nodes ({A1, ..., A22}) representing

572

the predictive attributes of the dataset and a class C (fraud or not fraud).

573

The edges define the causal-effect relationships among the attributes, con-

574

sidering the class variable. We will not show the conditional probability

28

575

tables (CPTs) of each node in the graph for the sake of simplicity, as the

576

resultant BNC is quite sophisticated. Note that some of the relationships

577

previously shown in the correlation matrix of Figure 2 appear in the graph.

578

For instance, the high correlation between attribute A1 and class C, and

579

the influences of A4 on many attributes, including A5 (not directly, but via

580

A8 or A21), A8, A11, A13 (if A13 influences A4, A4 also influences A13),

581

A17, A20 and A21. A1

C

A16

A2

A15

A9

A10

A5

A18

A3

A14

A17 A12

A6

A22 A8

A4 A20

A19

A21

A11

A13

A7

Figure 5: The BNC model generated by Fraud-BNC.

582

Consider that our intention is to classify a new transaction as a fraud or

583

legitimate. We first apply the model in Figure 5 to the attributes of the new

584

transaction. Let us assume the model produced two probabilities: 0.305 for

585

class fraud and 0.695 for class legitimate. If our threshold is 0.3 for classi-

586

fying an example as a fraud (and 0.305 is very close to the threshold), the

587

specialist should analyze the DAG to have a more appropriate classification.

588

A straightforward approach to do this is to look the causal-effect relation-

589

ships (edges) that affect the class node in the first order (direct links). In 29

590

the case of Figure 5, a practitioner should check the values and CPTs of at-

591

tributes A1, A2, A3, A14 and A16 before making a final decision of whether

592

the transaction is really a fraud using his expert knowledge. Higher order

593

relationships between the class and the other attributes can also be explored

594

to create a robust auditing system, even being more complex to analyze.

595

8. Conclusions and Future Work

596

This work presented Fraud-BNC, a customized Bayesian Network Classi-

597

fier (BNC) algorithm to solve a real-world credit card fraud-detection prob-

598

lem. The Fraud-BNC algorithm was automatically generated by a Hyper-

599

Heuristic Evolutionary Algorithm (HHEA), which creates customized so-

600

lutions for classification datasets. Fraud-BNC was evaluated on a dataset

601

from PagSeguro. Nevertheless, this algorithm is general enough to solve

602

other classification problems from the literature.

603

We tested different approaches to deal with two problems inherent to

604

fraud transaction data: class imbalance and the fact that miss-classified

605

frauds have a different cost than miss-classifying legitimate transactions.

606

The produced algorithm was analyzed considering two strategies to solve

607

these problems: instance reweighing and class probability threshold analysis.

608

The results obtained by Fraud-BNC were compared to methods within

609

the Bayesian framework and to other state-of-the-art classification methods.

610

Two different types of metrics were considered: a classification measure (F1 )

611

and a metric that assesses the economic impact of the model to the com-

612

pany (economic efficiency). Results show that the best algorithms in terms

613

of F1 usually is not the same that obtains the best values of economic ef-

614

ficiency. This happens because the latter is highly impacted by the value

30

615

of the transaction. Based on our experiments, we believe is most beneficial

616

for the decision maker to use Fraud-BNC following a probability threshold

617

approach. The results of the method can be used together with the values

618

of the transaction to help in the decision making process.

619

One thing worth investigating in the future is whether the strategies re-

620

lated to cost-sensitive classification could be added to the components given

621

to the hyper-heuristic. In this way, HHEA would be able to test in which

622

scenarios different approaches for dealing with cost-sensitive classification

623

are more beneficial. Additionally, a multi-objective optimization framework

624

could be implemented, making the hyper-heuristic to optimize both accu-

625

racy and economic efficiency simultaneously.

626

Acknowledgments

627

This work was partially supported by the following Brazilian Research

628

Support Agencies: CNPq (481204/2013-0, 573871/2008-6, 459301/2014-4),

629

CAPES and FAPEMIG (PPM-00650-15, APQ-01400-14).

630

References

631

Adewumi, A. O., Akinyelu, A. A., 2016. A survey of machine-learning and

632

nature-inspired based credit card fraud detection techniques. International

633

Journal of System Assurance Engineering and Management, 1–17.

634

635

Alvarez, G., Petrovic, S., 2003. A new taxonomy of web attacks suitable for efficient encoding. Computers & Security 22 (5), 435–449.

636

Barros, R. C., Basgalupp, M. P., Freitas, A. A., de Carvalho, A. C. P.

637

L. F., Dec 2014. Evolutionary design of decision-tree algorithms tailored to

31

638

microarray gene expression data sets. IEEE Transactions on Evolutionary

639

Computation 18 (6), 873–892.

640

641

642

643

Bielza, C., Larra˜ naga, P., 2014. Discrete Bayesian network classifiers: A survey. ACM Computing Surveys 47 (1), 5:1–5:43. Brazdil, P., Giraud-Carrier, C., Soares, C., Vilalta, R., 2008. Metalearning: applications to data mining. Springer.

644

Caldeira, E., Brandao, G., Campos, H., Pereira, A., 2012. Characterizing

645

and evaluating fraud in electronic transactions. In: Proceedings of the

646

Latin American Web Congress. pp. 115–122.

647

648

Chandola, V., Banerjee, A., Kumar, V., Jul. 2009. Anomaly detection: A survey. ACM Computing Surveys 41 (3), 15:1–15:58.

649

Chang, C.-C., Lin, C.-J., 2011. LIBSVM: A library for support vector ma-

650

chines. ACM Transactions on Intelligent Systems and Technology 2, 27:1–

651

27:27, available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.

652

Cheng, J., Greiner, R., 1999. Comparing Bayesian network classifiers. In:

653

Proceedings of the Conference on Uncertainty in Artificial Intelligence.

654

Morgan Kaufmann Publishers Inc., pp. 101–108.

655

656

657

658

659

660

Cooper, G. F., Herskovits, E., 1992. A Bayesian method for the induction of probabilistic networks from data. Machine Learning 9, 309–347. CyberSource, 2016. Online fraud report (Latin America edition). Tech. rep., CyberSource Corporation, a Visa Company. Daly, R., Shen, Q., Aitken, S., 2011. Learning Bayesian networks: approaches and issues. The Knowledge Engineering Review 26, 99–157. 32

661

Duman, E., Ozcelik, M. H., 2011. Detecting credit card fraud by genetic

662

algorithm and scatter search. Expert Systems with Applications 38 (10),

663

13057–13063.

664

665

666

667

668

669

Eiben, A. E., Smith, J. E., 2003. Introduction to Evolutionary Computing. Springer. Freitas, A. A., 2014. Comprehensible classification models: A position paper. ACM SIGKDD Explorations Newsletter 15 (1), 1–10. Friedman, N., Geiger, D., Goldszmidt, M., 1997. Bayesian network classifiers. Machine Learning 29, 13–163.

670

Fu, K., Cheng, D., Tu, Y., Zhang, L., 2016. Credit card fraud detection

671

using convolutional neural networks. In: Proceedings of the International

672

Conference on Neural Information Processing. Springer, pp. 483–490.

673

Gadi, M. F., Wang, X., Lago, A. P., 2008. Credit card fraud detection with

674

artificial immune system. In: Proceedings of the International Conference

675

on Artificial Immune Systems. Springer, pp. 119–131.

676

Guo, T., Li, G. Y., July 2008. Neural data mining for credit card fraud

677

detection. In: Proceedings of the International Conference on Machine

678

Learning and Cybernetics. Vol. 7. pp. 3630–3634.

679

Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., Bing, G.,

680

2016. Learning from class-imbalanced data: Review of methods and ap-

681

plications. Expert Systems with Applications 73, 220–239.

682

Halvaiee, N. S., Akbari, M. K., 2014. A novel model for credit card fraud

683

detection using artificial immune systems. Applied Soft Computing 24,

684

40–49. 33

685

686

He, H., Garcia, E. A., 2009. Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering 21 (9), 1263–1284.

687

Hens, A. B., Tiwari, M. K., Jun. 2012. Computational time reduction for

688

credit scoring: An integrated approach based on support vector machine

689

and stratified sampling method. Expert Systems with Applications 39 (8),

690

6774–6781.

691

Hesar, A. S., Tabatabaee, H., Jalali, M., 2012. Structure learning of Bayesian

692

networks using heuristic methods. In: Proceedings of International Con-

693

ference on Information and Knowledge Management.

694

695

Hsu, C.-W., Chang, C.-C., Lin, C.-J., 2010. A practical guide to support vector classification. Tech. rep., National Taiwan University.

696

Jain, R., 1991. Art of Computer Systems Performance Analysis Techniques

697

For Experimental Design Measurements Simulation And Modeling. John

698

Wiley & Sons.

699

Kotsiantis, S. B., 2007. Supervised machine learning: A review of classifica-

700

tion techniques. In: Proceedings of the Conference on Emerging Artificial

701

Intelligence Applications in Computer Engineering. IOS Press, pp. 3–24.

702

Lindqvist, U., Jonsson, E., 1997. How to systematically classify computer

703

security intrusions. In: Proceedings of the IEEE Symposium on Security

704

and Privacy. pp. 154–163.

705

Liu, X. Y., Zhou, Z. H., Dec 2006. The influence of class imbalance on cost-

706

sensitive learning: An empirical study. In: Proceeding of the International

707

Conference on Data Mining. pp. 970–974.

34

708

Ngai, E. W. T., Hu, Y., Wong, Y. H., Chen, Y., Sun, X., Feb. 2011. The ap-

709

plication of data mining techniques in financial fraud detection: A classifi-

710

cation framework and an academic review of literature. Decision Support

711

Systems 50 (3), 559–569.

712

Pappa, G., Ochoa, G., Hyde, M., Freitas, A., Woodward, J., Swan, J.,

713

2014. Contrasting meta-learning and hyper-heuristic research: the role of

714

evolutionary algorithms. Genetic Programming and Evolvable Machines

715

15 (1), 3–35.

716

717

718

719

Pappa, G. L., Freitas, A. A., 2009. Automating the design of data mining algorithms: an evolutionary computation approach. Springer. Quinlan, J. R., Mar. 1986. Induction of decision trees. Machine Learning 1 (1), 81–106.

720

Ravisankar, P., Ravi, V., Raghava Rao, G., Bose, I., 2011. Detection of finan-

721

cial statement fraud and feature selection using data mining techniques.

722

Decision Support Systems 50 (2), 491–500.

723

S´ a, A. G. C., Pappa, G. L., 2013. Towards a method for automatically evolv-

724

ing Bayesian network classifiers. In: Proceedings of Annual Conference

725

Companion on Genetic and Evolutionary Computation. pp. 1505–1512.

726

S´ a, A. G. C., Pappa, G. L., 2014. A hyper-heuristic evolutionary algo-

727

rithm for learning Bayesian network classifiers. In: Proceedings of Ibero-

728

American Conference on Artificial Intelligence. pp. 430–442.

729

730

Sacha, J. P., 1999. New synthesis of Bayesian network classifiers and cardiac spect image interpretation. Ph.D. thesis, The University of Toledo.

35

731

Sahin, Y., Bulkan, S., Duman, E., 2013. A cost-sensitive decision tree ap-

732

proach for fraud detection. Expert Systems with Applications 40 (15),

733

5916–5923.

734

Salama, K. M., Freitas, A. A., 2013. Extending the ABC-Miner Bayesian

735

classification algorithm. In: Proceedings of the Workshop on Nature In-

736

spired Cooperative Strategies for Optimization. pp. 1–12.

737

Sundarkumar, G. G., Ravi, V., 2015. A novel hybrid undersampling method

738

for mining unbalanced datasets in banking and insurance. Engineering

739

Applications of Artificial Intelligence 37 (Supplement C), 368–377.

740

Thai-Nghe, N., Gantner, Z., Schmidt-Thieme, L., July 2010. Cost-sensitive

741

learning methods for imbalanced data. In: Proceedings of the Interna-

742

tional Joint Conference on Neural Networks. pp. 1–8.

743

Vlasselaer, V. V., Bravo, C., Caelen, O., Eliassi-Rad, T., Akoglu, L., Snoeck,

744

M., Baesens, B., 2015. APATE: A novel approach for automated credit

745

card transaction fraud detection using network-based extensions. Decision

746

Support Systems 75, 38–48.

747

748

749

750

751

752

Weiss, G. M., Jun. 2004. Mining with rarity: A unifying framework. ACM SIGKDD Explorations Newsletter 6 (1), 7–19. West, J., Bhattacharya, M., 2016. Intelligent financial fraud detection: A comprehensive review. Computers & Security 57, 47–66. Witten, I. H., Frank, E., Hall, M. A., 2011. Data mining: practical machine learning tools and techniques. Morgan Kaufmann Publishers Inc.

753

Wolpert, D. H., Macready, W. G., 1997. No free lunch theorems for opti-

754

mization. IEEE Transactions on Evolutionary Computation 1 (1), 67–82. 36

Suggest Documents