An Expert Spam Detection System Based on Extreme Learning Machine

Comput. Sci. Appl. Volume 1, Number 2, 2014, pp. 132-137 Received: July 8, 2014; Published: August 25, 2014

Computer Science and Applications www.ethanpublishing.com

An Expert Spam Detection System Based on Extreme Learning Machine Yılmaz Kaya1, Ömer Faruk Ertuğrul2 and Ramazan Tekin3 1. Department of Computer Engineering, Siirt University, Siirt 56100, Turkey 2. Department of Electrical and Electronics Engineering, Batman University, Batman 72060, Turkey 3. Department of Computer Engineering, Batman University, Batman 72060, Turkey Corresponding author: Yılmaz Kaya (yilmazkaya1977@gmail.com) Abstract: Recent developments in electronic communication, which is one of the crucial communication tools owing to wide spreading internet technologies, heightened the requirements for a solution of the problem of circulation of unsolicited bulk messages on the internet, which is referred to as Spam. In this study, ELM (extreme learning machine), which is a training method for single hidden layer feed-forward artificial neural network, was employed as a filter to spam messages. The experimental results showed that, the proposed method achieved higher performance in terms of detection speed and classification accuracy, 91.655%, than other machine learning methods such as SVM (support vector machines), NB (naive Bayes), MLP (multi layer perceptron). Key words: Extreme learning machine, spam detection, machine learning, text categorization.

1. Introduction The advancing internet technologies increase the importance of e-mail communication. Nowadays, e-mail, which has begun to be used by millions and has still been spreading, has become the core of actions such as trade, spam and virus attacks. Thiago and Walmir reported that, spam, which make up a major part of e-mail traffic and are a component of our daily lives, have become an important trouble for both users and internet traffic [1]. It causes a waste of time for personal users in deciding whether incoming e-mails are spam or normal and also a serious bandwidth problem for communication companies. To solve this problem technical and legal measures are used, on the other hand, spams are used for money earning, weight losing, business incubation, finding friends. Informatics specialists employed various techniques in detecting spam or in filtering them, while some methods to check the address of the incoming e-mail from a blacklist, some others to make e-mail context

investigation against specific key words. Although, keyword based methods are reported to have satisfactory performance, Wu presented that, they are not deemed as sufficient [2]. As another solution, the machine learning methods that showed a satisfactory success for filtering spam, have been used widespread in recent years, such as decision tree [3], rough set [4], SVM (support vector machine) [5, 6], artificial immune system [7, 8], artificial neural network [2, 9, 10], and NB (naive Bayes) [11]. The main reason of the high accuracy of machine learning based methods depends on the fact that, these methods do not need any hypothesis, they learn from the training dataset and simply classifies the texts because they are realized according to the content of e-mails [12]. The aim of this study was to evaluate and validate the applicability of ELM (extreme learning machine) in spam detection, since the literature survey show the fact that the accuracy of spam detection is correlated with the employed machine learning method and it was

An Expert Spam Detection System Based on Extreme Learning Machine

reported by Huang et al. that ELM has high generalization capacity with a fast training stage [13]. In ELM, input layer weights and hidden-layer biases are determined randomly; the hidden-layer outputs are calculated analytically. The dataset taken from UCI machine-learning repository which composes of 4601 e-mails is used for validation [14, 15]. The results of employing ELM and other machine learning methods, i.e., the ANN (artificial neural network), NB, SVM, and decision trees methods were showed that ELM got better results in terms of speed and classification performance. The rest of the paper was organized as follows: The material and extreme learning method were explained in the next section; additionally the procedure of employing the proposed method was described briefly; results and discussions are provided in Section 3; while Section 4 concludes the paper.

133

Table 1 The distribution of the words and characters in the data set. 1-make

15-addresess

29-lab

43-original

2-address

16-free

30-labs

44-project

3-all

17-business

31-telnet

45-re

4-3d

18-email

32-857

46-edu

5-our

19-you

33-data

47-table

6-over

20-credit

34-415

48-conference

7-remove

21-your

35-85

49- ;

8-internet

22-font

36-tecnology

50- (

9-order

23-0

37-1999

51- [

10-mail

24-money

38-parts

52- !

11-receive

25-hp

39-pm

53- $ 54- #

12-will

26-hpl

40-direct

13-people

27-george

41-cs

14-report

28-650

42-meeting

2. Material and Method 2.1 Dataset The dataset is consisted 4,601 e-mails that 1,813 of them are spam, and the rest of them are not, obtained from the lab of Hewlett-Packard and taken from UCI machine-learning repository [14, 15]. Fifty-seven features were acquired from the e-mails. The first 48 features show the frequencies of the words gotten from the e-mails. 6 features from 49 to 54 show the frequencies of the characters such as, ‘(’, ‘[’, ‘!’, ‘\$’ and ‘\#’, which is listed in Table 1. The features from 55 to 57 show total and average letter number written in capital letters, and the letter number of the longest vocabulary. A spam example is presented in Fig. 1. 2.2 Extreme Learning Machine In this section, ELM, which is a machine learning method developed by Huang et al., will be described in detail [13]. ELM is a training method for single hidden layer feed-forward artificial neural network (SLFN), which the input weights randomly and output weights are analytically obtained. ELM use activation functions

Fig. 1

A spam example.

such as sigmoid, sine, Gaussian and hard-limiting in the hidden-layer, and linear functions in the output layer. Additionally, ELM can use non-differentiable or discontinuous activation functions in hidden layer [16]. All parameters of the conventional feed forward neural networks need to be tuned commonly by the gradient based learning algorithms. However, with the goal of better learning performance, the procedure of training takes so much time that the learning speed is extremely slow and also it can easily fall into local optima. In spite of the fact that adding a momentum

134


parameter to the weight adjustment can lower the risk of the network being trapped in local optima, but the time spent on the training process is not decreased [13, 16]. The research of Huang et al. indicates that input-output weights and bias values in a single hidden layer feedforward neural network (SLFN) do have an impact on the performance of the network [13]. The ELM selects the input weights and hidden layer biases randomly, and the output weights determined analytically. Therefore, the ELM has much faster learning speed and well performance in some real world applications than conventional learning algorithms [13]. The SLFN structure is illustrated in Fig. 2. According to Fig. 2, on determining that X = (X1, X2, X3, …, XN) is input and Y = (Y1, Y2, Y3, …, YP) is output, the mathematical model with M hidden neurons can be defined as [17]: ( ∑ + ) = , = 1, 2, … , (1) where Wi = (Wi1, Wi2, …, Win), and βi = (βi1, βi2, …, βim), are the input and output weights; bi is the biases of the hidden neuron and Ok is the output of the network. g(.) denotes the activation function [18]. In a network of the N training samples, the aim is with zero error: ∑ ( − ) = 0 or with minimum error: ‖∑ ( − ) ‖. Therefore, Eq. (1) can be shown as below [13]. ( ∑ + ) = , = 1,2, … , (2) Because in the equation above g(WiXk + bi) denotes output matrix in the hidden layer, Eq. (2) can be placed

as [13] =

(3)

where ( (

(

,…… + ) ⋮

=

+

; ⋯ ⋱ ) ⋯

. .

,….. ( ( =

; , … … + ) ⋮ + ) . .

)= (4)

(5)

H denotes the hidden layer output matrix [17, 19]. In ELM, the training the SLFN is same as seeking the least-squares solution of the linear system = like linear regression [16]: ( ,…… = ; , … . . ) − min ‖ ( , … … (6) ; , … . . ) − ‖ β where = is the smallest norm least-squares of denotes the Hβ = Y. In addition here, Moore-Penrose generalized inverse of hidden layer output matrix H. The norm of is the smallest solution among all the least-squares solutions of Hβ = Y equation [13]. So the ELM can achieve the smallest training error. And also can get better classification performance than the conventional gradient-based learning algorithms [16]. ELM algorithm can be summarized in 3 steps as below [13, 16]. (1) Phase: Wi = (Wi1, Wi2, …, Win) input weights and hidden layer bi bias values are produced randomly; (2) Phase: H hidden layer output is calculated; (3) Phase: output weights are calculated in . Y is the target feature. accordance with = 2.3 Spam Detection Method The procedure of the proposed spam detection method is shown in Fig. 3.

Fig. 2 The structure of a single hidden layer feed-forward artificial neural network.

Fig. 3

Diagram belong to classification process.


The proposed method sorted in 4 blocks, as seen in Fig. 3. Dataset was collected in block 1. The features were extracted from dataset in the block 2. The features were classified by ELM in the block 3. The last block includes the results of the classification.

also the durations of classifications are demonstrated in Figs. 6 and 7. Classification accuracies obtained with respect to neuron number in hidden layer of 50-50% training test dataset ratios are presented in Figs. 8 and 9. 1

3. Results and Discussion Training Efficiency

0.95

92.32

90.10

91.210

40-60%

92.01

90.50

91.255

50-50%

93.05

90.26

91.655

60-40%

92.54

91.20

91.870

70-30%

92.11

91.23

91.670

80-20%

91.99

91.41

91.700

Table 3

Training time

Test time

30-70%

0.0624

0.0156

40-60%

0.0780

0.0312

50-50%

0.0936

0.0156

60-40%

0.1092

0.0156

70-30%

0.1092

0.0156

80-20%

0.156

0.0001

20

30

40

50 60 Hidden Neuron

70

80

90

100

Training set accuracy rates.

0.9 0.85 0.8 0.75 0.7 0.65 0.6 0.55

0

10

Fig. 5

20

30

40

50 60 Hidden Neuron

70

80

90

100

Test set accuracy rates.

0.1 0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0

0

10

20

30

40

50 60 Hidden Neuron

70

80

90

100

80

90

100

Training duration of training test.

Fig. 6 0.05 0.045

Duration for different train/test data ratios (s).

Train/test

10

0.95

Training Time (second)

30-70%

0.7

Fig. 4

0.04 Testing Time (second)

datasets

0.8 0.75

0

Table 2 Classification accuracies for different train/test data ratios. Training dataset Test dataset Mean accuracy

0.9 0.85

0.65

Testting Efficiency

Classification accuracies and process durations obtained through ELM, which consist 100 neurons with sigmoid activation function in the hidden layer, for data sets of various trainings and test rates are shown in Tables 2 and 3, respectively. It is apparent in Table 2 that, high classification rates were obtained with ELM for all training-test partitions. In each of six different sets, the highest classification accuracy is at 60-40% partition belongs to training-test with 91.87%. The lowest classification accuracy for 30-70% partition is 91.21%. Additionally, it can be seen in Table 3 that classification process was completed in a very short time. Training and test accuracy rates in terms of neuron number in hidden layer for 50-50 % are presented in Figs. 4 and 5 and

Train/test

135

0.035 0.03 0.025 0.02 0.015 0.01 0.005 0

Fig. 7

0

10

20

30

40 50 60 Hidden Neuron

70

Classification durations of test data set.


136

Table 4 Classification accuracy rates in accordance with various activation functions. Training

Test

Training

Test

accuracy

accuracy

duration

duration

(%)

(%)

(s)

(s)

Sigmoid

91.69

90.38

0.0905

0.025

Sine

69.78

64.76

0.6978

0.0140

Hardlim

88.63

87.69

0.1076

0.0296

68.61

65.26

0.0998

0.0125

71.68

68.8

0.1186

0.0265

Activation function

Triangular Fig. 8 Training accuracy variation with respect to hidden neurons.

Basis Radial Basis

Table 5 Classification accuracy rates acquired by various machine learning methods.

0.95 0.9

Training Efficiency

0.85 0.8

Model

Accuracy (%)

Classification time (s)

MLP

89.96

190.14

SVM

80.65

6.07

NB

78.26

0.28

J48

90.82

2.46

PART

91.46

4.95

ELM

91.66

0.05

0.75 0.7 0.65 0.6 0.55 0 5 10 Run 15 20

0

2

4

6

8

10

12

14

16

18

20

Hidden Neuron x5

Fig. 9 Testing accuracy variation with respect to hidden neurons.

As can be seen in Figs. 8 and 9, the accuracy rates increase with respect to neuron number in the hidden layer. The highest accuracy rate is obtained when the neuron number is between 90 and 100. The accuracy results and durations of process obtained through various activation functions in the hidden layer for 50-50 % training-test data set are sorted in Table 4. The best classification results were obtained by sigmoid activation functions (1 ⁄ (1 + ^(− ) )) that can be seen in Table 4. Additionally, both classification accuracy and speed of ELM were compared with other methods: MLP (multilayer perceptron) [20-22], SVM [6, 20], NB [22, 23], decision rules (J48) and decision trees (PART) [6, 20, 21] for 50-50% training-test dataset. Obtained classification accuracies by various machine learning methods are presented in Table 5. As seen in Table 5, the highest performance in terms of both classification accuracy and classification time was obtained through ELM. These results are well

suited with literature findings as Huang et al. demonstrated that ELM has high generalization capacity with extremely fast training stage [13]. According to the previous studies, with 80-20% training-test data rate, classification accuracy is 90.88% by employing artificial neuron network [22], 75.22% with Bayesian method [22], and 91.23% with rough set method [4]. It has been determined that ELM is the best method for spam detection compared to other machine learning methods.

4. Conclusions One of the most used services of the internet is electronic communication. Yet, thanks to improving and spreading of it, electronic communication has brought some problems with it. One of the most important problems of electronic communication is the circulation of spam, which is unsolicited bulk message. In this research, spam were automatically detected by using ELM method. Results obtained through ELM


had better results compared with other studies done earlier in the same dataset. Spam filtering is realized largely through contextual investigation. So, a method, which will do the filtering, fast and accurately, is needed. As a result, ELM shows better performance in terms of both classification accuracy and speed compared to other methods and literature results.

References [1] T.S. Guzella, W.M. Caminhas, A review of machine learning approaches to spam filtering, Expert Systems with Applications 36 (2009) 10206-10222. [2] C.-H. Wu, Behavior-based spam detection using a hybrid method of rule-based techniques and neural networks, Expert Systems with Applications 36 (2009) 4321-4330. [3] E. Crawford, J.Kay, E. McCreath, Automatic induction of rules for e-mail classification, in: Proceedings of the 6th Australasian Document Computing Symposium, Coffs Harbour, Australia, 2001, pp. 13-20. [4] Y. Kaya, A. Yeşilova, R. Tekin, Filtering unwanted electronic messages (spam) by using rough set approach, in: Electronic-Electric and Computer Conference, Elazig, Turkey, October 5-7, 2011. [5] H.B. Wang, Y. Yu, Z. Liu, SVM classifier incorporating feature selection using GA for spam detection, in: Proceedings of the 2005 International Conference on Embedded and Ubiquitous Computing, 2005, pp. 1147-1154. [6] H. Drucker, S. Wu, V.N. Vapnik, Support vector machines for spam categorization, IEEE Transactions on Neural Networks 10 (5) (1999) 1048-1054. [7] F. Wang, Z. You, L. Man, Immune-based peer-to-peer model for antispam, in: Proceeding of the International Conference on Intelligent Computing, 2006, pp. 660-671. [8] X. Yue, A. Abraham, Z.X. Chi, Y.Y. Hao, H. Mo, Artificial immune system inspired behavior-based anti-spam filter, Soft Computing 11 (2007) 729-740. [9] C.-H. Wu, C.-H. Tsai, Robust classification for spam filtering by back-propagation neural networks using behavior-based features, Applied Intelligence 31 (2008) 107-121. [10] E.-S.M. El-Alfy, R.E. Abdel-Aal, Using GMDH-based networks for improved spam detection and email feature analysis, Applied Soft Computing 11 (2011) 477-488.

137

[11] M.N. Marsono, M.W. El-Kharashi, F. Gebali, Targeting spam control on middleboxes: Spam detection based on layer-3 e-mail content classification, Computer Networks 53 (2009) 835-848. [12] C.-C. Lai, An empirical study of three machine learning methods for spam filtering, Knowledge-Based Systems 20 (2007) 249-254. [13] G.B. Huang, Q.Y. Zhu, C.K. Siew, Extreme learning machine: Theory and applications, Neurocomputing 70 (1-3) (2006) 489-501. [14] A. Frank, A. Asuncion, UCI Machine Learning Repository, School of Information and Computer Science, University of California, Irvine, CA, http://archive.ics.uci.edu/ml. [15] M. Hopkins, E. Reeber, G. Forman, J. Suermondt, Spam email database from UCI machine learning repository, 2005, http://www.ics.uci.edu/~mlearn/MLRepository.html (accessed Dec.22, 2011). [16] Q. Yuan, W. Zhou, S. Li, D. Cai, Epileptic EEG classification based on Extreme learning machine and nonlinear features, Epilepsy Research 96 (2011) 29-38. [17] S. Suresh, S. Saraswathi, N. Sundararajan, Performance enhancement of extreme learning machine for multi-category sparse data classification problems, Engineering Applications of Artificial Intelligence 23 (2010) 1149-1157. [18] H.-J. Rong, Y.-S. Ong, A.-H. Tan, Z. Zhu, A fast pruned-extreme learning machine for classification problem, Neurocomputing 72 (2008) 359-366. [19] X.-G. Zhao, G. Wang, X. Bi, P. Gong, Y. Zhao, XML document classification based on ELM, Neurocomputing 74 (2011) 2444-2451. [20] K.-C. Ying, S.-W. Lin, Z.-J. Lee, Y.-T. Lin, An ensemble approach applied to classify spam e-mails, Expert Systems with Applications 37 (2010) 2197-2201. [21] M.-C. Su, H.-H. Lo, F.-H. Hsu, A neural tree and its application to spam e-mail detection, Expert Systems with Applications 37 (2010) 7976-7985. [22] Y. Yang, S. Elfayoumy, Anti-spam filtering using neural networks and Bayesian classifiers, in: Proceedings of the 2007 IEEE International Symposium on Computational Intelligence in Robotics and Automation, Jacksonville, FL, USA, Jun. 2007, pp. 20-23. [23] P. Graham, Better Bayesian filtering, in: Proceedings of the 2003 Spam Conference, Jan. 2003, http://www.paulgraham.com/better.html (accessed Dec. 28, 2011).