Web Spoofing Detection Systems Using Machine Learning ...

0 downloads 226 Views 1MB Size Report
... Systems Using Machine. Learning Techniques ... Supervised by. Dr. Sozan A. .... Web Spoofing Detection Systems Using
Web Spoofing Detection Systems Using Machine Learning Techniques A thesis Submitted to the Council of the College of Science at the University of Sulaimani in Partial Fulfillment of the Requirements for the Degree of Master of Science in Computer

By Shaida Juma Sayda B.Sc Computer Science (2006), University of Kirkuk Supervised by Dr. Sozan A. Mahmood Assistant Professor

2017, May

Dr. Noor Ghazi M. Jameel Lecturer

2717, Jozardan

) (.

Supervisor Certification

I certify that at this thesis which is entitled "Web Spoofing Detection System Using Machine Learning Techniques" accomplished by (Shaida Juma Sayda) was prepared under my supervision in the college of Science, at the University of Sulaimani, as partial fulfillment of the requirements for the degree of Master of Science in (Computer).

Signature: Name: Dr. Sozan A. Mahmood Title: Assistant Professor Date:

/

/2017

Signature: Name: Dr. Noor Ghazi M. Jameel Title: Lecturer Date:

/

/2017

In view of the available recommendation, I forward this thesis for debate by the examining committee.

Signature: Name: Dr. Aree Ali Mohammed Title: Professor Date:

/

/ 2017

Linguistic Evaluation Certification

I herby certify that this thesis titled "Web Spoofing Detection System Using Machine Learning Techniques" prepared by (Shaida Juma Sayda) has been read and checked and after indicating all the grammatical and spelling mistakes; the thesis was given again to the candidate to make the adequate corrections. After the second reading, I found that the candidate corrected the indicated mistakes. Therefore, I certify that this thesis is free from mistakes.

Signature: Name: Soma Nawzad Abubakr Position: English Department, College of Languages, University of Sulaimani Date:

/

/ 2017

Examining Committee Certification We certify that we have read this thesis entitled " Web Spoofing Detection System Using Machine Learning Techniques "was prepared by (Shaida Juma Sayda) and as Examining Committee, examined the student in its content and in what is connected with it, and in our opinion it meets the basic requirements toward the degree of Master of Science in Computer.

Signature:

Signature:

Name: Dr. Soran A. Saeed

Name: Dr. Aysar A. Abdulrahman

Title: Assistant Professor

Title: Lecturer

Date:

Date:

/

/ 2017

(Chairman)

/

/ 2017

(Member)

Signature:

Signature:

Name: Dr. Akar H.Taher

Name: Dr. Sozan A. Mahmood

Title: Lecturer

Title: Assistant Professor

Date:

Date:

/

(Member)

/2017

/

/2017

(Member-Supervisor)

Signature: Name: Dr.Noor Ghazi Mohammed Title: Lecturer Date:

/

/2017

(Member-Co-Supervisor) Approved by the Dean of the College of Science. Signature: Name: Dr. Bakhtiar Q. Aziz Title: Professor Date:

/

/ 2017

Acknowledgements First of all, my great thanks to Allah who helped me and gave me the ability to fulfill this work. We thank everybody who helped us to complete this project specially my supervisor Lecturer Dr. Noor Ghazi and Assistant Professor Dr. Sozan Abdullah for her help. Special thanks to my husband Mr.Ribwar, for his encouragements, scientific notes, and support that he has shown during my study and to finalize this work. Special thanks to my father , my mother and all my family for their endless support, understanding and encouragement. They have taken their part of suffering and sacrifice during the entire phase of my research. Special thanks goes to those who helped me during this work. I am glad to have this work done, Thanks to my colleagues and the entire faculty members.

Dedication

This thesis is dedicated to

My mother and father

my son

our families.

our friends.

Computer Department.

all who shared by any support.

Shaida

Abstract With the appearance of internet, various online attacks have been increased among them and the most well-known is a spoofing attacks. Web spoofing is the type of spoofing in which fake and spoofing websites made by fraudsters to copy real websites. Spoofing websites represent legitimate websites which attract users into visiting fake websites to steal users sensitive, personal information or install malwares in their devices. The stolen information will be used by the scammers for illegal purposes. The specific goal of this thesis is to build an intelligent system that detect and recognize between trusted and spoofing websites which try to mimic the trusted sites because it is very difficult to visually recognize whether they are spoofing or legitimate. This thesis deals with the detection of spoofing websites using Neural Network (NN) trained with Particle Swarm Optimization (PSO) algorithm. Information gain algorithm is used for feature selection, which was a useful step to remove the unnecessary features. The Information gain seem to improve the classification accuracy by reducing the number of extracted features and used as an input for training the NN using PSO. Training neural network using PSO provides less training time and good accuracy which achieved 99% compared to NN trained with back propagation algorithm which take more time for training and less accuracy which was 98.1%. The proposed technique is evaluated with a src="includes/jscripts/jquery.min.js">, it is now load file to make the page appearance good. When there are tags like this and between them any codes (not links) it is the suspicious tags because this script code is javascript or any other languages [51]. 33. Mouse Over: using JavaScript to show a fake URL in the status bar to the users [45]. Rule: ‘OnMouseOver’ feature is used is spoofing Otherwise Legitimate

22

Chapter Two

Theoretical Background

34. DNS Record: This feature can be extracted from WHOIS database. For spoofing sites, either the claimed identity is not recognized by WHOIS database or the record of the hostname is not founded .If there is no DNS record of the given domain name in the WHOIS database then it is classified as Spoofing otherwise it is considered as genuine website . DNS Record If the DNS record is empty or not found then the website is classified as “Spoofing”, otherwise it is classified as “Legitimate” [45]. Rule: No DNS record is spoofing Otherwise Legitimate 35. Protocol: Legitimate websites use HTTPS protocol for transferring any sensitive information on the internet. HTTPS require certificate for using but there is no assurance that the website which is using this protocol will be legitimate, so some other parameters must be considered. For example, if the age of the given web site is more than two years and it is from a trusted issuer then it can be considered as legitimate [45]. 36. Number of Domains in the URL: This technique is deployed in the assumption that users do not know exact domain location and the number of the domains in the URL but as longer they see brand’s domain they relate to they will not bother about the rest. In these scenarios, the URLs will thus have more than one domain. A genuine site’s URL is expected to have only one domain and should be before the URL’s path. For instance: “http://2iphoto.cn/https://www.paypal.com/cgi-bin/webscr?cmd=_login-run” is a spoofing URL which is spoofing PayPal. The URL has two domains, 2iphoto.cn, the actual domain of the URL, and paypal.com placed in the path to fool user into thinking paypal site is being accessed [22]. Rule: Number of domains in the URL > 1 is spoofing Otherwise Legitimate 23

Chapter Two

Theoretical Background

2.3 Feature Selection Feature selection, also known as variable selection, attribute selection or feature subset selection, is the process of selecting relevant features in terms of a target learning problem. The purpose of feature selection is to remove redundant and irrelevant features. Irrelevant features are features that provide no useful information about the data, and redundant features are features that provide no more information than the currently selected features. In other words, redundant features do provide useful information about the data set, but the information has been provided by the currently selected features. For example, the date of birth and age contain the same information about a person. Redundant and irrelevant features can reduce the learning accuracy and the quality of the model that is built by the learning algorithm [23]. Feature selection is the most promising field of research in data mining in which most impressive achievements have been reported. The feature selection influences the predictive accuracy of any data set [24]. Figure (2.2) shows the steps of feature selection process.

Figure (2.2) Flow of feature selection [23].

24

Chapter Two

Theoretical Background

In concept, a feature selection algorithm will search over all possible subsets of features to find the optimal subset. This process can be computationally intensive, so some sort of stopping criterion is needed. The stopping criterion is usually based on conditions involving the number of iterations and the evaluation threshold. For example, forcing the feature selection process to stop after it reaches a certain number or iterations [23]. Feature selection is a process that selects a subset of original features. The optimality of a feature subset is measured by an evaluation criterion. The feature selection process consists of four basic steps as shown in Figure (2.2), namely, features, features subsets, stopping criterion, and selected features. Features selection is a search procedure that produces candidate feature subsets for evaluation based on a certain search strategy. Each candidate subset is evaluated and compared with the previous best one according to a certain evaluation criterion. If the new subset turns out to be better, it replaces the previous best subset. The process of features and evaluation is repeated until a given stopping criterion is satisfied. Finally, the selected best subset to be validated by domain experts or any other test and the selected best subset may be given as an input to any data mining task [24].

2.3.1 Feature Selection Approaches Feature selection algorithms can be classified into three categories: embedded approaches, wrapper approaches, and filter approaches. The filter model relies on the general characteristics of data and evaluates features without involving any learning algorithm. The wrapper model requires having a predetermined learning algorithm and uses its performance as evaluation criterion to select features. The embedded model incorporates variable selection as a part of

25

Chapter Two

Theoretical Background

the training process, and feature relevance is obtained analytically from the objective of the learning model [23]. Filter Approaches Filter approaches perform feature selection before the learning algorithm is run. They can be used as a preprocessing step to reduce the dimensionality and avoid over fitting. They are independent and require no information from the learning task. Filter approaches usually require less computation than wrapper approaches. Filter approaches provide a general feature selection strategy for most learning algorithms. Some examples to filter based feature selection approaches are Relief, gain ratio and information gain [23]. In This thesis, information gain was used as a feature selection algorithm to select the best feature subset for the learning algorithm.

2.3.2 Information Gain (IG) Information Gain is supervised univariate feature selection algorithm of the filter model which is a measure of dependence between the feature and the class label. It is one of the most powerful feature selection techniques and it is easy to compute and simple to interpret. Information Gain (IG) of a feature X and the class labels Y is calculated as in equation (2.1) [24]. IG(X,Y)=H(X) – H(X|Y)

(2.1)

Entropy (H) is a measure of the uncertainty associated with a random variable. H(X) and H(X/Y) is the entropy of X and the entropy of X after observing Y, respectively. Entropy (H) is calculated as in equation (2.2). H(X) = - ∑ P (X i) log2 (P (X i)). (2.2) i

26

Chapter Two

Theoretical Background

The maximum value of information gain is 1. A feature with a high information gain is relevant. Information gain is evaluated independently for each feature and the features with the top-k values are selected as the relevant features [24]. 2.4 Artificial Neural Network (ANN) An Artificial Neural Network (ANN) is an information processing paradigm that is inspired by the way biological nervous systems, such as the brain, process information. The key element of this paradigm is the novel structure of the information processing system. It is composed of a large number of highly interconnected processing elements (neurons) working in unison to solve specific problems. Traditionally neural network was used to refer as network or circuit of biological neurons, but modern usage of the term often refers to ANN. ANN is mathematical model or computational model. It is made up of interconnecting artificial neurons which are programmed like to mimic the properties of m biological neurons. These neurons working in unison to solve specific problems. ANN is configured for solving artificial intelligence problems without creating a model of real biological system. Once a network has been structured for a particular application, that network is ready to be trained. To start this process, the initial weights are chosen randomly. Then, the training, or learning, begins. There are two approaches to training - supervised and unsupervised. Supervised training involves a mechanism of providing the network with the desired output either by manually "grading" the network's performance or by providing the desired outputs with the inputs. Unsupervised training is where the network has to make sense of the inputs without outside help [34]. Figure (2.3) shows a simple artificial neural network model.

27

Chapter Two

Theoretical Background

hidden layer input

W1 W5

layer

output

W2

layer W6 W3

W7

W4

Figure (2.3) A simple neural network [34]. The year of 1943 stands out as a prominent milestone in the history of ANN when Warren McCulloch and Walter Pitts introduced models of neurological networks; recreated threshold switches based on neurons and showed that even simple networks of this kind are able to calculate nearly any logic or arithmetic function [26]. There are three essential components on which ANN is built, input, activation functions of the unit, network architecture and the weight of each input connection. Given that the first two aspects are fixed; the behavior of the ANN is defined by the current values of the weights. The weights of the net, to be trained, are initially set to random values, and then instances of the training set are repeatedly exposed to the net. The values for the input of an instance are placed on the input units and the output of the net is compared with the desired output for this instance. Then, all the weights in the net are adjusted slightly in Activation Functions the direction that would bring the output values of the net closer to the values for the desired output [27]. 28

Chapter Two

Theoretical Background

The activation function (also called a transfer function) defines the output of neurons. There are three basic types of activation functions, as shown in figure (2.4)

Figure (2.4) Activation Functions [33] 1. Linear Function: The linear case, is not very powerful: multiple layers of linear units can be collapsed into a single layer with the same functionality [33]. 2. Threshold Function: In order to construct nonlinear functions, a network requires nonlinear units. The simplest form of nonlinearity is providing much more power than a linear function, as a multilayered network of threshold units can theoretically compute any Boolean because the discontinuities in the function imply that finding the desired set of weights may require an exponential search. Threshold function is described as equation (2.3) [33].

{

y=

0 if x 1

0 (2.3)

if x > 0

3. Sigmoid Function: There are many applications where continuous outputs are preferable to binary outputs. Consequently, the most common function is now the sigmoid. Sigmoid functions have the advantages of nonlinearity, it 29

Chapter Two

Theoretical Background

enables a multilayered network to compute any supporting practical training algorithm, back-propagation based on gradient descent. Sigmoid function is illustrated in equation (2.4) [33].

y=

1

1 + e -x

(2.4)

Since the accumulated knowledge is distributed over all of the weights moreover, the weights must be modified very gently so as not to destroy all the previous learning. A small constant called the learning rate (ŋ) is used to control the magnitude of weight modifications. Finding a good value for the learning rate is very important, if the value is too small, learning takes forever; but if the value is too large, learning disrupts all the previous knowledge. Unfortunately, there is no analytical method for finding the optimal learning rate; it is usually optimized empirically, by just trying different values [33]. The momentum (µ) determines the function of previous weight adjustment that is added to the current weight adjustment. It accelerates the network convergence process [33].

2.5 Particle Swarm Optimization (PSO) PSO is a population based global search technique proposed by Kennedy and Eberhart in 1995. Particle swarm optimization (PSO) is a swarm intelligence algorithm inspired by the social behavior of birds flocking or fish schooling. In PSO, each candidate solution the problem is represented as a particle, which is encoded by a vector or an array. Particles move in the search space to search for the optimal solutions. During the movement, each particle can remember its best experience. The whole swarm searches for the optimal solution by updating the position of each particle based on the best experience of its own and its neighboring 30

Chapter Two

Theoretical Background

particles. PSO is a simple but powerful search technique, which has been successfully applied to solve problems in a variety of areas [31]. In PSO, each candidate solution of the problem is encoded as a particle moving in the search space. The whole swarm searches for the optimal solution by updating the position of each particle based on the experience of its own and its neighboring particles. Generally, a vector xi = (xi1; xi2; :::; xiD) is used in PSO to represent the position of particle i, where D is the dimensionality of the search space and a vector vi = (vi1; vi2; :::; viD) represents the velocity of particle i. During the search process, the best previous position of each particle is recorded as the personal best called pbest and the best position obtained by the swarm thus far is called gbest. The swarm is initialized with a population of random solutions and searches for the best solution by updating the velocity and the position of each particle according to the equations (2.5) and (2.6).

X idt+1 = X id t+ Vid

Vid

= W*

(2.5) t

Vid + C1 * r1i *(Pid - X idt ) + C2 * r2i * (Pgd - X id)

where t denotes the t iteration in the search process. d

(2.6)

D denotes the dth

dimension in the search space. c1 and c2 are acceleration constants. r1i and r2i are random values uniformly distributed in [0, 1]. pid and pgd represent the elements of pbest and gbest in the dth dimension, respectively. w is inertia weight . The t

velocity v id is limited by a predefined maximum velocity, vmax and vtid

[-vmax, vmax] [30] [31]. Figure (2.5) illustrate the process of PSO

31

Chapter Two

Theoretical Background

Figure (2.5) Flowchart particle swarm optimization (PSO)[25].

Elements Used in PSO This section illustrates the concepts of the PSO elements. Particle-- the particle is defined as Pi [a,b] where i=1,2,3 .... D and a,b

R.

Here D is for dimension and R is for real numbers. Fitness Function---Fitness Function is the function used to find the optimal solution. Usually it is an objective function. 32

Chapter Two

Theoretical Background

Local Best---It is the best position of the particle among its all positions visited so for. Global Best---The position where the best fitness is achieved among all the particles visited so Velocity Update---Velocity is a vector to determine the speed and direction of the particle. Velocity is updated by the equation (2.6). Position Update---All the particles try to move toward the best position for optimal fitness. Each particle in PSO updates their positions to find the global optima. Position is updated by equation (2.5) [29]. 2.6 Training Neural Networks with PSO The PSO algorithm is vastly different than any of the traditional methods of training. PSO does not just train one network, but rather trains a network of networks. PSO builds a set number of ANN and initializes all network weights to random values and starts training each one. On each pass through a data set, PSO compares each network’s fitness. The network with the highest fitness is considered the global best. The other networks are updated based on the global best network rather than on their personal error or fitness. Each neuron contains a position and velocity. The position corresponds to the weight of a neuron. The velocity is used to update the weight. The velocity is used to control how much the position is updated. If a neuron is further away (the position is further from the global best position) then it will adjust its weight more than a neuron that is closer to the global best [28]. Generated weights are assigned in each link of neural network. In particle swarm optimization algorithm, pBest is the location of the best solution of a particle has achieved so far. gBest is the location of the best solution that any neighbour of a particle has achieved so far. Initially random numbers are generated for each particle and these random values are considered as pBest and present weights. 33

Chapter Two

Theoretical Background

Velocity is calculated using the equation (2.7) and added with the present weight in each link of neural network. For each particle, the newly calculated weights are compared with the pBest weights and the minimum error produced weights are stored in pBest. Initial velocity V is assumed to be 1 and gBest is the weights of minimum error produced particle. New weight is calculated as in equation (2.8).

V1 = W × V + C1 × Rand1 × (pBest[w1]-present[w1]) + C2 × Rand2 × (gBest[w1] - present[w1])

(2.7)

New weight wi = current weight w + new V1

(2.8)

Where C1 and C2 are two positive constants named learning factors (usually Cl= C2 = 2). Rand1 and Rand2 are two random functions in the range [0, 1]. W is an inertia weight to control the impact of the previous history of velocities on the current velocity. The operator W plays the role of balancing the global search and the local search; and was proposed to decrease linearly with time from a value of 1.4 to 0.5. As such, the global search starts with a large weight and then decreases with time to favour local search over the global search. When the number of iterations is equal to the total number of particles, a goal is compared with the error produced by the gBest weights. If the error produced by the gBest weights are less than or equal to the goal, weights in the gBest are used for testing and prediction. Otherwise weights of minimum error are stored in gBest and the iterations are repeated the until goal reached. Figure (2.6) illustrate the steps of training the neural network using PSO algorithm. [25]

34

Chapter Two

Theoretical Background

Find out random values and stored as pBest and present Weight, Assign iteration=0 and i=0

For each particle, calculate the output of neural network and error

Find out minimum error produced partical store the corresponding weights in gBest

Find velocity of ith particle and add velocity to the present weights to get new weights Yes Error (new weights)

Store newly generated

threshold then return spoofing

End of S end

60

Chapter Four The Experiments and The Results

4.1 Introduction Obtaining accurate results for a system is regarded as one of the important phases for any research work. In this chapter, the experiment and test results are explained in two ways and in last section a comparison is described to explain the differences in accuracy and training time and to find out which methods achieved higher accuracy and less training time. The followings are the methods used for training and testing the proposed system: First, the proposed system used Neural network (NN) and Particle swarm optimization algorithm (PSO). The PSO was used to train the NN to compute the optimal weights which were involved for the testing Phase using feed forward NN. Second, the proposed system was used Neural network (NN) trained with back propagation algorithm and tested using feed forward NN. The input for the training and testing phases based on the features extracted from legitimate and spoofing websites. The features were evaluated and selected using information gain algorithm so as to select the best features which the system perform high accuracy and less training and testing time. The experiment results were based on the effect of features, the training and testing phases used for the classifier with their parameters, the results of some conducted experimental tests and various cases were studied to choose the most suitable models and assessment of the performance of the proposed system, is described. All the above mentioned points will be described in the following sections via visual tables and figures. 61

Chapter Four

The Experiments and the Results

The test was performed on the Lenovo computer which it is RAM is 4 GB and it is CPU 2.4 GHZ. Moreover, it is windows operating system is 32 bit and the hard is 300 GB. 4.2 Involved System Parameters In this work, the system parameters which were involved in the training and testing phases of neural network are as the followings: 1. The parameters used in training the NN using PSO. The parameters utilized for training the NN using PSO comprise: the input nodes, maximum iterations, number of particles and hidden nodes. The features extracted from URL, HTML source code, Whois and ranker values, were fed into the NN as input data. These considered parameters generated optimal weights which involved in the testing phase. 2. The parameters used in training the NN using Back propagation algorithm. The parameters used in the training phase of NN using back propagation are: input nodes, learning rate and hidden nodes. The features extracted from the URL, HTML source code, Whois and ranker of websites, were fed into the NN as input data. These considered parameters generated weights which involved in the testing phase. 4.3 Training and Testing Phases In this thesis, two collected legitimate and spoofing datasets were used. The dataset was divided into two parts namely training and testing datasets. The training dataset is utilized for learning the system and redirecting the system for making decisions in the testing phase, whereas the testing dataset was used to evaluate the performance of the proposed system. 62

Chapter Four

The Experiments and the Results

A dataset with 3000 URLs which consists of 1500 legitimate URLs and 1500 spoofing URLs was used for training Neural Network (NN) with Particle Swarm Optimization (PSO). The same dataset was used for training the NN with back propagation algorithm for comparison test. Each website in the training dataset contains the attributes or features with the class of the website. The features described the URL based features, HTML source code features, ranker page and the class determines if the system recommends for promotion or not (1 or 0), depending on them, the proposed system can learn and make decisions. A dataset with 2000 URLs was used for testing using Neural Network. In each subsystem, numbers of experiments were performed to determine the best result by using different values of each parameter. In the neural network using PSO subsystem, best result was determined by calculating percentage of accuracy for the testing samples using table (4.1).

Table (4.1) performance calculation formula Performance Measure

Description

Accuracy Percentage % Classification

TP+TN

TN True Negative Rate (TNR)

TN + FP

Recall/True Positive Rate

TP

(TPR)

TP + FN FP

Error Percentage (%)

False Positive Rate(FPR)

63

*100

TP+TN+FP+FN

FP + TN

*100

*100

*100

Chapter Four

The Experiments and the Results

FN False Negative Rate(FNR)

*100

FN + TP

In order to understand the classification derived from notations better: True positive Tp: represents the number of website correctly classified as legitimate True negative TN: represents the number of websites classified correctly as spoofing website False positive FP: represents the number of legitimate websites classified as spoofing website False negative FN: represents the number of websites classified legitimate websites when they were actually spoofing websites.

as

In this work a list of 36 features were extracted; they were binary features, the binary feature vector was entered to the NN using PSO, and the result shown in table (4.2). Figure (4.1) shows the effect of different number of hidden nodes on training accuracy when the NN trained with PSO with 36 input nodes.

Table (4.2) Training NN using PSO with 36 features Number of Hidden Neurons

Training Accuracy

8

0.97301 %

9

0.96555 %

10

0.95995 %

11

0.97423%

12

0.97651 %

13

0.97543 %

14

0.97725 % 64

Chapter Four

The Experiments and the Results

Figure (4.1) The Effect Of Different Number Of Hidden Nodes On The Training Accuracy The information gain algorithm was used for feature selection to select the best features from 36 features, which had high ranking or had features near 1. 4.4

Information Gain Result for Features Selection

Two datasets were collected which each consisting of 5000 websites. Each website was stored as 36 features. The information gain algorithm was used to calculate information gain value for each feature and these features was sorted in descending order from maximum value to minimum value as described in table (4.3). Then 21 features were selected from 36. because 21 features had values between 1 and 0.005 value, other 15 features ignored because their values were close to 0 value. The calculated information gain value (also called entropy) for each attribute for the output variable. Entry values vary from 0 (no information) to 1 65

Chapter Four

The Experiments and the Results

(maximum information). Those attribute that contributed with more information had a higher information gain value and can be selected, whereas those that did not add much information had a lower score and can be removed.

Table (4.3) information gain values in descending order Feature Number

Information Gain Feature name

Number

value

1

25

PageRank

0.997453860815804

2

26

Alexarank

0.997453860815804

3

36

Number_of_Domains_in_the_URL

0.379503879033444

4

3

length_of_url

0.118868030103551

5

13

Length_of_host

0.095602382587933

6

21

age_of_domain

0.0911605230728393

7

27

LDigit_0_9_in_Host

0.0799538796808619

8

9

_dash in url_

0.0689143043156831

9

28

Keyword_based

0.0304603650945631

10

24

_?_

0.016392436470402

11

4

dot_in_path

0.0152678508576796

12

12

dot_in_Host

0.0134565511025651

13

5

_slash_

0.0133273726596062

14

22

Domain_token_count

0.0104788423444315

15

7

_at_

0.00966712482687615

16

11

_;_in_Path

0.00926162265998198

17

6

_equal_

0.00885635783479266

18

17

dash_in_domain

0.00743979584539134

19

16

Length_of_subdomain

0.00723766617975408

66

Chapter Four

The Experiments and the Results

20

15

Suspicious_//

0.005824407833669

21

8

_modol_

0.00542114880056099

22

33

Script

0.00300651167249832

23

23

Path_token_count

0.0015297480122658

24

10

Farza

0.000600259841027628

25

31

Redirect_page

0.000600259841027628

26

14

Length_of_path

0.000400115461790862

27

1

Protocol

0.000200028859672963

28

2

Port

0.000167312443627221

29

34

mouse_over

0

30

32

RequestURL

0

31

35

DNS_record

0

32

19

google_index_page

0

33

18

sub_domain

0

34

30

Hex_Based_Host

0

35

20

disable_right_click

0

36

29

IP_Based_Host

0

The effect of different number of features for training NN using PSO with 9 nodes in the hidden layer is shown in table (4.4). Figure (4.2) shows the training accuracy achieved with different number of features when the NN trained with PSO.

67

Chapter Four

The Experiments and the Results

Table (4.4) Training NN using PSO with different number of features and 9 nodes in hidden layer because the maximum training accuracy achieved with 9

Number of Features

Training Accuracy

21

99.12%

20

97.04%

19

96.59%

18

97.60%

17

97.39%

16

97.95%

15

96.99%

Figure (4.2) Training Accuracy of NN using PSO with different number of Features

68

Chapter Four

4.5

The Experiments and the Results

The System Results for Training NN using PSO and Testing Using Feed Forward Neural Network For training phase, 3000 sample was used to train NN using PSO, as a result

the optimal set of weights was stored in text file. To get test result, 2000 samples had been used for Feed Forward Neural Network and optimal weight. 4.5.1 Training NN Results Using PSO In this step, PSO was used for to get the optimal weights and biases which provide a minimum error for feed forward neural network. Legitimate and spoofing datasets with 21 features and a class with (1, 0) values was used for recommending for promotion or not. 4.5.1.1

The Effect of Different Number of Hidden Neurons

The hidden layer was the layer that connects the input layer to the output layer. The hidden layer node was one of the most important parameter in the NN which has direct effect in the learning process of the network. The number of nodes in the input layer and the output layer must be predefined and fixed according to the nature of the problem while the number of nodes in hidden layer was variable in the training process until which value gave the best result in the training phase to be used after in the testing phase. Training accuracy and training time presented in table (4.5) in the training phase with different neurons in the hidden layer and 21 features as input nodes for the input layer.

69

Chapter Four

The Experiments and the Results

Table (4.5) Training phase of the NN with PSO with 21 features No. of nodes in hidden layer

Training Accuracy

Training Time 1000 iteration in (seconds)

8

98.14%

6

9

99.12%

9

10

96.97%

11

11

97.61%

14

12

97.89%

15

4.5.1.2 The Effect of Different Number of Particles The number of particles was one of the most significant parameter in the PSO which had direct effect in the learning process of the network. Table (4.6) shows different number of particles with different training accuracy, 10 particles achieve higher training accuracy and used to get the optimal weights for the system.

Table (4.6) The effect of different number of particles with 9 nodes in hidden layer Number of particles

Training accuracy

10

99.12 %

12

98.23%

14

97.89%

20

97.13%

70

Chapter Four

The Experiments and the Results

Figure (4.3) The Effect Of Different Number Of Particles 4.5.2Testing Using Feed Forward Neural Network. The system was tested using feed forward neural network. The confusion matrix for the testing phase is presented in the Table (4.7). The table presented the result of the testing phase with testing datasets which consists of 2000 samples. According to the table, it is clear that there are a number of correctly classified instances, with a number of cases that are classified incorrectly in both cases. This makes the system to be fair to the error more than other systems with less incorrectly classified cases, which lead the system to misclassification during test phases. The system achieved higher accuracy with 9 neurons in the hidden layer. Figure (4.4) shows the test accuracy when the NN trained with PSO.

71

Chapter Four

The Experiments and the Results

Table (4.7) Confusion Matrix of Testing Phase Number of nodes in

Test

TN Rate

FN Rate

TP Rate

FP Rate

8

0.9852

0.0369

0.9631

0.0148

98.14%

9

0.9966

0.0105

0.9895

0.0034

99.18%

10

0.9638

0.0389

0.9613

0.0362

96.97%

11

0.9700

0.0373

0.9627

0.0300

97.61%

12

0.9343

0.0082

0.9918

0.0657

97.89%

hidden Layer

Figure (4.4) Testing Using Feed Forward Neural Network

72

Accuracy%

Chapter Four

The Experiments and the Results

4.6 The Effect of Number of Hidden Neurons The input layer fed by independent discriminating variables (i.e., feature vector). This layer was connected to the hidden layer, which had variable number of processing elements. The nodes in the first and last layers of the network are fixed by the nature of the problem itself. However, the number of hidden layers and the number of nodes in each of these layers are not known a previously. Table (4.8) shows the effect of different number of hidden nodes on the training time and training with learning rate 0.5. The maximum training accuracy achieved with 9 nodes in the hidden layer which was used in the testing phase.

Table (4.8) The Effect of Different Number of Hidden Nodes Number of hidden nodes in hidden Layer

Training Time with Training Accuracy

1000 iterations(Seconds)

8

88.19%

34

9

98.34%

36

10

97.11%

40

11

85.97%

45

12

68.73%

47

4.6.1 Testing Using Feed Forward Neural Network In the test phase of neural network, the test URLs were represented in terms of the binary feature vector of (i.e., consists of 21 features). The binary feature vector was entered to the feed forward neural network, to classify the URL into spoofing or legitimate. Table (4.9) shows testing accuracy. The best accuracy was achieved with 9 nods in the hidden layer.

73

Chapter Four

The Experiments and the Results

Table (4.9) Testing Using Feed Forward Neural Network Number of nodes in

Test Accuracy

hidden Layer 8

88.14%

9

98.20%

10

97.05%

11

85.58%

12

67.98%

4.7 Comparison between Training NN with PSO and Training NN with Back propagation. Neural network is a complex function that accepts some numeric inputs and that generates some numeric outputs. A neural network has weights and bias values that, along with the inputs, determine the outputs. There are several approaches for training a neural network. By far the most common technique is called back propagation. Although mathematically elegant, back-propagation isn't perfect. An alternative neural network training technique that uses particle swarm optimization (PSO). Training feed forward neural networks is an optimization problem of finding the set of network parameters (weights) that provide the best classification performance. A comparison of algorithms regarding the training time and test accuracy for training feed forward neural networks were done.

74

Chapter Four

The Experiments and the Results

4.7.1 Training Time Table (4.10) and figure (4.6) illustrate a comparison between the training time for training NN with PSO and training NN with back propagation algorithm. As shown in the table, the training time with PSO is less than training time with back propagation Table (4.10) Training Time Comparison Training Time Number of nodes in hidden

NN with PSO

NN with

Layer

(in seconds)

backpropagation (in seconds)

8

6

34

9

9

36

10

11

40

11

14

45

12

16

47

Figure (4.5) Training Time Comparison 75

Chapter Four

The Experiments and the Results

4.7.2 Test Accuracy In this stage, a proposed system was implemented by using NN with PSO to obtain better weights and biases for FNN. The improved PSO attempted to evaluate the system as with reducing the least error rate as possible. For this purpose, the testing dataset with 2000 instances was used in the testing phase by depending on the 21 input features to test the system and 9 nodes in the hidden layer. Neural network with PSO achieved 99.18% test accuracy which was the best performance compared to the neural network NN with back propagation in which the test accuracy was only 98.20%. Table (4.11)

Table (4.11) Comparison between NN trained with PSO and NN trained with Back propagation in testing Accuracy Classifier

Test Accuracy

Neural network trained with PSO

99.18 %

Neural network trained with Back

98.20%

propagation

76

Chapter Five Conclusions and Suggestions for Future Work

5.1 Conclusions Detecting the spoofing websites is one of the crucial problems facing the internet community because of its high impact on the daily online transactions performed. This work investigated the features that are effective in detecting spoofing websites. These features were extracted automatically without any intervention from the users and using computerized developed tools. We managed to collect and analyze 36 different features that distinguish spoofing websites from legitimate ones. We evaluated our approach on real-world data sets with more than 2500 spoof and 2500 non-spoofing URLs. From the experiments and the results some conclusions have been deduced. As summarized in the following points: A. Information gain for feature selection is a useful step to eliminate the unnecessary features.

The results of the conducted tests indicate that a

significant reduction in the network size could be done. It is found that the Information gain seem to improve the classification in accuracy, which led to less computational cost. Therefore, it appears that Information gain is only recommended in applications where time is critical. B. One of the most important conclusions for this thesis is reducing the number of features extracted and used for training the NN using PSO which help reducing the time needed by the NN using PSO to make the classification and ranking.

77

Chapter Five

Conclusions, and Suggestions for Future Work

C. The second most important conclusion of training neural network using PSO provides good accuracy for training %99.12 and testing %99.18 compare to back propagation for training %98.34 and testing %98.20. D. Training NN using PSO have also show better accuracy performance than the other. It showed almost 1% better accuracy than Neural Networks trained with backpropagation on the same datasets. E. The time for training is less than the time required for training the system basing on the neural network using PSO over backpropagation.

5.2 Future Work Suggestions For future study, the following points are suggested: A. Implementing this system using (Hadoop is open source project for creating cloud computing) B. The systems can be used to develop a tool for spoofing URL detection. C. Developing a system to online spoofing URL detection. D. Build a tool that is integrated with a web browser (examples: exe file for internet explorer and also plug-ins such as Google chrome) to detect spoofing websites on the real time and warn the users of any possible attack. E. Spoofing detection toolbar, a desktop application, so that it can run as a background process to be used as an independent spoofing detection tool.

78

References

[1] Justin Tung Ma,(2010)”Learning to Detect Malicious URLs”, Ph.D. dissertation, Computer Science, University Of California, San Diego .

[2] Vinod Saroha and R. Mehta,(2014)“Network Security : Spoofing-The Online Attacks", International Journal Of Engineering And Computer Science, Volume 3 Issue 4 .

[3] Peter Stephenson, (2000)"Investigating Computer -Related Crime a Handbook for Corporate Investigators",ISBN 0-8493-2218-9, United States.

[4] Shashank Lagishetty, (2011) “Online Spoofing: Its Effects And Mitigation“, ,MSc, International Institute Of Information Technology . GACHIBOWLI, HYDERABAD, A.P, INDIA - 500 032.

[5] Anne Mwende Kaluvu ,(2015)“Stack Path Identification and Encryption informed by Machine Learning as a spoofing defense mechanism in Ecommerce”, MSc,Computer Systems in the Jomo Kenyatta University of Agriculture and Technology .

[6] Nikitha.N and Maria krupa, (2014) “Teasers on Various Spoofing Attacks, Journal of Computer Engineering (IOSR-JCE) e-ISSN: 2278-8727, Volume 16, Issue 5 .

79



2278-0661,p-ISSN:

[7] P. R. Babu, D.L. Bhaskari and CH.Satyanarayana,(2010)“A Comprehensive Analysis of Spoofing”, (IJACSA) International Journal of Advanced Computer Science and Applications,Vol. 1, No.6.

[8]

Ahmed Abbasi, Jay F. Nunamaker, Jr, Zhu Zhang, (2010)” Detecting Fake Websites: The Contribution Of Statistical Learning Theory “,MIS Quarterly Vol. 34 No. 3 .

[9] Edward W. Felten, Dirk Balfanz, Drew Dean, and Dan S. Wallach,(1997),”Web Spoofing: An Internet Con Game”, Department of Computer Science, Princeton University .

[10] Fang Qi, Feng Bao, Tieyan Li, Weijia Jia, and Yongdong Wu ,(2006)“Preventing Web-Spoofing with Automatic Detecting Security Indicator “School of Information Science and Engineering, Central South University, Changsha 410083, China .

[11]

Sujata Garera, Niels Provos, Monica Chew, Aviel D. Rubin,(2007) “A Framework for Detection and Measurement of Phishing Attacks”, Alexandria, Virginia, USA .

[12] Yue Zhang, Jason Hong, Lorrie Cranor, (2007) “CANTINA: a Content-Based Approach to Detecting Phishing Web Sites”, Banff, Alberta, Canada .

[13] Justin Ma, Lawrence K. Saul, Stefan Savage and Geoffrey M. Voelker ,(2009)“Beyond Blacklists: Learning to Detect Malicious Web Sites from

80

Suspicious URLs”,Department of Computer Science and Engineering University of California, San Diego.

[14] Luong Anh Tuan Nguyen, Ba Lam To, Huu Khuong Nguyen and Minh Hoang Nguyen,(2014) “An Efficient Approach for Phishing Detection Using Single-Layer Neural Network”, The International Conference on Advanced Technologies for Communications (ATC'14).

[15] Dharmaraj Rajaram Patil and J. B. Patil,(2015) “ Survey on Malicious Web Pages Detection Techniques“, International Journal of u- and e- Service, Science and Technology Vol.8 .

[16] Mohammed Nazim Feroz and Susan Mengel,(2015)"Phishing URL detection using URL Ranking", Proceeidng of International Congress on Big Data, pp. 635-638.

[17] Bhagyashree E. Sananse and Tanuja K. Sarode,(2015),“Phishing URL Detection: A Machine Learning and Web Mining-based Approach”, International Journal of Computer Applications (0975 – 8887) Volume 123 – No.13 .

[18] Pradeepthi K.V. and Kannan A,(2016)” Detecting Phishing Urls Using Particle Swarm”, Australian Journal of Basic and Applied Sciences,10(2).

[19] Pooja Kalola,Sachin Patel ,Chirag Jagani,(2013) “Web Spoofing For User Security Awareness,” International Journal of Computer Applications & Information Technology Vol. 3 . 81

[20] Edward W.Felten,Drik Balfanz,Drew Dean,Dan S.Wallach,(1999)”Web Spoofing :An Internet Con Game “.

[21] Edward W. Felten, Dirk Balfanz, Drew Dean, and Dan S. Wallach, (1997)“Web Spoofing An Internet Con Game,” Department of Computer Science, Princeton University .

[22]

Thomas

Nagunwa,(2015)"Complementing

Blacklists:

An

Enhanced

Technique to Learn Detection of Zero-Hour Phishing URLs ",International Journal of Cyber-Security and Digital Forensics (IJCSDF) 4(4): 508-520 The Society of Digital Information and Wireless Communications .

[23] Bangsheng Sui,(2013),“Information Gain Feature Selection Based On Feature Interactions”, MSc,University of Houston .

[24] R. Porkodi,(2014 )“Comparison Of Filter Based Feature Selection Algorithms: An Overview “, International Journal Of Innovative Research In Technology&Science (IJIRTS) ISSN:2321-1156 .

[25] Dhas and Kumanan,( 2010)“Weld Quality Prediction of Submerged arc Welding Process using a Function Replacing Hybrid System ”,APEM Advances in Production Engineering & Management ISSN:2321-1156.

[26] Zhyan Abdulalwahab Hassan, (2014) “Implementing Web Search Engine by Using Neural Network and Support Vector Machine “,MSc, University of Sulaimani .

82

[27]

S.B.Kotsiantis,(2007)“Supervised Classification

Techniques”,

Machine

Department

Learning: of

A

Computer

Review Science

of and

Technology University of Peloponnese, Greece .

[28] Paulo F. Ribeiro and W. Kyle Schlansker ,(2003) “A Hybrid Particle Swarm and Neural Network Approach for Reactive Power Control “,IEEE

[29] Muhammad Imran , Rathiah Hashima and Noor Elaiza Abd Khalid ,(2013)“An Overview of Particle Swarm Optimization Variants”, Malaysian Technical Universities Conference on Engineering & Technology Part 4 Procedia Engineering 491 – 496 .

[30] B. Xue, M. Zhang, and W. N. Browne, (2012)“New Fitness Functions in Binary Particle Swarm Optimization for Feature Selection”, School of Engineering and Computer Science Victoria University of Wellington Wellington, New Zealand .

[31] Liam Cervante, Bing Xue and Mengjie Zhang,( 2012)“Binary Particle Swarm Optimisation for Feature Selection:A Filter Based Approach”,WCCI IEEE World Congress on Computational Intelligence .

[32] Bing Xue, A. K. Qin and Mengjie Zhang,(2014) “An Archive Based Particle Swarm Optimisation for Feature Selection in sClassification”, IEEE Congress on Evolutionary Computation (CEC) .

[33] Joseph Tebelskis,(1995) ”Speech Recognition using Neural Networks”, PhD, University Pittsburgh Pennsylvania . 83

[34] Sonali. B. Maind and Priyanka Wankar,(2014)“Research Paper on Basic of Artificial Neural Network”, International Journal on Recent and Innovation Trends in Computing and Communication Volume: 2 Issue: 1.

[35] Divya James,(2014)“An Innovative Framework for the Detection and Prediction of Phishing Websites”, International Conference On Innovations & Advances In Science, Engineering And Technology Volume 3, Special Issue 5

[36] Riddhi J. Kotak and S. H. Virani,(2015)“An Efficient Approaches for Website Phishing

Detection

using

Supervised

Machine

Learning

Technique

“,International Journal of Advance Engineering and Research Development Volume 2,Issue 5 .

[37] P. D. Dudhe and P. P. L. Ramteke,(2015) “Detection of Websites Based on Phishing Website Characteristics”, International Journal of Innovative Research in Computer and Communication Engineering Vol. 3, Issue 4.

[38] Purnima Singh and Manoj D. Patil,(2014) “Identification of Phishing Web Pages and Target Detection”, International Journal of Advanced Research in Computer Engineering & Technology (IJARCET) Volume 3, Issue 2 .

[39] O. A. Akanbi, I. S. Amiri and E. Fazeldehkordi,(2015)“AMachine-Learning Approach to Phishing Detection and Defense”, ISBN: 978-0-12-802927-5

[40] J. Lee, D. Kim, and L. Chang-Hoon,(2015)“Heuristic-based Approach for Phishing Site Detection Using URL Features”, Adv. Comput. Electron. Electr. Technol., pp. 131–135 . 84

[41] S. Carolin Jeeva

and

Elijah Blessing Rajsingh, Jeeva and Rajsingh,

(2016)”Intelligent Phishing Url Detection using Association Rule Mining”, Human-centric Computing and Information Sciences,DOI 10.1186/s13673016-0064-3 .

[42] Ram B. Basnet, Andrew H. Sung, and Q. Liu, (2014)“Learning to Detect Phishing Urls”, International Journal of Research in Engineering and Technology , ISSN: 2319-1163 -ISSN: 2321-7308.

[43] Sukhjeet Kaur1and Er. Amrit,(2015)“ Detection of phishing webpages “,International Journal For Technological Research In Engineering Volume 2, Issue 9, ISSN (Online): 2347 – 4718 .

[44]

Rami M. Mohammad, and Lee Mccluskey, (2013)“Phishing Websites Features”,School of Computing and Engineering University of Huddersfield .

[45] Priyanka Singh,Y.Maravi, and S.Sharma,(2015)“Phishing Websites Detection through Supervised Learning Networks”, International Conference on Computing

and

Communications

Technologies(ICCCT),Pages: 61 -

65, DOI: 10.1109/ICCCT2.2015.7292720 . . [46] Sana Ansari and Trustworthiness

of

Jayant Gadge (2012)“Architecture for Checking Websites”,

International

Applications (0975 – 8887) Volume 44– No14 .

85

Journal

of

Computer

[47] Rami M. Mohammad,(2015)“Speeding up Anti-Phishing Tools Development using Self-Structuring Neural Network”, School of Computing and Engineering University of Huddersfield , UK .

[48] Rami M. Mohammad, Fadi Thabtah and Lee McCluskey,(2012)”An Assessment of Features Related to Phishing Websites using an Automated Technique”, International Conferece For Internet Technology And Secured Transaction , IEEE, London, UK, pp. 492-497. ISBN 978-1-4673-5325-0.

[49] Weibo Chu,Bin B. Zhu, Feng Xue, Xiaohong Guan and Zhongmin Cai,(2011)“Protect Sensitive Sites from Phishing Attacks Using Features Extractable from Inaccessible Phishing URLs”,Microsoft Research Asia, Beijing, China .

[50] Ahmad Abunadi, Oluwatobi Akanbi, and A. Zainal, (2013)“Feature Extraction Process: A Phishing Detection Approach,” Universiti Teknologi Malaysia .

[51] Mona Ghotaish Alkhozae and Omar Abdullah Batarfi,(2011)“Phishing Websites Detection based on Phishing Characteristics in the Webpage Source Code”, International Journal of Information and Communication Technology Research,Volume 1 No. 6 .

[52]

Bhagyashree E. Sananse and Tanuja K. Sarode,(2015)“Phishing URL Detection :A Machine

Learning and Web Mining-based

Approach”,

International Journal of Computer Applications (0975 – 8887) Volume 123 No.13 .

86

[53] Anh Le, Athina Markopoulou and Michalis Faloutsos, (2011)“PhishDef: URL names say it all”, University of California, Irvine .

[54] Huajun Huang, Liang Qian and Yaojun Wang, (2012),“A SVM-based Technique to Detect Phishing URLs”, Information Technology Journal, 11: 921-925.

[55] Ram Basnet, Andrew H. Sung,(2015)” Rule-Based Phishing Attack Detection”, Computer Science & Engineering Department, New Mexico Tech, Socorro, NM 87801, USA.

[56] Venkata Manthena (2016),“Design and Implementation of Heuristic based Phishing Detection System Using Address, Abnormal, Domain and HTML&JavaScript features”, MSc, Sciences Texas A&M University-Corpus Christi Corpus Christi, Texas .

[57]

Available

http://popaganda.gr/chrisimos-odigos-apofigis-ton-scam-sites/

access 30 May 2017 .

[58] Available www.phishtank.com access 1 March 2016 .

[59] Available www.dmoz.com access 1 March 2016 .

87

             

     )   

                    

 

   ,       ,        .        .                  .  . .Malware              ,  .  .  . PSO   Practical Swarm Optimization    Neural Network  featurefeatures Information gain  ,   features      Information gain,   .  PSO  Neural Networkfeature  back     ,     accuracy   PSO    dataset     .   accuracy   propagation                 

 



          ()  



.     . 

  

 



 spoofing       NN InformationGainPSO  Information Gain PSOPSONN BPNNNN٪ ٪  .PSONN٪