Design of Anomaly Detection System for Outlier ... - Semantic Scholar

1 downloads 0 Views 177KB Size Report
Sep 9, 2012 - ted byte in use. Availab le. Mbytes. Cache faults/se c. Page faults/s ec. Page writes/s ec. Page ..... Jolliffe, “Principa suk Song, Hiroki tion,” In ...
Hari Om et al. / International Journal on Computer Science and Engineering (IJCSE)

Design of Anomaly Detection System for Outlier Detection in Hardware Profile Using PCA Hari Om, Tanmoy Hazra Department of Computer Science & Engineering Indian School of Mines, Dhanbad 826004, India 1 [email protected], [email protected] Abstract In this paper, we design an Anomaly Detection System for Outlier Detection in Hardware Profile by using Principal Component Analysis (PCA) that helps reduce the dimension of data. Anomaly detection methods can detect new intrusions, but they suffer from false alarms. Another approach is misuse detection that identifies only known attacks by matching with the previous patterns. Host based Intrusion Detection Systems (HIDSs) use anomaly detection approach to identify malicious attacks i.e. intrusion. Data being of large dimensional generates features in terms of large set of dimensions and hence the system takes considerable time for processing the huge amount of data. The PCA is used to reduce the dimensionality of the host based data without any loss of useful information such as non-redundant data. We experimentally show that the proposed intrusion detection system has detection rate in the range of 90% - 97.5% and false alarm rate in the range of 2.5% - 7.5% depending upon the major and minor principal components. Keywords: Anomaly Detection, Outlier Detection, PCA, Mahalanobis Distance, False alarm rate 1. Introduction With the explosive rapid expansion of computers in last decade and so, their security has become an important issue. The process of monitoring the events occurring in a computer system and analyzing them for identifying intrusions is known as intrusion detection technique and the system is known as intrusion detection system (IDS). An intrusion is defined as an attack in a network or system by an intruder that compromises the security parameters such as integrity, confidentiality, and authentication of the system. The attacks can be external attacks, internal penetrations, and misfeasors. An intruder tries to get access into a system for which he/she is not authorized. An Intrusion Detection System (IDS) is a program that analyzes the events that have taken place or those happen during an execution and it tries to find indications of misuse of the computer. Host based Intrusion Detection Systems (HIDSs) monitor suspicious activities that take place in the system. The HIDSs can be either anomaly detection that is based on statistical measure or misuse detection that is based on signature. Anomaly detection is used to capture the changes in behavior that are not normal. These methods use as input the training data to build normal system behavior models that signal alarms when there is any abnormal activity which deviates from the normal model. These models may be generated using different approaches such as statistical analysis, data mining algorithms, genetic algorithms, artificial neural network, fuzzy logic, rough set. Anomaly detection methods have problems of false positive and false negative. Since the numbers of new attacks are increasing and the variations of known attacks cannot be recognized by misuse detection. Therefore, we develop an intrusion detection system using Principal Component Analysis (PCA) that detects the outlier data. 2. Related Work There are several works related to intrusion detection in literature [1-6]. The principal component analysis (PCA) is one of the important approaches that are used to reduce the data size and also detect errors in multivariate data [3]. The Chi-square distribution is also very useful statistical approach in detecting anomalies. Shyu discusses an intrusion predictive model that uses PCA and Chi-square distribution for KDD1999 dataset [1]. For detecting anomaly in a system, monitoring of its behaviour is required. If there is abnormal behaviour in the system, one can suspect some security violation. In [2], an intrusion detection model based on security violations that is capable of detecting break-ins, penetrations and other types of computer attacks is discussed. Ye uses Chi–square statistic to develop an anomaly detection technique that has 0% false alarm rate and 100% detection rate [4]. Puketza provides a comparative study of detection rate and false alarm rate by using Hotelling’s T2 test and Chi-squared distance test. He has reported experimentally that the Chi-square distribution has better performance than the Hotelling’s T2 test [14]. Chen et. al discuss an efficient filtering scheme that requires only 0.3% of the original traffic volume for anomaly [17]. Casas et. al discuss an unsupervised network intrusion detection system that can detect unknown network attacks without using any kind of signatures,

ISSN : 0975-3397

Vol. 4 No. 09 Sep 2012

1623

Hari Om et al. / International Journal on Computer Science and Engineering (IJCSE)

labeled traffic, or training [18]. In this paper, we use PCA methodology to detect intrusion and our proposed system has detection rate in the range of 90% - 97.5% and false alarm rate in the range of 2.5% - 7.5% depending upon the major and minor principal components. The rest of the paper is organized as follows: section 3 discusses the proposed work. Experimental methodology has been discussed in section 4, Results and Discussions are given in section 5. Finally, the conclusion is given in section 6. 3. Proposed Work The PCA is a common technique to find the patterns in the data of high dimension. It basically reduces the number of dimensions in an input data set without losing its useful information. In PCA technique, a set of principal components are obtained that constitute an orthogonal set of eigenvalue and eigenvector pairs. The set of principal components, also called axes, best suits the data. In our proposed scheme, these set of axes represent features’ normal data. Outlier detection occurs by mapping the used data to these normal axes in order to find the distance from the axes. If the distance is greater than a certain threshold, it is assumed that there is an attack i.e. outlier detection. The principal components are linear combinations of m random variables (features of used data), denoting them as X1, X2, ... ,Xm, that have two important properties:  They are uncorrelated, and sorted in descending order.  Their total variance, denoted by R, is the summation of variances of each variable X1 , X2 , ... , Xm, i.e., m

R=

R i 1

i

, where Ri is variance of Xi

Assume that the original data is represented in matrix form with n observations, each observation has m attributes i.e. Xnxm. Let ρmxm and ∑mxm be the symmetric correlation and variance-covariance matrices of X1, X2,...,Xm, respectively. X= [X1, X2, … , Xm]T denotes the observation data matrix. Let the correlation matrix be the mxm symmetric matrix as given below:

ρ=

ρ11 ρ12 … ρ12 ρ22 … . . . . ρ1m ρ2m

ρ1m ρ2m . . ρmm

where the correlation coefficient ρik measures the amount of linear association between Xi and Xk that is defined in terms of covariance σik and variances σii and σkk as follows: ρik = σik /(√σii √σkk) Thus, the correlation matrix ρ can be defined as follows:

ρ=

σ11 / (√σ11 √ σ11) σ12 / (√σ11 √ σ22) . . σ1m / (√σ11 √ σmm)

σ12 / (√σ11 √ σ22) … σ1m / (√σ11 √ σmm) σ22 / (√σ22 √ σ22) … σ2m / (√σ22 √ σmm) . . . . σ2m / (√σ22 √ σmm) … σmm / (√σmm √ σmm)

∑=

σ11 σ12 . . σ1m

σ12 … σ1m σ22 … σ2m . . . . σ2m … σmm

where σik represents the covariance between ith and kth attributes defined below: n

σik = 1/(n-1)

(X i 1

i

- X i )(Xk - X k )

and σii represents the variance of ith attribute. and let Vmxm be standard deviation matrix that is defined below:

ISSN : 0975-3397

Vol. 4 No. 09 Sep 2012

1624

Hari Om et al. / International Journal on Computer Science and Engineering (IJCSE)

√σ11 0 … 0 0 √σ22 … 0 . . . . . . 0 0 … √σmm

V1/2 =

Then, it can be easily verified that V1/2ρV1/2 = Σ We can also write ρ = (V1/2)-1Σ(V1/2)-1 The principal components may also be obtained for the standardized variables: Z1, Z2,...,Zm using the following equation: Zi = (Xi - X i )/√σii

for i=1, 2, …, m.

X = [ X 1 , X 2 , ..., X m ]T is the mean vector of X which is having m attributes/components i.e. X= [X1, X2 , …, Xm]. We can also represent it in a matrix form of dimension mxm: Z = (V1/2)-1(X- X ), where Z = [Z1, Z2, … , Zm]T the column vector of the standardized observation data X. The principal components of Z are obtained from the eigenvectors of the correlation matrix ρ. Let Yi be the ith principal component of Z and (λi, ei) represent the ith eigenvalue/eigenvector pairs among m eigenvalues from ρ. If (λ1, e1), (λ2, e2), ..., (λm, em) are m eigenvalue-eigenvector pairs, where λ1 ≥ λ2 ≥ ... ≥ λm ≥ 0, the ith principal component is given by Yi = eiT Z = ei1Z1+ ei2Z2 + ... + eimZm,

i =1 , 2 , ..., m,

where ei is given by

ei1 ei2 ei =

. . eim

Each eigenvalue of a principal component corresponds to the amount of variation it has. The larger eigenvalues are more significant and correspond to their projected eigenvectors. The points which lie at a far distance from these axes would exhibit abnormal behavior that can easily be identified. Using a suitable threshold value, the normal system generated data with Mahalanobis distance greater than the threshold is considered as an outlier and it is an attack. If the data is in the threshold boundary, sometimes it alerts as intrusion. The sum of squares of the partial principal component values equals to the principal component value that is given as follows: m



Yi2/λi = Y12/λ1 + Y22/λ2 + …. + Ym2/λm

i 1

This sum is nothing but Mahalanobis distance of the dataset X from the mean of the normal sample dataset [9]. In general, Mahalanobis distance between two vectors x and y is calculated by d2 (x, y) = (x − y)T ρ (x − y), where ρ is the sample correlation matrix. Here, the major principal components value is used to detect extreme deviations with large values and minor principal components value is used to detect slight deviations on the normal dataset. Thus, two thresholds are needed to detect attacks. Let q & r be the most significant principal components and least significant principal components and Tq & Tr be the thresholds for the major principal component and minor principal component. We say that an attack occurs for any observation of X if any one of the following condition is satisfied: q

 i 1

m

Yi2/λi > Tq

or



Yi2/λi > Tr

i  m  r 1

These inequalities contain square of projections on the axes normalized by corresponding eigen values. The first inequality contains the sum of squares of first q principal component values (projections on first q axes) and the second one contains sum of squares of last r principal component values (projections on last r axes). If the first sum is greater than the threshold value Tq or the second sum is greater than the threshold value Tr, then there is

ISSN : 0975-3397

Vol. 4 No. 09 Sep 2012

1625

Hari Om et al. / International Journal on Computer Science and Engineering (IJCSE)

large deviation and such deviations are termed as abnormal behaviour of the system, i.e., an attack. Now we discuss confusion matrix that helps computing recall, precision, detection rates and false alarm rates. Confusion matrix: False alarm rate and detection rate can be calculated using the confusion matrix that is given below. Predicted Class C Actual

Class

NC

TN FN

C NC

FP TP

Fig. 1 Confusion matrix

C – Anomaly class NC – Normal class TN – True Negative FN – False Negative

Recall (R) = TP / (TP+ FN) Precision (P) = TP / (TP+FP) F-measure = 2*R*P/(R+P) = ((1+  2).R.P) / (  2.R+P)

TP – True Positive where  is the relative importance of precision vs recall and FP – False Positive it is usually set to 1. First, we calculate the mean vector for all the attributes that have been used for our experimental datasets. Then, we calculate the correlation matrix followed by the eigenvalues and eigenvectors from the correlation matrix. In order to calculate the principal components – major or minor- we sort the eigenvalues and their corresponding eigenvectors. We compute the summation of major and minor principal components and determine corresponding suitable threshold values from the normal dataset and compare with each observation of the mixed dataset. In order to evaluate the detection rate and false alarm rate accurately, we have used confusion matrix. The flow chart of the entire process is shown in the following Fig. 2.

Normal/Tested data  Mean Vector calculation

Correlation Matrix computation   Eigen Analysis i.e. eigenvalues and their corresponding eigenvectors( λ , e)

Sorted(in descending) eigenvalues and their corresponding eigenvectors i.e. Principal Components(No. of PC’s m) Minor and Major Principal Components i.e. r & q

Threshold evaluation for minor and major PC’s i.e. Tr & Tq

q

 i 1

m

Yi2/λi > Tq

OR



Yi2/λi > Tr

i  m  r 1

(Intrusion Detection) Intruded data verification using confusion matrix Detection rate

False alarm rate

Fig 2. Various steps for verification of intrusion detection using PCA

ISSN : 0975-3397

Vol. 4 No. 09 Sep 2012

1626

Hari Om et al. / International Journal on Computer Science and Engineering (IJCSE)

4. Performance Log Analysis We generate a log file of patterns with errors and without errors and then use PCA to analyze the results. a.

Performance Log

As PCA has wide area of applications, one area of application is HIDS. The analysis of the paper uses a host-based anomaly detection scheme to identify abnormal system behavior. Normal behavior of the system is created based on the processes running in the system. Then abnormal behavior is generated by creating problems in the system. Performance log are generated by taking some of the process attributes for the normal and abnormal behavior of the system. The performance of the personal computer can be measured by using the performance log. The hardware profile of the system that has been used for the experiment is as follows:  Intel(R) Core 2 Duo CPU 1.60 GHz  1.99 GB RAM  Microsoft Windows XP Professional Service Pack 2 b. Attributes used in performance log Different attributes considered for Performance Log analysis are as follows: 













Committed byte in use (%): This is the ratio of memory committed bytes to memory commit limit. Here committed memory is physical memory in use for which space has been reserved in the paging file and should be written to the disk. The commit limit is determined by the size of the paging file. If the paging file is enlarged, the commit limit increases, and the ratio is reduced. This counter displays the current percentage value only (not an average). Available Mbytes: This is the amount of physical memory in Megabytes available to processes running in the computer. It is calculated by summing up the space of the Zeroed, Free, and Standby memory lists. Free memory is ready for use. Zeroed memory is pages of memory filled with zeros to prevent later processes from seeing data used by a previous process. Standby memory is memory removed from a process’ working set (physical memory) on route to disk, but is still available to be recalled. This counter displays the last observed value only (not an average). Cache faults/sec: It is the rate at which faults occur when a page sought in the file system cache is not found and must be retrieved from elsewhere in memory ( a soft fault) or from the disk (a hard fault). The file system cache is an area of physical memory that stores recently used pages of data for applications. Cache activity is a reliable indicator of most application I/O operations. This counter shows the number of faults, without regard for the number of pages faulted in each operation. Page faults/sec: It is the average number of pages faulted per second. It is measured in number of pages faulted per second because only one page is faulted in each fault operation; hence this is also equal to the number of page fault operations. This counter includes both hard faults (those that require disk access) and soft faults (where the faulted page is found elsewhere in physical memory.) Most processors can handle large numbers of soft faults without significant consequence. However, hard faults, which require disk access, can cause significant delays. Page writes/sec: It is the rate at which pages are written to disk to free up space in physical memory. Pages are written to disk only if they are changed while in physical memory, so they are likely to hold data, not code. This counter shows write operations, without regard to the number of pages written in each operation. This counter displays the difference between the values observed in the last two samples, divided by the duration of the sample interval. Page op/sec: It is the rate at which pages are written to disk to free up space in physical memory. Pages are written back to disk only if they are changed in physical memory, so they are likely to hold data, not code. A high rate of pages output might indicate a memory shortage. Windows writes more pages back to disk to free up space when physical memory is in short supply. This counter shows the number of pages, and can be compared to other counts of pages, without conversion. Pool non-paged allocs: is the number of calls to allocate space in the non-paged pool. The nonpaged pool is an area of system memory for objects that cannot be written to disk, and must remain in the physical memory as long as they are allocated. It is measured in numbers of calls to allocate space, regardless of the amount of space allocated in each call. This counter displays the last observed value only; it is not an average.

ISSN : 0975-3397

Vol. 4 No. 09 Sep 2012

1627

Hari Om et al. / International Journal on Computer Science and Engineering (IJCSE)



Pool paged allocs: is the number of calls to allocate space in the paged pool. The paged pool is an area of system memory (physical memory used by the operating system) for objects that can be written to disk when they are not being used. It is measured in numbers of calls to allocate space, regardless of the amount of space allocated in each call. This counter displays the last observed value only; it is not an average.  System driver total byte: It is the size, in bytes, of the pageable virtual memory currently being used by device drivers. Pageable memory can be written to disk when it is not being used. It includes physical memory (Memory\\System Driver Resident Bytes) and code and data paged to disk. It is a component of Memory\\System Code Total Bytes. This counter displays the last observed value only; it is not an average.  Write copies/sec: It is the rate at which page faults are caused by attempts to write that have been satisfied by coping of the page from elsewhere in the physical memory. This is an economical way of sharing data since pages are only copied when they are written to; otherwise, the page is shared. This counter shows the number of copies, without regard for the number of pages copied in each operation. 5. Experiment Methodology To carry out the experiment, the performance logs are generated. The steps for generating the performance logs are as follows [16]: 

On the start menu, point to settings, point to Control Panel, double click Administrative Tools, and double click Computer Management.  Explore performance Logs and Alerts, right click Counter Logs, and then click New Log Settings.  Type a name for the counter log and then click OK.  Click Add Counters.  In the Performance object box, select a performance object that need to be monitored.  Counters added for experiment.  On the General tab under Sample data, every sampling interval of 15 seconds is configured.  On the Log Files tab, log files properties are configured as Comma delimited files that can be viewed later in reporting tools such as Microsoft Excel. After the performance log has been generated each day, the log is divided into 4 groups, and the average values for each column of the table are calculated. These values are used as our normal data set. In the meanwhile, for one day the system is left to work when the graphics driver, audio driver, and USB driver have been disabled. This generates the logs for system performance that have been considered as intruded data. We have taken the same number and same type of attributes in our experiment. For our experiment, we have taken the normal dataset and the testing dataset i.e. mixture dataset (normal and intrusion), which are given Tables 1 and 2, respectively. We have also shown how our proposed methodology detects and verifies the true intrusion in data flow diagram (DFD) (ref. Fig. 2). Table 1 : Normal dataset with some selective attributes

Commit Availab ted byte le in use Mbytes

Cache faults/se c

Page faults/s ec

Page writes/s ec

Page op/sec

3.82441 8508 3.64145 3572 5.11114 4298 5.63854 6946 4.57840 4615 4.77965 7979 6.23580 037 5.61765

101.612 4671 57.1504 0864 79.4770 8971 36.1708 794 51.1199 8798 49.9333 1807 94.4721 2975 46.0269

295.36 30973 256.09 14499 334.48 68715 162.96 19512 280.35 27608 128.97 45175 308.61 62728 219.40

0.06857 4005 0.24534 2241 0.14372 3968 0.03686 9153 0.15858 5618 0.16840 0988 0.13230 9061 0.01555

1.09718 4087 3.92547 5853 2.29958 3495 0.58990 6451 2.53736 9884 2.69441 5815 2.11694 497 0.24880

1724.57 2519 1736.84 8485 1680.81 9718 1654.26 1628 1702.97 1429 1696 1640.77 8065 1648.30

ISSN : 0975-3397

Vol. 4 No. 09 Sep 2012

Pool Nonpaged Allocs 26682 23133.4 2424 37059.2 0563 32614.1 1047 27955.8 33171.6 1811 54970.7 5613 33495.5

Pool Paged Allocs 42251.7 6336 32497.9 697 52290.0 2817 45698.5 5233 37290.3 7143 48146.6 0236 71052.6 1742 43982.1

System driver total byte 750315 2.855 750387 2 760777 1.944 750387 2 750387 2 750387 2 750387 2 750368

Write copies/s ec 3.33582 1308 4.44542 9107 1.91861 6972 2.13336 7554 5.54100 7434 1.32170 4214 3.68972 7265 7.98484

1628

Hari Om et al. / International Journal on Computer Science and Engineering (IJCSE)

1849 4.11134 8119 9.08474 6823 10.7049 4185 10.8421 5109 8.92087 0039 9.01944 5732 8.15211 776 9.78973 4979 8.82137 0882 9.22601 2337 13.1381 8528 10.2376 1708

0971 1720.17 0732 1619.29 0476 1580.08 8785 1563.67 4455 1650.93 6255 1629.35 9551 1669.16 8831 1564.33 3333 1635.53 8462 1605.61 8421 1567.82 7759 1621.77 0221

8896 132.121 0622 68.3286 9399 76.9729 0274 102.997 176 35.1183 0951 123.152 1235 46.8869 2477 108.419 3887 70.4130 3081 82.6517 3496 60.9771 7436 107.500 139

03424 525.63 92525 458.02 7867 512.52 4161 469.88 37442 189.85 69296 660.77 78565 185.93 97726 688.67 77162 401.79 80469 478.15 3206 335.72 65847 639.69 73908

0526 0.06567 2177 0.60135 9321 1.16544 1987 0.46537 3406 0.29024 2311 1.23764 2668 0.32143 1061 1.01201 838 0.52689 9445 0.81245 8579 0.80176 4537 0.61802 9752

8412 1.05075 484 9.62174 913 18.6470 7179 7.44597 4504 4.64387 6977 19.8022 8269 5.14289 6979 16.1922 9409 8.43039 1119 12.9993 3727 12.8282 326 9.88847 6038

1262 26849.5 3659 42900.9 6667 49948.0 1402 86950.0 7321 36718.6 8127 37276.3 9326 36697.4 8312 39289.6 8571 36366.8 5 44724.5 49766.0 2174 49423.0 7537

0097 37876.3 3537 47518.9 2857 62041.1 2617 100841. 3692 45366.9 5219 44230.8 5393 42494.6 1299 42273.7 4286 41617.4 6154 48669.6 7763 58386.9 9164 56112.7 8125

9.072 750017 5.61 747065 5.39 747066 3.776 747094 4.498 747091 6.335 747004 5.483 747085 9.304 746930 9.562 747037 9.323 747048 4.211 747094 6.462 757714 0.706

3091 7.74730 1195 4.44356 2404 10.8575 9362 8.17171 7948 7.52211 2596 9.02677 0954 2.60437 2686 11.2262 4646 7.55144 6472 5.78276 1903 6.01478 3825 3.40091 2746

Table 2 : Testing dataset with some selective attributes

32166. 55556

43517. 9321

750358 1.235

7.6550 12219

63023. 91566 48472. 48454 35413. 93651 39974. 58491 49530. 98042 35642. 40625 43191. 15556 53580. 36842

72219. 42935 57402. 49828 38624. 19048 44790. 88183 54235. 6993 39465. 27344 52279. 08889 61298. 03383

749744 3.119 747078 0.261 747035 6.317 747101 0.447 747084 0.481 747036 8 747075 5.081 747098 5.945

8.4206 12447 4.5066 49959 5.4890 3716 5.5179 03309 3.0219 82506 5.8815 60678 10.645 23952 4.5726 03869

Pool Paged Allocs

1.4017 67647

0.2828 49872

4.5255 97951

0.3761 7982 0.4248 27439 0.9445 6778 0.3255 04227 0.4299 25858 0.9420 87728 0.7074 88741 0.6753 52729

6.0188 77122 6.7972 39025 15.113 08448 5.2080 67639 6.8788 1372 15.073 40364 11.319 81986 10.805 64366

Availab le Mbytes

Cache faults/s ec

Page faults/s ec

Page writes/s ec

Page op/sec

1700.0 22388

26.040 43396

132.99 67699

0.0876 10478

4.9466 0698

1690.1 79012

85.354 08783

371.75 05326

9.2519 20893 9.8730 66812 7.5678 23945 8.8310 54122 9.1107 5792 8.0382 68296 9.8113 94241 13.241 4429

1632.7 54655 1584.6 42612 1692.2 06349 1646.6 37537 1635.1 58042 1653.0 70313 1581.9 2963 1585.8 45865

220.42 84527 64.198 29632 68.284 57374 54.316 47998 51.967 54351 71.370 37658 87.210 2098 64.641 90334

905.32 87534 766.80 79994 324.15 01421 264.23 89394 234.46 9843 399.82 83644 427.90 725 316.88 36163

ISSN : 0975-3397

Write copies/s ec

39788. 48507

System driver total byte 753300 2.507

Pool Nonpag ed Allocs 29212. 34328

Commi tted byte in use 4.6013 87552

Vol. 4 No. 09 Sep 2012

2.9401 84286

1629

Hari Om et al. / International Journal on Computer Science and Engineering (IJCSE)

8.7429 37931 8.2136 84021 12.961 23187 9.5993 38354 12.423 02175 8.3011 44161 11.656 48745 13.843 20911 9.7392 66809 11.234 44288 3.1962 9803 7.0980 92873 7.6333 7616 7.6484 79754 8.2629 57542 3.3246 78577 6.1918 77248 5.8753 97339 7.6461 94342 7.8930 18861 7.8465 1569 7.9633 6981 5.9363 08543 7.6332 76794 7.8436 34084 5.7472 15523 7.6562 30282

1644.1 83962 1634.3 24786 1484.2 99335 1608.6 84211 1585.5 87703 1657.9 4197 1522.1 72414 1499.3 41053 1581.8 27451 1540.4 98941 1740 1613 1600 1603 1590 1738 1643 1651 1600 1602 1601 1600 1649 1598 1601 1653 1602

ISSN : 0975-3397

32.311 56115 62.828 07225 62.837 06428 83.176 42964 73.917 82982 225.52 67105 189.69 87689 64.766 22069 68.704 86069 99.735 92101 1602.5 92741 3708.6 29397 2531.4 66836 1006.7 69784 461.14 78738 1762.7 64486 3732.7 99556 2049.1 89776 2791.5 75869 295.94 33757 524.72 7077 1156.7 44234 1982.8 52701 2817.9 41408 480.20 46148 1971.0 68567 2656.1 99819

222.82 07108 336.10 20724 446.22 32714 498.38 45806 383.00 54702 470.62 44796 1189.3 93533 314.71 77163 415.76 79794 500.43 37987 4193.3 97458 48843. 19072 29764. 09948 1956.8 7085 2469.6 11203 4682.1 15926 49552. 13253 21404. 14929 31989. 23505 2465.4 16994 1533.8 89331 2285.1 54868 20715. 51759 32296. 54016 1513.8 14548 20561. 84666 31600. 57928

0.2964 11962 0.5346 03814 0.5594 29696 0.9669 3393 1.1148 07342 0.3058 79715 2.6241 74933 0.8764 85557 0.5480 92356 0.3282 73963 1.8655 28638 4.6006 24828 6.0829 82197 6.0002 17242 5.2001 63966 1.6780 77508 4.6660 82767 3.5229 62366 2.5337 95515 6.0002 03608 5.9938 36392 6.0000 56544 3.3923 45214 2.5307 06249 6.0000 57662 3.3933 32869 2.5328 88158

4.7425 91388 8.5536 61019 8.9508 75138 15.470 94288 17.836 91747 4.8940 75434 41.986 79892 14.023 76892 8.7694 77698 5.2523 83416 29.848 45821 73.609 99725 97.327 71516 96.003 47587 83.202 62346 26.849 24012 74.657 32428 56.367 39785 40.540 72824 96.003 25773 95.901 38227 96.000 9047 54.277 52342 40.491 29998 96.000 92258 54.293 32591 40.526 21053

Vol. 4 No. 09 Sep 2012

36625. 95755 38311. 66239 39367. 74279 40983. 42915 52191. 46293 40272. 80297 48923. 3908 58342. 58105 40609. 1098 58790. 27542 22480

41421. 07547 44574. 31197 50919. 15965 46584. 31579 57598. 28571 51679. 34143 55990. 70115 66071. 59158 47562. 47059 63676. 78072 28030

26335

31848

29131

37788

29534

40935

31214

45784

22296

28074

25054

31251

31232

31145

34482

35446

31029

44443

36112

36112

36439

42058

31382

31160

34623

35585

36190

37699

31316

31052

34369

35563

747088 1.811 747029 8.803 747089 5.113 747034 1.182 747093 3.642 747084 9.727 746893 8.299 747070 7.335 747073 4.557 750942 1.559 727859 2 727859 2 737280 0 737280 0 737280 0 727859 2 727859 2 737689 6 737689 6 737280 0 747110 4 747110 4 737689 6 737689 6 747110 4 737689 6 737689 6

4.2557 07607 3.9071 10265 5.6460 72064 9.0750 43219 4.4846 65132 3.6557 37869 14.632 83051 4.6228 2402 7.1302 39303 3.9933 5076 123.22 85306 504.42 56508 287.25 19371 27.801 00656 200.47 29878 141.01 44466 492.67 16819 297.96 99201 211.70 52831 34.801 18093 48.550 07478 44.600 42031 292.81 86234 196.99 55022 48.600 46706 284.20 50934 220.16 13049

1630

Hari Om et al. / International Journal on Computer Science and Engineering (IJCSE)

7.8977 88417 12.938 51347 13.700 15193

1599 1429 1405

866.33 54261 233.86 71284 282.06 85421

2194.6 2 76453 7 1094.2 68827 6 2013.0 2 80051 8

5.99940 18 17 0

95.904 2907 71 0

36073

37570

55611

63459

0

0

58759

68319

747110 4 747110 4 747110 4

37.895 51487 67.800 13385 76.800 51064

Table 3: Dettection rate and false f alarm rate ussing different num mber of major and d minor principall components

Used Dataseet Testing Dataaset (Table 2))

q (Major) 4 4 3 5 3

r (Minor) 3 4 3 2 4

Detection Rate 92.5% 97.5% 87.5% 97.5% 90%

False A Alarm Rate 5% 5% 7.5% 2.5% 7.5%

Fig 3. Detecction rate and false alarm rate

6. Resultts and Discusssion In our exxperimental results, we havee evaluated thhe detection raate and false alarm rate of thhe used tested d data (ref. Table 2) for different number of major m and minnor principal components. We observe from our exp perimental he false alarm m rate from 2.55% to 7.5% (rref. Fig. 1 results thhat the detectioon rate varies from 90% too 97.5% and th & Table 3) by taking five f different numbers of m major and min nor principal components c foor the used daataset. The d false alarm rate have also been displlayed in the bbar chart (reff. Fig. 3). above reesults of detecction rate and Furtherm more, the detecction rate and d false alarm rrate depend on o the numberr of major prinncipal compo onents and minor prrincipal compoonents. It has been observeed that the maajor principal components aare more effective than the minoor principal coomponents beecause the maajor principall component specifies the sharp deviatiion of the value andd the minor principal p com mponent speciffies the slightt deviation off the value. W We have calcu ulated the thresholdd value for the major princcipal componeent and minorr principal component for tthe normal daataset and have beenn compared with w all the data of the testeed dataset. Wh hen the numbeer of major prrincipal compo onents are greater thhan or equal too the number of minor princcipal compon nents, the detecction rate is hiigh and false alarm rate is low (reef. Fig. 1 & Taable 3). 7. Conclu usions In this paaper, we have developed an n intrusion dettection system m using princip pal component nt analysis (PC CA) which has beenn implementedd for HIDS on o the basis of performan nce log. Our experimental results show w that our proposedd system proviides detection n rate in the raange of 90% t0 97.5% and false alarm raate in the rang ge of 2.5% to 7.5%, which are suppposed to fairlly good detecttion and false alarm rates. 8. Acknoowledgement The authors express thheir sincere thaanks to Prof. S S. Chand for his h invaluable comments annd suggestionss.

ISSN : 0975-3397

Vol. 4 No. 09 Sep 2012

1631

Hari Om et al. / International Journal on Computer Science and Engineering (IJCSE)

Referencces [1] [2] [3] [4] [5]

[6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18]

M.L . Shyu, S.C. Chhen, K. Sarinnap pakorn, and L. C Chang, “A noveel anomaly detec ction scheme bassed on principall component classiifier,” IEEE Founndations and New w Directions of D Data Mining Work kshop, Nov. 2003, pp. 172–179. D.E. D Denning, “An Inttrusion Detection n Model,” IEEE T Trans. on Softwarre Engineering, SE E-13, No. 2, 19877, pp. 222-232. D.M. Hawkins, “The Detection of Errrors in Multivarriate Data Using Principal Components,” Journall of the Americaan Statistical Assocciation, Vol. 69, No. N 346, 1974, pp p. 340-344. N.Ye and Q. Chen, “A An Anomaly Detection Techniquue Based on a Ch hi –Square Statisstic for Detectingg Intrusions into Information System ms,” Quality andd Reliability Eng. Int’l, Vol. 17, Noo. 2, 2001, pp. 10 05-112. J. MccHugh, “Testingg Intrusion Detection Systems: A Critique of th he 1998 and 19 999 DARPA IIntrusion Detection System Evaluuations as Perform med by Lincoln Laboratory,” L ACM M Transactions on o Information and a System Securrity, Vol. 3, No. 4, 2000, pp. 262-2294. N.J. P Puketza, K. Zhanng, B. Mukherjeee and R.A. Olssoon, “Testing Intrrusion Detection Systems: Designn Methodologies and Results from an Early Protottype,” Proc. 17thh Natrional Compputer Security Co onference, Vol. 1,, Oct. 1994, pp.1--10. M. Sabri, Md. Norwawi, and K. Seman, “Identify fying False Alarm m Rates for Intru usion Detection System with Daata Mining,” F.N.M Internnational Journal of o Computer Scien nce and Networkk Security, Vol.11 1, No.4, Apr. 2011. Z.K. Baker and V.K. Prasanna, “Efficcient Hardware D Data Mining witth the Apriori Algorithm on FPG GAs,” In Proceed dings of the EE Sym. on Field d Programmable C Custom Computin ng Machines 2005. Thirteeenth Annual IEE J.D. JJobson, “Applied Multivariate Datta Analysis,” Voluume II: Categoriccal and Multivariate Methods. Sprringer- Verlag, N Y, 1992. I. T. JJolliffe, “Principaal Component An nalysis,” Springerr- Verlag, N Y, 20 002. Jungssuk Song, Hiroki Takakura and Yasuo Y Okabe, “A A proposal of new w benchmark data a to evaluate minning algorithms for f intrusion detecttion,” In 23rd Asia Pacific Advancced Networking M Meeting, 2007. Riley, “Intrusion N. Atthanasiades, R. Abler, A J. Levine, H. H Owen and G. R n detection testing g and benchmarkiing methodologiees,” In IEEE Internnational Informatiion Assurance Workshop, W 2003. H. Soong and J.W. Loockwood, “Efficieent packet classiification for netw work intrusion de etection using FPPGA,” Intl. Sym mp. On Field Progrrammable Gate Arrays, A Feb. 2005.. N. Yee, S.M. Emran, Q. Q Chen, and S. Vilbert, V “Multivaariate Statistical Analysis A of Audiit Trails for Hostt-Based Intrusion n Detection,” IEEE Trans. on Compuuters, Vol-51, No o.7, 2002. r R.A. Johnson, D.W. Wichern, W “Applieed Multivariate D Data Analysis,” 3rd Edition, Prentic ce-Hall, Inc., Engglewood Cliffs, N.J., N U.S.A., 1992.. H. Om m and T.K. Sarkkar, “Neural netw work based intruusion detection sy ystem for detectiing changes in hhardware profile,”” Journal of Discreete Mathematicall Sciences & Cryp ptography, Vol. 112, No. 4, 2009, pp. p 451-466. C.M. Chen, Y.L. Chenn, H.C. Lin, “An efficient e networkk intrusion detectiion,” Computer Communications, C Vol. 33, 2010, pp p 477-484. P. Caasas, J. Mazel, P. Owezarski, “Unssupervised Netwoork Intrusion Dettection Systems: Detecting D the Unnknown without Knowledge,” K Compputer Communicaations, Vol. 35, 20 012, pp. 772-7833.

Authors: Hari Om is presently w working as Assistant Professor in the deepartment of Computer nes, Dhanbad,, India. He earned e his Science & Engineeringg at Indian Scchool of Min Ph.D. in Computer C Scieence from Jaw waharlal Nehrru University,, New Delhi, India. He has publish hed more thaan 50 researcch papers in International and Nationall Journals including various v Transs. of IEEE, Springer, Elsev vier etc., Inteernational and d National conferencees of high reppute. His reseearch areas are Video-on-D Demand, Cryp ptography, Network seecurity, Data M Mining, and Image I Processsing.

Tanmoy Hazra compleeted his M.T Tech in Com mputer Appliccations, deparrtment of Computer Science S and E Engineering from fr Indian School S of Minnes, Dhanbad d in May, 2012. He is presently woorking as Asssistant Professor in the deepartment of Computer School of Tecchnology, Pun ne. His researcch interest is mainly m on Engineering at ISB & M S ms. Intrusion Deetection System

ISSN : 0975-3397

Vol. 4 No. 09 Sep 2012

1632

Suggest Documents