Document not found! Please try again

Very Fast Intrusion Detection by Multivariate Linear Regression

7 downloads 0 Views 71KB Size Report
In this paper we investigate its potential as means of intrusion detection. We demonstrate that intrusion detectors constructed by multivariate linear regression.
Very Fast Intrusion Detection by Multivariate Linear Regression Jonathan P. Bernick Department of Computer Science Coastal Carolina University Conway, SC 29526 (843) 349-2098

[email protected]

ABSTRACT

2. THEORY

Multivariate regression is used in a wide variety of fields as a modeling and classification tool. In this paper we investigate its potential as means of intrusion detection. We demonstrate that intrusion detectors constructed by multivariate linear regression can achieve high accuracy with very short training and testing times, and conclude with a discussion of future research directions.

Multivariate regression has shown considerable potential as a pattern classification tool due its accuracy, very fast training and testing times, and robustness in the face of very large sets of noisy training data and repeated training vectors [1]. For the purposes of this paper we will concern ourselves exclusively with binary pattern classifiers constructed by multivariate linear regression.

Categories and Subject Descriptors G.m [Mathematics of Computing]: Miscellaneous; I2.6 [Computing Methodologies]: Artificial Intelligence - Learning

2.1 Definition A binary pattern classifier constructed with multivariate linear regression is a mapping ℜM→ℜ, consisting of a vector a = [ao … aM] such that, given an input M-vector x = [x1,…, xM ] and two classes C1 and C2,

General Terms Algorithms, Security.

Keywords Computer security, machine learning, network security, neural network, support-vector machine.

x ∈ C1

• p,

x ∈ C2

< p,

(1)

where p is a real threshold value usually set to 0.5, and u = [1, x1,…, xM ].

2.2 Construction 1. INTRODUCTION Multivariate regression has been used as modeling and classification tool in a wide variety of fields, including but not limited to statistics [14], biology [11], chemistry [3], medicine [5], business [2], and numerous other areas of knowledge. Recent work [1][16] shows that multivariate regressors can be used as learning machines, and thereby construct extremely accurate pattern classifiers. Given the success of pattern classification-based intrusion detection implemented with learning machines such as neural networks and support-vector machines [12], it is reasonable to investigate the capabilities of multivariate regressors as intrusion detectors.

The multivariate linear regressor is constructed using the method of the pseudoinverse [15]. Consider the case of training a multivariate linear regressor using a set of K training pairs of the form (x, y), each of which is contained in exactly one class V0 or V1, where y is the index of the class of x, and xj = [x1,…,xM], and has a positive scalar weight wi indicating its relative importance to the solution Let xj denote the jth training vector and yj the class of xj; if M < K we may define

uj = [1, x1,…, xM]

(2)

and T

U = [u1,…,uK] ,

(3)

we may then form the linear system

WUa = Wb, Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Computer Security Conference 2007 , April 11-13, 2007, Myrtle Beach, SC, USA. Copyright 2007 CSC 2007

where

(4)

Table 1. Attributes of the KDD Cup 1999 data, and their domains. Attribute

Domain

Attribute

Domain

1.

duration

continuous

22. is_guest_login

binary

2.

protocol_type

discrete: 3 values

23. count

continuous

3.

service

discrete: 70 values

24. srv_count

continuous

4.

flag

discrete: 11 values

25. serror_rate

continuous

5.

src_bytes

continuous

26. srv_serror_rate

continuous

6.

dst_bytes

continuous

27. rerror_rate

continuous

7.

land

binary

28. srv_rerror_rate

continuous

8.

wrong_fragment

continuous

29. same_srv_rate

continuous

9.

urgent

continuous

30. diff_srv_rate

continuous

10. hot

continuous

31. srv_diff_host_rate

continuous

11. num_failed_logins

continuous

32. dst_host_count

continuous

12. logged_in

binary

33. dst_host_srv_count

continuous

13. num_compromised

continuous

34. dst_host_same_srv_rate

continuous

14. root_shell

continuous

35. dst_host_diff_srv_rate

continuous

15. su_attempted

continuous

36. dst_host_same_src_port_rate

continuous

16. num_root

continuous

37. dst_host_srv_diff_host_rate

continuous

17. num_file_creations

continuous

38. dst_host_serror_rate

continuous

18. num_shells

continuous

39. dst_host_srv_serror_rate

continuous

19. num_access_files

continuous

40. dst_host_rerror_rate

continuous

20. num_outbound_cmds

continuous

41. dst_host_srv_rerror_rate

continuous

21. is_host_login

binary

b = [b1,…,bK] ,

(5)

0 u i ∈V0 , bi =  1 u i ∈ V1

-1

with V0 and V1 denote the smaller and larger classes respectively, allowing (6) to be rewritten as

T

a = (U WU) U Wb,

(6)

-1

T

a =M U c.

where W is K-by-K diagonal matrix of weights. If we make the simplifying assumption that all training vectors in the same class have the same weight, and that the weights of all the training vectors in the class with the most training vectors are equal to 1, we may then define matrices U1 and U2 denote those rows of U contained in the larger and smaller classes respectively; if we then define T

H 1 = U1 U1

(7)

and T

H 2 = U 2 U 2,

(8)

then simple algebra shows that T

U WU = H1 + wH2 ≡ M, where w is a scalar weight, and (5) is replaced by

(10)

w ui ∈ V0 , ci =   0 u i ∈ V1

and which has the solution [15] T

c = [c1,…,cK],

(9)

(11)

Since H1 and H2 are invariant with respect to w, (11) may be used in the place of (6) for multiple weightings, greatly increasing the speed with which the value of a is computed.

3. EXPERIMENTAL CONDITIONS 3.1 Data Our experiments were performed using the KDD Cup 1999 data [6]. Since the full set of training data was too large (4,898,432 vectors) for our computational resources, a representative set of 494,022 training vectors was used [8], while the 311,029-vector testing data set [7] was used unabridged. Each training/ testing vector consisted of 41 attributes [9]; these attributes and their domains may be found in Table 1. The data was normalized as follows:



Attributes 20 and 21 were found to have value 0 for all training vectors, and were thus removed from the data, resulting in 39 retained attributes.



Attributes 5 and 6 were rescaled by the formula xi_new = (log10[1+xi_old])/(log10[1+max(xi)]).



All numerical attributes were rescaled by the formula xi_new = (xi_old - min(xi))/(max(xi) - min(xi)).



All discrete attributes had their symbols replaced by the probabilities P(vector is an attack | symbol appears in vector as value for that attribute).

In all cases, formulae were derived from the training data only. Each vector represented a process request for a local area network (LAN), which could be either a normal request or an attack. Attacks fell into one of five categories [6][10]: •

DOS: denial-of-service.



R2L: unauthorized access from a remote machine.



U2R: unauthorized access to local root privileges.



probing: surveillance and other probing.



Unclassified: Any attack not documented as being in one of the above categories.

Table 2. Distribution of training and testing vectors by category Training

4. RESULTS AND DISCUSSION 4.1 Results In our first set of experiments we used the full training data set – including duplicated training vectors – to train our intrusion detectors. Representative results from these experiments are given in Table 4. Table 4. Intrusion detector performance, full training data set 12

13

17

39

Training Time (sec.):

23.062

27.656

32.187

103.046

Testing Time (sec.):

0.031

0.062

0.063

0.140

w:

0.0326

0.0647

0.0640

0.0109

Overall Accuracy:

93.455%

95.093%

95.284%

95.156%

False Negatives:

0.251%

4.807%

0.250%

3.408%

False Positives:

6.339%

0.100%

4.576%

1.446%

Normal:

67.46%

93.49%

76.51%

93.58%

Test

Normal:

100229

60593

DOS:

387685

223298

probing:

4701

2377

R2L:

1149

5993

Accuracy by class DOS:

100.0%

99.96%

100.0%

99.995%

probing:

96.76%

99.83%

95.29%

99.87%

U2R:

52

39

Unclassified:

206

18729

R2L:

98.88%

99.58%

97.18%

88.92%

Total:

494022

311029

U2R:

79.49%

89.74%

82.05%

97.44%

Unclassified:

97.39%

40.27%

99.23%

43.95%

In addition, experiments were run using only subsets of the attributes mentioned above, which were reported in the literature as offering performance similar to the original attribute set. These subsets consisted of 12 [4], 13 [12], and 17 [4] attributes respectively, and are detailed in Table 3. Table 3. Attributes used in subsets. Attribute numbers refer to Table 1 # of Attributes

All experiments were run on a Hewlett-Packard Pavilion dv4000 laptop computer with a 1.50 GHz Intel Celeron CPU and 512 Mbytes of RAM. The experimental software was written in C++ by the author, and compiled on Microsoft Visual C++ Version 6.0.

Attributes:

The distribution of the training and testing vectors among these categories is given in Table 2.

Class of Vector

3.2 Hardware and Software

Attribute Numbers

12

3,5,6,12,23,24,25,27,31,32,33,35

13

1,2,3,5,6,9,23,24,29,32,33,34,36

17

1,2,3,5,7,8,11,12,14,17,22,23,24,25,26,30,32

It is clear from results given in Table 4 that multivariate linear regressors, although not performing quite as well as some learning machines [12], are a potentially useful tool for intrusion detection. Of special interest are the variations displayed in the false positive, false negative, and by-class detection rates; these values suggest that it should be possible to tailor a multivariate regressor to detect a specific class of intrusions or to minimize either false positives or false negatives, allowing the construction of ensemble detectors. In a second set of experiments we tried training our detectors with duplicated training vectors removed from the training data. Representative results from these experiments are given in Table 5.

Table 5. Intrusion detector performance, duplicate training vectors removed Attributes:

12

13

17

39

Training Vectors:

108175

107687

91748

104822

Training Time (sec.):

12.125

14.125

13.75

42.735

Testing Time (sec.):

0.046

0.062

0.062

0.125

w:

0.0119

0.0126

0.0176

0.00587

92.038%

93.810%

93.021%

92.948%

False Negatives:

1.171%

4.623%

1.242%

1.182%

False Positives:

6.891%

1.567%

5.737%

5.870%

Overall Accuracy:

In contrast to many other learning machines, removing the duplicated vectors has clearly degraded the overall performance of the multivariate linear regressors. We believe this effect to be inherent in the nature of the multivariate linear regressor; where duplicating a vector has the same effect as including it the training data singly with a doubled weight. By removing the duplicates the training set was rendered unrepresentative, and the classifications performed with the multivariate linear regressors trained thereby were thus skewed.

4.2 Discussion The speed of testing displayed by multivariate linear regressionbased intrusion detectors is noteworthy. Given that examining a vector requires fewer than two floating-point arithmetic operations per attribute, a multivariate linear regression-based intrusion detector would be ideally suited for a high-speed, highvolume network, perhaps as an initial screening to refer suspect vectors to other, slower (but more detailed) intrusion-detection processes. Increasing the classification accuracy is of primary interest if multivariate linear regression-based intrusion detectors are to be practical. It is hypothesized based on previous patternclassification experiments [1] that using nonlinear, rather than linear, regression would considerably increase detection accuracy with only a small increase in test time, and we recommend further investigation in this direction.

5. CONCLUSIONS In this paper we have examined the performance of intrusion detector constructed by multivariate linear regression. We have demonstrated that such intrusion detectors can attain high accuracy and very high testing speeds, and suggested ways in which accuracy may be further improved. We conclude that intrusion detectors created by multivariate regression are wellsuited for applications where high-speed detection is a necessity, and recommend further investigation and study.

6. REFERENCES [1] BERNICK, J., “Pattern Classification with Polynomial Learning Machines,” Proceedings of The 2006 International

Conference on Machine Learning; Models, Technologies and Applications, June 26-29, Las Vegas, Nevada, USA, 2006. [2] BINDER, J., “Measuring the Effects of Regulation with Stock Price Data,” The RAND Journal of Economics, Vol. 16, No. 2, pp. 167-183, 1985. [3] BURNHAM, A. J., VIVEROS, R., AND MACGREGOR, J. F., “Frameworks for latent variable multivariate regression,” Journal of Chemometrics, Vol. 10, No. 1, pp. 31-45, 1996. [4] CHEBROLU S, ABRAHAM A, AND THOMAS J., “Feature Deduction and Ensemble Design of Intrusion Detection Systems,” Computers and Security, Vol. 24/4. Amsterdam: Elsevier, pp. 295–307, 2005. [5] FALLOWFIELD, L., GAGNON, D., ET. AL., “Multivariate regression analyses of data from a randomised, double-blind, placebo-controlled study confirm quality of life benefit of epoetin alfa in patients receiving non-platinum chemotherapy,” British Journal of Cancer, Vol. 87, No. 12, pp. 1341-1353, 2002. [6] http://kdd.org/kddcup/index.php?section=1999&method=dat a [7] http://kdd.org/kddcup/site/1999/files/corrected.zip [8] http://kdd.org/kddcup/site/1999/files/kddcup.data_10_perce nt.zip [9] http://kdd.org/kddcup/site/1999/files/kddcup.names [10] http://kdd.org/kddcup/site/1999/files/training_attack_types [11] MONTIERO, L. M., “Multivariate Regression Models and Geometric Morphometrics: The Search for Causal Factors in the Analysis of Shape,” Systematic Biology, Volume 48, Number 1, pp. 192-199, 1999. [12] MUKKAMALA, S., AND SUNG, A., “Intrusion Detection: Support Vector Machines and Neural Networks,” Proc. IEEE IJCNN, 2002. [13] RAWAT, S., PUJARI, A., AND GULATI, V., “On the Use of Singular Value Decomposition for a Fast Intrusion Detection System,” Electr. Notes Theor. Comput. Sci. 142, pp. 215-228, 2006. [14] ROBINS, J. M., AND ROTNITZKY, A., “Semiparametric Efficiency in Multivariate Regression Models with Missing Data,” Journal of the American Statistical Association, Vol. 90, 1995. [15] STRANG, G., Linear Algebra and Its Applications, Academic Press, Inc., New York, 1980. [16] TOH, K., TRAN, Q., AND SRININASAN, D., “Benchmarking a Reduced Multivariate Polynomial Pattern Classifier,” IEEE Transactions On Pattern Analysis And Machine Intelligence, Vol. 26, No. 6, June 2004.