Feature Selection for Robust Detection of Distributed ... - Springer Link

Feature Selection for Robust Detection of Distributed Denial-of-Service Attacks Using Genetic Algorithms Gavrilis Dimitris1, Tsoulos Ioannis2, and Dermatas Evangelos1 1 Department

of Electrical Engineering and Computer Technology, University of Patras, Patras, Hellas 2 Department of Computer Science, University of Ioannina, Hellas

Abstract. In this paper we present a robust neural network detector for Distributed Denial-of-Service (DDoS) attacks in computers providing Internet services. A genetic algorithm is used to select a small number of efficient features from an extended set of 44 statistical features, which are estimated only from the packet headers. The genetic evaluation produces an error-free neural network DDoS detector using only 14 features. Moreover, the experimental results showed that the features that best qualify for DDoS detection are the SYN and URG flags, the probability of distinct Source Ports in each timeframe, the number of packets that use certain port ranges the TTL and the window size in each timeframe. Keywords: Genetic Algorithms, Neural Networks, Denial of Service.

1 Introduction In recent years there has been a sudden increase of DDoS attacks in computers providing Internet services [1,2,8,10,13]. Especially, after the year 2000 the DDoS attacks cost of losses come up to even billions of US dollars. Major commercial web sites have been disabled for several hours due to such attacks. A DDoS attack uses network flooding, but is harder to defend against because the attack is launched from hundreds or even thousands of hosts simultaneously. Rather than appearing as an excess of traffic coming from a single host, a DDoS attack appears instead as a normal traffic coming from a large number of hosts. This makes it harder to be identified and controlled [21]. Furthermore, continuous monitoring of a network domain for preventing DDoS attacks poses several challenges [17-20]. In high-speed networks real-time monitoring and detection of DDoS attacks cannot be implemented using a huge amount of data or complex pattern recognition methods. Extended studies in specific tools [2,7,8] have been published, and neural networks [3,4,5,9,12] have already been used to detect intrusions and DDoS attacks.

2 Neural Network DDoS Detector and Features Selection Taking into account that the introduction of network encryption technologies such as IPSec, renders the traditional Network Intrusion Detection Systems useless, we present a robust neural network based DDoS detector, where statistical features are estiG.A. Vouros and T. Panayiotopoulos (Eds.): SETN 2004, LNAI 3025, pp. 276–281, 2004. © Springer-Verlag Berlin Heidelberg 2004

Feature Selection for Robust Detection of Distributed Denial-of-Service Attacks

277

mated from non-encrypted data such as a network packet header. Moreover, in the direction of detecting the most efficient features, a genetic solution to the features selection problem is implemented. The proposed DDoS detector consists of three sequentially connected modules: • The Data Collector: A sniffer captures the appropriate data fields for each packet. The timestamp for each packet is also recorded in order to group the packets into timeframes. Sequential timeframes are also overlapping with each other. • Features estimator: The frequency of occurrences for various data encoded in the captured packet headers is estimated. • The Detector: The features vector is passed onto a two-layer feed-forward neural network that determines if an attack is in progress. The complete set of 44 statistical features estimated in each timeframe consists of statistical probabilities or distinct values normalized by the total number of frame packets transferred in the timeframe: • Features 1-5. The probabilities of the SYN, ACK, FIN, URG, RST flag to be raised. • Feature 6. The distinct SEQ values. • Features 7-8. The distinct values of the source and destination port. • Feature 9. The probability of the source port to be within the first 1024 values. • Features 10-25. The sixteen probabilities of the source port value in 1024-65535 divided in groups of 4032 ports. • Feature 26. The probability of the destination port to be within the first 1024 values. • Features 27-42. The sixteen probabilities of the destination port value in 102465535 divided in groups of 4032 ports. • Features 43. The distinct values of the window size. • Features 44. The distinct TTL values. From experiments, it is has been established that the nature of features plays an important role in the DDoS detection efficiency. In general, the optimum set of features remains an unsolved problem, but a sub-optimum solution can be obtained by a natural selection mechanism known as genetic algorithm [6]. A general description of genetic algorithms theory and application details can be found elsewhere [14-16]. In this paper the main variance of the genetic algorithms is implemented [16], where the chromosomes are selected using a tournament method. In the experiments, the mutation probability varies from 0.05-0.1 and the selection probability varies from 0.25-0.9. The mean square error between the DDoS detector output and the desired values is used as the genetic algorithm’s evaluation function.

3 Simulation Environment and Data Collection A computer network was used to gather information. The attacks were launched from an attack host using the Tribe Flood Network 2000 (TFN2k). The clients were simulated from a single host using the Web Application Stress Tool from Microsoft Corp. that sends HTTP requests on a web server using actual user profiles. The profiles were recorded from an actual user that browsed through the web server's contents. Each request is an actual session that takes into consideration time delays and follows links

278

Gavrilis Dimitris, Tsoulos Ioannis, and Dermatas Evangelos

on the server’s contents. The mean rate of the HTTP request is about 3169 in a time frame of 30 secs. The traffic was recorded using a sniffer placed on a monitoring host. It is possible that the sniffer could ’’miss’’ some packets but can be implemented easily without the use of special hardware or by reducing the network’s efficiency. Furthermore, it is a passive monitoring device that can reside in any system on a network. Different scenarios were created using normal traffic only, traffic produced only by the TFN2k and a combination of the above. More specifically, three types of traffic were recorded: • Normal traffic of 2400 connections for 5 minutes. • Pure DDoS traffic in TCP flooding mode for 5 minutes. • Combined traffic for 5 minutes from multiple clients; the DDoS attack is started after the first second and lasted for about 3 minutes. Then for another minute, the traffic is normal. A Linux based sniffer (developed using the popular libpcap library) was used to gather the data. From the data that were collected, the client’s SEQ number was replaced with a random one, one for each distinct connection, because the Web Application Stress Tool use only two real clients to simulate all the other clients. Therefore the original SEQ numbers produced by the tool were complete unreliable. This modification was verified from a great number of experiments carried out in the same network configuration. The maximum number of neurons in the input layer was 44. The number of neurons in the hidden layer varies from 1-3 and for each network configuration the features from a 4, 16 and 32 seconds timeframe window was established. The well-known BFGS optimization method is used to estimate the neural network weights.

4 Genetic Optimization The genetic algorithm is implemented in gnu C++ language and the experiments were carried out in a Linux cluster consisting of 48 computers. The genetic algorithm for the features selection is implemented as follows: 1 A population of 100 randomly defined chromosomes defines the initial generation. Each chromosome is 44-bits long. The selection (ps) and mutation (pm) probabilities were set. 2 The Genetic fitness of each chromosome is evaluated using the neural DDoS detector, after a proper training. The 44-bits chromosome controls the configuration of the feature vector used in the neural DDoS detector. Only the features with the activated bit are used to activate the detector. The neural network weights are estimated by minimization of the least-square-error for the set of training data using the BFGS optimization method. The Genetic fitness is estimated by the meansquare-error between the neural network output and the expected data in the testing set. The data in the test and the training set are mutually exclusive. 3 A selection procedure is applied to the population. The population is sorted according to the fitness of each chromosome. The worst fitting individuals ((1-ps)*number of individuals) are removed from the generation pool. The eliminated chromosomes are replaced in the crossover procedure. The result of this phase is a reduced set of chromosomes called mating pool.


279

4 The crossover procedure is applied to the mating pool, producing the new generation: Two chromosomes are selected from the mating pool with tournament selection, which is the fastest algorithm for selecting parents. Two offsprings from the selected chromosomes with one point crossover are produced. The crossover repeated until the chromosome pool is completed. 5 The mutation procedure is applied to the new generation chromosomes: For every bit in the generation pool, if a random number in the range of (0,1) is lower than the mutation probability, the corresponding bit is inverting. The steps 2-5 are repeated 1000 times. Table 1. The best Genetic fitness and the number of activated features for various neural network configurations, timeframe size, selection and mutation probabilities

1

Hidden Nodes 1

Selection Probability 0.50

Mutation Probability 0.10

2

1

0.90

0.10

3

2

0.50

0.05

4

2

0.50

0.10

5

2

0.90

0.10

6

3

0.25

0.05

7

3

0.25

0.10

8

3

0.50

0.05

9

3

0.90

0.05

10

3

0.90

0.10

Timeframe Size 4 16 32 0.0494 0.0002 0.01 21 27 25 0.0354 0.0007 0.01 30 26 19 0.000 0.0000 0.0 22 23 27 0.0000 0.0000 0.0 22 23 20 0.0000 0.0000 0.0 23 25 17 0.0000 0.0000 0.0 14 17 23 0.0000 0.0000 0.0 18 17 23 0.0000 0.0000 0.0 20 17 23 ~0 0.0000 0.0 23 17 23 0.0000 0.0000 0.0 22 17 23

Best fitness Active features Best fitness Active features Best fitness Active features Best fitness Active features Best fitness Active features Best fitness Active features Best fitness Active features Best fitness Active features Best fitness Active features Best fitness Active features

5 Experimental Results In table 1, the best Genetic fitness and the number of activated features for various selection and mutation probabilities are displayed. In all experiments if more than two neurons in the hidden layer are used, the genetic algorithm and the neural network training process produces a suitable features vector and an error-free DDoS neural detector. The minimum number of 14 active features was obtained in the case of three hidden neurons, selection and mutation probability settings in 0.25 and 0.05 correspondingly, and features estimation in 4 seconds timeframe. It is also shown that the selection and mutation probabilities do not influence the classification rate of the DDoS detector but lead to different features vector.

280

Gavrilis Dimitris, Tsoulos Ioannis, and Dermatas Evangelos

An objective definition of the best features set was a difficult task. In this direction the number of times where each feature was setting active in the set of the best fitting ten chromosomes for 4,16 and 32 seconds timeframes is showed in table 2. Table 2. Number of times a feature is active in the ten best chromosomes for various Timeframes. Window Size 4 16 32 Window Size 4 16 32 Window Size 4 16 32

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

8 5 7

8 6 1

7 3 2

3 10 9

8 4 6

5 3 2

6 9 7

5 5 2

8 1 1

1 2 1

2 6 0

5 4 7

5 4 6

5 5 7

6 6 8

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

7 9 8

6 7 9

5 2 4

1 4 6

3 7 5

1 4 6

5 4 9

2 10 7

7 10 7

5 4 9

7 3 10

4 1 2

7 3 3

4 10 9

6 3 2

31

32

33

34

35

36

37

38

39

40

41

42

43

44

6 7 3

7 6 3

5 6 0

5 1 3

6 4 8

1 8 1

8 1 4

5 3 8

5 1 3

5 4 9

5 3 7

4 8 7

3 2 2

3 1 3

In general and for all timeframe sizes, the experimental results produced by the genetic algorithm, showed that the SYN and URG Flag, the distinct values of the source and destination port, four probabilities of the groups from the upper set of source ports (features 16,17,23,24), and two probabilities of the groups for the destination ports (features 29 and 42) were used very frequently by the best ten chromosomes. On the other hand, the probability of the source port to be within the first 1024 values (feature 9), two probabilities of the groups from the upper set of source ports (features 10 and 18), eight probabilities of the groups for the destination ports (features 27,28,30,33,34,36,37,39), the distinct values for the window size, and the TTL distinct values are the less frequent features. From additional experiments that were carried out it is verified that SYN and URG flags do play significant role in the identification of those kinds of attacks, and also that TTL and Window size provide almost no information. The role of the source port classes was significantly reduced, because the Web Application Stress Tool did not simulate correctly the clients’ source port assignment. This fact was confirmed by further experiments with real clients.

References 1. Mell, P., Marks, D., McLarnon.: A denial-of-Service, Computer Networks. 34, (2000) 641. 2. Ditrich, S.: Analyzing Distributed Denial of Service Tools: The Shaft Case. Proc of the 14th Systems Administration Conference-LISA 2000, New Orleans, USA, (2000) 329-339. 3. J. Ryan, M.J. Lin, R. Miikkulainen, "Intrusion Detection with Neural Networks", in: Advances in Neural Information Processing Systems 10, M. Jordan et al., Eds., Cambridge, MA: MIT Press, 1998 pp. 943-949.


281

4. Mukkamala, S., Janoski, G., Sung, A.: Intrusion Detection using Neural Networks and Support Vector Machines. Proc. IJCNN, 2 (2002) 1702-1707. 5. Bonifacio, J., Casian, A., CPLF de Carvalho, A., Moreira E.: Neural Networks Applied in Intrusion Detection Systems. Proc. Word Congress on Computational Intelligence - WCCI, Anchorage, USA, (1998) 205-210. 6. Helmer, G., Wong, J., Honavar, V., Miller, L.: Feature Selection Using a Genetic Algorithm for Intrusion Detection. Proceedings of the Genetic and Evolutionary Computation Conference, 2, (1999) 1781. 7. Chen, Y.W.: Study on the prevention of SYN flooding by using traffic policing. IEEE Symposium on Network Operations and Management (2000) 593-604. 8. Schuba, C., Krsul, I., Kuhn, M., Spafford, E., Sundaram, A., Zamboni, D.: Analysis of a denial-of-service attack on TCP. Proc. IEEE Computer Society Symposium on Research in Security and Privacy, USA, (1997) 208-223. 9. Lippmann, R., Cunnigham, R.: Improving intrusion detection performance using Keyword selection and neural networks, Computer Networks, 34 (2000) 596-603. 10. Lau, F., Rubin, S., Smith, M., Trajkovic, L.: Distributed denail-of-service attacks. Proc. IEEE Inter. Conference on Systems, Man and Cybernetics, 3 (2000) 2275-2280. 11. Cabrera, J., Ravichandran, B., Mehra, R.: Statistical Traffic Modeling for network intrusion detection. IEEE Inter. Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (2000) 466-473. 12. Bivens, A., Palagiri, C., Smith, R., Szymanski, B., and Embrechts M.: Network-Based Intrusion Detection using Neural Networks. Artificial Neural Networks In Engineering Nov. 10-13, St. Louis, Missouri, (2002). 13. Narayanaswamy, K., Ross, T., Spinney, B., Paquette, M., Wright, C.: System and process for defending against denial of service attacks on network nodes. Patent WO0219661, Top Layer Networks Inc. (USA), (2002). 14. Fletcher, R.: Practical methods of optimization. John Wiley & Sons (1980) 38-45. 15. Back, T., Schwefel, H.: An overview of evolutionary algorithms for parameter optimization, Evolutionary Computation, 1 (1993) 1-23. 16. Goldberg, D.: Genetic algorithms in Search, Optimization and Machine Learning. AddisonWesley, Reading, Massachusetts, (1989). 17. Branch, J., Bivens, A., Chan, C., Lee, T., Szymanski, B.: Denial of Service Intrusion Detection Using Time-Dependent Finite Automata, http://www.cs.rpi.edu/~brancj/ research.htm. 18. Cox, D., McClanahan, K.: Method for Blocking Denial of Service and Address spoofing attacks on a private network. Patent WO9948303, Cisco Tech Ind (USA), (1999). 19. Belissent, J.: Method and apparatus for preventing a denial of service (DOS) attack by selectively throttling TCP/IP requests. Patent WO0201834, Sun Microsystems Inc (USA), (2002). 20. Maher, R., Bennett V.: Method for preventing denial of service attacks. Patent WO0203084, Netrake Corp (USA), (2002). 21. Scwartau W.: Surviving denial-of-service. Computers & Security, 18, (1999) 124-133.

Feature Selection for Robust Detection of Distributed ... - Springer Link

Feature Selection for Robust Detection of Distributed ... - Springer Link

Suggest Documents

Unsupervised Distributed Feature Selection for

Robust tracking-by-detection using a selection and ... - Springer Link

Robust Object Detection Using Fast Feature Selection from Huge ...

Discriminative Feature Selection via Multiclass ... - Springer Link

Research Article Robust Feature Detection for Facial

Robust Feature Detection for Facial Expression Recognition

Feature Selection for Modeling Intrusion Detection - CiteSeerX

Heuristic Feature Selection for Clickbait Detection

Robust detection of outliers for projection-based face ... - Springer Link

Feature selection for content-based image retrieval - Springer Link

Feature Selection Based on Information Theory for ... - Springer Link

Hybrid Feature Selection Method for Supervised ... - Springer Link

IFMBE Proceedings 49 - Genetic Feature Selection for ... - Springer Link

A Probabilistic Approach to Feature Selection for Multi ... - Springer Link

Feature Selection with Single-Layer Perceptrons for a ... - Springer Link

Robust architecture for distributed intelligence in an IP ... - Springer Link

Merging microarray data, robust feature selection ... - BioMedSearch

Robust Feature Selection by Mutual Information Distributions

Robust Feature Selection by Mutual Information Distributions

Unsupervised Feature Selection via Distributed ... - ICSI, Berkeley

Feature selection of gas chromatography/mass ... - Springer Link

Feature Selection of Facial Displays for Detection of Non Verbal

Rectified binaural ratio: A complex T-distributed feature for robust ...

Jointly Feature Learning and Selection for Robust Tracking via ... - PLOS