Estimating Strength of a DDoS Attack Using Multiple Regression ...

Estimating Strength of a DDoS Attack Using Multiple Regression Analysis B.B. Gupta1, P.K. Agrawal2, R.C. Joshi1, and Manoj Misra1 1

Department of Electronics and Computer Engineering, Indian Institute of Technology Roorkee, Roorkee, India {brijgdec,rcjoshfec,manojfec}@iitr.ernet.in 2 Department of computer Engineering, Netaji Subhas Institute of Technology, New Delhi, India [email protected]

Abstract. Anomaly based DDoS detection systems construct profile of the traffic normally seen in the network, and identify anomalies whenever traffic deviate from normal profile beyond a threshold. This extend of deviation is normally not utilized. This paper reports the evaluation results of proposed approach that utilizes this extend of deviation from detection threshold, to estimate strength of DDoS attack using multiple regression model. A relationship is established between strength of DDoS attacks and observed deviation in sample entropy. Various statistical performance measures, such as Coefficient of determination (R2), Coefficient of Correlation (CC), Sum of Square Error (SSE), Mean Square Error (MSE), Root Mean Square Error (RMSE), Normalized Mean square Error (NMSE), Nash–Sutcliffe Efficiency Index ( η ) and Mean Absolute Error (MAE) are used to measure the performance of the regression model. Internet type topologies used for simulation are generated using Transit-Stub model of GT-ITM topology generator. NS-2 network simulator on Linux platform is used as simulation test bed for launching DDoS attacks with varied attack strengths. The simulation results are promising as we are able to estimate strength of DDoS attack efficiently with very less error rate using multiple regression model. Keywords: DDoS attack, Intrusion detection, Multiple regression, Zombies, Entropy.

1 Introduction Denial of service (DoS) attacks and more particularly the distributed ones (DDoS) are one of the latest threat and pose a grave danger to users, organizations and infrastructures of the Internet. A DDoS attacker attempts to disrupt a target, in most cases a web server, by flooding it with illegitimate packets generated from a large number of zombies, usurping its bandwidth and overtaxing it to prevent legitimate inquiries from getting through [1,2]. Anomaly based DDoS detection systems construct profile of the traffic normally seen in the network, and identify anomalies whenever traffic deviate from normal profile beyond a threshold [3]. This extend of deviation is normally not utilized. Therefore, we use multiple regression [4,5] based approach that utilizes this extend of deviation from detection threshold to estimate strength of DDoS N. Meghanathan et al. (Eds.): CCSIT 2011, Part III, CCIS 133, pp. 280–289, 2011. © Springer-Verlag Berlin Heidelberg 2011

Estimating Strength of a DDoS Attack Using Multiple Regression Analysis

281

attack. A real time estimation of strength of DDoS attack is helpful to suppress the effect of attack by choosing predicted strength of DDoS attack of most suspicious attack sources for either filtering or rate limiting. We have assumed that header information of out going packets is not spoofed. Moore et. al [6] have already made a similar kind of attempt, in which they have used backscatter analysis to estimate number of spoofed addresses involved in DDoS attack. This is an offline analysis based on unsolicited responses. Our objective is to find the relationship between strength of DDoS attack and deviation in sample entropy. In order to estimate strength of DDoS attack, multiple regression model is used. To measure the performance of the proposed approach, we have calculated various statistical performance measures i.e. R2, CC, SSE, MSE, RMSE, NMSE, η , MAE and residual error. Internet type topologies used for simulation are generated using Transit-Stub model of GT-ITM topology generator [7]. NS-2 network simulator [8] on Linux platform is used as simulation test bed for launching DDoS attacks with varied attack strength. In our simulation experiments, total number of zombies is fixed to 100 and attack traffic rate range between 10Mbps and 100Mbps; therefore, mean attack rate per zombie is varied from 0.1Mbps to 1Mbps. The remainder of the paper is organized as follows. Section 2 contains overview of multiple regression model. Section 3 presents various statistical performance measures. Intended detection scheme are described in section 4. Section 5 describes experimental setup and performance analysis in details. Model development is presented in section 6. Section 7 contains simulation results and discussion. Finally, Section 8 concludes the paper.

2 Multiple Regression Model Regression analysis [9,10] is a statistical tool for the investigation of relationships between variables. Usually, the investigator seeks to ascertain the causal effect of one variable upon another. More specifically, regression analysis helps us to understand how the typical value of the dependent variable changes when any one of the independent variables is varied, while the other independent variables are held constant. Variables which are used to ‘explain’, other variables are called explanatory variables. Variable which is explained is called response variable. A response variable is also called a dependent variable, and an explanatory variable is sometime called an independent variable, or a predictor, or repressor. When there is only one explanatory variable the regression model is called a simple regression, whereas if there are more than one explanatory variable the regression model is called multiple regression. The general purpose of multiple regression [4,5] (the term was first used by Pearson, 1908) is to learn more about the relationship between several independent or predictor variables and a dependent or criterion variable. In the multivariate case, when there is more than one independent variable, the regression line cannot be visualized in the two dimensional space, but can be computed just as easily. In general form of multiple regression given in eq. 1, there are p independent variables:

Yi = Yˆi + ε i Yˆi = β0 + β1 X1i + β2 X 2i + ...... + β p X pi

(1)

282

B.B. Gupta et al.

where -Y is dependent variable -X1, X2,……, Xp are p independent variables - β 0 is intercept -

β1 , β 2 ,…., β p are the coefficients of p independent variables ε is regression residual

Input and Output In multiple regression model, a relationship is developed between strength of DDoS attack Y (output) and observed deviation in sample volume X1 and flow X2 as input. Here X1 is equal to ( X in (t ) − X n* (t ) ) and X2 is equal to ( Fin (t ) − Fn* (t ) ). Our proposed regression based approach utilizes these deviations in volume X1 and flow X2 to estimate strength of DDoS attack.

3 Statistic Performance Measures The different statistical parameters are adjusted during calibration to get the best statistical agreement between observed and simulated variables. For this purpose, various performance measures, such as Coefficient of Determination (R2), Coefficient of Correlation (CC), Standard Error of Estimate (SSE), Mean Square Error (MSE), Root Mean Square Error (RMSE), Normalized Mean square Error (NMSE), Nash–Sutcliffe Efficiency Index ( η ) and Mean Absolute Error (MAE) are used to measure the performance of the proposed regression model. These measures are defined below: i). Coefficient of Determination (R2): Coefficient of determination (R2) is a descriptive measure of the strength of the regression relationship, a measure how well the regression line fit to the data. R2 is the proportion of variance in dependent variable which can be predicted from independent variable.

R2 =

⎛ N ⎞ ⎜ ∑ (Yo − Yo )(Yc − Yc ) ⎟ ⎝ i =1 ⎠

2

N ⎡ N 2 2⎤ ⎢ ∑ (Y o − Y o ) ⋅ ∑ ( Y c − Y c ) ⎥ i =1 ⎣ i =1 ⎦

(2)

ii). Coefficient of Correlation (CC): The Coefficient of Correlation (CC) can be defined as: N

CC =

∑ (Y i =1

o

− Yo )(Yc − Yc )

N ⎡ N 2 2⎤ ⎢ ∑ ( Yo − Yo ) ⋅ ∑ ( Yc − Yc ) ⎥ i =1 ⎣ i =1 ⎦

1/ 2

(3)


283

iii). Sum of Squared Errors (SSE): The Sum of Squared Errors (SSE) can be defined as: N

SSE = ∑ (Yo − Yc ) 2

(4)

i =1

iv). Mean Square Error (MSE): The Mean Square Error (MSE) between observed and computed outputs can be defined as: N

MSE =

∑ (Y

− Yo ) 2

c

i =1

(5)

N

v). Root Mean Square Error (RMSE): The Root Mean Square Error (RMSE) between observed and computed outputs can be defined as: N

∑ (Y

RMSE =

− Yo ) 2

c

i =1

(6)

N

vi). Normalized Mean Square Error (NMSE): The Normalized Mean Square Error (NMSE) between observed and computed outputs can be defined as:

NMSE =

N

1 N

∑ (Y i =1

− Yo ) 2

c

σ

(7)

2 obs

vii). Nash–Sutcliffe efficiency index ( η ): The Nash–Sutcliffe efficiency index (η) can be defined as: N

η = 1−

∑ (Y

− Yo ) 2

∑ (Y

− Yo )

c

i =1 N

o

i =1

(8) 2

viii). Mean absolute error (MAE): Mean absolute error (MAE) can be defined as follows: N

MAE = 1 −

∑ |Y i =1 N

c

∑ |Y i =1

o

− Yo | (9)

− Yo |

284

B.B. Gupta et al.

where N represents the number of feature vectors prepared,

Yo and Yc denote the

observed and the computed/simulated values of dependent variable respectively, and

2 σ obs

Yo

are the mean and the standard deviation of the observed dependent variable

respectively.

4 Detection of Attack Here, we will discuss proposed detection system that is part of access router or can belong to separate unit that interact with access router to detect attack traffic. A newly designed flow-volume based approach (FVBA) [11] is used to construct profile of the traffic normally seen in the network, and identify anomalies whenever traffic goes out of profile. In FVBA, two statistical measures namely volume and flow are used for profile construction. Fig. 1 depicts the FVBA architecture, where X in (t ) represents total traffic arriving at the target machine in Δ time duration, when system is under attacks. X in (t ) can be expressed as follows:

X in (t ) = X n* (t ) + Xˆ& (t ) , where,

(10)

X n* (t ) and Xˆ& (t ) are the components of the normal and attack traffic respec-

X in (t ) - X n* (t ) using above equation can be used for detection purpose. To set normal profile, consider a random process { X (t ) , t = ω Δ , ω ∈ N }, where

tively.

Δ is a constant time interval, N is the set of positive integers, and for each t, X (t) is a random variable. 1 ≤ ω ≤ l , l is the number of time intervals. Here X (t) represents the total traffic volume in {t − Δ , t} time interval. X (t) is calculated during time interval {t − Δ , t} as follows: Nf

X (t ) = ∑ ni , i = 1, 2....N f

(11)

i =1

where ni represents total number of bytes arrivals for a flow i in {t − Δ , t} time duration and N f represents total number of flows. We take average of X (t) and designate *

that as X n (t ) normal traffic Volume. Similarly value of flow metric is calculated and *

designates that as Fn (t ) . Here total bytes, not packets, are used to calculate volume metric, because it provides more accuracy, as different flows can contain packets of different sizes.


285

Volume Measure

X in ( t )

Distance Detector

ξ

Threshold Detector

ξth ,ξ thL

X n* ( t )

DDo S Attacks De cision

Flow Measure

Fin ( t ) Distance Detector

ς

Attacks Alert

Threshold Detector

ς th

Fn* ( t )

Fig. 1. FVBA architecture

To detect the attack, the value of volume metric X in (t ) and flow metric

Fin (t ) is

calculated in time window Δ continuously; whenever there is appreciable deviation *

*

from X n (t ) and Fn (t ) , various types of attacks are detected. *

X in (t ) , Fin (t )

*

and X n (t ) , Fn (t ) gives values of volume and flow at the time of detection of attack and volume and flow values for the normal profile respectively.

5 Experimental Setup and Performance Analysis In this section, we evaluate our proposed scheme using simulations. The simulations are carried out using NS2 network simulator [8]. We show that false positives and false negatives (or various error rates) triggered by our scheme are very less. This implies that profiles built are reasonably stable and are able to estimate strength of DDoS attack correctly. Simulation Environment Real-world Internet type topologies generated using Transit-Stub model of GT-ITM topology generator [7] are used to test our proposed scheme, where transit domains are treated as different Internet Service Provider (ISP) networks i.e. Autonomous Systems (AS). For simulations, we use ISP level topology, which contains four transit domains with each domain containing twelve transit nodes i.e. transit routers. All the four transit domains have two peer links at transit nodes with adjacent transit domains. Remaining ten transit nodes are connected to ten stub domain, one stub domain per transit node. Stub domains are used to connect transit domains with customer domains, as each stub domain contains a customer domain with ten legitimate client machines. So total of four hundred legitimate client machines are used to generate background traffic. Total zombie machines are fixed to 100 to generate attack traffic. Transit domain four contains the server machine to be attacked by zombie machines. The legitimate clients are TCP agents that request files of size 1 Mbps with request inter-arrival times drawn from a Poisson distribution. The attackers are modeled by UDP agents. A UDP connection is used instead of a TCP one because in a practical

286

B.B. Gupta et al.

attack flow, the attacker would normally never follow the basic rules of TCP, i.e. waiting for ACK packets before the next window of outstanding packets can be sent, etc. The attack traffic rate range between 10Mbps and 100Mbps; therefore, mean attack rate per zombie is varied from 0.1Mbps to 1Mbps. In our experiments, the monitoring time window was set to 200 ms, as the typical domestic Internet RTT is around 100 ms and the average global Internet RTT is 140 ms [12]. Total false positive alarms are minimum with high detection rate using this value of monitoring window. The simulations are repeated and different attack scenarios are compared by varying total attack strengths using fixed number of zombies. Table 1. Deviation in volume and flow with DDoS attack strength

Attack Strength (Y)

Deviation in volume (X1) ( X in (t ) − X n* (t ) )

Deviation in Flow (X2) ( Fin (t ) − Fn* (t ) )

10M 15M 20M 25M 30M 35M 40M 45M 50M 55M 60M 65M 70M 75M 80M 85M 90M 95M 100M

90855.56 109515.24 133721.59 143495.87 146886.67 144870.16 156592.38 160320.63 213209.52 178804.44 181885.71 187367.94 199750.48 209413.33 219707.62 227447.30 227771.75 249654.60 269721.59

59.96 59.13 59.37 58.81 59.10 58.28 57.23 58.67 56.64 57.58 57.61 56.24 57.48 57.25 56.82 53.72 55.23 54.71 53.09

6 Model Development In order to estimate strength of DDoS attack ( Yˆ ) from deviation ( X in (t ) − X n* (t ) ) in volume and ( Fin (t ) − Fn* (t ) ) in flow values, simulation experiments are done at varied attack strengths which are range between 10Mbps and 100Mbps and total number of zombies is fixed to 100; therefore, mean attack rate per zombie is varied from


287

0.1Mbps to 1Mbps. Table 1 represents deviation in volume and flow with actual strength of DDoS attack. Multiple regression model is developed using DDoS attack strength (Y) and deviation ( X in (t ) − X n* (t ) ) in volume and deviation ( Fin (t ) − Fn* (t ) ) in flow values as discussed in Table 1 to fit the regression equation. Regression equation and coefficient of correlation for multiple regression based model are as follows: Y = X 1 * 0.00050 + X 2 * (−1.80) + 66.76 R2=0.94

(12)

7 Results and Discussion We have developed multiple regression model as discussed in section 6. Various performance measures are used to check the accuracy of this model. 120 100 80 60 40 20 0 1

2

3

4

5

6

7

8

9

10 11 12 13 14 15 16 17 18 19

Observed DDoS attack strength

Predicted DDoS attack strength

Fig. 2. Comparison between actual DDoS attack strength and predicted DDoS attack strength using multiple regression model

Constants β 0 , β1 and β 2 of various regression equations are network environment dependent. DDoS attack strength can be computed and compared with actual DDoS attack strength using multiple regression model. The comparison between actual DDoS attack strength and predicted DDoS attack strength using multiple regression model is depicted in fig. 2. To represent false positive and false negative, we plot residual error. Positive cycle of residual error curve represents false positive, while negative cycle represents false negative. Fig. 3 represents residual error for multiple regression model.

288

B.B. Gupta et al.

25 Residual error

20 15 10 5 0 -5

0

20

40

60

80

100

120

-10

Fig. 3. Residual error in multiple regression model

Table 2 shows values of various performance measures. It can be inferred from table 2 that for multiple regression model Values of R2, CC, SSE, MSE, RMSE, NMSE, η , MAE are 0.94, 0.97, 941.02, 49.53, 7.04, 1.76, 0.93 and 0.78 respectively. Hence strength of DDoS attack predicted by this model is closest to the observed strength of DDoS attack. Table 2. Values of various performance measures

R2 CC SSE MSE RMSE NMSE

η

MAE

0.94 0.97 941.02 49.53 7.04 1.76 0.93 0.78

8 Conclusion and Future Work This paper investigates suitability of multiple regression model to predict strength of DDoS attack from deviation ( X in (t ) − X n* (t ) ) in volume and deviation ( Fin (t ) − Fn* (t ) ) in flow values. We have calculated various statistical performance measures i.e. R2, CC, SSE, MSE, RMSE, NMSE, η , MAE and residual error and their values are 0.94, 0.97, 941.02, 49.53, 7.04, 1.76, 0.93 and 0.78 respectively. Therefore, predicted strength of DDoS attack using multiple regression model is close to actual strength of DDoS attack. However, simulation results are promising as we are able to predict strength of DDoS attack efficiently with very less error rate, experimental using a real time test bed can strongly validate our claim.


289

References 1. Gupta, B.B., Misra, M., Joshi, R.C.: An ISP level Solution to Combat DDoS attacks using Combined Statistical Based Approach. International Journal of Information Assurance and Security (JIAS) 3(2), 102–110 (2008) 2. Gupta, B.B., Joshi, R.C., Misra, M.: Defending against Distributed Denial of Service Attacks: Issues and Challenges. Information Security Journal: A Global Perspective 18(5), 224–247 (2009) 3. Gupta, B.B., Joshi, R.C., Misra, M.: Dynamic and Auto Responsive Solution for Distributed Denial-of-Service Attacks Detection in ISP Network. International Journal of Computer Theory and Engineering (IJCTE) 1(1), 71–80 (2009) 4. Hill, T., Lewicki, P.: STATISTICS Methods and Applications. StatSoft, Tulsa (2007) 5. Foster, D., George, E.: The risk inflation criterion for multiple regression. Annals of Statistics 22, 1947–1975 (1994) 6. Moore, D., Shannon, C., Brown, D.J., Voelker, G., Savage, S.: Inferring Internet Denialof-Service Activity. ACM Transactions on Computer Systems 24(2), 115–139 (2006) 7. GT-ITM Traffic Generator Documentation and tool, http://www.cc.gatech.edu/fac/EllenLegura/graphs.html 8. NS Documentation, http://www.isi.edu/nsnam/ns 9. Lindley, D.V.: Regression and correlation analysis. New Palgrave: A Dictionary of Economics 4, 120–123 (1987) 10. Freedman, D.A.: Statistical Models: Theory and Practice. Cambridge University Press, Cambridge (2005) 11. Gupta, B.B., Joshi, R.C., Misra, M.: FVBA: A Combined Statistical Approach for Low Rate Degrading and High Bandwidth Disruptive DDoS Attacks Detection in ISP Domain. In: The Proceedings of 16th IEEE International Conference on Networks (ICON 2008), India (2008), doi: 10.1109/ICON.2008.4772654 12. Gibson, B.: TCP Limitations on File Transfer Performance Hamper the Global Internet. White paper (September 2006), http://www.niwotnetworks.com/gbx/ TCPLimitsFastFileTransfer.htm

Estimating Strength of a DDoS Attack Using Multiple Regression ...

Estimating Strength of a DDoS Attack Using Multiple Regression ...

Suggest Documents

Estimating strength of DDoS attack using various regression ... - arXiv

On Estimating Strength of a DDoS Attack Using Polynomial ...

A Taxonomy of DDoS Attack and DDoS Defense ... - CiteSeerX

A Taxonomy of DDoS Attack and DDoS Defense ... - CiteSeerX

DDoS Attack Detection System

A Proactive DDoS Attack Detection Approach Using ...

Fuzzy Multiple Regression Model for Estimating Software ...

DDos attack protection.pdf - Google Drive

DDos attack protection.pdf - Google Drive

An Approach of DDOS Attack Detection Using Classifiers - Springer Link

Real-Time Detection of Application-Layer DDoS Attack Using Time ...

Real-Time Detection of Application-Layer DDoS Attack Using Time ...

The Detection of DDOS Flooding Attack using Hybrid

Mitigation of Flood based DDoS Attack using Captcha

Cyberspace Administration of China DDoS Attack Forensics.pdf ...

A New DDoS Detection Model Using Multiple SVMs and TRA

An Intelligent DDoS Attack Detection System Using Packet Analysis ...

DDoS Attack Detection Using Fast Entropy Approach ...

Lightweight DDoS Flooding Attack Detection Using ... - Semantic Scholar

DDoS Attack Analyzer: Using JPCAP and WinCap - ScienceDirect.com

DDoS Attack's Simulation Using Legitimate and Attack Real ... - Ijser

Lightweight DDoS Flooding Attack Detection Using ... - Semantic Scholar

DDoS Attack Analyzer: Using JPCAP and WinCap - Science Direct

A New DDoS Detection Model Using Multiple SVMs and TRA