TCP SYN Attack - ICSY

9 downloads 0 Views 408KB Size Report
Jun 23, 2011 - TCP SYN based DoS attacks. Cyriac James, Hema A. Murthy. Department of Computer Science & Engineering. Indian Institute of Technology ...
Time Series Models and its Relevance to Modeling TCP SYN based DoS attacks Cyriac James, Hema A. Murthy Department of Computer Science & Engineering Indian Institute of Technology Madras, India

June 23, 2011 Cyriac James, Hema A. Murthy (IITM)

June 23, 2011

1 / 35

Background: TCP SYN Attack A common DoS attack: TCP SYN Attack Limited backlog queue (sysctl − a | grep ipv4.tcp max syn backlog)

Figure: SYN Attack

Time out: 3, 6, 12, 24 and 48 seconds1 1 V. Paxson and M. Allman, “RFC 2988 - Computing TCPs Retransmission Timer,” http://www.ietf.org/rfc/rfc2988.txt, Nov. 2000 Cyriac James, Hema A. Murthy (IITM)

June 23, 2011

2 / 35

Outline

Related Work Motivation for the Work Network Trace Representing Network Traffic as a Discrete Time Signal Time Series Models Analysis and Results Conclusion

Cyriac James, Hema A. Murthy (IITM)

June 23, 2011

3 / 35

Related Work Popular statistical work based on CUSUM algorithm by H. Wang et al - for the edge routers SYN - FIN 1 SYN - SYN/ACK 2

Drawbacks Series assumed to be i.i.d

Traffic burstiness and non-stationarity: Local-Area Network3 and Wide-Area Network 4 1

H. Wang, D. Zhang, and K. G. Shin, “Detecting syn flooding attacks,“ Proceedings of the IEEE INFOCOM, 2002 2 H. Wang, D. Zhang, and K. Shin, ”Syn-dog: Sniffing syn flooding sources,” ICDCS, 2002 3 W. E. Leland, M. S. Taqqu, W. Willinger, and D. V. Wilson, “On the self-similar nature of ethernet traffic,” in IEEE/ACM Transactions on Networking, 1994 4 V. Paxson and S. Floyd, “Wide-Area Traffic: The Failure of Poisson Modeling,“ in IEEE/ACM Transactions on Networking, 1995 Cyriac James, Hema A. Murthy (IITM)

June 23, 2011

4 / 35

Related Work Later, there were quite a few work based on Box-Jenkins Time Series Models - solution at the victim server Modeling the outstanding TCP requests1 Modeling the service rate2 Based on modeling the flow level features3 Modeling the web traffic4 1

D. M. Divakaran, H. A. Murthy, and T. A. Gonsalves, ”Detection of SYN flooding attacks using linear prediction analysis,“ ICON, 2006 2 G. Zhang, S. Jiang, G. Wei, and Q. Guan, ”A prediction-based detection algorithm against distributed denial-of-service attacks,“ in Proceedings of IWCMC, 2009 3 J. Cheng, J. Yin, C. Wu, B. Zhang, and Y. Liu, ”DDoS attack detection method based on linear prediction model,“ in ICIC, 2009 4 W. U. Qing-tao and S. Zhi-qing, ”Detecting DD O S attacks against web server using time series analysis,“ Wuhan Univesity Journal of Natural Sciences, vol. 11, no. 1, pp. 175-180, 2006 Cyriac James, Hema A. Murthy (IITM)

June 23, 2011

5 / 35

Motivation for the Work

Some of the major drawbacks of these work are: Assumptions of time invariance and stability of the process Window sizes of the order of seconds or lesser Can we have a model valid for a longer period?

Lacks description on: Model identification Model validation

Relevance of linear time series models?

Cyriac James, Hema A. Murthy (IITM)

June 23, 2011

6 / 35

Network Trace

Figure: Tenet Network Architecture

Traces collected at the edge router using tcpdump Feature: SYN - SYN/ACK, also called half-open count Sampling Interval: 10 seconds Data Set-1: 26th July 2010 to 30th July 2010 Data Set-2: 23rd August 2010 to 27th August 2010 Data Set-3: 20th September 2010 to 24th September 2010 Cyriac James, Hema A. Murthy (IITM)

June 23, 2011

7 / 35

Representing Network Traffic as a Discrete Time Signal

Representing Network Traffic as a Discrete Time Signal

Cyriac James, Hema A. Murthy (IITM)

June 23, 2011

8 / 35

Representing Network Traffic as a Discrete Time Signal

Discrete Time Signal Discrete Signal 18 16 14 Amplitude

12 10 8 6 4 2 0

2

4

6

8

10 Time

12

14

16

18

20

Figure: Network Signal

No access to input signals Consider the series as a sequence of impulse responses μt , for time t ≥ 0 Cyriac James, Hema A. Murthy (IITM)

June 23, 2011

9 / 35

Representing Network Traffic as a Discrete Time Signal

Stability of the System For a linear system, Total Response = Zero-State Response + Zero-Input Response1 External Stability: Zero-State Response Internal Stability: Zero-Input Response Internal Stability ⇒ External Stability1 For internal stability, Impulse response must die-off ∞ 

|μj | < ∞

(1)

j=0

μj : Impulse response at j th time lag 1

B. P. Lathi. “Principles of Linear Systems and Signals”, Oxford University Press, 2009

Cyriac James, Hema A. Murthy (IITM)

June 23, 2011

10 / 35

Time Series Models

Box-Jenkins Time Series Models

Cyriac James, Hema A. Murthy (IITM)

June 23, 2011

11 / 35

Time Series Models

Time Series Models Idea from the observations of Yule 1 xt = at + α1 at−1 + α2 at−2 + ...

(2)

xt : Output Signal at time t at , at−1 , ...: Random shocks or white noise process α1 , α2 ...: Model coefficients Also called Linear Filter Model Stationarity: First and Second order moments finite and independent of time2 LTI and Stability ⇔ Stationarity Can be used to build models for prediction 1 G. U. Yule, “On a method of investigating periodicities in distributed series, with special reference to Wolfer’s sunspot numbers”, Philos. Trans. Roy. Soc. A226, 267-298, 1927 2 G. E. P. Box, G. M. Jenkins, and G. C. Reinsel. “Time Series Analysis: Forecasting and Control”, Pearson Education, 1994 Cyriac James, Hema A. Murthy (IITM)

June 23, 2011

12 / 35

Time Series Models

Auto-Regressive(AR) Model An AR model can be written as xt = α1 xt−1 + α2 xt−2 + .... + αp xt−p + at

(3)

xt ,xt−1 , ...: Output values α1 , α2 , ...: Model coefficients, where p is the model order at : Random shock at time t Can be written as an infinite series of random shocks Consider an AR(2) model: xt = α1 xt−1 + α2 xt−2 + at

(4)

xt−1 = α1 xt−2 + α2 xt−3 + at−1

(5)

xt−2 = α1 xt−3 + α2 xt−4 + at−2 .. .

(6)

xt = at + ψ1 at−1 + ψ2 at−2 + .....

(7)

Cyriac James, Hema A. Murthy (IITM)

June 23, 2011

13 / 35

Time Series Models

Auto-Regressive(AR) Model Computing ACF: E(xt xt−k ) = E(at xt−k + ψ1 at−1 xt−k + ψ2 at−2 xt−k + .....)

(8)

γk = E(xt−k at−1 ) + ψ1 E(xt−k at−1 ) + ψ2 E(xt−k at−2 ) + ....

(9)

where γK is the autocovariance. Above equation can be generalised into: ∞  γk = σa2 ψj ψj+k (10) j=0

σa2 :

Variance of at with mean zero In terms of the impulse response (ACF), it becomes  σa2 ∞ j=0 ψj ψj+k ρk = γ0

Hence, an AR process is an infinite impulse response system  For stability ⇒ ∞ |ψ j| < ∞ j=0

Cyriac James, Hema A. Murthy (IITM)

June 23, 2011

(11)

14 / 35

Time Series Models

Auto-Regressive(AR) Model Multiply with xt−k on Equation (4) and take expectation on both sides: γk = α1 γk−1 + α2 γk−2

(12)

Dividing by γ0 , (12) becomes: ρk = α1 ρk−1 + α2 ρk−2

(13)

ρk − α1 ρk−1 − α2 ρk−2 = 0

(14)

Characteristic equation: λ2 − α1 λ − α2 = 0

(15)

The general solution is of the form: ρk = C1 (λ1 )k + C2 (λ2 )k

(16)

λ1 , λ2 : Roots (if distinct) C1 and C2 : Arbitary constants For Stability ⇒ |λ1 | < 1 and |λ2 | < 1 Yule-Walker Equation - Estimating model coefficients Cyriac James, Hema A. Murthy (IITM)

June 23, 2011

15 / 35

Time Series Models

Moving Average(MA) Model An MA model can be written as xt = at − ψ1 at−1 − ψ2 at−2 − ... − ψ2 at−q

(17)

xt : Output Signal at time t at , at−1 , ...: Random shocks or white noise process ψ1 , ψ2 ..., ψq : Model coefficients, where q is the model order Finite linear filter model Can be written as an infinite series of past values Consider an MA model of order 1 xt = at − ψ1 at−1

Cyriac James, Hema A. Murthy (IITM)

(18)

June 23, 2011

16 / 35

Time Series Models

Moving Average(MA) Model Multiply with xt−k on equation (18) and take expectation on both sides, γk = E(at xt−k − ψ1 at−1 xt−k )γ0 = E(at xt − ψ1 at−1 xt )

(19)

γ0 = E(at (at − ψ1 at−1 ) − ψ1 at−1 (at − ψ1 at−1 )

(20)

γ0 = σa2 + ψ12 σa2

(21)

γ1 =

−ψ1 σa2

(22)

γ2 = 0

(23)

γk = 0 for all values of k > 1 Hence, an MA process is a finite impulse response system Time invariant MA process is always stationary

Cyriac James, Hema A. Murthy (IITM)

June 23, 2011

17 / 35

Time Series Models

Duality Property: For Model Identification

For an AR(p) process, ACF converges slowly, but PACF cut-off after lag p For an MA(q) process, PACF converges slowly, but ACF cut-off after lag q

Cyriac James, Hema A. Murthy (IITM)

June 23, 2011

18 / 35

Time Series Models

Time Series Transformations

Study on Time invariant feature - inconclusive Transformation of Time Series Averaging and Differencing

Cyriac James, Hema A. Murthy (IITM)

June 23, 2011

19 / 35

Time Series Models

Average Series Data Set−1,Tuesday 350

300

300

Average half open count

Average half open count

Data Set−1,Monday 350

250 200 150 100 50 0

0

2000 4000 6000 Sampling Interval (Seconds)

250 200 150 100 50 0

8000

0

350

300

300

250 200 150 100 50 0

0

2000 4000 6000 Sampling Interval (Seconds)

8000

Data Set−1,Thursday

350 Average half open count

Average half open count

Data Set−1,Wednesday

2000 4000 6000 Sampling Interval (Seconds)

8000

250 200 150 100 50 0

0

2000 4000 6000 Sampling Interval (Seconds)

8000

Figure: Average Time Series Cyriac James, Hema A. Murthy (IITM)

June 23, 2011

20 / 35

Time Series Models

Difference Series Data Set−1,Tuesday Half open count:First difference

Half open count:First difference

Data Set−1,Monday 350 300 250 200 150 100 50 0

0

2000 4000 6000 Sampling Interval (Seconds)

350 300 250 200 150 100 50 0

8000

0

300 250 200 150 100 50 0

0

2000 4000 6000 Sampling Interval (Seconds)

8000

Data Set−1,Thursday

350

Half open count:First difference

Half open count:First difference

Data Set−1,Wednesday

2000 4000 6000 Sampling Interval (Seconds)

8000

350 300 250 200 150 100 50 0

0

2000 4000 6000 Sampling Interval (Seconds)

8000

Figure: Difference Time Series Cyriac James, Hema A. Murthy (IITM)

June 23, 2011

21 / 35

June 23, 2011

22 / 35

Analysis and Results

Analysis and Results

Cyriac James, Hema A. Murthy (IITM)

Analysis and Results

Stationarity Check: Mean Estimation

Day Monday Tuesday Wednesday Thursday Friday Average

Data Set-1 13.4132 11.4301 14.0949 14.3704 13.3957 13.3409

Dat Set-2 7.8968 8.1568 8.4121 8.4447 8.2669 8.2355

Data Set-3 8.1603 6.7047 4.9967 4.7113 6.0029 6.1152

Day Monday Tuesday Wednesday Thursday Friday Average

(a) Mean: Original Series Day Monday Tuesday Wednesday Thursday Friday Average

Data Set-1 5.5403 5.0008 5.6499 5.7435 5.3730 5.4615

Dat Set-2 5.0499 5.0958 5.1832 5.0452 5.2722 5.1292

Data Set-3 4.8475 4.2121 3.7834 3.6704 3.9951 4.1017

(b) Mean: Difference Series Data Set-1 13.4352 11.4376 14.1062 14.3830 13.3990 13.3524

Dat Set-2 7.8969 8.1533 8.4100 8.4461 8.2724 8.23572

Data Set-3 8.1667 6.7134 4.9954 4.7138 6.0130 6.1204

(c) Mean: Average Series

Cyriac James, Hema A. Murthy (IITM)

June 23, 2011

23 / 35

Analysis and Results

Stationarity Check: Autocorrelation Estimation of Original Series Data Set − 1

Data Set − 2 Monday Tuesday Wednesday Thursday Friday Zero Reference

0.8

0.6 ACF

ACF

0.6 0.4 0.2

0.4 0.2

0 −0.2 0

Monday Tuesday Wednesday Thursday Friday Zero Reference

0.8

0

2

4

6

8

10 Lag

12

14

16

18

−0.2 0

20

2

4

(d) ACF: Data Set-1

6

8

10 Lag

12

14

16

18

20

(e) ACF: Data Set-2 Data Set − 3

Monday Tuesday Wednesday Thursday Friday Zero Reference

0.8

ACF

0.6

0.4

0.2

0

−0.2 0

2

4

6

8

10 Lag

12

14

16

18

20

(f) ACF: Data Set-3 Cyriac James, Hema A. Murthy (IITM)

June 23, 2011

24 / 35

Analysis and Results

Stationarity Check: Autocorrelation Estimation of Difference Series Data Set − 2

Data Set − 1 Monday Tuesday Wednesday Thursday Friday Zero Reference

0.8

0.6 ACF

ACF

0.6

Monday Tuesday Wednesday Thursday Friday Zero Reference

0.8

0.4

0.4 0.2

0.2

0

0

−0.2 0

2

4

6

8

10 Lag

12

14

16

18

−0.2 0

20

2

4

(g) ACF: Data Set-1

6

8

10 Lag

12

14

16

18

20

(h) ACF: Data Set-2 Data Set − 3 Monday Tuesday Wednesday Thursday Friday Zero Reference

0.8

ACF

0.6

0.4

0.2

0

−0.2 0

2

4

6

8

10 Lag

12

14

16

18

20

(i) ACF: Data Set-3 Cyriac James, Hema A. Murthy (IITM)

June 23, 2011

25 / 35

Analysis and Results

Stationarity Check: Autocorrelation Estimation of Average Series Data Set − 2

Data Set − 1

0.8

0.8

Monday Tuesday Wednesday Thursday Friday Zero Reference

0.4

0.2

0.4 0.2 0

0

−0.2

Monday Tuesday Wednesday Thursday Friday Zero Reference

0.6 ACF

ACF

0.6

0

2

4

6

8

10

12

14

16

18

−0.2 0

20

2

4

6

Lag

(j) ACF: Data Set-1

8

10 Lag

12

14

16

18

20

(k) ACF: Data Set-2 Data Set − 3

0.8

ACF

0.6 Monday Tuesday Wednesday Thursday Friday Zero Reference

0.4

0.2

0

−0.2 0

2

4

6

8

10 Lag

12

14

16

18

20

(l) ACF: Data Set-3 Cyriac James, Hema A. Murthy (IITM)

June 23, 2011

26 / 35

Analysis and Results

1 0.8

0.6

0.6

0.4

0.4

0.2 0 −0.2

0 −0.2 −0.4

−0.6

−0.6

−0.8

−0.8 −0.5

0 Real Axis

0.5

0.5

0.2

−0.4

−1 −1

1

Imaginary Axis

1 0.8

Imaginary Axis

Imaginary Axis

Stability Check

−0.5

−1 −1

1

(m) Roots: Orig. Series

0

−0.5

0 Real Axis

0.5

−1 −1

1

(n) Roots: Diff. Series

−0.5

0 Real Axis

0.5

1

(o) Roots: Avg. Series

Cyriac James, Hema A. Murthy (IITM)

June 23, 2011

27 / 35

June 23, 2011

28 / 35

Analysis and Results

Model Identification

Sample Autocorrelation

Sample Autocorrelation Function (ACF)

0.8 0.6 0.4 0.2 0 −0.2 0

2

4

6

8

10 Lag

12

14

16

18

20

(p) Sample ACF - Difference Series Sample Partial Autocorrelations

Sample Partial Autocorrelation Function

0.8 0.6 0.4 0.2 0 −0.2 0

2

4

6

8

10 Lag

12

14

16

18

20

(q) Sample PACF - Difference Series Cyriac James, Hema A. Murthy (IITM)

Analysis and Results

Modeling and Prediction

Only AR and no MA component AR model of order 2 - from PACF Parameter Estimation: Yule-Walker Method Training: One day data

Cyriac James, Hema A. Murthy (IITM)

June 23, 2011

29 / 35

Analysis and Results

Model Validation: ACF Spread of Prediction Error Data Set−1 0.12

95% Confidence Interval

0.1

95% Spread (1.9 Times Inter−Quartile Range)

0.08

0.08

0.06

0.06 ACF

ACF

Data Set−1 0.12

95% Confidence Interval

0.1

0.04 0.02

95% Spread (1.9 Times Inter−Quartile Range)

0.04 0.02

0

0

−0.02

−0.02

−0.04

−0.04 Monday

Tuesday

Wednesday

Thursday

Friday

Monday

(r) Data Set-1

Tuesday

Wednesday

Thursday

Friday

(s) Data Set-2 Data Set−1

0.12

95% Confidence Interval

0.1

95% Spread (1.9 Times Inter−Quartile Range)

0.08 ACF

0.06 0.04 0.02 0 −0.02 −0.04 Monday

Tuesday

Wednesday

Thursday

Friday

(t) Data Set-3

Cyriac James, Hema A. Murthy (IITM)

June 23, 2011

30 / 35

Analysis and Results

Model Validation: Root Mean Square Error Day Mon-1 Tue-1 Wed-1 Thur-1 Fri-1 Mon-2 Tue-2 Wed-2 Thur-2 Fri-2 Mon-3 Tue-3 Wed-3 Thur-3 Fri-3

Model Mon-1 9.0393 6.6171 7.5437 7.6595 6.2821 7.6121 7.7220 13.3783 7.5915 7.7785 7.6859 6.4507 7.5276 6.9074 8.1178

Model Mon-2 9.0493 6.6318 7.5765 7.6819 6.3041 7.6061 7.7161 13.4851 7.5793 7.7825 7.7005 6.4603 7.5402 6.8949 8.1250

Model Mon-3 9.0509 6.6235 7.5391 7.6610 6.2892 7.6297 7.7566 13.3488 7.6356 7.7899 7.6736 6.4648 7.5327 6.9275 8.1197

Figure: RMSE: Model built on Monday Traffic

Cyriac James, Hema A. Murthy (IITM)

June 23, 2011

31 / 35

June 23, 2011

32 / 35

Analysis and Results

Model Validation: N-fold Cross Validation N−Fold Cross Validation 12

Wed−2

Outlier (Wednesday, Data Set−2)

10 Thur−1

Prediction Error

8

Mon−2

Fri−2

Tue−1

Thur−3

Tue−3

6 4 2 0

Fri−1

Wed−1

Mon−1

Thur−2

Tue−2

Mon−3 Wed−3

−2 −4

1

2

3

4

5

6

7

8 Days

9

10

11

12

13

Fri−3

14

15

Mean and Variance consistent across all models Model valid for a long period of time Gives an estimate of threshold to be fixed

Cyriac James, Hema A. Murthy (IITM)

Analysis and Results

Prediction of SYN Attack Trace driven Simulation Results are ensemble average over 50 such simulated attacks SYN rate: 10 syn/sec to 20 syn/sec Threshold error based on RMSE Data Set − 1 200

Prediction Error

150 Attack Period

Possible False Alarms

100

50

0 0

50

100

150

200 250 300 350 Sampling Interval (in seconds)

400

450

Cyriac James, Hema A. Murthy (IITM)

500

June 23, 2011

33 / 35

June 23, 2011

34 / 35

Analysis and Results

Detection Efficacy 0.7 Probability of False Postive (FP) Probability of False Negative (FN)

0.6

Probability

0.5 Threshold value for 0% FN and 11% FP

0.4 0.3 0.2 0.1 0 2

3

4

5

6 Threshold

7

8

9

10

(a) False Positive Vs False Negative Detection Delay (in seconds)

200

150

100

50

0 2

3

4

5

6 Threshold

7

8

9

10

(b) Detection Delay Cyriac James, Hema A. Murthy (IITM)

Analysis and Results

Conclusion

Systematic approach Stationarity Stability Appropriate Transformation

Stressed on model identification and validation Demonstrated the efficacy of the model For modeling normal traffic For longer period of time Detecting TCP SYN DoS attacks

Effective for Distributed SYN attacks as well Approach can be extended for other DoS attacks as well

Cyriac James, Hema A. Murthy (IITM)

June 23, 2011

35 / 35

Suggest Documents