A Statistical Model for Detecting Abnormality in Static-Priority Scheduling Networks with Differentiated Services Ming Li1 and Wei Zhao2 1
Abstract. This paper presents a new statistical model for detecting signs of abnormality in static-priority scheduling networks with differentiated services at connection levels on a class-by-class basis. The formulas in terms of detection probability, miss probability, probabilities of classifications, and detection threshold are proposed. Keywords: Anomaly detection, real-time systems, traffic constraint, staticpriority scheduling networks, differentiated services, time series.
1 Introduction Anomaly detection has gained applications in computer communication networks, such as network security, see e.g. [1], [2], [3], [4], [5], [6], [7]. This paper considers the abnormality identification of arrival traffic time series (traffic for short) at connection levels, which relates to traffic models. In traffic engineering, traffic models can be classified into two categories [8]. One is statistically modeling as can be seen from [9], [10], [11]. The other bounded modeling, see e.g. [12], [13], [14], [15]. Though statistically modeling has gained considerable progresses, one thing worth noting is that they are well in agreement with real life data in aggregated case. In general, nevertheless, they are not enough when traffic at connection levels has to be taken into account. In fact, traffic modeling at connection level remains challenging in the field [16]. In the academic area of computer science, a remarkable thing to model traffic at connection level is to study traffic from a view of deterministic queuing theory, which is often called network calculus or bounded modeling. One of the contributions in this paper is to develop traffic constraint (a kind of deterministically bounded model [13]) into a statistical bound of traffic. Recent developments of networking exhibit that there exists an increased interest in differentiated services (DiffServ) [13], [17]. From a view of abnormality detection, instead of detecting abnormality of all connections, we are more interested in Y. Hao et al. (Eds.): CIS 2005, Part II, LNAI 3802, pp. 267 – 272, 2005. © Springer-Verlag Berlin Heidelberg 2005
identifying abnormality of some connections in practice. Thus, this paper studies abnormality detection in the environment of DiffServ. As far as detections were concerned, the current situation is not lacking methods for detections [18] but short of reliable detections as can be seen from the statement like this. “The challenge is to develop a system that detects close to 100 percent of attacks. We are still far from achieving this goal [19].” From a view of statistical detection, however, instead of developing a way to detect close to 100 percent of abnormality, we study how to achieve an accurate detection for a given detection probability. By accurate detection, we mean that a detection model is able to report signs of abnormality for a predetermined detection probability. This presentation proposes an accurate detection model of abnormality in static-priority scheduling networks with DiffServ based on two points: 1) the null hypotheses and 2) averaging traffic constraint in [13]. A key point in this contribution is to randomize traffic constraint on an interval-by-interval basis so as to utilize the techniques from a view of time series to carry out a statistical traffic bound, which we shall call average traffic constraint for simplicity. To our best knowledge, this paper is the first attempt to propose average traffic constraint from a view of stochastic processes and moreover apply it to abnormality detection. The rest of paper is organized as follows. Section 2 introduces an average traffic constraint in static-priority scheduling networks with DiffServ. Section 3 discusses detection probability and detection threshold. Section 4 concludes the paper.
2 Average Traffic Constraint In this section, we first brief the conventional traffic constraint. Then, randomize it to a statistical constraint of traffic. The traffic constraint is given by the following definition. Definition 1: Let f (t ) be arrival traffic function. If f (t + I ) − f (t ) ≤ F ( I ) for t > 0
and I > 0, then F ( I ) is called traffic constraint function of f (t ) [13].
Definition 1 is a general description of traffic constraint, meaning that the increment of traffic f (t ) is upper-bounded by F ( I ). It is actually a bounded traffic model [13]. The practical significance of such model is to model traffic at connection level. Due to this, we write the traffic constraint function of group of flows as follows. Definition 2: Let f pi , j , k (t ) be all flows of class i with priority p going through
server k from input link j. Let Fpi , j , k (t ) be the traffic constraint function of f pi , j , k (t ). Then, Fpi , j , k (t ) is given by f pi , j , k (t + I ) − f pi , j , k (t ) ≤ Fpi , j , k ( I ) for t > 0 and I > 0.
Definition 2 provides a bounded model of traffic in static-priority scheduling networks with DiffServ at connection level. Nevertheless, it is still a deterministic model in the bounded modeling sense. We now present a statistical model from a view of bounded modeling. Theoretically, the interval length I can be any positively real number. In practice, however, it is usually selected as a finite positive integer in practice. Fix the value of
A Statistical Model for Detecting Abnormality in Static-Priority Scheduling Networks
I and observe Fpi , j , k ( I ) in the interval [(n − 1) I , nI ], n = 1, 2,..., N . For each
interval, there is a traffic constraint function Fpi , j , k ( I ), which is also a function of the index n. We denote this function Fpi , j , k ( I , n). Usually, Fpi , j , k ( I , n) ≠ Fpi , j , k ( I , q) for n ≠ q. Therefore, Fpi , j , k ( I , n) is a random variable over the index n. Now, divide the interval [(n − 1) I , nI ] into M non-overlapped segments. Each segment is of L length. For the mth segment, we compute the mean E[ Fpi , j , k ( I , n)]m (m = 1, 2,..., M ), where E is the mean operator. Again, E[ Fpi , j , k ( I , n)]l ≠ E[ Fpi , j , k ( I , n)]m for l ≠ m. Thus, E[ Fpi , j , k ( I , n)]m is a random variable too. According to statistics, if M ≥ 10, E[ Fpi , j , k ( I , n)]m quite accurately follows Gaussian distribution [1], [20]. In this case, E[ Fpi , j , k ( I , n)]m ~
1 2π σ F
{E[ Fpi , j , k ( I , n)]m − Fµ ( M )}2 2σ F2
where σ F2 is the variance of E[ Fpi , j , k ( I , n)]m and Fµ ( M ) is its mean. We call
E[ Fpi , j , k ( I , n)]m average traffic constraint of traffic flow f pi , j , k (t ).
3 Detection Probability In the case of M ≥ 10, it is easily seen that
⎡ ⎤ Fµ ( M ) − E[ Fpi , j , k ( I , n)]m ≤ zα / 2 ⎥ = 1 − α , Pr ob ⎢ z1−α / 2 < σF M ⎢⎣ ⎥⎦
where (1 − α ) is called confidence coefficient. Let CF ( M , α ) be the confidence interval with (1 − α ) confidence coefficient. Then, ⎡ σ z σ z ⎤ CF ( M , α ) = ⎢ Fµ ( M ) − F α / 2 , Fµ ( M ) + F α / 2 ⎥ . M M ⎦ ⎣
The above expression exhibits that Fµ ( M ) is a template of average traffic constraint. Statistically, we have (1 − α )% confidence to say that E[ Fpi , j , k ( I , n)]m takes Fµ ( M ) as its approximation with the variation less than or equal to
σ F zα / 2 M
Denote ξ E[ Fpi , j , k ( I , n)]m . Then, ⎛ σ z Pr ob ⎜ ξ > Fµ ( M ) + F α / 2 M ⎝
⎞ α ⎟= . ⎠ 2
⎛ σ z ⎞ α Pr ob ⎜ ξ ≤ Fµ ( M ) − F α / 2 ⎟ = . M ⎠ 2 ⎝
On the other hand,
For facilitating the discussion, two terms are explained as follows. Correctly recognizing an abnormal sign means detection and failing to recognize it miss. We explain the detection probability as well as miss probability by the following theorem. Theorem 1 (Detection probability and detection threshold): Let
V = Fµ ( M ) +
σ F zα / 2
be the detection threshold. Let Pdet and Pmiss be detection probability and miss probability, respectively. Then, Pdet = P{V < ξ < ∞} = (1 − α / 2),
Pmiss = P{−∞ < ξ < V } = α / 2.
Proof: The probability of ξ ∈ CF ( M , α ) is (1 − α ). According to (2) and (5), the probability of ξ ≤ V is (1 − α / 2). Therefore, ξ > V exhibits a sign of abnormality with (1 − α / 2) probability. Hence, Pdet = (1 − α / 2). Since detection probability plus □ miss one equals 1, Pmiss = α / 2. From Theorem 1, we can achieve the following statistical classification criterion for a given detection probability by setting the value α . Corollary 1 (Classification): Let f pi , j , k (t ) be arrival traffic of class i with priority
p going through server k from input link j at a protected site. Then, f pi , j , k (t ) ∈ N if E[ Fpi , j , k ( I , n)]m ≤ V
where N implies normal set of traffic flow, and f pi , j , k (t ) ∈ A if E[ Fpi , j , k ( I , n)]m > V .
where A implies abnormal set. The proof is straightforward from Theorem 1. The diagram of our detection is indicated in Fig. 1.
Setting detection probability (1−α / 2) f(t)
Feature extractor
ξ ξ
Establishing template
Fµ (M) Template
Detection threshold
Fig. 1. Diagram of detection model
A Statistical Model for Detecting Abnormality in Static-Priority Scheduling Networks
4 Conclusions In this paper, we have extended the traffic constraint in [13], which is conventionally a bound function of arrival traffic, to a time series by averaging traffic constraints of flows on an interval-by-interval basis in DiffServ environment. Then, we have derived a statistical traffic constraint to bound traffic. Based on this, we have proposed a statistical model for the purpose of abnormality detection in static-priority scheduling networks with differentiated services at connection level. With the present model, signs of abnormality can be identified on a class-by-class basis according to a detection probability that is predetermined. The detection probability may be very high and miss probability may be very low if α is set to be very small. The results in the paper suggest that abnormality signs can be detected at early stage that abnormality occurs since identification is done at connection level.
Acknowledgements This work was supported in part by the National Natural Science Foundation of China (NSFC) under the project grant number 60573125, by the National Science Foundation under Contracts 0081761, 0324988, 0329181, by the Defense Advanced Research Projects Agency under Contract F30602-99-1-0531, and by Texas A&M University under its Telecommunication and Information Task Force Program. Any opinions, findings, conclusions, and/or recommendations expressed in this material, either expressed or implied, are those of the authors and do not necessarily reflect the views of the sponsors listed above.
