We propose a novel data streaming algorithm for PCA-based network-wide traf- fic anomaly detection in a distributed fashion. Our algo- rithm can archive O(wn ...
A Distributed Data Streaming Algorithm for Network-wide Traffic Anomaly Detection Yang Liu, Linfeng Zhang, and Yong Guan Iowa State University, Ames, IA 50010, USA {yangl,
zhanglf, guan}@iastate.edu
ABSTRACT
2. PROBLEM DEFINITION
Nowadays, Internet has serious security problems and network failures that are hard to resolve, for example, botnet attacks, polymorphic worm/virus spreading, DDoS, and flash crowds. To address many of these problems, we need to have a network-wide view of the traffic dynamics, and more importantly, be able to detect traffic anomaly in a timely manner. To our knowledge, Principle Component Analysis (PCA) is the best-known spatial detection method for the network-wide traffic anomaly. However, existing PCA-based solutions have scalability problems in that they require O(m2 n) running time and O(mn) space to analyze traffic measurements from m aggregated traffic flows within a sliding window of the length n. We propose a novel data streaming algorithm for PCA-based network-wide traffic anomaly detection in a distributed fashion. Our algorithm can archive O(wn log n) running time and O(wn) space at local monitors, and O(m2 log n) running time and O(m log n) space at Network Operation Center (NOC), where w denotes the maximum number of traffic flows at a local monitor.
For the purpose of addressing problems like Intrusion detection, fault detection and recovery, and QoS provisions, many ISPs have chosen to use a distributed architecture for the network monitoring, which consists of local monitors and the NOC. In this framework, local monitors collect data from routers and other network devices, perform some processing at or close to the data sources, and then transfer their measurements to the NOC. The NOC is responsible for mining interested characteristics from collected measurements, and identify the problems and the roots thereof. In order to detect traffic anomalies, we introduce the anomaly distance defined as such a distance norm that the normal traffic measurements are close to each other with a high probability. Next we pick a typical normal measurement, denoted by xnormal , and compute the probability distribution of the anomaly distance of a random measurement x from xnormal . At last, we can detect traffic anomalies as outliers with a statistical confidence.
1.
INTRODUCTION
Internet has become an essential part of the daily life for billions of users worldwide. People are using and relying on a large variety of services built on the top of the Internet, such as web browsing, online banking, shopping, entertainment, VoIP, Video on demand, auction, social networks, etc. However, everyday we are still reading news stories about major security breaches, new polymorphic worm/virus spreading, identity theft, Botnet activity, DDoS or phishing emails. Traffic anomaly detection has become an important issue for the network management in the Internet, which has obtained considerable research interests [3, 5, 6, 7]. To deal with low-profile coordinated traffic anomalies, Lakhina et al. [5, 6] proposed PCA-based detection methods by utilizing traffic measurements from multiple links. Li et al. [7] aggregated traffic flows into sketch subspaces and detected traffic anomalies based on PCA too. Due to the high communication cost, Huang et al. [3] designed a local algorithm to filter data at the local monitors for the PCA-based methods. The spatial analysis like PCA has been verified to be an effective method for the traffic anomaly detection. But it introduces several challenges for applying PCA online in practice [9]. We first formalize the traffic anomaly detection problem, and then propose a novel data streaming algorithm which use the Random Projection [11] to improve the performance of the PCA-based detection methods.
Definition 1. Given xnormal , a measurement y is detected as traffic anomalies at (1 − ̺) confidence level (0 < ̺ < 1) if P (x : d(x) > d(y)) ≤ ̺
(1)
where d(y) denotes the anomaly distance of y from xnormal and P (·) denotes the probability that d(x) > d(y) for ∀x.
3. SKETCH-BASED METHOD In this paper, PCA is applied on the traffic volumes to compute the anomaly distance and the Q-statistic [4] to detect traffic anomalies as outliers [5]. All traffic volume from m traffic flows within the sliding window of the length n, denoted by xij , are organized into a n × m traffic matrix X, which is adjusted to a matrix Y with P t zero column mean, i.e. yij = xij − x ¯tj and x ¯tj = i=t−n+1 xij /n. The singular value decomposition (SVD) of Y is denoted P T by Y = m j=1 ηj uj vj , where vj is the principal component and ηj is the corresponding singular value. Lakhina found that the traffic volumes of multiple flows had a low intrinsic dimensionality. Thus the normal traffic can effectively reside in a r-dimensional subspace with r ≪ m. We choose the mean of the traffic volumes as xnormal = (¯ xt1 , . . . , x ¯tm )T , and the anomaly distance can be computed by using the last m − r principal components that are sorted by ηj , i.e. η1 ≥ η2 ≥ · · · ≥ ηm ≥ 0, v u X u m (vjT yi )2 , (2) dY (yi ) = t j=r+1
Aggregation
Packet Stream
Sketch Sketch PCA Computation Computation
Volume Counter
Traffic Volume
Computing Anomaly Distance
Local Monitor
Theorem 2. The computation complexity is O(wn log n) and the space requirement is O(wn) at the local monitor. The computation complexity is O(m2 log n) and the space requirement is O(m log n) at the NOC.
Computing Threshold
d > ?
Alarm?
4. EVALUATION
Network Operation Center
Figure 1: System model of the sketch-based method where yi is a vector of adjusted traffic volumes. If dY (yi ) is larger than the threshold Q̺ determined by the size r of the normal subspace and the singular values ηj [4], NOC gets an alarm of the traffic anomaly. The system model of the sketch-based method is shown in Fig.1, which is designed to compute the anomaly distance and the threshold efficiently. Local monitors implement a specific aggregation method and reports the flow ID and the packet size to the Volume Counter module, which maintains a list of buckets for each traffic flow. Each bucket Uij stores the traffic volume xij of the j-th flow at the i-th time interval. The Sketch Computation module generates l independent and identically-distributed (i.i.d.) pseudo random numbers at each time interval, denoted by ri1 , . . . , ril , from the standard normal distribution or other specific probability distributions [2]. A sketch zkj is computed as, 1 zkj = √
l
t X
rik yij
(3)
i=t−n+1
for k = 1, . . . , l. The NOC gets all sketches zkj from local monitors, and organizes them into a l × m matrix P Z. The SVD of the sketch matrix Z is denoted as Z = j λj bj aTj and the anomaly distance can be computed as, v u X u m dZ (yi ) = t (aTj yi )2 (4) j=r+1
A threshold δ̺ is computed based on λj . Our method detects traffic anomalies based on the same principle as Lakhina’s method. If dZ (yi ) > δ̺ , NOC gets an alarm. According to the matrix perturbation theory [10], we can prove that the anomaly distance dY (·) can be ε-bounded by dZ (·) with l = O(log n), where ε is a user-specific bound. n Theorem 1. If l > C log for a large enough constant C, ε2 √ 2 ε kYk2F kyk (5) |dZ (y) − dY (y)| ≤ 2 |ηr+1 − ηr2 | C log n
with the probability 1 − 2e− 4
.
The proof of the above theorem is provided in our technical report [8]. The threshold Q̺ can also be approximated by δ̺ with Mirsky’s theorem [8]. Furthermore, our method can improve the performance of the PCA-based methods. The computation and space requirements of our method are determined by the following steps. At a local monitor, each yij is multiplied by l random numbers rik , which require O(wnl) productions. The local monitor needs to save the traffic volumes, which requires O(wn) spaces. At the NOC, the computation complexity of the SVD on a l × m matrix is O(m2 l). In order to save the sketch matrix, NOC needs O(ml) spaces. Because l = O(log n), we have [8],
We evaluate our methods on the data from Abilene Observatory Data Collections [1]. We first apply Lakhina’s method to detect traffic anomalies, and then use these detected anomalies as the real anomalies to evaluate the detection accuracy of the SSM. The results show that both false negative errors and false positive errors decrease quickly as the size of the sketch l increases. A small value of l like l = 100 for n = 2016 is enough for ISPs to detect traffic anomalies, which can improve the performance of PCAbased methods over a distributed computing environment.
5. ACKNOWLEDGMENTS This project has benefited from the use of measurement data collected on the Internet2 network as part of the Internet2 Observatory Project. This work was partially supported by NSF under grants No. CNS-0644238, CNS-0626822, and CNS-0831470.
6. REFERENCES [1] Abilene observatory data collections. www.internet2.edu/observatory/. [2] N. Alon, P. B. Gibbons, Y. Matias, and M. Szegedy. Tracking join and self-join sizes in limited storage. PODS ’99, pages 10–20, 1999. [3] L. Huang, X. L. Nguyen, M. Garofalakis, J. Hellerstein, M. Jordan, A. Joseph, and N. Taft. Communication-efficient online detection of network-wide anomalies. INFOCOM ’07, pages 134–142, 2007. [4] J. E. Jackson and G. S. Mudholkar. Control procedures for residuals associated with principal component analysis. Thechnometrics, pages 341–349, 1979. [5] A. Lakhina, M. Crovella, and C. Diot. Diagnosing network-wide traffic anomalies. SIGCOMM Comput. Commun. Rev., 34(4):219–230, 2004. [6] A. Lakhina, M. Crovella, and C. Diot. Mining anomalies using traffic feature distributions. SIGCOMM ’05, pages 217–228, 2005. [7] X. Li, F. Bian, M. Crovella, C. Diot, R. Govindan, G. Iannaccone, and A. Lakhina. Detection and identification of network anomalies using sketch subspaces. IMC ’06, pages 147–152, 2006. [8] Y. Liu, L. Zhang, and Y. Guan. Sketch-based network-wide traffic anomaly detection. Technical Report, ECpE Department, Iowa State University (http://home.eng.iastate.edu/∼yangl/), April 2009. [9] H. Ringberg, A. Soule, J. Rexford, and C. Diot. Sensitivity of pca for traffic anomaly detection. SIGMETRICS ’07, pages 109–120, 2007. [10] G. Stewart and J.-G. Sun. Matrix perturbation theory. Academic Press, Boston, 1990. [11] S. S. Vempala. The Random Projection Method. American Mathematical Society, Rhode Island, 2004.