Dynamic Binary Trees for Hierarchical Clustering of IP ...

3 downloads 47 Views 522KB Size Report
on-line HHH detection: their algorithm relies on a binary tree whose expansion ... that a HHH is simply a prefix whose cumulative volume exceeds φN, for some ...
Dynamic Binary Trees for Hierarchical Clustering of IP traffic Patrick Truong and Fabrice Guillemin France Telecom R&D, 2, avenue Pierre Marzin, F-22307 Lannion, France {patrick.truong, fabrice.guillemin}@orange-ftgroup.com

Abstract—This paper proposes a computational and memoryefficient technique for online unidimensional clustering of individual IP addresses in order to detect high-volume traffic clusters (Hierarchical Heavy Hitters). Our technique is based on a Patricia tree and can cope with today’s traffic volume. We test our algorithm by using a traffic trace composed of NetFlow records sent by a few tens of routers of the France Telecom IP backbone network. We moreover show how our algorithm can be used for network anomaly detection.

I. I NTRODUCTION Since the Internet has become over the past ten years a critical infrastructure for the society and business, it is today crucial for network operators and service providers to accurately monitor and analyze traffic in almost real time in order to identify irregular traffic patterns and unexpected phenomena. This is intended to prevent any degradation in the performance of the network and to protect honest users from malicious usage (e.g., in case of flooding attacks such as DoS or DDoS, etc.). The most basic task for traffic anomaly detection consists of identifying which flows have a volume larger than a given proportion of the total amount of traffic. (Note that volume can be expressed in terms of bytes, or SYN segments, or Echo ICMP segments, etc.) In this context, flows are typically identified by certain combinations of information fields in packets headers (e.g., IP source and destination addresses, port numbers, and protocol). While this simple observation may be well adapted to detecting an attack against a single end user, it cannot capture network usage changes at higher aggregation levels, for instance the range of IP addresses or port numbers involved in DDoS, a port scan or worm propagation. Capturing anomalies at higher aggregation levels requires grouping IP addresses into prefixes (e.g., sub-nets). When properly aggregated, a group of hosts can be responsible for a significant portion of traffic volume. These hierarchical IP clusters are referred to as Hierarchical Heavy Hitters (HHHs) in the technical literature. The concept of HHH is extremely relevant in the case of a DoS attack where victims belonging to a same sub-net are massively flooded with unusual messages; maintaining an observation of traffic based on destination IP prefixes makes it possible to detect such an attack. To address this specific problem, we develop in this paper a new algorithm for identifying Hierarchical Heavy Hitters that are one-bit hierarchical IP clusters defined by IP prefixes of any length. Hierarchical Heavy Hitters have been introduced in the

context of network security by Estan et al. [1] and Cormode et al. [2] as a natural generalization of Heavy Hitters in order to reflect the hierarchy inherent to network data. Estan et al. [1] present an effective technique for offline computation of HHHs. Cormode et al. [2] propose deterministic and randomized memory-efficient algorithms for on-line HHH identification. The deterministic algorithm of [2] is based on Lossy Counting [3]: it generalizes Lossy Counting to take into account hierarchical levels. The probabilistic algorithm of [2] is based on the Count Min sketch [4]: there is a Count Min sketch for each hierarchy level of the IP addressing space. Zhang et al. [5] present another deterministic algorithm for on-line HHH detection: their algorithm relies on a binary tree whose expansion is limited by a parameter (expansion smoother). In this paper, we develop a new algorithm for HHH identification, which is close to that by Zhang et al. [ibid.] but by using a slightly different definition for HHH. In our algorithm, we manipulate the discounted volume for detecting HHHs while Zhang et al. use the cumulative volume so that a HHH is simply a prefix whose cumulative volume exceeds φN , for some φ ∈ (0, 1) and N is the total volume. Using the cumulative volume, however, has a drawback as the algorithm returns too much superfluous information. Indeed, if an address is identified as a heavy hitter then all its ancestors are HHH prefixes. Our algorithm is based on a Patricia tree [6] and we use the same technique as in the algorithm by Zhang et al. to limit the number of nodes in the tree. But, contrary to their algorithm, our method does not need to reconstruct missed volumes. This paper is organized as follows. The different concepts used in this paper are defined in Section II. Our algorithm is presented in Section III and in Section IV, we present performance evaluation results. We show in Section V an application of the algorithm in the context of brute-force volume attack detection as a case study. Section VI presents concluding remarks. II. P ROBLEM FORMULATION The concept of Hierarchical Heavy Hitters is defined in [2] and is based on the concept of Heavy Hitters. These latter are items whose frequency (or volume quantified by the number of bytes or packets) is larger than a user-specified ratio. For example, IP source addresses are Heavy Hitters if those source addresses send a volume (measured during an observation

period) larger than a given fraction of the total amount of traffic. As discussed in the Introduction, this flat summary does not adequately reflect the possible hierarchical structure of IP data. The IP addressing space has a natural hierarchy: IP addresses can be aggregated into IP prefixes, which are sub-nets and, at higher level, networks. This hierarchy in the IP addressing space leads to the following definition for Hierarchical Heavy Hitters (HHH). Definition 1 (Hierarchical Heavy Hitter): Fix a given observation period and consider an IP data stream as a sequence of arrivals of the form (i, v), where i is an IP address and v denotes the associated volume. Let fi be the total volume generated by address i and NP the total traffic volume during the observation period (N = i fi ). The cumulative volume of a prefix p is defined as the sum of volumes of addresses that are descendants of p. Given a threshold φ (0 ≤ φ ≤ 1), Hierarchical Heavy Hitters are recursively defined as follows: • At the lowest level 0 of the hierarchy, the Hierarchical Heavy Hitters are those IP addresses that are Heavy Hitters, i.e., IP addresses that have a volume larger than φN . • At a level l of the hierarchy, the Hierarchical Heavy Hitters are those prefixes of length (32−l) with a volume greater than φN even after discounting the volumes of those descendants that are already identified as HHHs. While the cumulative volume of a prefix p aggregates all the volumes of the descendants of p, the discounted volume of p aggregates only the volumes of those descendant addresses not included in any HHH that is a strict descendant of p. The Hierarchical Heavy Hitters are then those prefixes whose discounted volume is greater than φN . The above definition slightly differ from that used in the paper by Zhang et al. [5], where the volume of a prefix accounts for the volumes of all its descendants. To determine HHHs, a naive and straightforward solution consists of building a binary tree of depth 32 in order to represent all the addresses as in Figure 1. Each node of the tree has at most two children, one associated with bit 0 and the other with bit 1. But this solution suffers from several shortcomings: • With today’s high-speed links, the number of distinct addresses is very important, so maintaining a standard binary tree is prohibitive in terms of memory; this problem is all the more critical as HHH detection algorithms have to be implemented in pieces equipment (for instance in traffic collector in a supervision center) which have to run many tasks in parallel and for which memory is expensive. • The computational cost per update is 32 node visits or creations, which may run out time if analysis has be perform on the fly as data streams are observed, for instance by a traffic collector. So a major challenge in the HHH problem is the ability to process massive network traffic streams on-line, in near real

time and with a limited memory.

Fig. 1. The HHH are underlined and the numbers represent the discounted volumes of prefixes

III. D ESCRIPTION OF OUR O N - LINE HHH A LGORITHM In this section, we present our algorithm to detect on-line IP prefixes that are HHHs. The algorithm uses a Patricia tree combined with an expansion smoother to limit the creation of nodes in the tree. We create fewer nodes than we would do if we were using a classical Patricia tree. This expansion smoother technique is inspired by the algorithm by Zhang et al. [5]. A. Tree Data Structure Our algorithm maintains a binary tree based on a Patricia tree. Each node of the tree has the following fields: • • • •

uPrefix: gives the IP prefix represented by the node, uPrefixLen: the IP prefix length, tVolume: the estimated cumulative volume of the prefix, pLeft and pRight : pointers towards the children.

The user-supplied parameters of the algorithm are: • •

a threshold φ (0 < φ < 1) for HHH detection, an estimation error ε (0 < ε < 1).

Note that with regard to accuracy, it is important to verify ε < φ. Another algorithm parameter is then defined by Tsmooth = εN h , where N is the total traffic volume during the observation period and h = 32 is the hierarchy depth of the IPv4 addressing space. We call this parameter the expansion smoother. The expansion smoother makes it possible to limit the growth of the tree. As we do not know in advance the value of N before the end of the observation period, we show in Section III-D how to cope with this lack of information. When executing the algorithm, we also maintain a counter actualTotalVolume to keep track of the total traffic volume at the end of the observation period.

B. Update Procedure For each node pNode of the tree, we use the following (language C-like) notation: • pNode->uPrefix is the IP prefix represented by the node. • pNode->uPrefixLen is the length of the prefix. • pNode->tVolume is the estimated cumulative volume of the prefix. The binary tree is updated in a similar way to that of a Patricia tree, but with the extra use of the expansion smoother Tsmooth to limit the growth of the tree. When an update arrives at time t, the running counter actualTotalVolume is incremented by V to maintain the total traffic volume received up to time t and we walk down the tree until we reach the most neighbouring node to the address IPaddress, defined as the node in the tree whose prefix contains the Longest Prefix Match (LPM) with IPaddress: we note pNode this node, LPMPrefix the longest prefix match between pNode->uPrefix and IPaddress, and LPMPrefixLen the length of the longest prefix match. Two possible cases can then occur: 1) If LPMPrefixLen < pNode->uPrefixLen (meaning that in the case of a traditional Patrica tree, we should create the LPM node) then: a) If (pNode->tVolume + V ) ≤ Tsmooth , we only need to change the field values of pNode: • pNode->uPrefix = LPMPrefix • pNode->uPrefixLen = LPMPrefixLen • pNode->tVolume = pNode->tVolume + V b) Otherwise we create the longest prefix match node, denoted by pLPMNode=, and we initialize its volume counter as follows: pLPMNode->tVolume = pNode->tVolume+V . We then distinguish three sub-cases: i) If V ≤ Tsmooth , we only attach the child node pNode to pLPMNode according to the value of the (LPMPrefixLen + 1)th bit of pNode->uPrefixLen. ii) Else if pNode->tVolume ≤ Tsmooth , we change the field values of pNode: • pNode->uPrefix = IPaddress, • pNode->uPrefixLen = 32, • pNode->tVolume = V and we attach pNode as a child of pLPMNode iii) Else we create a new node, pNewNode: • pNewNode->uPrefix = IPaddress, • pNewNode->uPrefixLen = 32, • pNewNode->tVolume = V and we attach the two nodes pNode and pNewNode as the children of pLPMNode. 2) If LPMPrefixLen = pNode->uPrefixLen, then:

a) If LPMPrefixLen = 32, this simply means that the address IPaddress is already present in the tree and is identified by the node pNode; we only increment its volume counter: pNode->tVolume+ = V . b) If (pNode->tVolume + V) ≤ Tsmooth , we just need to increment the volume counter of pNode: pNode->tVolume+ = V . c) Otherwise we create a new node, pNewNode: • pNewNode->uPrefix = IPaddress, • pNewNode->uPrefixLen = 32, • pNewNode->tVolume = V, and we attach pNewNode as a child of pNode. It is important to note that while we walk down the tree to find the most neighboring node, we also update the volume counter of each visited node. At any time, the counter tVolume of each node in the tree gives an estimation of the cumulative volume of the prefix uPrefix that is less than the true value with an error, which is at most equal to ε·actualTotalVolume. At each level in the tree, there cannot be more than N/Tsmooth internal nodes, otherwise the sum of internal node volumes would exceed N , which is impossible. Consequently,  our algorithm uses O h2 /ε nodes. C. Detecting HHHs At the end of the observation period, the value of the counter actualTotalVolume gives the total traffic volume. For detecting HHHs, we recursively scan the tree bottomup. A node is a HHH if its counter tVolume exceeds (φ · actualTotalVolume) after discounting the volumes of descendant nodes that are already identified as HHHs. This prevents us to take into account redundant information. In addition, in the spirit of HHH detection, a neat delineation between the different levels of the addressing space should be ensured. The basic principle is to detect HHH at a given level and the decision should be not be influenced by the presence of HHHs at lower levels. For instance, a class A (with a mask of 8 bits) could be detected as a HHH, but if we take a mask of 9 bits, then we should not detect an HHH, simply because the corresponding address space has no semantic meaning. Hence, a HHH may make sense with a mask of 8 bits but not with a mask of 9 bits. This implies that the volume of HHH at lower levels should be discounted from the volumes at upper levels in order to avoid misinterpretations. D. A Method for Estimating the Expansion Smoother As the expansion smoother Tsmooth = εN 32 depends on N , the total traffic volume during the observation period which is not known before the end of the observation period. To overcome this problem, we present in this section a method for estimating N . Suppose that the duration of the observation period is w minutes. Before starting HHH detection, we recommend a learning phase whose duration is equal to m measurement intervals of w minutes each. For each learning measurement

interval Ij , 1 ≤ j ≤ m, we define two variables νj and νbj as follows:



νj is a counter to keep track of the (true) volume of traffic during the interval Ij , νbj is an estimation of νj obtained via exponential smoothing.

At the end of each interval Ij , we obtain νj by means of traffic observation and we compute νbj+1 , the prediction for the next interval volume in the time series according to the exponential smoothing formula:

our algorithm Zhang et al.’s algorithm

6 5 % of false positives



False positive rate 7

4 3 2 1 0

0

0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018 threshold φ

νbj+1 = ανj + (1 − α) νbj , with 0 < α < 1

We test our algorithm presented in the previous section by using a trace of NetFlow records [7] captured on the interconnection IP backbone network of France Telecom. The trace represents Netflow streams coming from more 50 routers and contains an average of 60,000 records per second. Each NetFlow record contains statistics (such as number of octets or packets) for an IP flow. Traffic observation is done on IP destination addresses, so the Hierarchical Heavy Hitters are destination prefixes. We first evaluate the performance of our algorithm by examining the space usage, the false positive rate and the false negative rate as a function of the threshold φ. The duration of the observation period is set to one minute: every minute, we detect HHH prefixes in the just ended time interval of one minute. For each threshold value φ, we choose an estimation error ε 10 times smaller than φ and we run our algorithm over 40 observation periods. The curves are plotted in Figures 2 and 3 and correspond to the means over the 40 observation periods. For assessing the accuracy of the algorithm, we use the binary tree-based naive solution to compute the true HHHs. We also compare our evaluation results against the algorithm of Zhang et al [5] since our algorithm uses the same expansion smoother technique. From Figure 2 and 3, we observe that the false positive and false negative rates are quite small (about 3%) and are comparable for both algorithms. Of course, the accuracy is much better when φ is small; our algorithm gives results which are a little better as the algorithm by Zhang et al. To estimate the gain in terms of memory when compared to the naive binary tree-based approach, Figure 4 displays the ratio

False negative rate our algorithm Zhang et al.’s algorithm

5

4

3

2

1

0

0

0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018 threshold φ

Fig. 3.

0.02

False negative rate

It clearly appears from this figure that we substantially gain memory efficiency by using a sufficiently large factor φ. If we take φ = 1 %, we have a substantial gain in memory while relatively small false positive and false negative rates.

Memory usage 0.3

0.25

0.2

0.15

0.1

0.05

0

0

0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018 threshold φ

Fig. 4.

number of nodes in our algorithm tree . number of nodes in the naive binary trie

False positive rate

6

% of false negatives

IV. E VALUATION OF OUR A LGORITHM

Fig. 2.

memory usage

When the learning phase finishes, we finally have νm and νbm+1 . As νbm+1 is the prediction of total traffic volume for the next observation period, we can start the HHH detection by taking an estimation of N = βb νm+1 (0 < β < 1) in order to be sure that the estimation is no larger than the real value of N . The learning process also continues during the HHH detection to have an estimation of N that is adaptively computed based on the most recent traffic measurements.

0.02

Space usage of our algorithm

0.02

In this section, we describe how we can use our HHH detection algorithm as a pre-filtering module to detect network anomalies in the case of a brute-force volume attack, or in the case of a DDoS attack by capturing the attacking sources in a single cluster. While examining the HHH prefixes on consecutive observation periods, we note that they remain nearly similar: from one period to the next, a HHH prefix only differs from a few one-bit granular hierarchy levels and we have in most cases the same list of HHHs modulo a few bits. Hierarchically clustering IP traffic gives then us a good view of traffic composition in terms of the largest bandwidth contributors. On the basis of this observation, we can then perform a change detection in the HHH output. We define the normal HHH behavior by a mean µ and a standard deviation σ, which are estimated on-line. The quantity µ represents the mean volume above the threshold φ for the discounted volume of a HHH and σ represents the corresponding standard deviation. We first compute the values of µ and σ from learning data. For an observation interval Ij , let Nj be the total traffic volume and xij the discounted volume of a HHH i. Weconsider HHH i as suspect if the difference yji = xij − φNj exceeds the alarm threshold µ + Lσ, where L > 0. This latter parameter is used to detect traffic which is too far away from the mean value and then seems to be abnormal. The mean µ and the variance σ 2 are estimated by using an exponentially weighted moving average (EWMA) scheme, that is, 2 σ 2 = δ yji − µ + (1 − δ)σ 2 , (1) µ = λyji + (1 − λ)µ,

(2)

where δ and λ are EWMA factors. Typically, the so called innovation parameters δ and λ take values between 0.01 and 0.3. The HHH i is declared abnormal if there is a minimum number K > 1 of consecutive intervals for which the alarm threshold is exceeded. The parameter K is used for reducing false positives. In summary, the proposed anomaly detection algorithm makes use of the following parameters: 2 • the on-line estimated mean µ and variance σ , • the exceedance parameter L, which defines the exceedance level µ + Lσ, • the exceedance tolerance K (a HHH is suspect if its discounted volume is at least K times greater than µ + Lσ) . We evaluate our proposed anomaly detection algorithm with a NetFlow trace (from the France Telecom transit backbone network), for which we already know that there are anomalies; these anomalies have been detected by an exhaustive algorithm storing for each level the volume of traffic. We set the observation period equal to 60 seconds. We compute the alarm threshold µ + Lσ as specified by Equations (1) and (2). For each observation period, we also compute the mean of all the

HHH volumes after subtracting the proportion of traffic corresponding to the threshold φ. The parameters of our method are: φ = 0.01, δ = 0.0225, λ = 0.15, and L = 2.4. Figure 5 shows our experimental results: anomalies are detected when the HHH behavior curve exceeds the alarm threshold curve.

HHH Anomalies 1e+08

alarm threshold HHH behaviour

9e+07 8e+07 7e+07 volume

V. A PPLICATION TO B RUTE - FORCE VOLUME ATTACK D ETECTION

6e+07 5e+07 4e+07 3e+07 2e+07 1e+07

0

500

1000

Fig. 5.

1500 2000 2500 observation period

3000

3500

4000

HHH anomalies

VI. C ONCLUSION Observing traffic at different aggregation levels is a valuable information for IP network management. We have proposed in this paper an efficient on-line algorithm for accurately detecting Hierarchical Heavy Hitters. The hierarchical analysis of network data, however, remains an open topic: instead of clustering along one dimension, we can generalize to multiple hierarchies (e.g., source and destination addresses) to find Multidimensional Hierarchical Heavy Hitters (see [8] for more details about the problem complexity). Moreover, as mentioned in [9], another challenge in hierarchical data is to find hierarchical deltoids that are IP prefixes that differ significantly in volume from one time window to another. R EFERENCES [1] C. Estan, S. Savage, and G. Varghese, “Automatically inferring patterns of resource consumption in network traffic,” in Proceedings of ACM SIGCOMM, 2003. [2] G. Cormode, F. Korn, S. Muthukrishnan, and D. Srivastava, “Finding hierarchical heavy hitters in data streams,” in International Conference on Very Large Databases, 2003, pp. 464–475. [3] G. Manku and R. Motwani, “Approximate frequency counts over data streams,” in Proceedings of 28th International Conference on Very Large Data Bases, 2002, pp. 346–357. [4] G. Cormode and S. Muthukrishnan, “Improved data stream summaries: The count-min sketch and its applications,” Journal of Algorithms, 2004. [5] Y. Zhang, S. Singh, S. Sen, N. Duffield, and C. Lund, “Online identification of hierarchical heavy hitters: Algorithms, evaluation and applications,” in IMC ’04, October 2004. [6] D. R. Morrison, “Patricia - practical algorithm to retrieve information coded in alphanumeric,” Jrnl. of the ACM, vol. 15, no. 4, pp. 524–534, October 1968. [7] “Introduction to cisco ios netflow,” http://www.cisco.com/application/pdf/ en/us/guest/products/ps6601/c1244/cdccont 0900aecd80406232.pdf. [8] J. Hershberger, N. Shrivastava, S. Suri, and C. D. T´oth, “Space complexity of hierarchical heavy hitters in multi-dimensional data streams,” in PODS 2005 conference, June 2005. [9] G. Cormode and S. Muthukrishnan, “What’s new: Finding significant differences in network data streams,” in Proceedings of IEEE Infocom, 2004.

Suggest Documents