Taming Large Classifiers with Rule Reference Locality

2 downloads 0 Views 145KB Size Report
large number of rules in future packet classifiers. ... basic step for many networking functions such as Diff-Serv traffic condition- .... popular port numbers. For TCP ...
Taming Large Classifiers with Rule Reference Locality Hyogon Kim1 , Jaesung Heo1 , Lynn Choi1 , Inhye Kang2 , and Sunil Kim3 1 2 3

Korea University University of Seoul Hongik University

Abstract. An important aspect of packet classification problem on which little light has been shed so far is the rule reference dynamics. In this paper, we argue that for any given classifier, there is likely a significant skew in the rule reference pattern. We term such phenomenon rule reference locality, which we believe stems from biased traffic pattern and/or the existence of “super-rules” that cover a large subset of the rule hyperspace. Based on the observation, we propose an adaptive classification approach that dynamically accommodates the skewed and possibly time-varying reference pattern. It is not a new classification method per se, but it can effectively enhance existing packet classification schemes, especially for large classifiers. As an instance, we present a new classification method called segmented RFC with dynamic rule base reconfiguration (SRFC+DR). When driven by several large real-life packet traces, it yields a several-fold speedup for 5-field 100K-rule classification as compared with another scalable method ABV. In general, we believe exploiting the rule reference locality is a key to scaling to a very large number of rules in future packet classifiers.

1

Introduction

Classifying incoming packets based on header fields is the first and the most basic step for many networking functions such as Diff-Serv traffic conditioning, firewall, virtual private network (VPN), traffic accounting and billing, loadbalancing, and policy-based routing. These functions need to track flows, i.e., equivalence classes of packets, and give the same treatment to the packets in a flow. Since a flow is defined by the header values, a classifier’s duty is to examine the values of packet header fields, then identify the corresponding flow. The rules that prescribe flow definitions and the associated actions reside in the rule base (a.k.a. “policy base” or “filter table”) located in memory. Lying on the per-packet processing path, the classification function should minimize the accesses to memory in order to maximize packet throughput. Of particular concern is thus when the rule base becomes large and the search complexity increases. Although no such large classifiers exist as of today, rule bases of up to 100K [2] or even 1 million [1, 10] entries are believed to be of practical interest. Recently, there has been active research on fast and/or scalable classification techniques [1, 2, 4, 5, 6, 7, 8, 9, 10]. With the exception of [9, 10], these works H.-K. Kahng (Ed.): ICOIN 2003, LNCS 2662, pp. 928–937, 2003. c Springer-Verlag Berlin Heidelberg 2003 

Taming Large Classifiers with Rule Reference Locality

929

focus on relatively small classifiers, e.g., with less than 20K entries. A notable property also shared by most of these works is that the dynamic aspect of rule reference behavior is not considered. For some, the lack of information on reference pattern hardly causes a problem as long as classifiers are small. For instance, the time complexity of RFC [1] algorithm is constant no matter what the rule base size is, given that table pre-computation succeeds within the given memory budget. For others, such as computational geometry formulations [5, 8, 9], it is inherently difficult to accommodate any dynamics in reference (or “point location”) pattern. As we deal with increasingly large classifiers, however, the rule reference dynamics can play a vital role in designing a scalable methodology. We firmly believe that a realistic assumption for any given classifier is that some rules are more popular than others, receiving more references. Moreover, we argue that we can even exploit it to improve the performance of large classifiers. In fact, it is the departing point and the contribution of this paper. A novel aspect of our paper is that we go as far as reconfiguring the rule base itself, to adapt the classifier to whatever rule reference pattern is. Most classifier implementations already keep a usage counter for each rule for statistics collection purpose [11]. By dynamically reordering the rule base with respect to such usage counts, we can exploit the notion of reference skew and yield good average case performance so that we can scale to a very large number of rules. One constraint in rule movement is that rules should retain the relative ordering relations or “dependency”, as before the reconfiguration. However, this constraint does not seem to prohibit relatively free movements of rules required in our approach because a survey of numerous real-life rule bases [1] shows that rule intersection is very rare. We will elaborate on these issues in the rest of the paper. Note we do not advocate a new data structure or a new search algorithm for purpose of accelerating the rule search. Rather, we propose to dynamically reflect rule reference pattern on rule base structure so that search methods scale well with the number of rules. Also, rule base optimization by eliminating redundant rules is not the focus of this paper. Our approach works for any given rule base, whether it abounds in redundancy or not. Finally, in this paper, we focus on the problem of scalable classification for the applications in which most packets get classified using non-default rules, such as in firewall. This paper is organized as follows. In Section 2, we discuss a rule base reconfiguration method based on our approach. We also present a practical classification method where our approach is used. Section 3 shows experiment results. We demonstrate that our approach can easily reap several hundred percent speedup over existing scalable methods. The paper concludes with Section 4.

2

Dynamic Rule Base Reconfiguration

There are a few causes for a skewed reference pattern in a rule base. First, the most frequently appearing IP protocol numbers at any observing locale are almost always TCP, UDP, and ICMP, although depending on locale GRE, PIM,

930

Hyogon Kim et al.

and others can exceed ICMP [3]. Second, probably the most direct cause for a skewed reference pattern is that within TCP and UDP, there are dominantly popular port numbers. For TCP, for example, 80 (HTTP) is an unquestionable winner. Third, the IP address fields also show non-uniform reference patterns. An evidence of non-uniform pattern at network edge has been first noted in [4]. Last but not least, there are rules with shorter address prefixes and wider port number ranges. These “super-rules” occupy a large subset of the rule base hyperspace, and have potential to capture more packets than others. Below, we design a method to exploit the skewed reference pattern. 2.1

Rule Dependency

Let the depth of a rule r be defined to be the number of rules preceding r when the rule base is sorted in the decreasing order of priority (let us assume smaller number denotes higher priority). Therefore, in this paper we will equate the depth with the priority, which the system operator can arbitrarily assign so as to meet the needs of the network. Before discussing the rule dependency, let us first define a few other terms: Definition 1. We say that a rule r1 = (< Fi1 >, A1 ) overlaps with a rule r2 = (< Fi2 >, A2 ) if ∀i Fi1 ∩ Fi2 = 0, where Fik and Ak is the ith element of the flow definition and the associated action of rule k, respectively. Intuitively, two rules overlap if there is any instance of packet header values that matches both. In our flow definition of < proto, srcIP pref ix, srcportrange, dstIP pref ix, dstportrange > all 5 fields must have non-null intersection for two rules to overlap. Usually, a mask is associated with the IP address prefix [2], where the IP address intersection is determined as follows: min mask ← mask1 & mask2 intersection? ((IP address1 & min mask) = (IP address2 & min mask)) where ‘&’ is the bit-wise AND operator. Definition 2. The strict ordering is a constraint under which two overlapping rules r1 and r2 such that priority(r1 ) > priority(r2 ) are prohibited from exchanging locations if as a consequence it becomes priority(r1 ) < priority(r2 ). Definition 3. The loose ordering is a constraint under which two overlapping rules r1 and r2 such that priority(r1 ) > priority(r2 ) and A1 = A2 are prohibited from exchanging locations if as a consequence it becomes priority(r1 ) < priority(r2 ). The loose ordering constraint can make rule relocations much easier than the strict ordering. In particular, a highly popular rule with a wildcard in one of its fields can promote past a less popular overlapping rule [3]. For instance, given

Taming Large Classifiers with Rule Reference Locality

931

r1 = (< T CP, ∗, ∗, 123.45.6.0/255.255.255.0, 80 >, accept) r2 = (< T CP, 123.45.10.0/255.255.255.0, ∗, 123.145.6.0/255.255.255.0, 80 > , accept) priority(r1 ) < priority(r2 ) r1 cannot promote beyond r2 under the strict ordering constraint. On the other hand, we can freely change the priority relation (i.e., locations) under the loose ordering constraint. This may be important to gather only truly popular rules at the top locations of the rule base, thereby achieving a high “cache hit ratio”. Since it is not very meaningful to assign actions to rules in the synthetic rule base, we enforce only the strict ordering rule in this paper. Rule base reconfiguration using the loose ordering constraint is a subject of future work. A related and important observation of [1] is that in practice the rule overlap is very rare. As we push up a rule, it implies, there are not many rules that have to be moved up piggybacked even if we enforce the strict ordering instead of the loose ordering. Once the overlapping rules are known, we can compute the set of dependent rules D(r) for a rule r as those rules satisfying: ∀r ∈ID(r) r overlaps with r and priority(r ) > priority(r). When we promote r, each rule in D(r) should be also promoted, and must retain a higher priority than r after the move. Then the complexity C of a single rule relocation in terms of the number of comparison operations is bounded as follows: O(F d) = F (d − 1) ≤ C ≤ F

d−1 

k = F d(d − 1)/2 = O(F d2 )

k=1

where the depth of the rule is d, and F is the dimension. The worst-case occurs when all (d − 1) rules above the relocated rule overlaps with it, and the best case is when there is no overlap. The average number of comparisons will be determined by the probability of overlapping for a given pair of rules. It is very difficult to compute, but an important aspect is that once the rule dependency is computed, we can easily perform the subsequent relocations by reusing the already computed dependency relations. Excluding the additions of new rules, the comparison operations above need to be performed only for the first “epoch”. Even if there are additions, the complexity of incrementally updating dependency relations is O(N ) for each addition. When we attempt a rule relocation, we retrieve its dependency relations in O(1) operations (because D(r) is small) and move all dependents along with it. In addition, each epoch, the entire rule base must first be sorted with respect to the reference count, and the complexity is O(N logN ) where N is the rule base size. An algorithm for moving up a frequently referenced rule is described below. 2.2

Reconfiguration Algorithm

The reconfiguration algorithm is done in 2 phases. In the first phase, we sort the rule base with respect to the reference counts recorded in the previous epoch. Then we attempt to move the top T OP HIT S rules up. If the dependency relations are not available, we find the overlapping rules as we promote the popular

932

Hyogon Kim et al.

rule and combine them in a group and attempt to move up the group in entirety. Otherwise, we omit the dependency computation. Since the traffic pattern can change over time, we need to run the reconfiguration algorithm every so often. Fig. 1 shows the reconfiguration algorithm for the first epoch when the dependency relation is not available. Sorted rule table is an array of size N , where each element has two pieces of information: rule ID (rule) and reference count (ref count). As a result of sort(), the array is sorted in the decreasing order of the reference count. Starting from the most popular rule, we promote T OP HIT S rules one by one. For each promotion of a rule r located at dth position, we examine d − 1 higher priority rules. If some higher priority rule r has a higher reference count, r is placed just below r and the promotion stops there. In case there is an overlapping rule, it is combined with r in the same promotion group. Each rule in the promotion group needs to be checked for overlapping with all higher priority rules. Highest rule() returns the top rule in the block, while lowest rule() returns the bottom rule in the block. The bottom rule is the one that originally we wanted to promote. Since overlapping rules are stacked up on the popular rule, the lowest rule is the popular rule. Note that even if there are more than one rules in a promotion group, reference count comparison is done only against the popular rule.

procedure reconfigure() { /* sort the rule base by reference counts in the last epoch */ sorted_rule_table[] = sort(RULE_BASE); for (i=0; i < TOPHITS; i++) /* relocate top TOPHITS rules to the top */ push_up({sorted_rule_table[i].rule}); /* move rule i as far up as possible */ } procedure push_up(R) { /* R is a~set of overlapping rules */ for (i=depth(highest_rule(R))-1; i>=0; i--) { r = lowest_rule(R); /* r is the rule we want to promote */ if (RULE_BASE[depth(i)].ref_cnt > r.ref_cnt) { insert(R, i); /* insert R below ith rule */ return; /* stop */ } for each r in R { if (overlap?(r, RULE_BASE[depth(i)])) { /* merge */ push_up(combine(R, RULE\_BASE[depth(i)])); return; }}}}

Fig. 1. An example of the rule base reconfiguration algorithm

Taming Large Classifiers with Rule Reference Locality

3

933

Performance Evaluation

In this section, we apply our approach to a well-known classification method, RFC [1]. Since RFC cannot handle the rule base size larger than 15,000, we assume that such rule bases are segmented into smaller pieces. We refer to the segmented version of RFC as SRFC, and SRFC+DR is the version augmented with the proposed dynamic rule base reconfiguration. In this section, we evaluate the general effect of dynamic rule base reconfiguration, and then compare the performance of SRFC, SRFC+DR, and ABV. Due to the lack of large, publicly available classifiers we can obtain, we synthesize them. Then we pass several large real-life packet trace through the synthesized rule bases, to measure the packet throughput with SRFC, SRFC+DR, and ABV. We first discuss how we steer the random rule base syntheses so as to maintain generality and fairness. We utilize several large real-life packet traces in rule base syntheses and in the evaluation of the three schemes. 3.1

Rule Base Synthesis

It is difficult to extract general characteristics of classifier rule bases. Rule base configuration is at the discretion of the system administrator, and it can take any structure according to the needs of a particular network. For fairness and generality, therefore, we opt to randomly generate test rule bases and repeat the performance evaluation for different rule bases a few times. This is the approach that many prior works take [1, 2, 9, 10]. We also make use of the observations made about real-life classifiers [1, 2] in order to generate rule bases as close to reality as possible. We have the following guidelines in rule generation that accommodates some of the key observations in [1, 2]. 1. No rules can appear that do not match any packet in the test traffic trace. 2. In the IP address fields, wildcard can appear with a certain fixed probability. Generally, in the IP address fields, the prefix of length 0 ≤ l(j) ≤ 32 can appear with a probability P (l(j)). The ABV paper [2] reports that in the industrial firewalls they used, most prefixes have either a length of 0 or 32, and that there are some prefixes with lengths of 21, 23, 24 and 30. In particular, the destination and source prefix fields in roughly half the rules are wildcarded [2]. We follow these guidelines when fixing P (l(j)). 3. The transport layer protocol field is restricted to a small set of values: TCP, UDP, ICMP, IGMP, (E)IGRP, GRE, IPINIP, and wildcard [1]. The IP protocol number is randomly chosen from a pool of these protocols with a probability associated with each. 4. The transport port number(s) for a given rule can be arbitrary. The RFC paper [1] reports roughly 10% of rules have range specifications, and in particular “> 1023” occurs in 9% of the rules. For simplicity, we assume that well-known port numbers are used as singleton (i.e., no range), and the range specifications are generated only with ephemeral port numbers with 10% probability.

934

Hyogon Kim et al.

Table 1. Traces used in the rule generation Trace Date Duration Total no. of Packets T1 July 23-24, 11:01pm-4:01am 5 hrs. 159.3 million 211.9 million T2 Dec. 14, 9:35am - 12:13pm 2hrs. 38 min. 109.6 million T3 Dec. 17, 9:06am - 10:31am 1hr. 25 min.

5. There is one all-wildcard rule that matches all packets, and this rule is located at the bottom of the rule base. (In firewalls, for instance, this is the “all-deny” rule). Table 1 shows the size and the duration of each packet trace used in the rule generation process. experiments. 3.2

Performance Comparison of SRFC, SRFC+DR, and ABV

The primary metric that we use in this paper is the number of memory accesses. Before discussing this per-packet processing overhead, let us briefly consider the additional memory and table computation overhead for SRFC+DR. In terms of memory usage, there is no additional overhead compared with SRFC [1], except for the temporary memory required for rule sorting and relocation. As far as the relocation is concerned, we can apply some optimization in case rule movement is limited. Namely, we can re-compute only the partitions affected by the movement. By maintaining a small free space in each partition for incoming rules, we can avoid causing cascade movement of rules and subsequent re-computation [12]. As for table computation time, the known sorting complexity is O(N logN ), and the reconfiguration complexity in terms of comparison operation is O(nF d2 ) and O(nF d) without and with dependency information, respectively as we discussed in Section 2.1. Note n is the number of rules to promote, which can be configured to be small in case the reference skew is severe. Figure 2 compares the average number of memory accesses for SRFC, SRFC+DR and ABV. For our 5-field classification, SRFC needs 13 memory accesses for each segmented table lookup. Although there is one more field (i.e. source port) compared with [1], the number of tables to lookup is the same [12]. The numbers for SRFC+DR contain the distances before the first reconfiguration as well as after. For 64- and 128-bit ABV, we used 2-level ABV hierarchy (i.e. BV-ABV-AABV). For 32-bit ABV, we used 3-level ABV hierarchy (i.e. BVABV-AABV-AAABV). For SRFC and SRFC+DR, X(Y ) denotes the number of memory accesses X and the number of rule base partitions Y . For t1 , the total number of rules is 100K, so Y = 10 means that each rule base partition contains 10K rules. For ABV and ABV+DR, X(Y ) denotes the number of memory accesses X and the memory bus width Y . We experimented with three most probable widths: Y = 32, 64, and 128. To our great surprise, ABV shows the worst performance among the three experimented schemes. In particular, even SRFC in the 10K-rule segmented

Taming Large Classifiers with Rule Reference Locality

935

Table 2. Average number of memory accesses per classification for t1 with 100K-rule bases Rule Set SRFC SRFC+DR ABV r0 71.617 (10) 18.225 (10) 150.015 (32) 136.760 (20) 24.194 (20) 142.469 (64) 115.323 (128) 51.004 (10) 16.907 (10) 161.574 (32) r1 101.848 (20) 21.337 (20) 143.018 (64) 122.780 (128) 71.751 (10) 18.786 (10) 202.554 (32) r2 136.402 (20) 25.176 (20) 171.376 (64) 136.968 (128) 72.057 (10) 18.696 (10) 154.578 (32) r3 134.974 (20) 24.889 (20) 140.767 (64) 115.525 (128) 59.718 (10) 17.625 (10) 169.281 (32) r4 112.458 (20) 22.896 (20) 145.234 (64) 121.301 (128) Avg. 65.229 (10) 18.048 (10) 167.600 (32) 124.488 (20) 23.698 (20) 148.573 (64) 122.379 (128)

configuration outperforms ABV. In the 5K-rule segmentation that is much safer in terms of memory usage, SRFC records comparable results (2 better, 1 almost equal, 2 worse) with the best ABV configuration with 128-bit memory bus width. An examination of 50K-rule cases in Fig. 11-2 reveals even larger performance gap. Both in 5K- and 10K-rule segmentation SRFC performs better than ABV. This seems to demonstrate that to a certain rule base size, simply by partitioning the rule base and thereby avoiding possible memory explosion [1], RFC can outperform ABV. Whether the segmented variant of RFC (SRFC) will retain its performance edge over ABV is not certain when the rule bases scale beyond 100K rules. Since SRFC is location-dependent (unlike RFC itself), the number of memory accesses required for a (randomly located) matching rule linearly scales with the rule base size. In contrast, ABV performance is not so much location-dependent thanks to its rule sorting algorithm [2], so the performance gap between SRFC and ABV might close with larger rule base size. In fact, the ratio of the best numbers of each scheme with growing rule base size seems to agree with our expectation. ABV(50K, 128) / SRFC(50K, 5) = 2.37 ABV(75K, 128) / SRFC(75K, 8) = 2.01 ABV(100K, 128) / SRFC(100K, 10) = 1.88 The scalability concern for SRFC largely disappears with SRFC+DR. This is because the rule reference pattern under the rule base reconfiguration is no longer arbitrary. Rather, SRFC+DR continually condenses the most popular rules to

936

Hyogon Kim et al.

the top partitions, so SRFC+DR is much less location-dependent than SRFC. In fact, above best performance ratio increases with the number of rules with SRFC+DR, as we can see below: ABV(50K, 128) / SRFC+DR(50K, 5) = 5.566 ABV(75K, 128) / SRFC+DR(75K, 8) = 6.578 ABV(100K, 128) / SRFC+DR(100K, 10) = 6.780 Therefore, SRFC+DR maintains or even strengthens its lead over ABV as the rule base size increases. The performance ratio exceeds 550rule base size larger than 50K. As for SRFC, it is slower than SRFC+DR at least by a factor of 2.5. The narrowest performance gap is for 5-partition 50K-rule bases, and the gap widens as the number of rules increases. For 5-partition 100K-rule bases, the speedup is in excess of 3.5. The gap widens even more as the number of partitions doubles, due to SRFC’s scalability limitation, and the speedup is over 5. SRFC memory accesses almost linearly increase with the number of partitions whereas SRFC+DR exhibits much slower growth. Finally, we observe that ABV improves with larger memory bandwidth. However, the improvement is linear rather than exponential, with exponentially growing memory bandwidth.

4

Conclusion

This paper illuminates the value of a new aspect of packet classification problem, i.e., rule reference dynamics. Circumstantial evidences point to the possibility that any classifier will exhibit (highly) skewed rule reference pattern. Based on this premise, we proceed to design a classification scheme that captures the pattern of the subject packet stream and reconstitutes its own rule base. With several large rule bases synthesized from real-life packet traces, we demonstrate that this approach indeed yields a few hundred percent speedup over other scalable schemes, such as ABV. We firmly believe that exploiting the rule reference locality is a key to scale to a very large number of rules in future packet classifiers. Finally, there are a few issues that still wait for further exploration. Event-triggered reconfiguration as opposed to periodic reconfiguration and relocation with the loose ordering constraint instead of the strict ordering constraint are expected to improve performance of our approach. We will employ the new techniques in our future work where we will attempt classification on 1 million rules and beyond. Finally, the current scheme fits better with such applications that classifies most packets with non-default (“bottom”) rules.

References [1] P. Gupta and N. McKeown, “Packet classification on multiple fields,” ACM SICOMM 1999. 928, 929, 931, 933, 934, 935 [2] F. Baboescu and G. Varghese, “Scalable packet classification,” ACM SIGCOMM 2001. 928, 930, 933, 935

Taming Large Classifiers with Rule Reference Locality

937

[3] Flow analysis of passive measurement data, http://pma.nlanr.net/PMA/Datacube.html. 930 [4] M. Poletto et al., “Practical Approaches to Dealing with DDoS Attacks,” NANOG22, May 2001. 928, 930 [5] T. V. Lakshman and D. Stiladis, “High-speed policy-based packet forwarding using efficient multi-dimensional range matching,” ACM SIGCOMM 1998, pp. 191202. 928, 929 [6] V. Srinivasan, S. Suri, G. Varghese, and M. Valdvogel, “Fast and scalable layer 4 switching,” ACM SIGCOMM 1998, pp. 203-214. 928 [7] V. Srinivasan, G. Varghese, and S. Suri, “Fast packet classification using table space search,” ACM SIGCOMM 1999, pp. 135-146. 928 [8] M. M. Buddhikot, S. Suri, and M. Waldvogel, “Space decomposition techniques for fast layer-4 switching,” Protocols for High Speed Networks, vol. 66, no. 6, pp. 277-283, 1999. 928, 929 [9] A. Feldmann and S. Muthukrishnan, “Tradeoffs for packet classification,” IEEE Infocom 2000. 928, 929, 933 [10] T. Woo, “A modular approach to packet classification: algorithms and results,” IEEE Infocom 2000. 928, 933 [11] P. Gupta and N. McKeown, “Packet classification using hierarchical intelligent cuttings,” Hot Interconnects VII, 1999. 929 [12] H. Kim, “Exploiting reference skew in large classifiers for scalable classifier design,” techreport, 2001. 934 [13] K. Houle et al., “Trends in Denial of Service Attack Technology”, CERT Coordination Center, Oct. 2001. Available at http://www.cert.org/archive/pdf/DoS trends.pdf.

Suggest Documents