tuple pruning using bloom filters for packet classification - CSIE -NCKU

0 downloads 0 Views 1MB Size Report
use Bloom filters.5-7 A Bloom filter is a space- efficient data structure consisting of a bit vec- tor that is used to test whether an element is a member of a set.
[3B2-14]

mmi2010030048.3d

28/6/010

13:17

Page 48

..........................................................................................................................................................................................................................

TUPLE PRUNING USING BLOOM FILTERS FOR PACKET CLASSIFICATION ..........................................................................................................................................................................................................................

TUPLE PRUNING FOR PACKET CLASSIFICATION PROVIDES FAST SEARCH AND A LOW IMPLEMENTATION COMPLEXITY. THE TUPLE PRUNING ALGORITHM REDUCES THE SEARCH SPACE TO A SUBSET OF TUPLES DETERMINED BY INDIVIDUAL FIELD LOOKUPS THAT CAUSE OFF-CHIP MEMORY ACCESSES. THE AUTHORS PROPOSE A TUPLE-PRUNING ALGORITHM THAT REDUCES THE SEARCH SPACE THROUGH BLOOM FILTER QUERIES, WHICH DO NOT REQUIRE OFF-CHIP MEMORY ACCESSES.

......

Hyesook Lim So Yeon Kim Ewha Womans University

cross-producting,6 perform a separate lookup on each field to narrow the search space. Hence, these algorithms cause off-chip memory accesses for both the individual field lookups and the final combined lookup. We propose to replace each field lookup with an on-chip Bloom filter and add a new tuple Bloom filter to further reduce unnecessary off-chip memory accesses. The proposed idea can be applied to any of the above algorithms. Here, we show how we apply it to tuple space pruning. (See the ‘‘Related work in using tuple spaces for packet classification’’ sidebar for a discussion of this approach.)

Packet classification enables routers to support various value-added services, such as blocking traffic from insecure sites, giving preferential treatment to premium traffic, and routing based on traffic type and source.1 Routers classify arriving packets by comparing them to a set of predefined rules and finding the highest priority rule or the rule that best matches the packet header fields (the best matching rule, or BMR). A rule consists of a set of fields made up of the IP source prefix, the IP destination prefix, the source port range, the destination port range, and the protocol type and flags. The difficulty of packet classification is in performing multiple field lookups at wire speed for every incoming packet given that the packet arrival rate can be several million packets per second. Various algorithms have attempted to find an effective solution. Most of these efforts use high bandwidth and a small on-chip memory, while locating the rule database in a slower and higher capacity off-chip memory.2 Many packet classification algorithms, such as tuple space pruning,3 cross-producting,4 coarse-grained tuple space,5 and modified

A Bloom filter, which represents a set P ¼ {x1, x2, . . . , xn } of n elements, is described by an array of m bits, initially all set to 0. For each element in P, k different hash functions

Published by the IEEE Computer Society

0272-1732/10/$26.00 c 2010 IEEE

Bloom filter theory Many current networking applications use Bloom filters.5-7 A Bloom filter is a spaceefficient data structure consisting of a bit vector that is used to test whether an element is a member of a set. A Bloom filter supports two operations: programming and querying.

Programming

..............................................................

48



[3B2-14]

mmi2010030048.3d

28/6/010

13:17

Page 49

..............................................................................................................................................................................................

Related work in using tuple spaces for packet classification A tuple space search algorithm1 converts a packet classification problem into plural exact match problems in tuple space. A tuple is defined as a vector of D lengths for D-dimensional (or field) packet classification. It is denoted as (i1, i2, . . . , iD), where id (for d ¼ 1, . . . , D) is the length of the dth field. For example, a 2D packet classification using two IP prefix fields can have 33  33 different tuples given 0 to 32-bit lengths. Hence, a packet arriving at a link can be queried with each tuple. In searching a tuple, we can use an exact-match method, such as hashing, since a tuple is composed of known lengths. Because it is too time consuming to query every tuple in classifying a packet, Srinivasan et al. introduced a practical solution, called tuple space pruning (TSP).1 Assume a two-field TSP algorithm, which, for a given input, performs individual field lookups. If it finds matches at p different lengths for a field and at q different lengths for the other field, the number of intersected tuples that should be queried is reduced to p by q. Additionally, because there is a high possibility of many inactive tuples that are not associated with any rule, the number becomes less than p by q. Another interesting pruning approach is to partition the tuple space into coarse-grained tuples with dissimilar rules to limit the number of subsets to be searched.2 Yet another

are computed in such a way that the resulting hash index (which is a pointer to a Bloom filter bit location) j is of the range 0  j < m. All the Bloom filter bit locations corresponding to j are set to 1.

Querying To test whether an element x is a member of P, we perform the following query. For input x, we generate k hash indices using the same hash functions we used to program the filter. We then check the bit locations in the Bloom filter corresponding to the hash indices. If at least one of the locations is 0, the element is absolutely not a member of P. If it were a member of the set, the bit location corresponding to the hash index would have been set to 1 during programming. This result is called negative. If all the bit locations are set to 1, the result is called positive. However, even if all the bit locations are set to 1, they might not have been set only by the element under querying. Some other elements could have set them. This type of positive result is termed false positive. On the whole, a Bloom filter might produce false positives but never false negatives.

pruning approach uses precomputed markers to direct to the next tuple in the search space.3 The TSP algorithm is easy to realize and provides fast search performance. However, it can be improved. First, it requires individual field lookups, which cause off-chip memory accesses. Second, the intersected list of tuples includes unnecessary tuples, since the tuple is generated only by combining the lengths without considering values. We describe these issues in more detail in the main article.

References 1. V. Srinivasan, S. Suri, and G. Varghese, ‘‘Packet Classification Using Tuple Space Search,’’ Proc. ACM SIGCOMM, ACM Press, 1999, pp.135-146. 2. H. Song, J. Turner, and S. Dharmapurikar, ‘‘Packet Classification Using Coarse-Grained Tuple Spaces,’’ Proc. Architecture for Networking and Comm. Systems (ANCS), ACM Press, 2006, pp. 41-50. 3. P. Wang et al., ‘‘Scalable Packet Classification for Enabling Internet Differentiated Services,’’ IEEE Trans. Multimedia, vol. 8, no. 6, 2006, pp. 1239-1249.

The proposed algorithm Here, we show the 2D version of our proposed algorithm. We can extend this algorithm for an arbitrary number of fields. Figure 1 shows the overall architectures of the tuple space pruning (TSP) algorithm and our proposed algorithm (using the source and destination prefix fields). Compared to TSP, the proposed architecture replaces each individual field lookup with a Bloom filter—either a source Bloom filter (src-Bloom filter) or a destination Bloom filter (dst-Bloom filter). We also add a tuple Bloom filter (tuple-Bloom filter). When programming a fixed number of elements into a Bloom filter, the Bloom filter’s size and the number of hash functions affect performance, which is determined by the false positive rate. Analytic discussions on Bloom filter characteristics are available elsewhere.7,8 Determining the optimum size of a Bloom filter and the optimum number of hash functions needed to minimize the false positives is beyond this article’s scope. Because using between 2 and 5 hash functions for a small-size Bloom filter (about 4 to 8 times the given number of elements)

....................................................................

MAY/JUNE 2010

49

[3B2-14]

mmi2010030048.3d

28/6/010

13:17

Page 50

............................................................................................................................................................................................... FEATURE

Off-chip table Source prefix lookup

Matched lengths

Off-chip hash table

Intersected list

Input packet

Off-chip table Destination prefix lookup

Matched lengths

(a)

Source Bloom filter

Positive lengths

On-chip

Off-chip hash table

Positive Intersected Tuble tuples list Bloom filter

Input packet

Destination Bloom filter Positive lengths (b)

Figure 1. A comparison of the overall structures of the algorithms: the tuple space pruning (TSP) algorithm (a), and the proposed tuple pruning algorithm (b).

minimizes the probability of a false positive,7 for simplicity we describe the proposed algorithm using an example with fixed-size Bloom filters and two hash functions. Any arbitrary hash generator can serve as a hash function. The Bloom filter used in the proposed algorithm must accommodate prefixes of arbitrary lengths in a single Bloom filter. It requires a hash generator that produces hash indices for variable-length inputs. A cyclic redundancy check (CRC) generator suits our purpose.9 CRC generators scramble bits of a given input and produce a fixedlength binary sequence known as a CRC code, regardless of the input length. We can easily obtain any number of hash indices (each of which is used as a pointer to a Bloom filter bit location or to a hash table entry) from the generated CRC code by selecting different combinations of bits. Let Pt be a rule set composed of source and destination prefix pairs, and Lt be the set of the distinct length pairs of Pt . If Pt ¼ {R1(00*, *), R2(1*, 00*), R3(01*, 100*), R4(101*, 100*), R5(101*, 11*), R6(1*, *)}, then Lt ¼ {(2, 0), (1, 2), (2, 3), (3, 3),

....................................................................

50

IEEE MICRO

(3, 2), (1, 0)}. Let P1 and P2 be the set of distinct source and the set of distinct destination prefixes included in Pt , and L1 and L2 be the sets of distinct lengths. Then, P1 ¼ {00*, 1*, 01*, 101*}, P2 ¼ {*, 00*, 100*, 11*}, L1 ¼ {1, 2, 3}, and L2 ¼ {0, 2, 3}. Figure 2 shows the detailed structure of the proposed algorithm. Figure 2a shows the CRC-8 generator. A random number (11111100 in this example) initializes the registers of the 8-bit CRC generator. Figure 2b shows the proposed architecture programmed for Pt . Figure 2c shows the hash table’s resultant entry structure when we use the two prefix fields for tuple pruning. It is a simple hash table storing 5D rules without additional data structure. Its width is 22 bytes. A port range is represented by a start and an end in a single entry. We assume the rules mapped to the same hash table entry are stored by a linked list in decreasing order of priority (as shown in Figure 2b for rules R1 and R6), and the last field stores a pointer for the linked list. When programming the src-Bloom filter for P1, we enter a source prefix bit serially

[3B2-14]

mmi2010030048.3d

28/6/010

13:17

Bit 0

Page 51

Bit 1

Bit 2

Bit 3

Input Bit 4

Bit 6

Bit 5

Bit 7

(a)

src-BF On-chip

CRC-8 generator

L1 = {1, 2, 3}

0

0

1

1

0

0

2

1

1

0

3

0

2

1

4

1

3

1

5

0

4

1

6

1

5

0

7

1

6

0

7

0

8

0

Input

tuple-BF

CRC-8 generator

x dst-BF

CRC-8 generator

L2 = {0, 2, 3}

0

0

9

1

1

1

10

0

2

1

11

1

3

1

12

1

4

0

13

1

5

0

14

0

6

1

15

1

7

1

Lt = {(1,0), (1,2), (2,0), (2,3), (3,2), (3,3)}

Off-chip hash table 0 1 R5 2 3 R2 4 R4 5 R3 6 7 R1

R6

8 bits

13 bits

(b) 1 bit

Entry valid

13 bits

Rule no.

6 bits

32 bits

6 bits

32 bits

16 bits

16 bits

Dst. Source Source Src. port prefix Src. prefix prefix Dst. prefix port length end start length

16 bits

16 bits

Dst. port start

Dst. port end

1 bit

Protocol Protocol Linked type list wild

(c)

Figure 2. The proposed architecture programmed for an example rule set: the cyclic redundancy check (CRC)-8 generator (a), the programmed Bloom filters and the off-chip hash table (b), and the entry structure of the off-chip hash table (c).

into the CRC generator. After entering the last bit of the prefix, we obtain a CRC code. We choose two hash indices from the generated CRC code and repeat this procedure for all elements in P1, and remember L1 to use in the search procedure. Similarly, we program P2 to the dst-Bloom filter, and remember L2. Table 1 shows the CRC codes and hash indices we used for programming the proposed architecture. We programmed

the 8-bit src-Bloom filter and dst-Bloom filter in Figure 2b using two hash indices chosen from the first and last three bits of the CRC code. We can use any combination of the two prefixes to program the tuple-Bloom filter. Here we use the concatenated strings of the two prefixes. For the 16-bit tuple-Bloom filter shown in Figure 2b, we chose the hash indices from the first and the last four bits of

....................................................................

MAY/JUNE 2010

51

[3B2-14]

mmi2010030048.3d

28/6/010

13:17

Page 52

............................................................................................................................................................................................... FEATURE

Table 1. The distinct prefix, cyclic redundancy check (CRC) code, and hash indices used for programming the proposed architecture. Distinct prefixes or

Bloom filter

Hash table

CRC code

indices (decimal)

index (decimal)

00*

00111111

1, 7

N/A

1*

11001111

6, 7

01*

10001110

4, 6

101*

01000010

2, 2

Destination 00*

00111111

1, 7

100*

11110011

7, 3

11*

01010110

2, 6

concatenated prefixes Source

N/A

Tuple R1

(2, 0)

00*

00111111

3, 15

7

R2

(1, 2)

100*

11110011

15, 3

3

R3

(2, 3)

01100*

10111101

11, 13

5

R4 R5

(3, 3) (3, 2)

101100* 10111*

00100100 11111001

2, 4 15, 9

4 1

R6

(1, 0)

1*

11001111

12, 15

7

the CRC code, as Table 1 shows. Assuming that the number of hash table entries is 2dlog2 N e , where N is the number of rules, we need a 3-bit hash index to store a rule into the hash table. We programmed the hash table in Figure 2b using the last three bits of the CRC code as shown in Table 1. Figure 3 describes the proposed algorithm’s search procedure. Assume a search where the input length is 4 bits (it is 32 bits in IPv4) and the input packet has a source and destination address pair (A1, A2) ¼ (0100, 1001). Table 2 shows the CRC code (generated when we enter the substrings of the input into the CRC generator), the corresponding Bloom filter indices, and the Bloom filter result. For L1 ¼ {1, 2, 3}, referring to the srcBloom filter in Figure 2b, the 1-bit string of the source has a negative result since the bit value of entry 3 (which is indexed by a Bloom filter index of 0* in Table 2) is zero. The 2- and 3-bit strings have positive results, so L1(A1) ¼ {2, 3}. Similarly, for L2 ¼ {0, 2, 3}, L2(A2) ¼ {0, 2, 3}. Note that a zero length cannot be programmed to the Bloom filter, so it is always positive. Therefore, the intersected list is L1(A1)  L2(A2) ¼ {(2, 0), (2, 2), (2, 3), (3, 0), (3, 2),

....................................................................

52

IEEE MICRO

(3, 3)}. However, the (2, 2) and (3, 0) tuples are inactive tuples that are not included in Lt. Hence, Lc ¼ {(2, 0), (2, 3), (3, 2), (3, 3)}. Table 2 also shows the queried tuple, the concatenated string for the tuple, the corresponding CRC code, the Bloom filter indices, and the Bloom filter results. Since tuple (2, 0), (3, 2), and (3, 3) turn out to be negative by referring to the tuple-Bloom filter bits in Figure 2b corresponding to the Bloom filter indices in Table 2, the hash table is not accessed for these tuples. Hence the off-chip hash table needs to be accessed only once for the tuple (2, 3). Table 2 also shows the corresponding hash table index. The algorithm compares the input to the entry on index 5 of the hash table shown in Figure 2b, and find that it matches R3 in first two fields. The rule R3 is returned as the BMR if all the remaining fields are matched. If we apply the TSP algorithm to the same example, the source lookup against P1 produces a match of length 2, and the destination lookup against P2 produces matches of lengths 0 and 3. Each lookup will cause at least 4 to 5 off-chip memory accesses. The intersected list of tuples is (2, 0) and (2, 3), and the TSP algorithm will access the hash table for these two tuples.

[3B2-14]

mmi2010030048.3d

28/6/010

13:17

Page 53

1D_Bloom filter_Search (Ai) { for (l = 32; l > 0; l--) { if ( l is not a member of Li) break; // l is an inactive length else { CRC_code = CRC_32 (S(Ai, l)); ind1 = CRC_code [32: 32 - li + 1]; //most significant li bits of CRC code, where size of a src-Bloom filter(or dst-Bloom filter) is 2li ind2 = CRC_code [ li - 1:0]; //least significant li bits if ( (1D_Bloom filter[ind1]&1D_Bloom filter[ind2]) == 1) //positive put l into Li(Ai); } return Li(Ai); }

-- --

Search (in_packet) { BMR = N-1; //default BMR is the lowest priority rule L1(A1) = 1D_Bloom filter_Search (A1); // L1(A1) = {l l ˛ L1 and S(A1, l) is a positive} L2(A2) = 1D_Bloom filter_Search (A2); // L2(A2) = {l l ˛ L2 and S(A2, l) is a positive} Lc = Lt ˙ (L1(A1) ˜ L2(A2)); // L1 ˜ L2 is the intersected set of L1(A1) and L2(A2) while (Lc is not empty) { //for each element of Lc , where (l1, l2) is a tuple of the element tuple_value = concate (S(A1, l1), S(A2, l2)); CRC_code = CRC_64 (tuple_value); ind1 = CRC_code [63: 63 - lt + 1]; //most significant lt bits of CRC code, where size of tuple-Bloom filter is 2lt ind2 = CRC_code [ lt - 1:0]; //least significant lt bits if ( (tupleBloom filter[ind1]&tupleBloom filter[ind2]) == 1) { //positive ind_hash = CRC_code [ lh - 1:0]; //least significant lh bits of CRC code, where size of hash table is 2lh rule = Hash_Table [ind_hash]; if ( (in_packet == rule) & (priority(rule) is higher than BMR)) BMR = priority(rule); } remove the current element from Lc; } return BMR; }

Figure 3. The proposed algorithm’s search procedure, where A1 and A2 are the given source and destination addresses of an input packet, and S(Ai, l) is the substring of the most significant l bits of Ai.

Therefore, the TSP algorithm requires 10 to 12 off-chip accesses, whereas the proposed algorithm requires only one. Note that the tuple (2, 0) caused an unnecessary access in the TSP algorithm, but was filtered out by the tuple-Bloom filter in the proposed algorithm. In the TSP algorithm, a tuple consists only of lengths, and values are not used in determining the tuples to be accessed. However, in the proposed algorithm, the tupleBloom filter uses the combined prefix values and filters out unnecessary tuples, such as the tuple (2, 0) in this example.

For a D-dimensional packet classification for D > 2, we extend the proposed algorithm by using Bloom filters for each field after converting the port ranges to prefixes.10 As more fields are involved in the tuple space, the number of tuples increases, which negatively affects search performance. Alternatively, using a single field minimizes the number of tuples (which is not actually a tuple since the tuple generally means involving more than a field), but the number of rules associated with a tuple increases. In this case, different rules can

....................................................................

MAY/JUNE 2010

53

[3B2-14]

mmi2010030048.3d

28/6/010

13:17

Page 54

............................................................................................................................................................................................... FEATURE

Table 2. The Bloom filter query results for input (0100, 1001).

Length

Prefix

CRC

Bloom filter

Bloom filter

Hash table

code

indices

result

index

N/A

Source address 1

0*

01111110

3, 6

Negative

2

01*

10001110

4, 6

Positive

3

010*

01000111

2, 7

Positive

Destination address 2

10*

11100111

7, 7

Positive

100*

11110011

7, 3

Positive

(2, 0)

01*

10001110

8, 14

Negative

(2, 3)

01100*

10111101

11, 13

Positive

5

(3, 2)

01010*

00001001

0, 9

Negative

N/A

(3, 3)

010100*

10000100

8, 4

Negative

N/A

3 Tuple

N/A

N/A

Table 3. The number of distinct prefixes (or tuples) and the distinct lengths. Source Type

Rule sets

No. of rules N

n(P1)

n(Pt)

n(Lt)

ACL1

902

47

13

266

25

460

65

IPC

4,660 972

641 282

14 19

898 408

30 20

2,443 876

102 162

IPC5

4,468

121

33

446

33

2,876

680

FW1

852

133

25

167

15

374

90

FW5

4,351

64

33

120

33

1,144

579

Performance evaluation We performed simulations for rule sets created by Classbench,11 which is widely used in evaluating the performance of packet classification algorithms.5,6 In these simulations, we used two prefix fields for tuple pruning. We stored the remaining fields,

IEEE MICRO

n(L2)

ACL5 IPC1

have the same value for the chosen field. These rules are mapped to the same hash table entry, degrading the search performance. For efficient tuple space search, the number of tuples should be small while the field values composing the tuple space should have great variety. Hence, it is necessary to select the proper number and type of fields when composing the tuple space. We are currently investigating how to determine the number and type of fields that will optimize the tuple pruning performance.

54

n(P2)

Tuple

ACL

FW

....................................................................

n(L1)

Destination

including the port ranges, in an off-chip hash table, and compared all fields with a given input when the hash entry was accessed. Because the proposed method doesn’t necessarily convert a port range into a number of prefixes,10 it is simpler to implement. We generated three types of 5D rule sets—access control list (ACL), IP chain (IPC), and firewall (FW)—with two sets for each type— one for about 1,000 rules and the other for about 5,000 rules. We also generated input traces. Table 3 shows the characteristics of the rule sets. Let n(S ) be the number of elements included in a set S. The ACL has the smallest and the IPC has the largest number of distinct tuples in n(Lt). The FW has the smallest variety and the IPC has the largest variety in n(Pt). These characteristics affect the Bloom filter performance and the off-chip hash table performance, as we will show.

[3B2-14]

mmi2010030048.3d

28/6/010

13:17

Page 55

Let hn(P)i be the smallest multiple of 2 which is equal to or greater than n(P)— that is,

16 14

10 8 6 4 2 0

4

8

16

32

Source/destination Bloom filter size

(a) 50

Number

40 30 20 10 0

4

8

16

32

Source/destination Bloom filter size

(b) 50

Query Negative True positive

40 Number

We can adjust the sizes of the source, destination, and tuple Bloom filters proportional to hn(P1)i, hn(P2)i, and hn(Pt)i, respectively. We first investigated the performance of the individual Bloom filters (the src-Bloom filter and the dst-Bloom filter) related to size. We fixed the tuple-Bloom filter size at 4hn(Pt)i, and increased the size of the individual Bloom filters by a factor of 4, 8, 16, and 32. Figure 4 shows the number of tuple-Bloom filter queries, negatives, and positives, all in terms of the average per packet. As the size of the individual Bloom filters increases, the number of queries to the tuple-Bloom filter decreases quickly. Therefore, the individual Bloom filters’ tuple pruning performance is proportional to the filters’ size, and the large size effectively filters out the lengths that cannot match. The number of positives (that is, the summation of true and false positives) is directly related to the number of off-chip hash table accesses, and this number decreases as the sizes of the individual Bloom filters increase. The number of true positives is the number of rules matching the two prefix fields among all the positives, and the number of true matches is the number of rules matching all the fields among the true positives. Our next simulation sought to determine the effectiveness of the tuple-Bloom filter related to its size since the false positive rate is inversely related to a Bloom filter’s size. For fixed-size source and destination Bloom filters, 4hn(P1)i and 4hn(P2)i, Figure 5 shows the number of tuple-Bloom filter negatives, positives, true positives, false positives, and true matches, all in terms of the average per packet, as the size of tuple-Bloom filter increases by a factor of 1, 2, . . . , 32. Because the size of the individual Bloom filters is fixed, the query number to the tupleBloom filter is constant—13.4, 46.8, and 40.5 for the ACL5, IPC5, and FW5, respectively. As the tuple-Bloom filter size increases, the number of negatives increases.

Number

12

hnðP Þi ¼ 2dlog2 nðP Þe :

False positive Positive True-match

30 20 10 0

(c)

4

8

16

32

Source/destination Bloom filter size

Figure 4. The number of queries, negatives, true positives, false positives, positives, and true matches in terms of the average per packet in the tuple-Bloom filter for the 5D rule sets: access control list (ACL5) (a), IP chain (IPC5) (b), and firewall (FW5)(c).

Thus, the tuple-Bloom filter effectively filters out unnecessary tuples. The number of positives (and the number of false positives) decreases as the tuple-Bloom filter size

....................................................................

MAY/JUNE 2010

55

[3B2-14]

mmi2010030048.3d

28/6/010

13:17

Page 56

............................................................................................................................................................................................... FEATURE

10 8 Number

6 4 2 0

1

2

4

8

16

32

Tuple Bloom filter size (a) 40 35

Number

30 25 20

hnðN Þi ¼ 2dlog2 nðN Þe ;

15 10 5 0

1

2

4

8

16

32

Tuple Bloom filter size (b) 35 30

Number

25

Negative Positive True positive

20

False positive True-match

15 10 5 0

1

2

4

8

16

32

Tuple Bloom filter size (c)

Figure 5. The number of negatives, positives, true positives, false positives, and true matches in terms of the average per packet in the tuple-Bloom filter for the 5D rule sets as the tuple-Bloom filter’s size increases by a factor of 1 to 32: access control list (ACL5) (a), IP chain (IPC5) (b), and firewall (FW5)(c).

....................................................................

56

increases. For a tuple-Bloom filter size 8hn(Pt)i, the average number of tupleBloom filter positives per packet is 4.0, 11.5, and 9.6 for the ACL5, IPC5, and FW5, respectively. Table 4 compares the proposed algorithm’s performance with other algorithms in terms of memory requirements and the average and the worst-case search performance per packet measured by the number of memory accesses. We can determine the proper sizes of the Bloom filters for our proposed algorithm based on the decreasing rate of false positives in Figures 4 and 5. Because we used two hash indices in this simulation, the false positives do not decrease much for anything bigger than factor 8. The simulation results in Table 4 use Bloom filters with sizes 8hn(P1)i, 8hn(P2)i, and 8hn(Pt)i. The hash table entry number is hn(N)i, where

IEEE MICRO

and N is the number of rules. In an ideal case, the average number of offchip hash accesses is equal to the average number of tuple-Bloom filter positives in our proposed algorithm, but it is shown to be larger. Notably, the FW type has many more hash accesses than tuple-Bloom filter positives. One reason is that in our simulation, we assume the rules mapped to a single hash entry are compared sequentially. Because the FW has the smallest variety in Pt , as Table 3 shows, many different rules have a same source-destination pair and map to a single hash entry; they cause many accesses by being compared sequentially. We can solve this issue by applying some of the other fields to tuple pruning. The other reason is that two different rules having a different source-destination pair were collided to the same hash entry. We can solve this issue by finding a better hash function that distributes the rules more uniformly. Techniques to organize the hash table to reduce the number of collided entries are described elsewhere.8,9 In the TSP algorithm, each field lookup consumes 4 to 5 off-chip memory accesses using the algorithm of binary search on levels.1 Moreover, the TSP algorithm does not use the combined prefix values in reducing the number of tuples. Hence, its search

[3B2-14]

mmi2010030048.3d

28/6/010

13:17

Page 57

Table 4. The performance comparison to the other algorithms. PriorityMetrics

Rule sets

Memory requirement

ACL1

902

22.1

62.8

82.9

56.4

29.9

153.3

ACL5 IPC1

4,660 972

174.2 23.0

273.2 63.8

401.5 121.6

200.2 71.2

145.6 30.9

2,793.0 154.3

IPC5

4,468

172.7

184.2

224.7

234.3

139.9

2,531.0

FW1

852

22.0

42.9

39.4

35.2

27.2

111.9

FW5

4,351

170.3

172.1

119.1

479.8

136.0

2,340.0

17.5

77.2

38.6

35.6

66.0

19.2

84.0

50.1

59.6

64.1

(Kbytes)

Memory accesses per packet (average)

No. of rules (N)

ACL1

902

ACL5

4,660

Proposed

6.48 11.4 7.90

TSP3

H-trie1

Area-based quad-trie1

based quad-trie12

Bit vector1

IPC1

972

27.8

71.9

94.5

73.6

63.6

IPC5 FW1

4,468 852

18.7 33.6

36.3 45.4

85.6 52.1

344.8 369.3

202.1 197.9

151.9 196.6

FW5

4,351

69.2

39.5

44.2

660.5

571.1

738.8

Memory accesses per

ACL1

902

55

65

124

64

75

68

packet (worst case)

ACL5

4,660

59

59

177

94

113

76

IPC1

972

19

42

128

119

106

80

IPC5

4,468

40

65

192

415

295

230

FW1

852

75

93

117

444

293

318

FW5

4,351

85

89

146

1193

999

1,044

performance is much worse than the proposed algorithm. Detailed description on hierarchical trie (H-trie), area-based quad-trie (AQT), bit-vector (BV), and priority-based quad-trie (PQT) algorithms can be found in previous work.1,12 Even though the simulation result is the simplest case of our proposed algorithm, which uses two hash functions and assumes rules mapped to a same entry are sequentially compared, the proposed algorithm shows the best performance in all metrics. The proposed algorithm’s memory requirements, shown in Table 4, are the summation of the memories for the Bloom filters (2 to 6 Kbytes) and the off-chip hash table (21 to 168 Kbytes). If we use a 4-bit counter for each Bloom filter bit to provide the incremental deletion of rules, the required memory of the Bloom filters would be 8 to 24 Kbytes, still small enough to fit into a chip. For the off-chip hash table, if we use a 250-MHz QDRII SRAM2 with 36-bit width, each hash entry of 22 bytes is read through five accesses, taking 20 nanoseconds. Since the proposed algorithm consumes 7 to 40 memory accesses per packet, it takes 140 to 800 ns. Hence, the average throughput is 1.25 to 7.14 million

packets per second (Mpps). In an ideal case, considering the 4 to 12 average number of tuple-Bloom filter positives shown in Figure 5, the proposed algorithm could achieve a throughput of 4.17-12.5 Mpps.

B

ecause the performance of tuple pruning depends on the number and type of selected fields, and because the Bloom filter has a probabilistic data structure, it is not easy to formulate the performance of our proposed algorithm. The mathematical analysis on optimizing the pruning performance through the Bloom filters should be investigated further. Recently, new network applications demanding a multimatch packet classification have emerged. In these applications, all matching results including the BMR must be returned.10 We therefore need efficient algorithms that can perform both a highest priority match and a multimatch packet classification. Because it is simple to return all matching results in the hash table lookups, the proposed algorithm naturally enables both the highest priority match and the mulMICRO timatch packet classification.

....................................................................

MAY/JUNE 2010

57

[3B2-14]

mmi2010030048.3d

28/6/010

13:17

Page 58

............................................................................................................................................................................................... FEATURE

Acknowledgments This work was supported by the National Research Foundation of Korea (NRF) through a grant funded by the Korean government (2010-0000483) and by the Korean Ministry of Knowledge Economy under the HNRC-ITRC support program supervised by the NIPA (NIPA-2010-C1090-10110010).

7. S. Dharmapurikar, P. Krishnamurthy, and D. Taylor, ‘‘Longest Prefix Matching Using Bloom Filters,’’ IEEE/ACM Trans. Networking, vol. 14, no. 2, 2006, pp. 397-409. 8. H. Song et al., ‘‘Fast Hash Table Lookup Using Extended Bloom Filter: An Aid of Network Processing,’’ Proc. ACM SIGCOMM, 2005, pp. 181-192. 9. C. Martinez, D. Pandya, and W. Lin, ‘‘On Designing Fast Non-uniformly Distributed IP

....................................................................

Address

References

IEEE/ACM Trans. Networking, vol. 17, no. 6,

1. H.J. Chao, ‘‘Next Generation Routers,’’ Proc. IEEE, vol. 90, no. 9, 2002, pp. 1518-1588. 2. H. Yu and R. Mahapatra, ‘‘A MemoryEfficient Hashing by Multi-Predicate Bloom

Lookup

Hashing

Algorithms,’’

2009, pp.1916-1925. 10. F. Yu and T.V. Lakshnam, ‘‘Efficient Multimatch Packet Classification and Lookup

Filters for Packet Classification,’’ Proc. IEEE

with TCAM,’’ IEEE Micro, vol. 25, no. 1, 2005, pp. 50-59.

Int’l Conf. Computer Comm. (INFOCOM),

11. D.E. Taylor, J.S. Turner, ‘‘ClassBench: A

IEEE Press, 2008, pp. 2467-2475.

Packet Classification Benchmark,’’ IEEE/ACM

3. V. Srinivasan, S. Suri, and G. Varghese, ‘‘Packet Classification Using Tuple Space

Trans. Networking, vol. 15, no. 3, 2007, pp. 499-511.

Search,’’ Proc. ACM SIGCOMM, ACM

12. H. Lim, M. Kang, and C. Yim, ‘‘Two-dimen-

Press, 1999, pp.135-146. 4. V. Srinivasan et al., ‘‘Fast and Scalable Layer

sional Packet Classification Algorithm Using

Four Switching,’’ Proc. ACM SIGCOMM,

a Quad-tree,’’ Computer Comm., vol. 30, no. 6, 2007, pp. 1396-1405.

ACM Press, 1998, pp. 191-202. 5. H. Song, J. Turner, and S. Dharmapurikar, ‘‘Packet

Classification

Using

Coarse-

Grained Tuple Spaces,’’ Proc. Architecture for Networking and Comm. Systems (ANCS), ACM Press, 2006, pp. 41-50. 6. S. Dharmapurikar et al., ‘‘Fast Packet Classification Using Bloom Filters,’’ Proc. Architecture

for

Networking

and

Comm.

Systems (ANCS), ACM Press, 2006, pp. 61-70.

Hyesook Lim is an associate professor in the Department of Electronics Engineering at Ewha Womans University, Seoul, Korea. Her research interests include router design issues such as IP address lookup, packet classification, and deep packet inspection. Lim has a PhD in electrical and computer engineering from the University of Texas at Austin. She is a member of IEEE. So Yeon Kim is a research engineer at Samsung Electronics. Her research interests include fast IP address lookup and packet classification algorithms. Kim has an MS from the Department of Electronics Engineering at Ewha Womans University, Seoul, Korea. Direct questions and comments about this article to Hyesook Lim, Ewha Womans University, Seoul 120-750, Korea; hlim@ ewha.ac.kr.

....................................................................

58

IEEE MICRO

[3B2-14]

mmi2010030048.3d

28/6/010

13:17

Page 59

Suggest Documents