On Adding Bloom Filters to Longest Prefix Matching Algorithms

10 downloads 86 Views 1MB Size Report
Aug 6, 2012 - Index Terms—Internet, router, IP address lookup, longest prefix matching, Bloom filter, multihashing, binary search on levels, leaf pushing. З.
IEEE TRANSACTIONS ON COMPUTERS,

VOL. 63, NO. 2,

FEBRUARY 2014

411

On Adding Bloom Filters to Longest Prefix Matching Algorithms Hyesook Lim, Senior Member, IEEE, Kyuhee Lim, Nara Lee, and Kyong-Hye Park, Student Member, IEEE Abstract—High-speed IP address lookup is essential to achieve wire-speed packet forwarding in Internet routers. Ternary content addressable memory (TCAM) technology has been adopted to solve the IP address lookup problem because of its ability to perform fast parallel matching. However, the applicability of TCAMs presents difficulties due to cost and power dissipation issues. Various algorithms and hardware architectures have been proposed to perform the IP address lookup using ordinary memories such as SRAMs or DRAMs without using TCAMs. Among the algorithms, we focus on two efficient algorithms providing high-speed IP address lookup: parallel multiple-hashing (PMH) algorithm and binary search on level algorithm. This paper shows how effectively an on-chip Bloom filter can improve those algorithms. A performance evaluation using actual backbone routing data with 15,000-220,000 prefixes shows that by adding a Bloom filter, the complicated hardware for parallel access is removed without search performance penalty in parallel-multiple hashing algorithm. Search speed has been improved by 30-40 percent by adding a Bloom filter in binary search on level algorithm. Index Terms—Internet, router, IP address lookup, longest prefix matching, Bloom filter, multihashing, binary search on levels, leaf pushing

Ç 1

INTRODUCTION

A

DDRESS lookup determines an output port using the destination IP address of an incoming packet. The address aggregation technology currently used for the Internet is a bitwise prefix matching scheme called classless interdomain routing (CIDR), which uses variable-length subnet masking to allow arbitrary-length prefixes. An IP address is said to match a prefix if the most significant l bits of the address and an l-bit prefix are the same. When an IP address matches more than one prefix, the longest matching prefix is selected as the best matching prefix (BMP) [1], [2], [3], [4]. The IP address lookup is one of the most challenging operations in router design because of the amount of traffic and the number of networks, which have increased dramatically in recent years. Using application-specific integrated circuits (ASICs) with off-chip ternary content addressable memories (TCAMs) has been the best solution to provide the wirespeed packet forwarding. However, TCAMs have some limitations [5]. TCAMs consume 150 times more power per bit than SRAMs. TCAMs consume around 30-40 percent of the total line card power. As line cards are stacked together, TCAMs impose a high cost on the cooling system. System vendors are willing to accept some latency penalty if the

. H. Lim, K. Lim, and N. Lee are with the Department of Electronics Engineering, Ewha Womans University, 428 Asan Engineering Building, 52, Ewhayeodae-gil, Seodaemun-gu, Seoul 120-750, Korea. E-mail: [email protected]. . K.-H. Park is with the Mobile Communication Business Unit, Samsung Electronics, Seoul, Korea. Manuscript received 5 Nov. 2011; revised 22 July 2012; accepted 25 July 2012; published online 6 Aug. 2012. Recommended for acceptance by V. Eramo. For information on obtaining reprints of this article, please send e-mail to: [email protected], and reference IEEECS Log Number TC-2011-11-0868. Digital Object Identifier no. 10.1109/TC.2012.193. 0018-9340/14/$31.00 ß 2014 IEEE

power of a line card can be lowered [6]. TCAMs also cost about 30 times more per bit of storage than DDR SRAMs. Various algorithms have been studied to replace TCAMs with ordinary memories such as SRAMs or DRAMs [1], [2], [3], [4], [6], [7], [8], [9], [10], [11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24], [25], [26]. A fast on-chip SRAM is often used in several applications, so that critical data is stored there with a guaranteed fast access time [27] since an access to off-chip memory (usually DRAM) requires longer access time, which is 10-20 times slower than on-chip memory access. It is important to partition properly so that a small part of data is stored into on-chip memories and most of the data is stored in slower and higher capacity off-chip memories. Several metrics are used for evaluating the performance of IP address lookup algorithms and architectures. Since the IP address lookup should be performed at wire-speed for every incoming packet, which can be a hundred million packets per second, search performance is the most important metric. Search performance is often measured by the number of off-chip memory accesses. The next metric is the required memory size for storing a routing table. The incremental update of a routing table is also an important metric. Scalability for large routing data sets and migration to IPv6 should also be considered. The performance in these metrics depends on data structures and search algorithms, and thus, it is essential to have an efficient structure and a search algorithm to provide the high-performance IP address lookup. IP address lookup algorithms can be roughly categorized by trie(or tree)-based algorithms [2], [3], [4], [8], [13], [14], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24], [25], [26], hashing-based algorithms [9], [10], [11], or bitmap-based algorithms [6], [12]. Recently, dynamic programming-based Published by the IEEE Computer Society

412

IEEE TRANSACTIONS ON COMPUTERS,

approaches have been proposed to improve the search performance and/or storage performance [15], [16], [17], [18], [19], [20], [21]. Hashing is a well-defined procedure for turning each key into a smaller integer called a hash index, which serves as a pointer into an array. Hashing has been used mostly in search algorithms to quickly locate a data record for a given search key. For the IP address lookup, hashing is applied to each length of prefixes, and the longest prefix among matched prefixes is selected as the best match [9], [10], [11]. Among trie-based algorithms, binary searching on hash tables organized by prefix lengths [13], [14], [15] provides the best IP address lookup performance. Bloom filter is a space-efficient probabilistic data structure used to test whether an element is a member of a set. Bloom filter has been popularly applied to network algorithms [7], [26], [28], [29], [30], [31]. This paper shows how effectively an on-chip Bloom filter can improve the search performance of known efficient IP address lookup algorithms. This paper is organized as follows: Section 2 describes the Bloom filter theory. Section 3 introduces two different algorithms providing high-speed IP address lookup; parallel multiple-hashing (PMH), and binary search on levels. Section 4 describes our proposed method to improve those algorithms using a Bloom filter. Section 5 shows performance evaluation results, and Section 6 concludes the paper.

2

BLOOM FILTER THEORY

A Bloom filter is basically a bit-vector used to represent the membership information of a set of elements. A Bloom filter that represents a set S ¼ fx1 ; x2 ; . . . ; xn g of n elements is described by an array of m bits, initially all set to 0. Bloom filter supports two different operations: programming and querying. In programming, for an element x in the set S, k different hash functions are computed in such a way that the resulting hash index hi ðxÞ is of the range 0  hi ðxÞ  m for i ¼ 1; . . . ; k. Then all the bit-locations corresponding to k hash indices are set as 1 in the Bloom filter. The pseudocode to program a Bloom filter for an element x is as follows [7]: BFProgramming (x) for ( i ¼ 1 to k ) BF[hi ðxÞ] ¼ 1; A querying is performed to test whether an element y 2 S. For an input y, k hash indices are generated using the same hash functions that were used to program the filter. The bit-locations in the Bloom filter corresponding to the hash indices are checked. If at least any one of the location was 0, then it is absolutely not a member of the set S, and it is termed as negative. If all the hash index locations were set as 1, then the input may be a member of the set, and it is termed as positive. The querying procedure is as follows [7]: BFQuery(y) for (i ¼ 1 to k) if (BF[hi ðyÞ] ¼¼ 0) return negative; return positive;

VOL. 63, NO. 2,

FEBRUARY 2014

However, the positive does not mean that all those bitlocations were set only by that current element under querying and there is a possibility that those locations would have been set by some other elements in the set. This type of positive result is termed as a false positive. It is important to properly control the rate of the false positive in designing a Bloom filter. For a given ratio of m=n, it is known that the false-positive probability is minimized when the number of hash functions k has the following relationship [7]: k¼

m ln 2: 2dlog2 ne

ð1Þ

On the whole, a Bloom filter may produce false positives but not false negatives.

3

RELATED WORKS

The IP address lookup problem can be defined formally as follows [8]. Let P ¼ fP1 ; P2 ; . . . ; PN g be a set of routing prefixes, where N is the number of prefixes. Let A be an incoming IP address and SðA; lÞ be a substring of the most significant l bits of A. Let nðPi Þ be the length of a prefix Pi . A is defined to match Pi if SðA; nðPi ÞÞ ¼ Pi . Let MðAÞ be the set of prefixes in P that A matches, then MðAÞ ¼ fP i 2 P : SðA; nðPi ÞÞ ¼ Pi g. The longest prefix matching (LPM) problem is to find the prefix Pj in MðAÞ, such that nðPj Þ > nðPi Þ for all Pi 2 MðAÞ; i 6¼ j. Once the longest matching prefix Pj is determined, the input packet is forwarded to an output port directed by the prefix Pj .

3.1

IP Address Lookup Algorithms Using Bloom Filters An IP address lookup architecture proposed by Dharmapurikar et al. [7] is the first algorithm employing a Bloom filter. It performs parallel queries on W Bloom filters sorted by prefix length to determine the possible lengths of prefix match, where W is 32 in IPv4. For a given IP address, offchip hash tables are probed for prefix lengths, which turn out to be positive in Bloom filters starting from the longest prefix. This architecture has a high implementation complexity because of the Bloom filters as well as the hash tables in each prefix length. Depending on the prefix distribution, the size of the Bloom filters and the size of the hash tables can be highly skewed. To bound the worst case search performance by limiting the number of distinct prefix lengths, which is the same as the worst number of hash table probes, controlled prefix expansion (CPE) [22] is suggested in the paper. However, prefix replication is inevitable in the CPE. Moreover, the naive hash table employed in this architecture incurs collisions, and resolving the collisions using chaining adversely affects the worst case lookup-rate guarantees that routers should provide [30], [32]. 3.2 Parallel Multiple-Hashing A hash function is used to map the search key to a hash index. In general, a hash function may map several different keys to the same hash index. This is called collision. Collisions are an intrinsic problem of hashing. Broder and Mitzenmacher [9] proposed to use multiple

LIM ET AL.: ON ADDING BLOOM FILTERS TO LONGEST PREFIX MATCHING ALGORITHMS

413

Fig. 2. The binary trie for an example set of prefixes.

Fig. 1. Parallel multiple hashing-architecture.

hash functions to reduce collisions. Instead of searching for a perfect hash function in which each distinguishable search key is mapped to a different hash index, a multiplehashing architecture [9] uses multiple hash functions for each search key. The number of hash tables is equal to the number of hash indices. Assuming two hash indices, the corresponding two hash tables are named the left table and the right table. Each slot of a hash table should contain a set of entries, and for this reason, each slot of a hash table is often called a bucket. In storing a given prefix, two hash indices are obtained; the hash index from the hash function 1 is used to access the left table, and the index from the hash function 2 is used to access the right table. Comparing the number of loads stored already in the two buckets accessed, the prefix is stored in the bucket with a smaller number of loads. By the multiple-hashing, prefixes are distributed more evenly into hash tables. The number of collisions can be controlled by three parameters: the number of hash tables, the number of buckets in a table, and the number of entries in a bucket. To apply multiple-hashing to an IP address lookup problem with variable-length prefixes, a PMH architecture [10] constructs a separate multihash table for each group of prefixes with a distinct length and additionally, an overflow TCAM. A prefix is stored into the overflow TCAM when both buckets are already full. Fig. 1 shows the overall PMH architecture. Multiple hash tables (here two) are constructed for each length, and prefixes in each length are stored into either a left table entry or a right table entry of the corresponding length. The search procedure is as follows: For a given input address, hash indices for all possible lengths are obtained. Using these hash indices, multihashing tables are accessed in parallel, and matching prefixes in each length (if exist) are returned. The overflow TCAM is also assumed to be accessed in parallel. Among the returned prefixes, the longest matching prefix is selected by the priority encoder. By the parallel access of multihashing tables in every length, the BMP is obtained in a single access cycle. However, since tables in each length should be implemented using a separate memory for parallel access and the size of the tables can be highly skewed depending on prefix distribution, implementation complexity can become very high.

3.3 Binary Search on Trie Levels by Waldvogel A binary trie (B-Trie) is a tree-based data structure which applies linear search on length [4]. Each prefix resides in a node of the trie, in which the level and the path of the node from the root node correspond to the prefix length and the value, respectively. Fig. 2 shows the binary trie for an example set of prefixes P ¼ f1; 00; 010; 111; 1101; 11111; 110101g. In Fig. 2, black nodes represent prefixes, and white nodes represent empty internal nodes. At each node, the search proceeds to the left or right according to sequential inspection of address bits starting from the most significant bit. If it is 0, the search proceeds to a “left child” and otherwise proceeds to a “right child,” until it reaches a leaf node. The binary trie structure is simple and easy to implement. However, the search performance of the binary trie is linearly related to the length of IP address, since each bit is examined one at a time. As an attempt to improve the search performance of the trie, algorithms performing binary search on trie levels are proposed [13], [14]. The binary search on level structure proposed by Waldvogel et al. [13] separates the binary trie, according to the level of the trie, and stores nodes included in each level in a hash table. Binary search is performed on the hash tables of each level. When accessing the mediumlevel hash table, if there is a match, the search space becomes the longer half; otherwise, the search space becomes a shorter half. Fig. 3 shows Waldvogel’s binary search on length (W-BSL) structure with the denotation of access levels [13]. Level 3 is the first level of access, levels 1 and 5 are the second levels of access, and levels 2, 4, and 6 are the last levels of access.

Fig. 3. Waldvogel’s binary search on lengths.

414

IEEE TRANSACTIONS ON COMPUTERS,

VOL. 63, NO. 2,

FEBRUARY 2014

Fig. 5. CRC-8 generator.

Fig. 4. Lim’s binary search on lengths (L-BSL).

The W-BSL structure uses the precomputed markers and BMPs. Markers are precomputed in the internal nodes, if there is a longer length prefix in the levels accessed later. A precomputed BMP is maintained at each marker, and the precomputed BMP is returned, when there is no match in longer levels. The markers and the BMPs are not maintained for the last level of access. They are precomputed for nodes of preceding levels of access as shown in Fig. 3. As a search example, for a 6-bit input 111000, the most significant 3 bits, which are 111, are used for hashing in accessing level 3. Since the input matches P5 , it is remembered as the current BMP, and search goes to level 5. In level 5, the most significant 5 bits of the input, 11100, does not match any node. The search goes to level 4 and does not match. Since it is the last level of access, the search is over and prefix P5 is returned as the BMP. Using the binary search on trie levels, three memory accesses were performed to find the longest matching prefix for this input. Waldvogel’s algorithm provides Oðlogldist ) hash-table accesses, where ldist is the number of distinct prefix lengths. Srinivasan and Varghese [22] have proposed to improve the search performance by the use of controlled prefix expansion reducing the value of ldist . Kim and Sahni [15] have proposed to optimize the storage requirement by selecting prefix lengths minimizing the number of markers and precomputed BMPs when the ldist is given.

3.4

Binary Search on Trie Levels in a Leaf-Pushed Trie The W-BSL requires complex precomputation because of the prefix nesting relationship in which one prefix becomes a substring of another prefix. That is, since each node can have prefixes in both upper and lower levels, the markers and their BMPs should be precomputed. If every prefix is disjointed in relation to each other, they are located only in leaves and free from a prefix nesting relationship. Hence, the binary search on trie levels for the set of disjoint prefixes can be performed without precomputation. Lim’s binary search on level structure (L-BSL) [14] uses leaf-pushing [22] to make every prefix disjoint. Fig. 4 shows the leaf-pushed binary trie for the same set of prefixes. It is represented that leaf-pushed nodes are connected to the trie by dotted edges. The levels of access performing the binary search on levels are also shown. Assume a 6-bit input 110110. Note that we use a different input example from the W-BSL to show the search procedure for the L-BSL in detail. Since the first level of

access is 4, the most significant 4 bits, which are 1101, are used for hashing. As the search encounters an internal node, it is proceeded to a longer level. In level 6, the most significant 6 bits of the input, 110110, do not match any node. Hence, the search proceeds to a shorter level, which is level 5. The input matches prefix P4 in the level 5. The prefix P4 is returned and the search is over. The L-BSL finishes a search, either when a match to a prefix occurs or when it reaches the last level of access, while the W-BSL always finishes a search when it reaches the last level of access.

4 4.1

THE PROPOSED ARCHITECTURES

Adding a Bloom Filter to Parallel Multiple-Hashing Architecture (PMH-BF) In this section, we propose to add an on-chip Bloom filter to the PMH architecture (shown in Fig. 1) to reduce the implementation complexity. In implementing hash tables, previous architectures, such as [7], [9], [10], [11], [13], and [32], require separate hash tables for each length, and this increases the implementation complexity by requiring multiple variable-sized memories. To reduce the required number of memories by reducing the distinct number of prefix lengths, Dharmapurikar et al. [7] suggested using the CPE. However, the CPE causes prefix replications and increases the memory requirement. In [32], prefix collapsing is suggested. However, prefix collapsing increases the number of collisions in hashing. On the whole, in using the hashing for IP address lookup, it is essential to reduce the required number of off-chip memories. If there is no need for parallel access in each prefix length, the hash table can be designed to accommodate prefixes with various lengths within a single table. We describe the proposed architecture using the same example set. We use a single hash function based on a cyclic redundancy check (CRC) generator to obtain multiple hash indices for prefixes in every length [29]. The CRC generator is composed of shift-right registers with XOR logic, and hence it is easy to implement. There is an advantage to using a CRC generator as a hash generator; hash indices can be obtained consistently for variouslength prefixes. Fig. 5 shows an example of an 8-bit CRC generator. All the registers of the CRC generator are initially set to 0. Once a prefix with an arbitrary length is serially entered to the CRC generator and XORed with a stored register value, a fixed-length scrambled code is obtained. By selecting a set of registers or multiple sets of registers from the scrambled code, we obtain as many hash indices as desired in any length. Let n, m, and k be the number of prefixes, the number of bits in a Bloom filter, and the number of hash indices, respectively. As an example case, we set the Bloom filter size m as 16. The number of hash indices k should be

LIM ET AL.: ON ADDING BLOOM FILTERS TO LONGEST PREFIX MATCHING ALGORITHMS

415

TABLE 1 CRC Code and Bloom Filter Index for Each Prefix

derived from (1), and here we set k as 2. Since m is 16, we need 4 bits for each hash index. We arbitrarily select the first 4 bits and the last 4 bits from the 8-bit CRC codes for Bloom filter indices. Assuming an 8-bucket multihashing table, we can also obtain hash indices for the multihashing table from the CRC code. We arbitrarily select the first 3 bits and the last 3 bits from the 8-bit CRC codes for the multihashing indices. Table 1 shows the CRC codes for the example set of prefixes and selected indices. In Fig. 6, we show the overall structure of our proposed architecture. The on-chip Bloom filter was programmed using the BF indices in Table 1. The multihashing table, which is composed of two hash tables, has eight buckets and two entries per bucket storing the prefixes. Prefixes were stored into the multihashing table using the indices shown in Table 1. In this example, there is no overflow. The search procedure for the proposed algorithm is summarized in the following pseudocode. Let A be the destination address of a given input packet, and SðA; lÞ be the substring of the most significant l bits of A. Let nðxÞ be the length of an element x. SearchMHB(A) { TCAM_BMP ¼ TCAM_search(A); if (TCAM_BMP !¼ NULL) len ¼ n(TCAM_BMP); else len ¼ 0; for (l ¼ W to len+) { if (valid[l] ¼¼ 1) { // l is a valid length inString ¼ SðA; lÞ; CRC ¼ crc_gen (inString); for (i ¼ 1 to k) bf_idx[i] ¼ extract(CRC); rst ¼ probe_BF (bf_idx[1], . . . , bf_idx[k]); if ( rst ¼¼ positive) { for (i ¼ 1 to 2) h_idx[i] ¼ extract(CRC); BMP ¼ hash_table(inString, h_idx[1], h_idx[2]); if (BMP !¼ NULL) return BMP; } } } return TCAM_BMP; } In this example set, the valid levels are length 6, 5, 4, 3, 2, and 1. As a search example of 6-bit input address 111000, the CRC code generated for this input using the 8-bit CRC generator shown in Fig. 5 is 10010010. Hence, by selecting the first 4 bits and last 4 bits, we have BF indices 9 and 2. The Bloom filter shown in Fig. 6 produces a negative since one of the Bloom filter bits is 0, and hence the hash table is not accessed. Reducing the input by 1 bit, 11100 is entered,

Fig. 6. Our proposed multiple hashing architecture.

and the generated code is 00100101. The BF indices are 2 and 5, and the Bloom filter produces a negative, and hence the hash table is not accessed. Next, the input 1110 is tried. The generated CRC is 01001010, and the BF indices are 4 and 10. The Bloom filter produces a positive, and the hash table is accessed using two hash indices obtained by the first 3 bits and the last 3 bits of the CRC code, which are 2 and 2. There is a match neither in bucket 2 of the left table nor bucket 2 in the right table for 1110, so it turns out that the Bloom filter produced a false positive. Next, the input 111 is tried. The generated CRC is 10010100, and the BF indices are 9 and 4. The Bloom filter produces a positive, and the hash table is accessed. We obtain a matched prefix in the bucket 4 of the left table. Therefore, the search is over. The search for a given input is terminated when a true positive is occurred. In this example, the Bloom filter was accessed four times, lengths 6, 5, 4, and 3, and it generated two negatives, one false positive, and one true positive. The number of hash table accesses is 2, which is the same as the number of the Bloom filter positives. As will be shown in simulation section, the false positive can be reduced to less than 0.3 percent by increasing the size of Bloom filters to 16 times the number of prefixes. The parallel accesses to hash tables in each length is not necessary in the proposed architecture since the Bloom filter filters out the length of the input that does not have a matching prefix. The Bloom filter is small enough to be implemented in a fast cache or embedded in a chip. Hence, the implementation complexity is significantly reduced by adding a simple Bloom filter without sacrificing the search performance.

4.2

Adding a Bloom Filter to Binary Search on Trie Levels by Waldvogel (WBSL-BF) In this section, we propose to add an on-chip Bloom filter to Waldvogel’s binary search on level algorithm to improve the search performance. The preliminary version of this proposal was presented in [26]. The proposed algorithm is described in detail in the context of adding a Bloom filter. The role of the Bloom filter is to filter out the substring of each input that does not have a node in the binary trie.

416

IEEE TRANSACTIONS ON COMPUTERS,

VOL. 63, NO. 2,

FEBRUARY 2014

TABLE 2 CRC Codes and Bloom Filter Indices for WBSL-BF

Table 2 shows CRC codes and Bloom filter indices for every node in the level of access (level 1 through 6) for the W-BSL trie shown in Fig. 3. We use the CRC-8 generator as in Fig. 5 to obtain BF indices. Fig. 7 shows the WBSL-BF trie, which has a Bloom filter programmed using the Bloom filter indices. The search procedure for the WBSL-BF is summarized in the following pseudocode. SearchWBSL(A) { TCAM_BMP ¼ TCAM_search(A); low ¼ min_level; high ¼ max_level; while (low  high) { accLevel ¼ b (low þ high) c=2; inString ¼ SðA, accLevel); CRC ¼ crc_gen (inString); for (i ¼ 1 to k) bf_idx[i] ¼ extract(CRC); rst ¼ probe_BF (bf_idx[1], . . . , bf_idx[k]); if ( rst ¼¼ negative) high ¼ accLevel 1; else // positive { h_idx ¼ extract(CRC); node ¼ access_hash_table(inString, h_idx); if (node ¼¼ NULL) high ¼ accLevel 1; else { //a node exists low ¼ accLevel þ1; if (prefix node or bmp node) tree_BMP ¼ node.BMP; } } } if (n(TCAM_BMP) < n(tree_BMP)) return tree_BMP; else return TCAM_BMP; } As a search example for the 111000, the CRC code generated for the 3-bit substring 111 is 10010100. Hence, we have BF indices 9 and 4. The Bloom filter shown in Fig. 7 produces a positive, and hence the hash table is accessed and P5 is obtained as the current best match. For the 5-bit substring of the input 11100, the CRC code is 00100101, and, hence, the BF indices are 2 and 5, and it is a negative. Hence hash table is not accessed, and the search space becomes the shorter lengths. For 4-bit substring of the input 1110, the CRC code is 01001010, and hence the BF indices are 4 and 10, and it is a positive. Hence, hash table is accessed. There is no match, so it turns out to be a false positive. The P5 is returned

Fig. 7. Adding a Bloom filter to the W-BSL.

as BMP. Compared with the search procedure in W-BSL, the number of off-chip hash table accesses is reduced from 3 to 2 because of the Bloom filter which produced one negative.

4.3

Adding a Bloom Filter to Binary Search on Trie Levels in a Leaf-Pushed Trie (LBSL-BF) Table 3 shows CRC codes and Bloom filter indices for every node in the level of accesses (level 2 through 6) for the L-BSL trie shown in Fig. 4. Fig. 8 shows the LBSL-BF trie, which has a Bloom filter programmed using the Bloom filter indices. The search procedure for the LBSL-BF is summarized in the following pseudocode. SearchLBSL(A) { TCAM_BMP ¼ TCAM_search(A); low ¼ min_level; high ¼ max_level; while (low  high) { accLevel ¼ d (low þ high) e=2; inString ¼ SðA, accLevel); CRC ¼ crc_gen (inString); for (i ¼ 1 to k) bf_idx[i] ¼ extract(CRC); rst ¼ probe_BF (bf_idx[1], . . . , bf_idx[k]); TABLE 3 CRC Codes and Bloom Filter Indices for LBSL-BF

LIM ET AL.: ON ADDING BLOOM FILTERS TO LONGEST PREFIX MATCHING ALGORITHMS

417

Fig. 9. Entry structure of multihashing table.

Fig. 8. Adding a Bloom filter to L-BSL.

if ( rst ¼¼ negative) high = accLevel - 1; else // positive { h_idx ¼ extract(CRC); node ¼ access_hash_table(inString, h_idx); if (node ¼¼ NULL) high = accLevel 1; else if (node.type ¼¼ internal) low ¼ accLevel þ 1; else {// prefix node tree_BMP ¼ node.BMP; if (n(TCAM_BMP) < n(tree_BMP)) return tree_BMP; else return TCAM_BMP; } } } return TCAM_BMP; } For an example input 110110, at level 4, the CRC code of the 4-bit string of this input is 00110100, and hence we have BF indices 3 and 4. The Bloom filter produces a positive, and hence the hash table is accessed and encounters an internal node. For 6-bit string 110110, the CRC code is 11011000, and hence the BF indices are 13 and 8. It is a negative, and hence the hash table is not accessed. For 5-bit substring 11011, the CRC code is 10110001, and hence the BF indices are 11 and 1, and it is a positive. The hash table is accessed and it encounters P4 stored in the hash table; P4 is returned as BMP. Compared with the search procedure in L-BSL, the number of off-chip hash table accesses is reduced from 3 to 2 because of one Bloom filter negative. (In determining the sequence of access levels, we can use either a flooring operation or a ceiling operation. The ceiling operation is used in this example.)

5 5.1

PERFORMANCE EVALUATION

Search Performance Improvement by Adding a Bloom Filter to Multiple-Hashing Architecture A performance evaluation was executed using five different sets of actual back-bone routing data [35] with C++ language. Throughout our simulation, hash indices were consistently generated using a 32-bit CRC generator [29].

The number of overflows depends on several factors, such as the number of hash tables, the number of buckets in each hash table, and the number of entries in each bucket. For the number of prefixes, N, we have two hash tables, each of which has N ¼ 2dlog2 Ne buckets. Each bucket has two entries, and each entry has 46 bits. For five different prefix sets, there is no overflow except for one overflow occurred in PORT80. We assume to store overflow prefixes in a TCAM. Fig. 9 shows the entry structure of the multihashing table. Hence, the memory requirement is 4  46  N bits. From our simulation, we found out the average number of hash table access (Havg ) is not as low as we expected, because of the false positives of the Bloom filter. Unexpected false positives are caused by prefixes having the same bit patterns but having different lengths. To eliminate the false positives, the hash key of the Bloom filter should have more information than prefix value itself. We padded zeros after each prefix to make every prefix become 32 bits and attached 6 bits of the prefix length information after that. Each hash key now has 38 bits. Performance evaluation result for the sets of prefixes is shown in Table 4. The number of prefixes in each routing set is shown inside the parenthesis. The number of input traces generated for simulation is three times the number of prefixes. The size of the Bloom filter M is set proportional to N . The number of hash indices, K, is calculated using (1), and the result is 2, 2, 3, 6, and 11, respectively, for N , 2N , 4N , 8N , and 16N . An input IP address is probed only for distinct lengths that exist in the routing data, starting from the longest length. If a positive is returned from the Bloom filter, the hash table is probed. If it has a match in the hash table, the search is over. If it does not have a match, the same procedure is repeated for next longest length, and so on. The maximum number of Bloom filter queries (Qmax ) can be up to the number of distinct lengths that exist in the routing data. Since the Bloom filter query is stopped when the longest matching prefix is found, the average number of Bloom filter queries (Qavg ) is smaller than the maximum. As the size of the Bloom filter increases, the maximum number of hash table probes (Hmax ) and the average number of hash table probes (Havg ) decrease as expected, since the number of negatives from the Bloom filter increases and the number of false positives decreases. The hash table access rate in the last column is the average number of hash table probes versus the average number of Bloom filter queries, i.e., Havg /Qavg . As the size of the Bloom filter increases from N to 16N , the hash table access rate is exponentially reduced. This means that the Bloom filter effectively avoids unnecessary memory accesses to the off-chip hash table as the size of the Bloom filter increases. When the size of the Bloom filter is 16N , the maximum number of hash table probes (Hmax ) is bigger than one, but the average number of hash table access (Havg ) is only 1.

418

IEEE TRANSACTIONS ON COMPUTERS,

VOL. 63, NO. 2,

FEBRUARY 2014

TABLE 4 Performance Evaluation Results of PMH-BF

TABLE 5 Total Number of Input Traces and the Number of Inputs with At Least One False Positive

TABLE 6 Comparison in Search Performance with and without Bloom Filter for PMH

For a given input trace, if the Bloom filter has false positives, the number of hash table probes becomes more than one. For the Bloom filter size 16N , Table 5 shows the total number of input traces injected to our simulation and the number of inputs that have at least one false positives. As shown, the number of inputs with at least one false positive is very small compared with the total number of input traces. Hence, the fractional part of the average number of hash table access (Havg ) is zero up to the second digit under the decimal point. It means that most of the false positives are removed when the size of Bloom filter is sufficiently large, and each IP address lookup only requires one off-chip hash table access on average. Hence, the complex hardware for parallel access and multiple separate memories required in the previous architecture is effectively removed by adding a simple on-chip Bloom filter that has a size of up to 512 Kbytes for about 200 K prefixes.

Table 6 shows the performance comparison of with and without the Bloom filter for the PMH algorithm. When there is no Bloom filter, the average number of off-chip memory accesses for an IP address lookup using the PMH algorithm is 6.77 to 11.96, and the maximum number is 22 to 30. In our PMH-BF algorithm, the size of the Bloom filter is 32 to 512 Kbytes, which is 16 of N , where N ¼ 2dlog2 Ne for N prefixes. For each given input, off-chip hash table accesses are avoided for the length that is a negative in the Bloom filter query. The negative means that there is no prefix corresponding to the specific length of the input. The average number of off-chip memory accesses for an IP address lookup becomes 1.00 for every case, and the maximum number is 2 or 3. Simulations have been performed to compare the search performance of our proposed PMH-BF algorithm with that of Dharmapurikar’s algorithm (D-BF) [7]. The D-BF algorithm

LIM ET AL.: ON ADDING BLOOM FILTERS TO LONGEST PREFIX MATCHING ALGORITHMS

419

TABLE 7 Search Performance Comparison with [7]

applies controlled prefix expansion to bound the worst case search performance as three hash table accesses. A direct lookup array is used for prefixes of length less than or equal to 20 bits. Prefixes of length 21 to 23 are expanded to length 24, and prefixes of length 25 to 31 are expanded to length 32. The D-BF algorithm requires two Bloom filters; one for prefixes of length 24 and the other for prefixes of length 32. For fair comparison, we implemented both algorithms using same constraints in terms of memory amount for Bloom filters, memory amount for hash table implementation, hash functions, and the handling of collided hash keys. In generating hash indices for a Bloom filter, the ANSI C function rand() was used as suggested in [7] for both algorithms. Each hash key used as an input to the rand() is less than or equal to 32 bits. In generating hash keys for our proposed algorithm, 6 bits of prefix length information were attached after each prefix unless the entire length is not longer than 32 bits. Table 7 shows the result. The memory size for the Bloom filters was fixed as 16N bits for N prefixes. Since our proposed PMH-BF has a single Bloom filter, 16N bits were used for the Bloom filter. For the D-BF algorithm, bits were allocated for two Bloom filters proportional to the number of prefixes. Multihashing tables were used for both algorithms. The memory amount for hash table implementation was determined as follows: For the D-BF algorithm, we allocated 1 Mbytes for the direct lookup array. The number of hash table entries for other prefixes was determined as 2ðN24 þ N32 Þ, where N24 is the number of prefixes with length 24 and N32 is the number of prefixes with length 32. The same amount of memory required for the implementation of the D-BF algorithm was allocated for the multihashing table of the PMH-BF. Table 7 shows that the D-BF algorithm has large prefix replication, in which the number of prefixes stored in the hash table is more than the number of prefixes. Hence, the algorithm shows slightly worse search performance both in

the average and in the worst case than our proposed algorithm. It was assumed that collided prefixes are connected by a linked list without assuming a perfect hash function for a given set of prefixes in this simulation. Hence, the worst case number of hash table accesses is not bounded by 3 for [7] as shown in Grouptlcom and Telstra.

5.2

Search Performance Improvement by Adding a Bloom Filter to W-BSL In implementing binary search on level algorithms, if there is no prefix in a level of the binary trie, it is an invalid level and nodes in the invalid levels are not stored into a Bloom filter and a hash table. Every node including prefix nodes and internal nodes in valid levels are stored into the Bloom filter and the hash table. Throughout this simulation, we assume to have a perfect hash function in storing nodes of the trie to an off-chip hash table. The worst case number of off-chip memory access is equal to log2 ðW þ 1Þ in the W-BSL algorithm and it is 6. Table 8 shows the performance comparison for W-BSL algorithm. The number of nodes represents the total number of nodes stored. When there is no Bloom filter, the average number of off-chip memory accesses for an IP address lookup using the W-BSL algorithm is 4.33 to 4.78. The simulation result shown in Table 8 is where the size of Bloom filter is 8U, where U ¼ 2dlog2 Ue and U is the total number of nodes. The size of the Bloom filter is 16 to 64 Kbytes. The average number of Bloom filter queries is equal to the average number of memory accesses when there is no Bloom filter. For each given input, off-chip hash table accesses is avoided for the length that is a negative in the Bloom filter query. The negative means that there is no node corresponding to the specific length of the input in the trie. The average number of off-chip memory accesses for an IP address lookup for the WBSL-BF became 2.50 to 3.19, and hence the number of memory accesses was reduced by around 40 percent by adding a Bloom filter.

TABLE 8 Comparison in Average Search Performance with and without Bloom Filter for W-BSL

420

IEEE TRANSACTIONS ON COMPUTERS,

VOL. 63, NO. 2,

FEBRUARY 2014

TABLE 9 Comparison in Average Search Performance with and without Bloom Filter for L-BSL

5.3

Search Performance Improvement by Adding a Bloom Filter to L-BSL Table 9 shows the performance evaluation result for the L-BSL algorithm. As the same as the W-BSL case, if there is no prefix in a level of a leaf-pushed trie, the level is an invalid level, and nodes in invalid levels are not stored into a Bloom filter and a hash table. The number of nodes represents the total number of nodes including prefix nodes and internal nodes in valid levels. The number inside the parenthesis represents the number of prefix nodes after the leaf-pushing. The number of prefix nodes after leaf-pushing is slightly more than the number of original prefix nodes. When there is no Bloom filter, the average number of offchip memory accesses for an IP address lookup is 3.57 to 4.49. The search performance of L-BSL is slightly better than the W-BSL case since a search can be terminated when a prefix node is encountered even though it is not a last level of access. The simulation result is also where the size of Bloom filter is 8U. The size of the Bloom filter is 16 to 128 Kbytes. The average number of off-chip memory accesses for an IP address lookup for the LBSL-BF became 2.62 to 3.47, and hence the number of memory accesses was reduced by approximately 30 percent by adding a Bloom filter. The performance improvement is smaller in the LBSL-BF than that of the WBSL-BF. Because of the leaf-pushing, many internal nodes are created, and hence a smaller number of Bloom filter negatives was produced in the LBSL-BF. 5.4

Search Performance Comparison with Other Algorithms This section shows simulation results comparing with other algorithms in terms of required memory amount and search performance. Algorithms in comparison are binary trie [4], priority-trie (P-Trie) [23], binary search on range (BSR) [24], binary search with prefix vector (BST-PV) [8], Waldvogel’s BSL (W-BSL) [13], Lim’s BSL (L-BSL) [14], logW Elevator algorithm (logW-E) [25], Dharmapurikar’s algorithm [7], and proposed algorithms (PMH-BF, WBSL-BF,

and LBSL-BF). The details of B-Trie, P-Trie, BSR, BST-PV, and logW-E algorithms can be found in [4]. Our simulation has used the same prefix sets as those used in [4]. Table 10 shows the required memory amount for each algorithm. For the algorithms requiring a Bloom filter, which are the D-BF, the PMH-BF, the WBSL-BF, and the LBSL-BF, the required memory amounts for Bloom filters are also shown. The sizes of the Bloom filters are reasonably small so that each Bloom filter can be embedded in a chip. The memory amount for the hash table implementation of the WBSL-BF and the LBSL-BF is the same as that of the W-BSL and the L-BSL, respectively. Algorithms requiring multihashing table such as the D-BF and the PMH-BF generally consume more memory than other algorithms, while they provide the better search performance as will be shown shortly. Figs. 10 and 11 show the worst case search performance and the average-case search performance, respectively. The D-BF algorithm and the proposed PMH-BF algorithm provide the best performance both in the worst case and the average-case search performance. The WBSL-BF and the LBSL-BF are the next. The search performances of known algorithms were effectively improved by adding an on-chip Bloom filter.

6

CONCLUSION

This paper shows how effectively an on-chip Bloom filter can improve the search performance of known efficient IP address lookup algorithms. The parallel multiplehashing architecture provides high-speed IP address lookup with a single access cycle of off-chip memory, but it requires a complicated hardware for parallel accesses to the separate memories storing prefixes in each length. This paper shows how to avoid the parallel access to off-chip memories by adding a small on-chip Bloom filter. For a given input, the Bloom filter is queried first, starting from the longest length. If it turns out to be a negative, access to the off-chip hash table is avoided for that specific length.

TABLE 10 Memory Requirement (Mbyte)

*Bloom filter size is in Kbyte.

LIM ET AL.: ON ADDING BLOOM FILTERS TO LONGEST PREFIX MATCHING ALGORITHMS

421

Fig. 10. Worst case number of memory accesses for each algorithm. (a) Mae-West (14553 prefixes). (b) Mae-East (39464 prefixes). (c) PORT80 (112310 prefixes). (d) Grouptlcom (170601 prefixes). (e) Telstra (227223 prefixes).

Fig. 11. Average number of memory accesses for each algorithm. (a) Mae-West (14553 prefixes). (b) Mae-East (39464 prefixes). (c) PORT80 (112310 prefixes). (d) Grouptlcom (170601 prefixes). (e) Telstra (227223 prefixes).

The off-chip hash table is accessed only for the positive result of the Bloom filter. When it turns out a true positive, the search for the input is finished. It is shown that the proposed architecture provides compatible average search performance with the parallel multiple-hashing by properly

controlling the false positive rate. The proposed architecture requires much less hardware since it only has a small on-chip Bloom filter and a single multihashing table and does not require complicated hardware or separate memories for parallel access.

422

IEEE TRANSACTIONS ON COMPUTERS,

Among trie-based algorithms, algorithms based on binary search on trie levels provide the best search performance since their performance is proportional to Oðlogldist ), where ldist is the distinct prefix lengths. This paper shows how to improve further the search performance of those algorithms by adding a simple on-chip Bloom filter. For each given input, the Bloom filter is queried first for the current level of access. If it turns out to be a negative, it means that there is no node in the trie. Hence, the search can proceed to a shorter level without accessing the off-chip hash table. It is shown that the average search performance is improved by 3040 percent by effectively avoiding the access of off-chip hash table when there is no node in the current level. Multibit tries with controlled prefix expansion [22], [33], [34] provide better search performance than binary tries by reducing the number of distinct levels. Binary search on trie levels can be applied to multibit tries without the loss of generality, and hence our proposed approach using an on-chip Bloom filter also can be applied to the binary search on trie levels at a multibit trie. We believe that a Bloom filter is a simple but extremely powerful data structure that will improve the performance of many other applications as well [31], and we are actively seeking possible applications.

ACKNOWLEDGMENTS This work was supported by the National Research Foundation of Korea grant funded by the Korea government (2012-005945). This research was also supported by The Ministry of Knowledge Economy, Korea, under the ITRC support program supervised by the NIPA (NIPA2012-H0301-12-4004). The preparation of this paper would not have been possible without the efforts of our students in SoC Design Lab at Ewha Womans University on simulations. The authors were particularly grateful for Jungwon Lee and Youngju Choi. This work had been performed when all the authors were affiliated at Ewha Womans University.

REFERENCES [1] [2] [3]

[4] [5] [6] [7] [8] [9]

H.J. Chao, “Next Generation Routers,” Proc. IEEE, vol. 90, no. 9, pp. 1518-1588, Sept. 2002. M.A. Ruiz-Sanchez, E.M. Biersack, and W. Dabbous, “Survey and Taxonomy of IP Address Lookup Algorithms,” IEEE Networks, vol. 15, no. 2, pp. 8-23, Mar./Apr. 2001. S. Sahni, K. Kim, and H. Lu, “Data Structures for OneDimensional Packet Classification Using Most-Specific-Rule Matching,” Int’l J. Foundations of Computer Science, vol. 14, no. 3, pp. 337-358, 2003. H. Lim and N. Lee, “Survey and Proposal on Binary Search Algorithms for Longest Prefix Match,” IEEE Comm. Surveys and Tutorials, vol. 14, no. 3, pp. 1-17, July/Sept. 2012. F. Yu, R.H. Katz, and T.V. Lakshman, “Efficient Multimatch Packet Classification and Lookup with TCAM,” IEEE Micro, vol. 25, no. 1, pp. 50-59, Jan./Feb. 2005. H. Lu and S. Sahni, “Dynamic Tree Bitmap for IP Lookup and Update,” Proc. Int’l Conf. Networking, 2007. S. Dharmapurikar, P. Krishnamurthy, and D. Taylor, “Longest Prefix Matching Using Bloom Filters,” IEEE/ACM Trans. Networking, vol. 14, no. 2, pp. 397-409, Feb. 2006. H. Lim, H. Kim, and C. Yim, “IP Address Lookup for Internet Routers Using Balanced Binary Search with Prefix Vector,” IEEE Trans. Comm., vol. 57, no. 3, pp. 618-621, Mar. 2009. A. Broder and M. Mitzenmacher, “Using Multiple Hash Functions to Improve IP lookups,” Proc. IEEE INFOCOM, vol. 3, pp. 14541463, 2001.

VOL. 63, NO. 2,

FEBRUARY 2014

[10] H. Lim and Y.J. Jung, “A Parallel Multiple Hashing Architecture for IP Address Lookup,” Proc. IEEE High Performance Switching and Routing (HPSR), pp. 91-95, 2004. [11] H. Lim, J. Seo, and Y. Jung, “High Speed IP Address Lookup Architecture Using Hashing,” IEEE Comm. Letters, vol. 7, no. 10, pp. 502-504, Oct. 2003. [12] W. Eatherton, G. Varghese, and Z. Dittia, “Tree Bitmap: Hardware/Software IP Lookups with Incremental Updates,” ACM SIGCOMM Computer Comm. Rev., vol. 34, no. 2, pp. 97122, Apr. 2004. [13] M. Waldvogel, G. Varghese, J. Turner, and B. Plattner, “Scalable High Speed IP Routing Lookups,” Proc. ACM SIGCOMM, pp. 2535, 1997. [14] J.H. Mun, H. Lim, and C. Yim, “Binary Search on Prefix Lengths for IP Address Lookup,” IEEE Comm. Letters, vol. 10, no. 6, pp. 492-494, June 2006. [15] K. Kim and S. Sahni, “IP Lookup by Binary Search on Prefix Length,” J. Interconnection Networks, vol. 3, pp. 105-128, 2002. [16] W. Lu and S. Sahni, “Succinct Representation of Static Packet Classifiers,” IEEE/ACM Trans. Networking, vol. 17, no. 3, pp. 803816, June 2009. [17] W. Lu and S. Sahni, “Recursively Partitioned Static Router Tables,” IEEE Trans. Computers, vol. 59, no. 12, pp. 1683-1690, Dec. 2010. [18] S. Sahni and K. Kim, “An Oðlogn) Dynamic Router Table Design,” IEEE Trans. Computers, vol. 53, no. 3, pp. 351-363, Mar. 2004. [19] H. Lu and S. Sahni, “Oðlogn) Dynamic Router-Tables for Prefixes and Ranges,” IEEE Trans. Computers, vol. 53, no. 10, pp. 1217-1230, Oct. 2004. [20] W. Lu and S. Sahni, “Packet Classification Using Space-Efficient Pipelined Multi-Bit Tries,” IEEE Trans. Computers, vol. 57, no. 5, pp. 591-605, May 2008. [21] K. Kim and S. Sahni, “Efficient Construction of Pipelined MultibitTrie Router-Tables,” IEEE Trans. Computers, vol. 56, no. 1, pp. 3243, Jan. 2007. [22] V. Srinivasan and G. Varghese, “Fast Address Lookups Using Controlled Prefix Expansion,” ACM Trans. Computer Systems, vol. 17, no. 1, pp. 1-40, Feb. 1999. [23] H. Lim, C. Yim, and E.E. Swartzlander Jr., “Priority Trie for IP Address Lookup,” IEEE Trans. Computers, vol. 59, no. 6, pp. 784794, June 2010. [24] B. Lampson, B. Srinivasan, and G. Varghese, “IP Lookups Using Multiway and Multicolumn Search,” IEEE/ACM Trans. Networking, vol. 7, no. 3, pp. 324-334, June 1999. [25] R. Sangireddy, N. Futamura, S. Aluru, and A.K. Somani, “Scalable, Memory Efficient, High-Speed Algorithms for IP Lookups,” IEEE/ACM Trans. Networking, vol. 13, no. 4, pp. 802812, Aug. 2005. [26] K. Lim, K. Park, and H. Lim, “Binary Search on Levels Using a Bloom Filter for IPv6 Address Lookup,” Proc. IEEE/ACM Fifth Symp. Architectures for Networking and Comm. Systems (ANCS), pp. 185-186, 2009. [27] P. Panda, N. Dutt, and A. Nicolau, “On-Chip vs. Off-Chip Memory: The Data Partitioning Problem in Embedded ProcessorBased Systems,” ACM Trans. Design Automation of Electronics Systems, vol. 5, no. 3, pp. 682-704, July 2000. [28] H. Lim and S. Kim, “Tuple Pruning Using Bloom Filters for Packet Classification,” IEEE Micro, vol. 30, no. 3, pp. 784-794, May/June 2010. [29] A.G.A. Priya and H. Lim, “Hierarchical Packet Classification using a Bloom Filter and Rule-Priority Tries,” Computer Comm., vol. 33, no. 10, pp. 1215-1226, June 2010. [30] H. Song, S. Dharmapurikar, J. Turner, and J. Lockwood, “Fast Hash Table Lookup Using Extended Bloom Filter: An Aid to Network Processing,” Proc. ACM SIGCOMM, Aug. 2005. [31] S. Taroma, C.E. Rothenberg, and E. Lagerspetz, “Theory and Practice of Bloom Filters for Distributed Systems,” IEEE Comm. Surveys and Tutorials, vol. 14, no. 1, pp. 131-155, Jan.-Mar. 2012. [32] J. Hasan, S. Cadambi, V. Jakkula, and S. Chakradhar, “Chisel: A Storage-Efficient, Collision-Free Hash-Based Network Processing Architecture,” Proc. 33rd Int’l Symp. Computer Architecture (ISCA), pp. 203-215, 2006. [33] W. Lu and S. Sahni, “Packet Forwarding Using Pipelined Multibit Tries,” Proc. IEEE Symp. Computers and Comm., pp. 802-807, May. 2006.

LIM ET AL.: ON ADDING BLOOM FILTERS TO LONGEST PREFIX MATCHING ALGORITHMS

[34] W. Lu and S. Sahni, “Packet Classification Using Pipelined TwoDimensional Multibit Tries,” Proc. IEEE Symp. Computers and Comm., pp. 808-813, May. 2006. [35] http://www.potaroo.net. Hyesook Lim (M’91-SM’13) received the BS and MS degrees from the Department of Control and Instrumentation Engineering, Seoul National University, Korea, in 1986 and 1991, respectively, and the PhD degree from the University of Texas at Austin, in 1996. From 1996 to 2000, she was employed as a member of the technical staff at Bell Labs of Lucent Technologies, Murray Hill, New Jersey. From 2000 to 2002, she worked for Cisco Systems, San Jose, California. She is currently a professor in the Department of Electronics Engineering and the associate vice president of faculty and academic affairs, Ewha Womans University, Seoul, Korea. Her research interests include router design issues such as address lookup and packet classification, Bloom filter application to various distributed algorithms, and packet forwarding at content centric networks. She is a senior member of the IEEE. Kyuhee Lim received the BS degree from the Department of Electronics Engineering, Ewha Womans University, Seoul, Korea, in 2005. From 2005 to 2009, she was employed at Hynix Semiconductor, Korea, where she was working for memory design. Her research interests include address lookup and packet classification algorithms and TCAM architecture design.

423

Nara Lee received the BS and MS degrees from the Department of Electronics Engineering, Ewha Womans University, Seoul, Korea, in 2009 and 2012, respectively. Her research interests include various network algorithms such as IP address lookup and packet classification, web caching, and Bloom filter application to various distributed algorithms. She is currently a Research Engineer with the IP Technical Team, DTV SoC Department, SIC Lab, LG Electronics, Inc., Seoul, Korea.

Kyong-Hye Park received the BS and MS degrees from the Department of Electronics Engineering, Ewha Womans University, Seoul, Korea, in 2007 and 2009, respectively. She works for Mobile Communication Business Unit, Samsung Electronics, Korea, where she is currently developing Android handsets. She is a student member of the IEEE.

. For more information on this or any other computing topic, please visit our Digital Library at www.computer.org/publications/dlib.

Suggest Documents