Survey and Proposal on Binary Search Algorithms ... - Semantic Scholar

6 downloads 207 Views 2MB Size Report
match, best matching prefix, binary trie, binary search. ...... free from nesting each other, the constructed binary search tree is a perfectly balanced tree. ...... [34] H. Lim, J. Seo, and Y. Jung, “High Speed IP Address Lookup Architec- ture Using ...
IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 14, NO. 3, THIRD QUARTER 2012

681

Survey and Proposal on Binary Search Algorithms for Longest Prefix Match Hyesook Lim, Member, IEEE, and Nara Lee, Student Member, IEEE

Abstract—The IP address lookup has been a major challenge for Internet routers. This is accompanied with a background of advances in link bandwidth and rapid growth in Internet traffic and the number of networks. This survey paper explores binary search algorithms as a simple and efficient approach to the IP address lookup problem. Binary search algorithms are categorized as algorithms based on the trie structure, algorithms performing binary search on prefix values, and algorithms performing binary search on prefix lengths. In this paper, algorithms in each category are described in terms of their data structures, routing tables, and performance. Performance is evaluated with respect to pre-defined metrics, such as search speed and memory requirement. Table update, scalability toward large routing data, and the migration to IPv6 are also discussed. Simulation results are shown for real routing data with sizes of 15000 to 227000 prefixes acquired from backbone routers. Suggestions are made for the choice of algorithms depending on the table size, routing data statistics, or implementation flexibility. Index Terms—Algorithm, IP address lookup, longest prefix match, best matching prefix, binary trie, binary search.

I. I NTRODUCTION

I

P ADDRESS lookup is the process to find out an output port and a next hop address corresponding to the destination IP address of each incoming packet [1], [2]. It is a basic operation for packet forwarding in Internet routers. The IP address has a hierarchical structure of a network identifier and a host identifier. The network identifier is termed a prefix. The IP address lookup is solely based on prefix; it searches for a prefix that matches the destination IP address of each incoming packet from the pre-defined routing table. The IP address lookup is a major bottleneck in routers based on observations found in the current Internet [2]. First, the number of packets incoming to routers has dramatically increased. This is accompanied by the explosive growth of link bandwidth and the emergence of new application programs and services, such as multimedia streaming, on-line games, and instant messenger. Second, about 50% of the Internet traffic is TCP acknowledgement packets; TCP acknowledgement packets are minimum-sized. Assuming that the minimumsized packets arrive back-to-back through the aggregated link bandwidth of 40 Gbps, routers should process hundreds of millions of packets per second in the worst case scenario. Manuscript received 10 July 2010; revised 14 May 2011. This work was supported by the Mid-Career Researcher Program through NRF grant funded by the MEST (2011-0000232). This work was also supported by the MKE (The Ministry of Knowledge Economy), Korea, under the HNRC-ITRC support program supervised by the NIPA (2011-C1090-1111-0010). H. Lim is with the Ewha Womans University, Seoul, Korea (e-mail: [email protected]). N. Lee is with the Ewha Womans University, Seoul, Korea. Digital Object Identifier 10.1109/SURV.2011.061411.00095

Third, backbone routers cannot resort to using a default route; they should recognize all network identifiers of incoming packets. Backbone routers currently have more than 150,000 host routes; the number of host routes is expected to be increased even further to 500,000 or a million. Hence, the amount of memory required to store the routing table becomes a critical issue. Fourth, there are about 250,000 concurrent flows in backbone routers; this means the current approach, which utilizes the locality of flows and caches the routing table, no longer works. Fifth, routing tables should be updated every 1 msec to 1 sec, because BGP protocol which is used for inter-autonomous system routing is very unstable. Hence, table updating is also a critical issue. Finally, it is necessary to consider the migration to IPv6, since the address lookup problem becomes even more complex under IPv6 that has a 128-bit address space. The current solution for IP address lookup is the use of ternary content addressable memories (TCAMs). TCAMs support parallel lookups for entire entries as a hardwarebased search mechanism and thus more than 250 million lookups per second [3]. However, high power dissipation and large area consumption are the major disadvantages of using TCAMs. While an SRAM only uses six transistors to store a bit, a TCAM uses 16 transistors. These issues worsen for IPv6 due to the much longer prefixes that need to be stored. Hence, algorithmic approaches that can replace the use of TCAM have been widely studied. For example, the tree bitmap algorithm [4], [5] was successfully implemented in the Cisco CRS-1 router [6]. Several metrics are used in evaluating the performance of various algorithms and architectures. First, lookup speed is the most important metric, since it is essential to process incoming packets in wire-speed. The lookup speed is evaluated by the required number of memory accesses for an address lookup, considering that the memory access is the most time consuming operation in lookup process. The required amount of memory to store a routing table is the next important metric. The amount of memory depends on the data structure used. It is essential to have efficient data structures optimizing the required memory size and providing fast search simultaneously. The possibility of incremental update of routing table is also an important metric. Scalability to huge routing data and the possibility of migration to IPv6 should be also considered. In this paper, we describe various IP address lookup algorithms and compare the characteristics. [7] has already published a survey on IP address lookup algorithms. However, several algorithms that are more interesting have been proposed. Most of the algorithms covered in this paper were

c 2011 IEEE 1553-877X/11/$25.00 

682

IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 14, NO. 3, THIRD QUARTER 2012

published after the work [7] was published. Our approach differs from the approach in [7]. We use a consistent example set to describe the data structure and the search procedure of each algorithm, so that each algorithm can be easily understood, compared, and practically implemented. The evaluation method also differs. While algorithms were evaluated in execution time in [7], this paper evaluates each algorithm in terms of the minimum, worst-case, and average-case number of memory accesses, as well as memory requirements. We found out binary search algorithms can be a low-cost solution that is easily and cheaply implemented without requiring special purpose hardware. Hence, this paper describes algorithms based on binary search. Here, binary search means that search space for a given input is halved in the ideal case, when a memory entry is accessed. The organization of the paper is as follows. Section II defines the IP address lookup problem. Section III describes IP address lookup algorithms based on trie structure that is the most natural data structure representing a routing table of variable-length prefixes. Section IV describes algorithms performing binary search on prefix values. Section V describes algorithms performing binary search on prefix lengths. While we describe 12 different algorithms from Section III to V, readers can skip any section depending upon what they need, since we made the sections relatively self-contained, so readers need not worry about an unexpected and unnecessary dependence of one section on another. Section VI shows performance evaluation results using real routing data with 15,000 to 227,000 prefixes acquired from backbone routers. In section VII, we discuss and compare the characteristics of the described algorithms. A brief conclusion is made in section VIII. II. IP A DDRESS L OOKUP P ROBLEM In the class-based addressing scheme, prefixes had fixed lengths of 8, 16, or 24 bits for 32-bit IPv4 address. Hence, the IP address lookup was performed using the exact match operation. However, the class-based addressing scheme had some issues. IP address space is wasted due to the inflexibility of fixed-length prefixes. Moreover, the size of routing tables was massively grown due to the lack of address aggregation. For example, assuming that a subnet with 1000 hosts is created, if a 16-bit prefix is allocated, many IP addresses are wasted, since a 16-bit prefix can hold up to 216 IP addresses. Otherwise, if four 24-bit prefixes are allocated, as each 24bit prefix holds 28 addresses, routing tables of each router should keep four entries for the subnet, instead of a single entry. The classless inter-domain routing (CIDR) scheme was introduced to overcome these issues. The CIDR scheme allows arbitrary length prefixes; thus, a 22-bit prefix (holding up to 210 addresses) can be allocated for the example case. Hence, this avoids the waste of IP address space, and allows prefix aggregation. However, an input address can match more than one prefix under CIDR. Hence, the IP address lookup problem now becomes determining the longest prefix matching a given input to forward the input to the most specific network. The longest matching prefix is called the best matching prefix (BMP).

Fig. 1.

Example network with 6-bit IP addresses[9].

The IP address lookup problem can be defined formally as follows [8]. Let P = {P1 , P2 , · · · , PN } be a set of routing prefixes, where N is the number of prefixes. Let A be an incoming IP address and S(A, k) be a sub-string of the most significant k bits of A. Let n(Pi ) be the length of a prefix Pi . A is defined to match Pi if S(A, n(Pi )) = Pi . Let M (A) be the set of prefixes in P that A matches, then M (A) = {P i ∈ P : S(A, n(Pi )) = Pi }. The longest prefix matching problem is to find the prefix Pj in M (A), such that n(Pj ) > n(Pi ) for all Pi ∈ M (A), i = j. Once the longest matching prefix Pj is determined, the input packet is forwarded to an output port directed by the prefix Pj . Figure 1 shows an example network in [9] that has a 6-bit address space (IP address in IPv4 is 32 bits) as a toy example. Each router obtains a routing table composed of a set of prefixes and the corresponding output port by running routing algorithms. For the example set of prefixes as shown, by searching the routing table for an arbitrary input address, 110100, we obtain M (A) = {1∗, 1101∗}. Of those two matching prefixes, prefix 1101* is identified as the longest matching prefix or the best matching prefix (BMP); it represents the most specific network that the input has to be forwarded. Hence, the input packet is forwarded toward the network though output port 2. While TCAM has been used in high-end routers to meet the performance requirement, most of the successful algorithmic approaches for the longest prefix matching in small enterprise routers are essentially high performance variants of trie algorithms due to their simplicity and implementation flexibility [10]. III. T RIE - BASED A LGORITHMS A. Binary Trie (B-trie) A trie [7] is a tree-based data structure allowing the organization of prefixes on a digital basis using the bits of prefixes to direct the branching. Each node has at most two children in a binary trie. Each prefix maps to a node in the binary trie

LIM and LEE: SURVEY AND PROPOSAL ON BINARY SEARCH ALGORITHMS FOR LONGEST PREFIX MATCH

683

TABLE I ROUTING TABLE FOR BINARY TRIE

0 1 2 3 4 5 6 7 8 9 10 11 12 13

Fig. 2.

Binary trie (B-trie).

of which the path and the level are determined by the prefix value and the length, respectively. Figure 2 shows the binary trie for the example set of prefixes, P = {P 0(00∗), P 1(010∗), P 2(1∗), P 3(110101∗), P 4(1101∗), P 5(111∗), P 6(11111∗)}. From the root node, the left edge corresponds to a bit value of 0 and the right edge corresponds to a bit value of 1. The depth of the trie is determined by the longest prefix existing in the routing data. As shown, empty nodes (white nodes) that are not associated with any prefix are included in the paths going to prefix nodes (dark nodes). Table I shows the routing table implementing the data structure of the binary trie. Entries in the routing table have one-to-one correspondence to a node in the binary trie, and the entry address is represented by a gray-colored number in each node. The first entry is the root node. Fields of the routing table include a valid bit to distinguish prefix nodes from empty nodes and two memory addresses pointing to its children. It also has a field for an output port used in the case of a match. We have noted the prefix name in this field for simplicity. It is not necessary to store the value and the length in the routing table entries in the trie structure, since the value and the length of each prefix are known by the path and the level of the prefix node from the root node. Search in the binary trie proceeds to a lower level, by examining a bit of the input address at a time. If it is 0, the search proceeds to a left child and otherwise proceeds to a right child, until it reaches where there is no pointer to follow. While going down the trie, the search process keeps track of the current best matching prefix (BMP), whenever a prefix node is encountered. When the search is over, the currently remembered BMP is returned. Assuming that a 6-bit input IP address 110100 is given, the search passes through entries of 0, 2, 5, 7, 9, and 11. The current BMP was the prefix P2 at entry 2. It is replaced with prefix P4 at entry 9 and the output port of prefix P4 is returned when the search is complete. The search space is reduced by half in a balanced binary trie by accessing a memory entry. Hence, the binary trie provides better search performance than the linear search, which reduces the search space by a single entry every time a memory entry is accessed.

Prefix valid 0 0 1 1 0 0 1 0 1 1 0 0 1 1

Left pointer 1 3 6 7 11 -

Right pointer 2 4 5 8 9 10 12 13 -

Output port P2 P0 P1 P5 P4 P6 P3

The binary trie provides the incremental update of a routing table. In deleting an obsolete prefix, it is only required to locate the prefix and reset the prefix invalid. In inserting a new prefix, after locating the corresponding entry, it is required to set the prefix valid and write down the directive information that is an output port. If there is no such entry, it may require adding multiple empty entries (for nodes in the path to the prefix), and the prefix is inserted as a leaf. It is expected that the binary trie provides very good characteristics, with respect to the scalability to a large routing table, because the trie would become denser. However, for the migration to IPv6 that has 128-bit address space, the depth of the binary trie becomes excessive and the search speed will become a major issue. Multi-bit tries examine more than 1 bit at a time to improve search performance. However, this requires more memory than the binary trie, since prefix replication occurs to adjust the number of bits in each level [7]. Most of the successful schemes providing high-speed IP address lookup are variants of trie architecture, such as tree bitmap [4], [5], multi-bit trie [11], [12], shape-shifting trie [10], binary decision diagram [13], or trie partitioning [14], [15]. B. Path-Compressed Trie (PC-Trie) The path-compressed trie (PC-trie) [7] is often referred to as a PATRICIA trie. The major motivation of the PC-trie is to improve the characteristics of the binary trie. It first uses skip values to remove single-child empty internal nodes. It also removes a child pointer by converting the sub-trie of each node into a full trie, so that each node only needs to store a pointer to its first child. Figure 3 shows a PC-trie for the same set as the example. Table II shows the corresponding routing table. All singlechild empty nodes are removed using skip values. Several empty entries are created to convert the sub-trie of each node into a full trie, as shown in Table II. While the prefix values and the lengths mapped to each node do not have to be stored in the binary trie, they should be stored in the PC-trie due to bit skipping. The search in the routing table follows. Starting from the first entry, a given input is compared to the stored string. If they match and it is a valid prefix, the prefix is maintained as the current best matching prefix (BMP). In going down the trie, if this entry has a skip value other than 0, then the amount

684

IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 14, NO. 3, THIRD QUARTER 2012

Fig. 4. Fig. 3.

TABLE III ROUTING TABLE FOR PRIORITY TRIE

Path compressed trie (PC-trie). TABLE II ROUTING TABLE FOR PC- TRIE

0 1 2 3 4 5 6 7 8 9 10 11 12

Priority trie (P-trie).

Prefix valid 0 0 1 1 1 0 0 1 1 1 0 0 1

Skip value 0 0 0 0 1 0 1 1 1 0

String

Length

* 0* 1* 00* 010* 11* 1101* 111* 110101* 11111*

0 1 1 2 3 2 4 3 6 5

Next pointer 1 3 5 7 9 11 -

Output port P2 P0 P1 P4 P5 P3 P6

of input bits corresponding to the skip value is skipped. If the next bit is 0, the search is directed to the memory location of the next pointer. If the next bit is 1, the search is directed to the memory location of the next pointer plus 1. As the number of routing entries increases, the number of single-child empty nodes decreases. Hence, the relative superiority of the PC-trie compared to the binary trie can be decremented. However, IPv6 is expected to have many singlechild empty nodes. Hence, the PC-trie will achieve much better performance than the binary trie for IPv6. C. Priority Trie (P-trie) A priority trie [16], [17] is proposed to improve the performance of the binary trie by removing empty internal nodes. The priority trie (P-trie) is proposed to relocate the longest prefix belonging to the sub-trie of each empty internal node into the empty node to remove empty internal nodes in the binary trie. The prefix stored in the level equal to its length is named an ordinary prefix and the prefix stored in the level higher than its length is named a priority prefix. Figure 4 shows the P-trie corresponding to the binary trie of Figure 2. If there is more than one longest prefix belonging to the sub-trie of an empty node, it is assumed to relocate the leftmost one. Priority prefixes, which are relocated to empty nodes, are represented by nodes with a bold boundary. For the empty root shown in Figure 2, since the prefix P3 is the longest prefix, P3 is relocated to the root node. For the node of 0*, the prefix P1 is the longest, and hence P1 is relocated, and so on.

0 1 2 3 4 5 6

Priority/ Ordinary 1 1 0 0 1 1 0

Prefix

Length

110101* 010* 1* 00* 11111* 1101* 111*

6 3 1 2 5 4 3

Left pointer 1 3 5 -

Right pointer 2 4 6 -

Output port P3 P1 P2 P0 P6 P4 P5

Table III shows the routing table storing the P-trie. The number of entries is equal to the number of prefixes, as shown, since empty nodes are removed. The first field represents whether the entry stores a priority prefix or an ordinary prefix. Fields for the prefix value and the length are stored in each entry of the routing table, while they are not stored in a binary trie. Search in the P-trie is similar to that in the binary trie, except that if an input matches a priority prefix, search can be immediately terminated without proceeding toward to a leaf node, since the priority prefix is the longest prefix among the prefixes belonging to the sub-trie of this node. The Ptrie requires less memory and improves search performance compared to the binary trie. [17] shows that 2 to 3 nodes on average are affected for the insertion or the deletion of a prefix with regard to the incremental update of the routing table. If the binary trie is dense, the improvement of using the P-trie is not significant, with respect to the scalability to a huge routing data. The priority trie will significantly improve the performance in the migration to IPv6, since there would be many empty internal nodes in the binary trie of IPv6. IV. A LGORITHMS PERFORMING BINARY SEARCH ON PREFIX VALUES

The trie-based data structures described in section III construct binary tries by utilizing the characteristics that each bit of IP addresses has 0 or 1, and they perform linear search in prefix length. Hence, the search performance of the trie-based data structures depends on the length of prefixes and it is not related to the number of prefixes. If we perform binary search on prefix values, the performance will depend on the number of prefixes, rather than the length of prefixes. However, prefixes with different lengths are not directly compared in their values. Moreover, even after prefixes are assumed to

LIM and LEE: SURVEY AND PROPOSAL ON BINARY SEARCH ALGORITHMS FOR LONGEST PREFIX MATCH

685

TABLE IV ROUTING TABLE FOR BINARY SEARCH ON RANGE Value

Fig. 5.

Binary search on range (BSR).

be sorted according to their values, naive binary search is not possible due to prefix nesting relationships. Algorithms described in this section propose methods to sort prefixes of different lengths and methods to handle the prefix nesting relationship to perform the binary search on prefix values. A. Binary Search on Range (BSR) In the binary search on range (BSR) [18] scheme, each prefix is considered as a range in a line of [0, 232 − 1] and a 32-bit IP address is represented by a point in the line. Each prefix range has a fixed length start address and a fixed length end address. The start and the end addresses are generated by padding zeros and ones to the prefix into the maximum length, respectively. The IP address lookup problem is now to determine the smallest range that includes the point of an input IP address. A range corresponding to a prefix can include a range of another prefix, if the prefix is the sub-string of the other prefix. The BSR scheme overcomes this issue by separating ranges into disjoint intervals and by pre-computing and storing the BMP for each interval. Figure 5 shows the range representation of prefixes for the same set of example prefixes (assuming 6-bit address space for simplicity). The range nesting relationship is also shown in Figure 5, in which a prefix range is completely included in another prefix range, such as the range of prefix P6 being included in the range of prefix P5 and consequently in the range of prefix P2. Table IV shows the corresponding routing table. In Table IV, each entry is either the start address or the end address of a range that is padded to the maximum number of bits. If the start address of a range follows the end address of another range consecutively, the start address is omitted in the routing table. Entries are sorted in the order of magnitude, and BMPs for the equivalence case and the greater-than case are pre-computed as shown. Search in this scheme starts from accessing the medium entry of the routing table. If the given input is equal to the entry value, the search is over and the pre-computed BMP in the equivalence field is returned. If the given input is smaller than the entry value, the search space is reduced to the upper half of the table; otherwise, it is reduced to the lower half of the table. When the input is bigger than the entry value, the pre-computed BMP value in the greater-than field is remembered. For example, assume an input address of 110100. The input is compared to the entry in the middle, 110100, and matches. Hence, the search is complete and the BMP in the equivalence field, which is the prefix P4, is returned. As another example, assume an input address of 111000. The input is bigger than 110100; hence, the search space

000000 001111 010111 100000 110100 110101 110111 111110 111111

BMP (equivalence) P0 P0 P1 P2 P4 P3 P4 P6 P6

BMP (greater-than) P0 P1 P2 P4 P4 P5 P6 -

becomes the lower half of the routing table. In this case, the BMP in the greater-than field, which is the prefix P4, is remembered. In the lower half of the routing table, the input is compared to the entry in the middle (which is 110111 or 111110). Assuming that the input is compared to the entry 111110, the input is smaller, and hence the search space is reduced to the upper half, (which has two entries 110101 and 110111). Assuming that the input is compared to the entry 110111, the input is bigger, and hence remembers the BMP in the greater-than field, which is the prefix P5. There is no other entry in the lower half; hence, the search is complete and returns prefix P5. Even though the number of entries of the BSR scheme can become twice that of the prefixes in the worst case, it provides very good search performance, since the BSR scheme performs a balanced binary search. However, due to the pre-computation of BMPs, a single update operation might cause a large and costly restructuring of the table, and hence the incremental update of the routing table is not provided. Dynamic router tables, which trade off the search performance for the incremental update in the BSR scheme, were studied in [19], [20]. The BSR is expected to be good, with respect to the scalability to a large routing table or to the migration to IPv6, since it performs the balanced binary search and does not depend on the length of prefixes. It was shown that the number of routing table entries can be significantly reduced utilizing the multi-way and multi-column range search; hence, search performance is improved [21]. B. Binary Search Tree (BST) While the BSR algorithm converts a variable length prefix to a range with the same length start and end addresses, the binary search tree (BST) [22] provides a set of new definitions for the magnitude comparison of two different-length prefixes. In comparing two prefixes with different lengths, the shorterlength prefix is compared to the same-length sub-string of the longer-length prefix. The prefix with bigger (smaller) numerical value is defined as bigger (smaller). If they are the same, the next bit of the longer-length prefix is examined. If it is one, the longer-length prefix is bigger; otherwise, the shorter-length prefix is bigger. For example, comparing two prefixes 00* and 010*, the prefix 010* is bigger, since the first two-bits of the prefix is bigger than the prefix 00*. Comparing two prefixes 1* and 1101*, prefix 1101* is bigger, since the

686

Fig. 6.

IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 14, NO. 3, THIRD QUARTER 2012

Binary search tree (BST).

first bits are the same and the next bit of the prefix 1101* is 1. Interestingly enough, if we scan the prefixes in the binary trie of Figure 2 from left to right, we have a list of prefixes 00*, 010*, 1*, 110101*, 1101*, 111*, and 11111*. This is in the same order sorted using the described definition. However, the binary search cannot be directly applied to this sorted list due to the prefix nesting relationship. For example, assume that we have a 6-bit input 110100. Obviously, the BMP of this input should be 1101*, but the BMP is not correctly found, if we perform the binary search against the sorted list. The prefix in the middle of the list, which is 110101*, is compared to the input. Since the input is smaller, the search goes to the left half of the list that does not include the prefix 1101*. The reason of this failure comes from prefix 1101* being contained in the prefix 110101* (termed prefix nesting) and prefix 110101* was compared earlier than prefix 1101*. The BST algorithm provides a restriction in building the binary search tree to solve this issue. A prefix is defined as an enclosure, if there exists at least one other prefix that has the prefix as its sub-string. A prefix that has other prefixes as its sub-string is defined as an enclosed prefix. A disjoint prefix is neither an enclosure nor an enclosed prefix. The restriction is to locate the enclosure at a higher level than the enclosed prefix. We have shown the prefix nesting relationship and the binary search tree (BST) in Figure 6. Prefixes P0 and P1 are disjoint prefixes. Prefix P2 is a first level enclosure, and prefixes P4 and P5 are second level enclosures. The list in composing a binary search tree primarily consists of disjoint prefixes and the first level enclosure prefix, and they are sorted according to the provided definition. The medium prefix in the sorted order is selected as the root of current level; if the selected prefix is an enclosure prefix, its enclosed prefixes can be inserted to the list in the sorted order. A binary search tree is constructed by repeating this process. Figure 6 shows the binary search tree. As shown, enclosures are located in higher levels than enclosed prefixes. As shown in Figure 6, the BST is highly unbalanced due to the restriction caused by prefix nesting. Table V shows the routing table implementing the BST data structure. A given input address is compared to the stored prefix at each node in the search process using the definition described

TABLE V ROUTING TABLE FOR BINARY SEARCH TREE

0 1 2 3 4 5 6

Prefix

Length

010* 00* 1* 1101* 110101* 111* 11111*

3 2 1 4 6 3 5

Left pointer 1 4 -

Right pointer 2 3 5 6 -

Output port P1 P0 P2 P4 P3 P5 P6

above. If the input is smaller than the prefix value, the search follows the left pointer; otherwise, the search follows the right pointer. If they are the same, it is maintained as the current BMP, until a new match is found, while going down the tree. For example, an input address 110100 is given. Since the input is bigger than the prefix 010* at the root node, the search follows the right pointer. At entry 2, the input matches the prefix 1* and hence the current BMP is P2. Since the input is bigger, the search follows the right pointer. At entry 3, the input matches the prefix 1101*, and hence the current BMP becomes P4. Since the input is smaller, the search follows the left pointer. At entry 4, the input does not match and there is no pointer to follow. Hence, the search is terminated and returns prefix P4 as the BMP. The BST structure does not provide incremental update of the routing table, since the complete restructuring of the tree is required for the deletion of a prefix in the worst case, while a single insertion operation can cause a change of several entries of the routing table. Moreover, the depth of the BST becomes excessive for a routing data with many levels of prefix containments; hence; it does not provide good scalability toward large routing data. However, since the BST does not depend on the length of prefixes, it can be readily migrated to IPv6, if there are not many levels of prefix containments. C. Weighted Binary Search Tree (WBST) As an attempt to reduce the depth of the BST, the weighted binary search tree (WBST) [23] is motivated to choose a better root in each level. While the BST algorithm selects the medium prefix, as the root of the current level, no matter whether the prefix is an enclosure or a disjoint, WBST

LIM and LEE: SURVEY AND PROPOSAL ON BINARY SEARCH ALGORITHMS FOR LONGEST PREFIX MATCH

Fig. 8.

Fig. 7.

Binary search tree with prefix vectors (BST-PV). TABLE VII ROUTING TABLE FOR BST-PV

Weighted binary search tree (WBST). TABLE VI ROUTING TABLE FOR WEIGHTED BINARY SEARCH TREE

0 1 2 3 4 5 6

Prefix

Length

1* 00* 1101* 010* 110101* 111* 11111*

1 2 4 3 6 3 5

Left pointer 1 4 -

687

Right pointer 2 3 5 6 -

Output port P2 P0 P4 P1 P3 P5 P6

considers the number of enclosed prefixes included in an enclosure. WBST first defines the weight of a prefix, as the number of enclosed prefixes plus 1 (considering its own weight). In the first level list of Figure 6, the BST algorithm selects prefix P1 as the root, since it is the medium prefix among three prefixes. However, while prefixes P0 and P1 have weights of 1, prefix P2 has weight of 5. Since the fourth prefix in magnitude (which is in the middle of 7 sorted prefixes) is included in the bag of the enclosure prefix P2, WBST makes the prefix P2 as the current level root, and repeats this process recursively. By considering the weight of each prefix, enclosure prefixes tend to be located in a higher level in WBST; this makes the constructed tree more balanced. Figure 7 shows the constructed WBST. The depth of WBST is reduced from BST. Table VI shows the routing table implementing the WBST data structure. The search procedure is the same as that in BST. D. Binary Search Tree with Prefix Vector There can be several different approaches in overcoming the prefix nesting relationship to construct a balanced binary search tree. Prefix grouping to multiple disjoint groups of prefixes is used in [24]-[25]. The binary search tree with the prefix vector (BST-PV) [8] algorithm constructs a binary search tree only with leaves of the binary trie and makes each leaf have the nested prefix information using a prefix vector. For example, from Figure 2, the leaf prefix P3 has a prefix vector, {P2, null, null, P4, null, P3}, which means the prefix P3 has enclosures in lengths 1 and 4. Since leaf prefixes are free from nesting each other, the constructed binary search tree is a perfectly balanced tree. Figure 8 shows BST-PV for the same set of prefixes. Table VII shows the routing table for BST-PV. The BST-PV algorithm has two steps to build the routing table. First, prefix vectors are constructed for each leaf prefix of the binary trie. Second, the leaf prefixes with a prefix

Prefix 00* 010* 110101* 11111*

Length 2 3 6 5

P2 P2

P0 -

Prefix vector P1 P5

P4 -

P6

P3 -

vector are sorted in ascending order, based on the definitions of the BST [22] algorithm described earlier. We do not need to store pointers for child nodes, since child nodes are always the medium entries of the upper and the lower half of the search space in a perfectly balanced tree. BST-PV provides a balanced binary search tree with a smaller number of nodes than the actual number of routing prefixes. The search procedure follows. At each node, the given input address is compared to the stored prefix. If they are the same, the search is complete by returning the forwarding information corresponding to the stored prefix. If they are not the same, the given input should be compared to the prefix vector. The length and the forwarding information of the longest matching prefix in the prefix vector of the current node are remembered. The length of the currently remembered BMP is compared to that of a new match in lower levels, and if the length of the new match in lower levels is longer, the length of the current BMP and the corresponding forwarding information are replaced by the new ones. The incremental update of the routing table of BST-PV algorithm is possible but as complex as BSR due to the nature of the balanced binary search. A single insertion may affect the entire entries in the worst case. Unlike other binary search trees, such as BST and WBST, the BST-PV provides very good scalability to large routing data, since it constructs a balanced tree composed of leaf prefixes only. The BST-PV provides very good scalability in migration to IPv6. E. Binary Search Tree with Switch Pointer (BST-SP) In this sub-section, we propose a new binary search algorithm. In this algorithm, we use a different approach to handle the prefix nesting relationship; we use switch pointers. The algorithm is named the binary search tree with switch pointers (BST-SP). The naive binary search was not correctly performed in the binary search tree if an enclosed prefix was compared earlier than the enclosure and so the enclosure did not have a chance to be compared. Each node holds the length of the direct enclosure and the pointer to the direct enclosure in the BST-SP to resolve this issue. Here, the direct enclosure is the enclosure of the longest length. For example in Figure 2, the direct enclosure of a prefix 110101* is 1101*, and the direct enclosure of a prefix 1101* is 1*. The pointer to

688

IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 14, NO. 3, THIRD QUARTER 2012

TABLE VIII ROUTING TABLE FOR BINARY SEARCH TREE WITH SWITCH POINTER

Fig. 9.

Binary search tree with switch pointer (BST-SP).

the direct enclosure is the memory address storing the direct enclosure. That is, we construct a naive balanced binary search tree and use a pointer to the enclosure that may not be covered during the binary search. Figure 9 shows BST-SP. The dotted edges are the pointers for the direct enclosure of each prefix. The two steps in the BST-SP algorithm search are binary search and pointer search. Binary search is the same as the usual binary search procedure but additionally keeps track of information about the possible matching enclosure length and the pointer. That is, at each node, the binary search procedure keeps track of four types of records; the current best matching prefix (BMP) and its length (BML), the same as in the usual binary search, and additionally, the current matching enclosure length (CMEL) and switch pointer (SP). The CMEL is defined as the matching length within the length of direct enclosure of the stored prefix. The pointer search is followed by the binary search, only if there is a possibility of another matching prefix that is longer than the current BMP. Table VIII shows the routing table of the BST-SP algorithm. For example, assume we have an input address 110100. Initially, BMP and BML are set to wildcard and 0, respectively. Both CMEL and SP are set to 0. At the root node, the input is compared to prefix P3. The BMP and the BML are not updated, since the input does not match P3. The input matches the prefix, up to 5 bits, but the direct enclosure of P3 has a length of 4 bits. Hence, CMEL is set to 4 and SP gets a value from the direct enclosure pointer, which is the entry address 4. Since the input is smaller than P3, the search moves to the medium entry of the upper half, the entry of P1. The same procedure is repeated. Since the input does not match P1, the BMP and the BML are not updated. The length of the direct enclosure of the prefix P1, which is 0, is shorter than CMEL (4), and hence CMEL and SP are not updated. Since the input is bigger than P1, the search moves to the medium entry of the lower half, the entry of P2. The input matches P2, and hence the BMP becomes P2 and the BML is 1. The direct enclosure length of the prefix P2 is 0, shorter than CMEL. Hence, CMEL and SP are not updated. It is now time to determine if the pointer search is required, since there is no node to follow. The current BML is 1; it is shorter than CMEL (4). This means that it is possible to have a longer prefix than the current BMP. Hence, the pointer search should be followed using SP. During the pointer search, comparison at each node results in one of the following three cases. First, if the given input matches the prefix, the search is over and the new BMP is returned. Second, if the length of the sub-string of the prefix

0 1 2 3 4 5 6

Prefix

Length

00* 010* 1* 110101* 1101* 111* 11111*

1 3 1 6 4 3 5

Output port P0 P1 P2 P3 P4 P5 P6

Switch pointer 4 2 2 5

Enclosure length 0 0 0 4 1 1 3

matching the input is equal to or shorter than the current BML, there is no possibility of a longer matching prefix. Hence, the search is over and the current BMP is returned. Otherwise, we need to jump to the next node using the switch pointer and repeat the pointer search. In this example, at the node of prefix P4 (which is accessed by SP), the input is compared to prefix P4 and matches. This is the first case. Hence, the search is over and prefix P4 is returned as the BMP. As another example, assume an input address of 111000. At the root node, the input does not match prefix P3; hence, BMP and BML are not updated. The input and the prefix match up to 2 bits; hence, CMEL is 2 and SP is 4. At entry 5, the input matches prefix P5; hence, BMP is P5 and BML is 3. CMEL and SP are not updated, since the length of direct enclosure of the prefix is shorter than CMEL. At entry 4, the input does not match prefix P4. Pointer search is unnecessary, since the current BML is 3 and CMEL is 2. The search is over and returns the current BMP, the prefix P5. The incremental update of the routing table of the BSTSP algorithm is as complex as for BSR and BST-PV. BST-SP provides very good scalability to large routing data, as well as very good scalability in migration to IPv6, since it constructs a balanced tree. V. A LGORITHMS PERFORMING BINARY SEARCH ON PREFIX LENGTHS

The trie-based algorithms compare one bit at a time; hence, they perform linear search on prefix lengths. If the maximum prefix length is W , they require W times of memory accesses in the worst case. The search performance of the algorithms performing the binary search on prefix values is proportional to log2 N , where N is the number of prefixes. Algorithms described in this section perform binary search on prefix lengths and hashing on prefix values [26]-[30]. Hence, the search performance is proportional to log2 W . A. Waldvogels Binary Search on Length (W-BSL) The binary search on length structure proposed by Waldvogel et al. [26] separates the binary trie, according to the level of the trie, and stores nodes included in each level in a hash table. Binary search is performed on the hash tables of each level. When accessing the medium-level hash table, if there is a match, the search space becomes the longer half; otherwise, the search space becomes a shorter half. Figure 10 shows the Waldvogels binary search on length (W-BSL) structure including access levels for the binary trie

LIM and LEE: SURVEY AND PROPOSAL ON BINARY SEARCH ALGORITHMS FOR LONGEST PREFIX MATCH

Fig. 10.

689

Waldvogels binary search on lengths (W-BSL).

of Figure 2. Level 3 is the first level of access. If a given input matches a prefix node in level 3, the search proceeds to level 5. If the input does not match any node, the search proceeds to level 1. However, even though the given input matches a prefix node in the current level of access, there may not be a longer length prefix. Moreover, even though there is no match in the current level, there may be a longer length prefix. This contradicts the basic search procedure of this approach. The W-BSL structure uses the pre-computation of markers and BMPs to resolve this issue. Markers are pre-computed in the internal nodes, if there is a longer length prefix in the levels accessed later. However, markers sometimes mislead the search. The search proceeds to a longer level due to markers but there could be no match in longer levels. In that case, back-tracking to the marker node should occur. Back-tracking is not preferred, since it degrades search performance. A precomputed BMP is maintained at each marker, and the precomputed BMP is returned, when there is no match in longer levels, to avoid back-tracking. Figure 10 shows markers and their BMPs. The markers and the BMPs are pre-computed and maintained for nodes of preceding levels of access. They do not need to be maintained for the last level of access, which are level 2, 4, and 6 in Figure 10. Table IX shows the hash tables in each level. The hash tables are constructed for levels including prefixes and contain all of prefix nodes and internal nodes. They also contain pointers to which table the search needs to proceed, in cases of a match or no match. (Here, we only show the hash entries corresponding to nodes of the trie, assuming that perfect hash functions are given for each level.) As a search example, for a 6-bit input 110100, the most significant 3 bits are used for hashing in accessing level 3. In Table IX(a), since the input matches a marker node, it is remembered that the current BMP is P2, and search goes to level 5. In level 5, shown in Table IX(c), the most significant 5 bits of the input matches a marker node; hence, it remembers prefix P4, as the current BMP. The search goes to level 6, shown in Table IX(f) and does not match. The search is over and prefix P4 is returned as the BMP, since it is the last level of access.

TABLE IX H ASH TABLES FOR BINARY SEARCH ON LENGTHS BY WALDVOGEL

Prefix/Internal 1 0 1

Node 010 110 111

(a) Level 3 Marker Prefix/pre-computed BMP 0 P1 1 P2 0 P5

Prefix/Internal 0 1

Node 0 1

(b) Level 1 Marker Prefix/pre-computed BMP 1 0 P2

Prefix/Internal 0 1

Node 11010 11111

(c) Level 5 Marker Prefix/pre-computed BMP 1 P4 0 P6

Prefix/Internal 1 0 0

Node 00 01 11

(d) Level 2 Marker Prefix/pre-computed BMP 0 P0 0 0 -

Prefix/Internal 1 0

Node 1101 1111

(e) Level 4 Marker Prefix/pre-computed BMP 0 P4 0 -

Node 110101

(f) Level 6 Marker Prefix/pre-computed BMP 0 P3

Prefix/Internal 1

The W-BSL scheme provides very good search performance, as the first binary search on length structure. However, it does not provide the incremental update of routing tables due to the pre-computation of markers and their BMPs. It would provide good scalability toward large routing data, the same as the binary trie. While the binary trie does not show good performance on IPv6, this scheme would provide good performance on IPv6, since it performs binary search on prefix lengths.

690

Fig. 11.

IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 14, NO. 3, THIRD QUARTER 2012

Lims binary search on lengths (L-BSL).

B. Binary Search on Length in a Leaf-Pushed Trie (L-BSL) W-BSL requires complex pre-computation due to the prefix nesting relationship in which one prefix becomes a sub-string of another prefix. That is, since each node can have prefixes in both upper and lower levels, the markers, and their BMPs should be pre-computed. If every prefix is disjoint to each other, they are located only in leaves and free from a prefix nesting relationship. Hence, the binary search on prefix lengths for the set of disjoint prefixes can be performed without precomputation. Lims binary search on length structure (L-BSL) [27] uses leaf-pushing [31] to make every prefix disjoint. Figure 11 shows the leaf-pushed binary trie, and the levels of access performing the binary search on lengths. It is represented that leaf-pushed nodes are connected to the trie by dotted edges. Table X shows the routing table performing the binary search on lengths in a leaf-pushed trie. Each hash table has pointers to the next table to continue the search. L-BSL finishes the search, either when a match to a prefix occurs or when it reaches the last level of access, while W-BSL always finishes the search, when it reaches the last level of access. If the search encounters an internal node in the current level, the search proceeds to a longer level, since no prefix in shorter levels is guaranteed. If there is no matching node in the current level, the search proceeds to a shorter level, since no node in the longer levels is guaranteed. If the search encounters a prefix, the search is over, since prefixes are only located in leaves. For the same example of a 6-bit input 110100, the most significant 3 bits are used for hashing in accessing level 3. In Table X(a), since the input matches an internal node, the search goes to the level 5. In level 5, shown in Table X(c), the most significant 5 bits of the input, 11010, matches an internal node; hence, the search goes to level 6, shown in Table X(e). The input matches prefix P4 in level 6. Hence, the search is over and P4 is returned as BMP. Even though this scheme does not require complex precomputation, the number of nodes is increased and prefix deletion is not possible due to leaf-pushing. L-BSL provides very good scalability toward large routing data and the migration to IPv6, as in the W-BSL structure.

TABLE X H ASH TABLES FOR BINARY SEARCH ON LENGTHS BY L IM (a) Level 3 Prefix/Internal Node 1 010 0 110 0 111

Prefix P1 -

(b) Level 2 Prefix/Internal Node 1 00 0 01 1 10 0 11

Prefix P0 P2 -

(c) Level 5 Prefix/Internal Node 0 11010 1 11011 1 11110 1 11111

Prefix P4 P5 P6

(d) Level 4 Prefix/Internal Node 1 1100 0 1101 1 1110 0 1111

Prefix P2 P5 -

(e) Level 6 Prefix/Internal Node 1 110100 1 110101

Prefix P4 P3

C. logW-Elevator Algorithm (logW-E) The logW-Elevators (logW-E) algorithm [29], [30] is another interesting algorithm based on binary search of prefix lengths. This algorithm constructs multiple kth-level tries for k = W/2, W/4, · · · , 2 required to perform a binary search on levels, in addition to a PATRICIA trie. Figure 12 shows the logW-E algorithm for the example prefix set. The figure shows hash tables and pointers to the next trie at each node. While the logW-E algorithm is based on the PATRICIA trie, here we describe the algorithm based on a binary trie, for simplicity. In this example, assuming that the maximum prefix length is 8 bits, a 4th-level trie, a 2nd-level trie, and a binary trie are

LIM and LEE: SURVEY AND PROPOSAL ON BINARY SEARCH ALGORITHMS FOR LONGEST PREFIX MATCH

691

to log2 W . The update performance can be O(N ) in the worst case due to the pre-computation of BMP for nodes crossing every other level of the binary trie. However, it is shown that the update performance is O(log W ) for more than 97% of inserted prefixes in [29]. D. Binary Search on Lengths in Multiple Tries (BSL-MT)

Fig. 12.

logW-Elevators Algorithm (logW-E).

constructed. The 4th-level trie has nodes at levels of a multiple of W/2, and hence the 4th-level trie has nodes at levels 0, 4, and 8 (if it exists). Similarly, the 2nd-level trie has nodes at levels of multiples of W/4, and the binary trie has nodes at every level. For update and memory efficiency, it is assumed that the output port information corresponding to each prefix is stored in the binary trie only, so that BMP is identified from the binary trie. For the same example of a 6-bit input 110100, by accessing the hash table of the root node of the 4th-level trie, the search is directed to node 1101 which is a prefix node. This means that the length of the BMP is at least 4 bits. If it were a leaf node, the search is over, after identifying the output port information of the node from the corresponding node of the binary trie. Since it is not a leaf node and it does not have a hash entry for level 8, the length of the BMP is less than 8. The search follows the pointer to the corresponding node of the 2nd-level trie. Since there is no hash entry corresponding to next two bits 00, the length of the BMP is less than 6. The search follows the pointer to the corresponding node of the binary trie and finds a match to P4. The search continues to the left child; since it is not a prefix node, the search is over and returns P4 as BMP. As another example of a 6-bit input 111000, the length of BMP is less than 4, since there is no entry corresponding to 1110 in the hash table of the root node of the 4th-level trie. The search goes to the root node of the 2nd-level trie. Since there is a hash entry of first two bits 11, the search goes to level 2 of the 2nd-level trie and follows the pointer to the corresponding node of the binary trie. The search finds a match to P2 (represented by a gray node), by accessing node of 11 at the binary trie, and follows the right pointer and finds a longer match to P5. The search is over and returns P5 as BMP. This algorithm assumes that nodes crossing the kth level of the binary trie (level 2, 4, and 6 of the binary trie in this example) should have pre-computed BMPs, represented by gray nodes. Otherwise, if there is no match at level 3 in this example, the search would fail to find BMP. There were five memory accesses to search for BMP for each input in these examples. This algorithm requires a large amount of memory, as will be shown in the simulation, since the number of tries is equal

In this sub-section, we propose a new algorithm performing the binary search on lengths. This algorithm is the only algorithm providing incremental update, while performing binary search on lengths. The binary search on length structures described either requires complex pre-computation or prefix replication. We propose to separate the prefix set into multiple disjoint groups and perform binary search on prefix lengths in each group to make prefixes disjoint. We term this scheme binary search on lengths in multiple tries (BSL-MT). Hence, this scheme eliminates both the pre-computation requirement and the leaf-pushing requirement. We have shown the binary tries and the levels of access in each trie in Figure 13 for BSL-MT. It is separated into three groups, since the maximum number of prefix containments is three for the example set. The most efficient way of prefix grouping is to separate tries based on relative levels of prefixes. That is, a group for the longest prefixes in a prefix nesting relationship, a group for the next longest prefixes, and so on. In Figure 13, the first trie is composed of the longest prefixes in each path. The second trie is composed of the next longest prefixes, and the third trie is composed of the shortest prefixes. Table XI shows the routing table for BSL-MT. The binary search on prefix lengths is easily performed in BSL-MT. Starting from the first trie, if the search encounters an internal node, the search proceeds to a longer level, since there is no prefix in shorter levels. If there is no matching node in the current level, the search proceeds to a shorter level, since there is no node in longer levels. If there is a matching prefix in the current level, the search is over, since it is the longest matching prefix of a given input. If the search does not find the matching prefix in the current trie, the search is continued in the second trie, and so on. Note that the search can be immediately over, when it encounters a prefix in BSLMT, since tries are composed according to their relative levels of hierarchy from the longest to the shortest. This scheme provides the incremental update of the routing table. The worst case search performance is not good with respect to the scalability toward the large routing data, but the average search performance is comparable to other binary search on length algorithms, as will be shown in next section. For the migration to IPv6, if there is an excessive number of prefix containments, the performance will be degraded due to the excessive number of tries constructed. VI. P ERFORMANCE E VALUATION We performed simulations on real routing data acquired from backbone routers [32] using the C language. We have five different sets of routing data that have approximately 15000, 40000, 112000, 170000, and 227000 prefixes. Table XII shows the required memory for each algorithm. Among the trie-based algorithms, the priority-trie (P-trie)

692

Fig. 13.

IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 14, NO. 3, THIRD QUARTER 2012

A binary search on lengths in multiple tries (BSL-MT).

shows the best performance with respect to memory. The performance difference between the binary trie and the Ptrie is noticeable for a small routing table, but it becomes insignificant for large routing sets, such as Grouptlcom and Telstra, as expected, due to the reduced number of empty internal nodes. Among the algorithms performing the binary search on prefix values, BST, WBST, and BST-SP algorithms provide the same performance, since the number of nodes is equal to the number of prefixes in these algorithms and the entry widths are almost the same. BSR shows the best scalability regarding the number of prefixes, in which the increasing rate of the required memory amount is relatively slow compared to other algorithms. We assume perfect hash functions for prefixes in each level and the number of hash entries in each level is equal to 2log2 Nl  , where Nl is the number of nodes in the level l, for algorithms performing binary search on prefix lengths. W-BSL and L-BSL show similar performance, while BSL-MT requires more memory, as expected. The logW-E algorithm consumes more memory than other algorithms, since the algorithm constructs multiple tries, in addition to a binary trie, as well as requiring hash tables for nodes crossing the levels in kth-level tries. We can think of two different approaches with respect to memory consumption. The first approach is to reduce the

number of memory entries. BST-PV [8] follows this approach; the number of memory entries is smaller than the number of prefixes. The P-trie [17], BST [22], WBST [23], and BST-SP algorithms also follow this approach. The required number of memory entries is equal to the number of prefixes by removing unnecessary entries. The other approach is to reduce the memory consumption by reducing memory width. B-trie [7] does not require storing prefix values and lengths and thus requires smaller width. PC-trie [7] removes a child pointer by making the sub-trie of each node a full trie. BSR [18] completely removes child pointers by achieving perfectly balanced binary search. However, if routing tables are stored using on-chip memory, we can utilize wide-word parallelism, and hence it is better to reduce the number of entries, rather than the width of memory entries. Search performance is evaluated in terms of the number of memory accesses. Figures 14, 15, and 16 show the minimum, maximum, and average number of memory accesses locating the longest matching prefix for each input packet, respectively. From the minimum number of memory accesses shown in Figure 14, it is shown that search can be finished as soon as a match occurs in some algorithms, such as P-trie [17], BSR [18], BST-PV [8], L-BSL [27], and BSL-MT. In the case of the logW-E algorithm [29], the minimum number of

LIM and LEE: SURVEY AND PROPOSAL ON BINARY SEARCH ALGORITHMS FOR LONGEST PREFIX MATCH

TABLE XI H ASH TABLES FOR BINARY SEARCH ON LENGTHS IN MULTIPLE TRIES (a) Level 3 (first trie) Prefix/Internal Node Prefix 1 010 P1 0 110 0 111 (b) Level 2 (first trie) Prefix/Internal Node Prefix 1 00 P0 0 01 0 11 (c) Level 5 (first trie) Prefix/Internal Node Prefix 0 11010 1 11111 P6 (d) Level 6 (first trie) Prefix/Internal Node Prefix 1 110101 P3 (e) Level 3 (second trie) Prefix/Internal Node Prefix 0 110 1 111 P5 (f) Level 4 (second trie) Prefix/Internal Node Prefix 1 1101 P4 (g) Level 1 (third trie) Prefix/Internal Node Prefix 1 1 P2

memory access occurred for an input matching a leaf node in the first trie. This results in three memory accesses; the hash table of the root node, a 16th-level node, and the corresponding node of the binary trie. In all other algorithms, the search always finishes when a leaf node or the last level of access is encountered. In the worst-case number of memory access shown in Figure 15, the W-BSL and L-BSL algorithms provide the best performance, 6. These algorithms also provide very good scalability, since the performance does not depend on the amount of routing data. The logW-E algorithm [29] requires maximum two memory accesses in the binary trie, after searching kthlevel tries, and hence it shows 9 to 11 memory accesses in the worst-case, depending on the routing set. The BSL-MT algorithm yields the worst performance for large routing data sets, since it constructs the plural number of tries, depending on the number of prefix containments. BSR [18], B-trie [7], PC-trie [7], P-trie [17], BST-PV [8], and BST-SP provide a reasonable worst-case performance. They also show very good scalability. Note that the BST [22] and WBST [23] algorithms provide better worst-case performance than trie-based algorithms for small routing data sets, but worse performance for large routing data sets. The BST algorithm shows bad performance, especially for large routing data, such as Telstra, due to constructing an unbalanced tree. Large routing sets are expected to have an excessive number of prefix containments. As expected, WBST achieves better performance than the BST algorithm.

693

In the average search performance, shown in Figure 16, W-BSL [26] and L-BSL [27] achieve the best performance compared to other algorithms, if we assume perfect hash functions for each level of a given routing data set. The average number of memory access is less than 4. The logW-E [29] and BSL-MT algorithms achieve slightly worse performance than W-BSL and L-BSL but still achieve very good performance. BSR [18], BST-PV [8], and BST-SP achieve the next best performance, which is less than 18. PC-trie [7] and the Ptrie [17] improve the average search performance of the Btrie [7]. VII. D ISCUSSION Table XIII summarizes the characteristics of algorithms in terms of incremental updatability, scalability to large routing data, and performance after migration to IPv6. B-trie [7] and P-trie [17] provide incremental update; hence, the complexity of inserting new prefixes incrementally is low. The algorithms performing binary search on prefix values do not provide incremental update. In general, algorithms requiring pre-computation based on the distribution of a given routing data set do not provide incremental update. BSR [18] requires the pre-computation of BMPs in each entry. A new algorithm was recently proposed to provide incremental update to the BSR algorithm [19], but the data structure was modified significantly and the overall structure became very complex. W-BSL [26] requires the pre-computation of markers and BMPs, and L-BSL [27] requires leaf-pushing. Hence, they do not provide incremental update. BSL-MT is the only scheme providing incremental update among the algorithms performing binary search on prefix lengths. The characteristics of each algorithm regarding scalability to a large routing set that can have millions of prefixes, such as in default-free zone (DFZ) routers, can be speculated from the simulation results. Trie-based algorithms provide good scalability, since the memory requirement does not increase much and the search performance is not much degraded with an increasing number of prefixes. More levels of prefix containments are expected due to the growth of routing data; thus, performance degradation of algorithms performing binary search on prefix value, such as BST [22] and WBST [23] would be high due to the restriction that enclosure should be located in a higher level than enclosed prefixes. BSR and BSTPV provide very good scalability of the binary search on prefix value algorithms, since these algorithms construct a balanced search tree. W-BSL [26], L-BSL [27], and logW-E [29] algorithms provide very good scalability in search speed. The BSL-MT algorithm provides very good scalability in average search performance but not in the worst case search performance. The migration to IPv6 should be considered in terms of search performance degradation. Even though the depth of the P-trie [17] depends on the prefix length, most prefixes are expected to be stored as priority prefixes in IPv6; hence, search performance degradation is expected to be low. Algorithms performing binary search on prefix values will achieve good search performance for IPv6, since search performance is related to the number of prefixes, rather than prefix lengths.

694

IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 14, NO. 3, THIRD QUARTER 2012

TABLE XII M EMORY REQUIREMENT (M BYTE ) Routing Data Mae-West Mae-East PORT80 Grouptlcom Telstra

No. of Prefixes 14553 39464 112310 170601 227223

B-Trie 0.45 0.99 1.29 1.80 2.59

PC-Trie 0.28 0.76 1.51 2.28 3.34

P-Trie 0.14 0.39 1.07 1.67 2.22

BSR 0.17 0.46 0.99 1.50 2.00

BST 0.14 0.39 1.07 1.67 2.22

WBST 0.14 0.39 1.07 1.67 2.22

BST-PV 0.34 0.92 1.65 2.50 3.77

BST-SP 0.14 0.39 1.07 1.67 2.22

W-BSL 0.45 0.99 1.29 1.80 2.59

(a)

(b)

(c)

(d)

L-BSL 0.40 0.71 1.43 1.96 2.75

logW-E 1.01 2.58 3.03 4.36 5.70

BSL-MT 0.44 1.00 2.11 2.99 3.80

(e) Fig. 14. Minimum number of memory accesses for each algorithm. (a) Mae-West (14553 prefixes). (b) Mae-East (39464 prefixes). (c) PORT80 (112310 prefixes). (d) Grouptlcom (170601 prefixes). (e) Telstra (227223 prefixes).

However, in the case of the BST and WBST, depending on the number of prefix nesting levels, the search performance for IPv6 may not be good, since it may cause highly unbalanced trees. Even though the prefix levels can be increased up to 128 for IPv6, the search time of the algorithms performing the binary search on prefix lengths logarithmically increases. Hence, search performance is not much degraded. However, in the case of BSL-MT, the performance may not be good, since

an excessive number of tries may be constructed, depending on the number of prefix nesting levels. VIII. C ONCLUSION IP address lookup is becoming a major bottleneck in routers due to advances in communication link technologies, rapidly growing Internet traffic, exponentially growing routing tables,

LIM and LEE: SURVEY AND PROPOSAL ON BINARY SEARCH ALGORITHMS FOR LONGEST PREFIX MATCH

695

(a)

(b)

(c)

(d)

(e) Fig. 15. Worst case number of memory accesses for each algorithm. (a) Mae-West (14553 prefixes). (b) Mae-East (39464 prefixes). (c) PORT80 (112310 prefixes). (d) Grouptlcom (170601 prefixes). (e) Telstra (227223 prefixes).

and the increasing number of dynamically changed routes. The solution currently adopted in commercial routers using ternary content addressable memory (TCAM) confronts a scalability problem toward both the large routing data and the migration to IPv6. Many IP address lookup algorithms utilizing ordinary memories, such as SRAM or DRAM, have been proposed. In this paper, we described ten different previous algorithms and proposed two new algorithms in their data structures composing routing tables. We compared the characteristics based on pre-defined metrics. Simulations using real routing data from backbone routers were presented to show the performance of each algorithm in terms of search speed and memory requirement. The binary search on prefix length algorithms [26]-[30] achieve the best performance in overall metrics, such as search speed, scalability toward large routing data, and migration to IPv6. However, obtaining perfect hash

functions for each level of a given set of routing data is a pre-requisite of these algorithms. This is a time-consuming process. The search performance of the BST-PV [8] and BSR algorithms [18] are very good and scalable toward large routing data. However, the BSR algorithm is unable to provide incremental update of the routing table. Hence, the algorithm should be used for stable routing data. The P-trie algorithm [17] can be the best solution for dynamically changed routing data, since it provides reasonable search performance, optimized memory requirement, incremental update of the routing table, good scalability toward large routing data, and the migration to IPv6. Many IP address lookup algorithms have been recently proposed based on hashing and Bloom filter [33]-[35]. These algorithms can provide significant throughput by being imple-

696

IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 14, NO. 3, THIRD QUARTER 2012

(a)

(b)

(c)

(d)

(e) Fig. 16. Average number of memory accesses for each algorithm. (a) Mae-West (14553 prefixes). (b) Mae-East (39464 prefixes). (c) PORT80 (112310 prefixes). (d) Grouptlcom (170601 prefixes). (e) Telstra (227223 prefixes).

mented in hardware. Research on these algorithms is currently ongoing. Another research topic can be the investigation of algorithms devoted to IPv6 address lookup [35]. The IP address lookup problem is not only important itself, but in addition, the solutions can be used as a building block for the more general problem of packet classification [36][42] that is a basic function to provide value-added features, such as policy-based routing, quality of services, and security services via access control in Internet routers. ACKNOWLEDGMENTS The preparation of this survey paper would not have been possible without the simulation efforts of my students in SOC Design Lab. at Ewha Womans University. I am particularly grateful for Bomi Lee, Ju Hyoung Mun, A. G. Alagu Priya, Soohyun Lee, and Seh Won Min.

R EFERENCES [1] R. Perlman, “Interconnections: Bridges, Routers, Switches, and Interconnecting Protocols,” Addison Wesley, 2005. [2] G. Varghese, “Network Algorithmics,” Morgan Kaufmann, 2005. [3] H. Song, F. Hao, M. Kodialam, and T. V. Lakshman, “IPv6 Lookup Using Distributed and Load Balanced Bloom Filters for 100Gbps Core Router Line Cards,” IEEE INFOCOM, 2009, pp.2518–2526. [4] W. Eatherton, G. Varghese, and Z. Dittia, “Tree Bitmap: Hardware/Software IP Lookups with Incremental Updates,” ACM SIGCOMM Computer Communications Review, vol.34, no.2, pp.97–122, Apr. 2004. [5] D. E. Taylor, J. S. Turner, J. W. Lockwood, and T. S. Sproull, “Scalable IP Lookup for Internet Routers,” IEEE J. Sel. Areas Commun., vol.21, no.4, pp.522–534, May 2003. [6] http://www.cisco.com/en/US/products, Cisco CRS, 2007. [7] M. A. Ruiz-Sanchez, E. M. Biersack and W. Dabbous, “Survey and Taxonomy of IP Address Lookup Algorithms,” IEEE Networks, vol. 15, no. 2, pp. 8–23, March/April 2001. [8] H. Lim, H. Kim, and C. Yim, “IP Address Lookup for Internet Routers Using Balanced Binary Search with Prefix Vector,” IEEE Trans. Commun., vol.57, no.3, pp.618–621, Mar. 2009. [9] J. F. Kurose and K. W. Ross, “Computer Networking: A Top-Down Approach ,” Pearson Education, 2010.

LIM and LEE: SURVEY AND PROPOSAL ON BINARY SEARCH ALGORITHMS FOR LONGEST PREFIX MATCH

697

TABLE XIII C HARACTERISTICS OF EACH ALGORITHM Algorithms B-Trie PC-Trie P-Trie BSR BST WBST BST-PV BST-SP W-BSL L-BSL logW-E BSL-MT * Depending on the number of nesting levels

Complexity to provide incremental update Very low Medium Low Very high Very high Very high Very high Very high Very high Very high High Very low

[10] H. Song, J. Turner, and J. Lockwood, “Shape Shifting Tries for Faster IP Route Lookup,” IEEE ICNP, 2005, pp.358–367. [11] S. Sahni and K. Kim, “Efficient Construction of Multibit Tries for IP Lookup,” IEEE/ACM Trans. Netw., vol.11, no.4, pp.650–662, Aug. 2003. [12] K. Kim and S. Sahni, “Efficient Construction of Pipelined Multibit-Trie Router-Tables,” IEEE Trans. Comput., vol.56, no.1, pp.32–43, Jan. 2007. [13] R. Sangireddy and A. K. Somani, “High-Speed IP Routing with Binary Decision Diagrams Based Hardware Address Lookup Engine,” IEEE J. Sel. Areas Commun., vol.21, no.4, pp.513–521, Apr. 2003. [14] W. Lu and S. Sahni, “Recursively Partitioned Static IP Router-Tables,” IEEE Trans. Comput., vol.59, no.12, pp.1683–1690, Dec. 2010. [15] H. Lu, K. Kim and S. Sahni, “Prefix and Interval-Partitioned Dynamic IP Router-Tables,” IEEE Trans. Comput., vol.54, no.5, pp.545–557, May 2005. [16] H. Lim and J. Mun, “An Efficient IP Address Lookup Algorithm Using a Priority-Trie,” Proc. IEEE Globecom, 2006, pp.1–5. [17] H. Lim, C. Yim, and E. E. Swartzlander, Jr., “Priority Trie for IP Address Lookup,” IEEE Trans. Comput., vol.59, no.6, pp.784–794, Jun. 2010. [18] B. Lampson, V. Srinivasan, and G. Varghese, “IP Lookups Using Multiway and Multi-column search,” IEEE/ACM Trans. Netw., vol.7, no.3, pp.324-334, Mar. 1999. [19] S. Sahni and K. Kim, “An O(log n) Dynamic Router-Table Design,” IEEE Trans. Comp., vol.53, no.3, pp.351–363, Mar. 2004. [20] H. Lu and S. Sahni, “O(log n) Dynamic Router-Tables for Prefixes and Ranges,” IEEE Trans. Comp., vol.53, no.10, pp.1217–1230, Oct. 2004. [21] H. Lu and S. Sahni, “A B-tree Dynamic Router-Table Design,” IEEE Trans. Comp., vol.54, no.7, pp.814–824, Jul. 2005. [22] N. Yazdani and P. S. Min, “Fast and Scalable Schemes for the IP Address Lookup Problem,” Proc. IEEE High Performance Switching and Routing, 2000, pp.83–92. [23] C. Yim, B. Lee, and H. Lim, “Efficient Binary Search for IP Address Lookup,” IEEE Commun. Lett., vol.9, no.7, pp.652–654, July 2005. [24] H. Lim, B. Lee, and W. Kim, “Binary Searches on Multiple Small Trees for IP Address Lookup,” IEEE Commun. Lett., vol.9, no. 1, pp.75–77, Jan. 2005 [25] H. Lim, W. Kim, B. Lee, and C. Yim, “High-speed IP address lookup using balanced multi-way trees,” Computer Communications vol. 29, no.11, pp. 1927–1935, July 2006 [26] M. Waldvogel, G. Varghese, J. Turner, and B. Plattner, “Scalable High Speed IP Routing Lookups,” Proc. ACM SIGCOMM, 1997, pp.25–35. [27] J. H. Mun, H. Lim and C. Yim, “Binary Search on Prefix Lengths for IP Address Lookup,” IEEE Commun. Lett., vol.10, no.6, pp.492–494, June 2006. [28] K. Kim and S. Sahni, “IP Lookup by Binary Search on Prefix Length,” Proc. IEEE ISCC, 2003, pp.77–82. [29] R. Sangireddy, N. Futamura, S. Aluru, and A. K. Somani, “Scalable, Memory Efficient, High-Speed Algorithms for IP Lookups,” IEEE/ACM Trans. Netw., vol.13, no.4, pp.802–812, Aug. 2005. [30] N. Futamura, R. Sangireddy, S. Aluru, and A. K. Somani, “Scalable, memory efficient high-speed lookup and update algorithms for IP routing,” Proc. IEEE Computer Communications and Networks (ICCCN), Oct. 2003, pp.257–263. [31] V. Srinivasan and G. Varghese, “Fast Address Lookups Using Controlled Prefix Expansion,” Proc. ACM Sigmetrics, 1998, pp.1–11. [32] http://www.potaroo.net

Search performance degradation when applied to large routing data Very low Medium Low Very low High High Low High Very low Very low Very low Low in average, high in worst case

Search performance degradation when applied to IPv6 High Low Very low Very low May or may not be low* May or may not be low* Very low May or may not be low* Low Low Low May or may not be low*

[33] S. Dharmapurikar, P. Krishnamurthy, and D. Taylor, “Longest Prefix Matching Using Bloom Filters,” IEEE/ACM Trans. Netw., vol.14, no.2, pp.397–409, Feb. 2006. [34] H. Lim, J. Seo, and Y. Jung, “High Speed IP Address Lookup Architecture Using Hashing,” IEEE Commun. Lett., vol.7, no.10, pp.502–504, Oct. 2003. [35] K. Lim, K. Park and H. Lim, “Binary Search on Levels Using a Bloom Filter for IPv6 Address Lookup,” IEEE/ACM ANCS, 2009, pp.185–186. [36] H. Jonathan Chao, “Next Generation Routers,” Proc. IEEE, vol.90, no.9, pp.1518–1588, Sept. 2002. [37] V. Srinivasan, S. Suri, and G. Varghese, “Packet classification using tuple space search,” ACM SIGCOMM Computer Communication Review, vol.29, no. 4, pp.135–146, 1999. [38] H. Lim and S. Kim, “Tuple Pruning Using Bloom Filters for Packet Classification,” IEEE Micro, vol.30, no.3, pp.784–794, May/June 2010. [39] A.G. Alagu Priya and H. Lim, “Hierarchical Packet Classification Using a Bloom Filter and Rule-Priority Tries,” Computer Communications, vol.33, no.10, pp.1215–1226, Jun. 2010. [40] H. Lim, H. Chu, and C. Yim, “Hierarchical Binary Search Tree for Packet Classification,” IEEE Commun. Lett., vol.11, no.8, pp.689–691, Aug. 2007. [41] H. Lim, M. Kang, and C. Yim, “Two-dimensional Packet Classification Algorithm Using a Quad-Tree,” Computer Communications, vol.30, no.6, pp.1396–1405, Mar. 2007. [42] D. E. Taylor and J. S. Turner, “Classbench: A packet classification benchmark,” IEEE/ACM Trans. Netw., vol. 15, no.3, pp.499–511, Jun. 2007. Hyesook Lim (M’91) received the B.Eng. and the M.S. degrees in the Department of Control and Instrumentation Engineering from Seoul National University, Seoul, Korea, in 1986 and 1991, respectively. She got the Ph.D. degree from the University of Texas at Austin, Austin, Texas, in 1996. From 1996 to 2000, she had been employed as a member of technical staff at Bell Labs in Lucent Technologies, Murray Hill, NJ. From 2000 to 2002, she worked for Cisco Systems, San Jose, CA. She is currently a professor in the Department of Electronics Engineering, Ewha Womans University, Seoul, Korea. Her research interests include router design issues such as address lookup, packet classification, deep packet inspection, and data stream mining, and hardware implementation of various network algorithms.

Nara Lee received the B.Eng. degree in the Department of Electronics Engineering from Ewha Womans University, Seoul, Korea, in 2009, where she is currently pursuing the M.S. degree. Her research interests include various network algorithms such as IP address lookup and packet classification, and their hardware implementation.

Suggest Documents