Journal of High Speed Networks 13 (2004) 223–232 IOS Press
223
Efficient IP forwarding engine with incremental update Shuo-Cheng Hu a , Chia-Tai Chan b , Hung-Yi Chang c and Pi-Chung Wang d,∗ a Department of
Information Management, Ming-Hsin University of Science and Technology, Hsinchu 304, Taiwan, ROC E-mail:
[email protected] b Telecommunication Laboratories, Chunghwa Telecom Co., Ltd., Taipei 106, Taiwan, ROC E-mail:
[email protected] c Department of Information Management, I-Shou University, Kaohsiung 840, Taiwan, ROC E-mail:
[email protected] d Institute of Computer Science and Information Technology, National Taichung Institute of Technology, Taichung 404, Taiwan, ROC E-mail:
[email protected] Abstract. Nowadays, the commonly used table lookup scheme for IP routing is based on the so-called classless interdomain routing (CIDR). With CIDR, routers must find out the best matching prefix (BMP) for IP packets forwarding, which complicates the IP lookup. Since the IP lookup performance is a major design issue for the new generation routers, in this article we investigate the properties of the routing table and present a new IP lookup scheme. By using the proposed scheme, the size of the forwarding table can be compressed to 360 Kbytes for a large routing table with 58 000 routing entries. The data structure for the incremental update is also introduced by adding 40% storage. The new data structure, could accomplish a single route update within 100 ns. Even where route flaps impede lookup performance, the performance degrades by only 0.05% with 4000 route updates per second. Furthermore, this scheme is IPv6 scalable. Keywords: Routing-table, routing lookup, Internet
1. Introduction Since the introduction of WWW in 1992, the Internet has experienced an explosion of both traffic and users. To fulfill the requirement for large bandwidth demand, the provision of multi-gigabit links are commercially available. The key to improve the performance of the Internet lies in fast routers, which in turn lies on fast lookup schemes. A major obstacle to forward multiple millions of packets per second (MPPS) [1] for high performance router design is the slow, multiple-memory-access IP lookup process. IP routes are identified by a routing prefix, prefix length pair, where the prefix length varies from 1 to 32 bits with the deployment of CIDR in 1993. An IP address might match several prefixes in a routing table, so we refer to the best match prefix (BMP) as the valid route [2]. It may be time consuming to search the BMP, especially in a backbone router with a large number of table entries. To attain the goal of fast IP lookup, the following metrics should be considered simultaneously. – Routing-table size: Routing lookup performance can be improved significantly by reducing the routing-table size to fit into the high-speed SRAM. – Simplicity: The IP lookup process and the associated data structures should be simple to avoid search complexity, such as using routing interval [3] to avoid best matching problem. * Corresponding author.
0926-6801/04/$17.00 2004 – IOS Press and the authors. All rights reserved
224
S.-H. Hu et al. / Efficient IP forwarding engine with incremental update
– Fast incremental update: The BGP (Border Gateway Protocol) route update could be 100 times per second or more [4,5]. The routing-table update rate is less obvious but just as important to router performance. Since the router cannot keep up with all of the updates may trigger route flaps. It significantly causes poor network performance and degrades the overall efficiency of the Internet infrastructure. Therefore, the incremental update is a major issue for the backbone routers. In this paper, we propose a compact data structure based on the level compression trie. For a large routing table with 58 000 routing entries, the forwarding table only consumes 360 Kbytes. It can accomplish one IPv4 route lookup with a maximum of nine memory accesses. Our proposed scheme can also support the fast incremental update with extra 40% storage. It can accomplish each route update within 500 ns. Even where route flaps impede lookup performance, the performance degrades by only 0.05% with 4000 route updates per second. Furthermore, this scheme could support IPv6 routing lookup. The rest of this paper is organized as follows. Section 2 describes the related works. Section 3 then presents the proposed scheme. Section 4 describes the detail procedure of the route update. The experimental results are shown in Section 5, and finally, a concluding remark is given in Section 6.
2. Related works Recently, IP forwarding engine has been studied extensively to improve the routing lookup performance. The proposals include both hardware-based [6–8] and software-based [3,5,9–12] solutions, such as trie-based data structure, bit mapping, caching, and protocol exchanging, etc. Several typical schemes are briefly described in the following. – Small forwarding table: The main idea in [6] is to compress the trie nodes so that they can fit into the on-chip cache memory. It compresses the trie nodes by representing repeated elements in a node array for only once, and using the bitmap to specify the number of times an element gets repeated. By applying the restriction of trie levels and node compression, the routing table with 40 000 entries can be reduced to 150–160 Kbytes. For a lookup with hardware implementation, the maximum and minimum number of memory access is two and nine, respectively. However, updating the bitmap is difficult and the implicit use of leaf pushing also causes large insertion time. – Binary search on prefix lengths: Waldvogel et al. proposed a lookup scheme [10] based on a binary search mechanism that scales very well with increasing address and routing table size. The entries are stored in hash tables, and the algorithm exploits the fact that the entries are unevenly distributed. A lookup is performed as a binary search over the different prefix lengths. Thus five hash lookups are needed for IPv4, and seven for IPv6 (128-bits). – Controlled prefix expansion: Srinivasan et al. presented a data structure [9] based on multiple branching. The basic idea in the work of Srinivasan et al. is to reduce a set of arbitrary length prefixes into a predefined set of lengths by using the technique called “controlled prefix expansion”. Applying dynamic programming could minimize the required storage, but also makes the trie construction time consuming. – LC-trie: The LC-trie scheme [12] is based on implementing multibit tries. This scheme exhibits significantly less dependency on the specific distribution of addresses within IP routing tables. However, the array layout and the requirement for full subtries makes updates slow in the worst case. For example, deleting the root of the trie may cause a change in the subtries decomposition or the reconstruction for the trie. – Binary search on prefixes: By precomputing the best-matching prefix associated with a range, the scheme proposed in [11] does a binary search in a sorted array for the longest matching prefix problem. The multiway search tree exploits the fact that most processors prefetch an entire cache line when doing a memory access. By using six way branching, the worst case is five cache line fills in a Pentium Pro with a 32-byte cache line. However, the insertion/deletion of prefixes may result in a table reconstruction due to the recalculation of the pre-computed information.
S.-H. Hu et al. / Efficient IP forwarding engine with incremental update
225
3. Proposed scheme IP router mainly consists of a network processor and forwarding engine. The network processor executes the routing protocols, such as BGP and OSPF, and maintains a routing table. The forwarding engine employs a forwarding table to make the routing decision for packet forwarding. The forwarding table is a reduction of the routing table and contains an index of IP prefix associated with an outgoing interface. A router may have multiple forwarding engines by sharing a routing table. Once the routing table exchanges, the forwarding engine will correct forwarding table based on the update information from the network processor. This separation ensures that the routing instability does not impact the performance of packet forwarding engine. To improve the lookup performance, reducing the forwarding table size to fit into the high-speed SRAM is straightforward. Figure 1 shows a “complete” binary trie of routing prefixes which is constructed by adding the complementary child to the single-path child, as used in [6]. The gray circles are leaves, and the labels in the leaves are the best-matched prefixes. The “leaf-pushing” process can avoid the BMP problem [9]. Consequently, we present how to collect the trie information into the heavy-lined circles to reduce the forwarding table size and improve the lookup performance. 3.1. Data structure Before presenting the proposed scheme, two data structures is introduced, “Node” and “Pointer”. The detailed data structures are described in the following. – Node: Figure 2 shows the data structure of the node which consists of ID indicator, index and four leaves routes. The ID value is set to zero. For a subtrie with no greater than four leaves, each node has an 3-bit index to illustrate eight different leaves distributions, as illustrated in Fig. 3. When the lookup procedure reaches a node, the bit vector corresponding to the indexed leaves distribution is retrieved. Then we extract the next 3 bits of the IP address to find the mapped leaf. For example, if the extracted 3-bit’s value is 010, it corresponds to the first leaf and uses prefix P3 as route. – Pointer: Pointer is an intermediate node to guide the traversing of the binary trie and find the corresponding route. Like the pointer in Fig. 4, each pointer is a root of the subtrie with (M + 1) bits depth. When an IP
Fig. 1. Complete binary trie of routing prefixes.
226
S.-H. Hu et al. / Efficient IP forwarding engine with incremental update
Fig. 2. Data structure of the node.
Fig. 3. Illustration of the node vector.
Fig. 4. Illustration of the pointer vector.
Fig. 5. Data structure of the pointer.
address traversing reaches a pointer, the lookup procedure will extract the next M bits and find the corresponding node/pointer in the subtrie. Figure 5 shows the data structure of a pointer. The ID value is set to one. The vector size depends on the depth of the subtrie. We set M = 4, thus the vector is 24 = 16 bits.
S.-H. Hu et al. / Efficient IP forwarding engine with incremental update
227
3.2. Location and construction of the forwarding table To construct the forwarding table, we will label the nodes and pointers in the binary trie. The procedure is named as “location” whose pseudo code is given below.
Location algorithm Input: The complete binary trie build from the routing prefixes. Output: The root of the located binary trie. Locate(root) { If (Root!=NULL) { If (all the leaves bellowed the nodes are processed) Return; If (Root→Number_of_Leaves 4) { Root→Node = true; Process_leaves(Root→First, Root→Last); //set the leaves bellowed the Node processed Return; } Locate(Root→Left_Child); Locate(Root→Right_Child); IF((Root→CurrentBits mod M ) == 0) Root→Pointer = true; } } After locating the nodes and the pointers, we construct the forwarding table from the located binary trie. The construction procedure traverses all pointers in a top-down manner, explores the pointer subtries, and saves the nodes/pointers information of the subtrie in the forwarding table. The resulted vectors of pointers and nodes from Fig. 1 are shown in Fig. 6 (M = 3). The detailed algorithm is shown as below.
Construction algorithm Input: Located complete binary trie of routing prefixes. Output: Forwarding table. Global variable: ForwardingTable[] boundary = Root→CurrentBits + M ; Root = root of complete binary trie; Pointer(Root, boundary, data_array, vector){ If (Root!=NULL) { New_Node = TrackNode(Root);//Get the Node’s value Append New_Node to data_array; Set vector; } Else If( Root→Pointer == true ) { New_Pointer=Pointer(Root,boundary+M ,new_data_array,new_vector); //Explore the new Pointer subtree and get the new Pointer’s value; Append New_Pointer to data_array; Set vector; } Pointer(Root→Left_Child,boundary,data_array,vector); Pointer(Root→Right_Child,boundary,data_array,vector); If(all Nodes/Pointers in Root’s M -bits subtree have been traversed ){ Create a variable Pointer;
228
S.-H. Hu et al. / Efficient IP forwarding engine with incremental update
Fig. 6. The generated vectors of the nodes & pointers.
Save vector in Pointer’s Vector field; Address = address of ForwardingTable[]’s tail + 1; Save Address in Pointer’s Base field; Append data_array’s value to ForwardingTable[]; Return Pointer; } } By applying the proposed algorithms to the binary trie in Fig. 1, the forwarding table can be generated as shown in Fig. 6. Our proposed scheme is different from the existing schemes using level compression [9,12]. The level compression will generate the child nodes exhaustively according to the expansion bits while the proposed scheme only requires vital ones. 3.3. IP routing lookup algorithm To perform an IP lookup, the destination address is broken into chunks to traverse the trie until a node is fetched. Consequently, the mapped leaf is extracted from the node. This scheme could be implemented by software or hardware. The software design is simple and cheap while the hardware design could adopt pipelining and parallel techniques to further speed up forwarding.
4. Route update As the network topology changes, new routing information is disseminated among the routers, leading to changes in routing tables. As a result of the change, one or more entries must be added, updated or deleted from the table. The complete binary tree in our scheme can support the prefix update, thus the forwarding table can be adjusted. However the cost may be large because the reference to a prefix may be widely spread across in the complete binary tree due to the property of leaf-pushing [9]. To achieve the goal of fast incremental update, we must restrict the updates in the binary tree, and record the update information in the forwarding table. Before discussing route updates and relative procedures, we need to make some modifications of nodes and pointers. Then we will discuss the updates under three conditions: route change, route insertion, and route deletion.
S.-H. Hu et al. / Efficient IP forwarding engine with incremental update
229
Fig. 7. Data structure of the renewable node.
Fig. 8. Data structure of the renewable pointer.
4.1. Renewable data structures Each node in the complete binary tree has a best-matched prefix and the length of the best-matched prefix will be recorded in the forwarding table. Moreover, both the addresses of nodes and pointers in the forwarding table must be kept in the binary tree for incremental update. Figures 7 and 8 show the renewable data structures. A two-byte field for new route is inserted to both data structures, thus the storage is increased by 40%. Note that the length field is 7 bits for IPv6 (128-bit) scalability. This field is used to avoid exhausted updates incurred by the leaf pushing. If the prefix length equals to zero, it means there is no new route. 4.2. Route change Route change happens most frequently in the route update. When a prefix incurs a route change, we only modify the local area below the node which presents the prefix. If the updated prefix corresponds to a leaf, then the leaf field in the corresponding node is modified directly. Otherwise, the new route information is filled to both the pointer and the node. Figure 9 shows a portion of the complete binary trie. The binary trie is divided into multiple levels per M bits. If a prefix P1 has a new route R , only the nodes and the pointers in the level k covered by P1 need to be modified. Thus only the pointer PTRA and the node N1 are influenced by the update in Fig. 9. 4.3. Route insertion There are two situations in route insertion. If there is any leaf below the inserted prefix node in the complete binary trie, this is named as route insertion (I). In this case, the procedure of the route insertion is the same as that of route change. In the second case, route insertion (II), new pointers/nodes will be generated according to the length of the inserted prefix. A subtrie will be constructed under the leaf which is closest to the inserted prefix. Consequently, the pointers or nodes are generated if necessary. We use an example to explain the procedure. Suppose that a new prefix P5 is inserted. The corresponding leaf of P5 is under the leaf of P2 . The length difference between P5 and P2 is 4 bits. Figure 10 shows the procedure of the route insertion. The node N3 is converted to PTR3 , and two new nodes, N7 and N8 , are inserted for the unchanged leaves. Then the pointer PTR4 and the nodes, N9 and N10 , are generated for the subtrie beyond M bits. Since the data of forwarding table are saved sequentially and the update scheme does not release the memory, the memory utilization of the proposed scheme is 100%.
230
S.-H. Hu et al. / Efficient IP forwarding engine with incremental update
Fig. 9. A portion of the complete binary trie.
Fig. 10. Route insertion.
4.4. Route deletion The route deletion procedure is identical to the route change procedures. We need only to change the route of the deleted prefix to the route of the matched sub-prefix, and change the length of updated nodes in the binary tree to the length of the sub-prefix. 4.5. Modified IP lookup The IP lookup procedure is modified to support the new data structure. During the IP lookup procedure, we keep the length of the best-matched prefix and the new route of the traversed nodes/pointers in the scoreboard if the nodes/pointers’ length is larger than zero.
S.-H. Hu et al. / Efficient IP forwarding engine with incremental update
231
5. Performance evaluation To illustrate the performance with software implementation, we choose a 300-MHz Pentium II with a 512Kbytes L2 cache and runs Windows 2000 for the experiments. The real world IPv4 routing tables from the IPMA [13] projects are used as basis for comparison; these data provides a daily snapshot of the routing tables used by some major Network Access Points (NAPs). However, the routing tables available in IPv6 contain only hundreds of prefixes. Thus the performance using IPv6 can only be estimated and cannot be determined experimentally. In Table 1, the performance of the proposed scheme without considering the update is presented. For a large routing table with more than 58 000 entries, the size is small enough to fit into the high speed SRAM. While implemented in software, the worst-case lookup time is less than 500 ns. This is because of the time-consuming bit counting. If hardware implementation is adopted, the lookup performance can be improved to 45 ns with 5-ns SRAM. The memory requirement would be increased by 40% to support update. In our experiments, the required storage for routing table with 58 000 prefixes is 504 Kbytes. Table 2 lists the accessed entries under different update situations. 24 entries will be accessed in the worst case. Based on the 5 ns per memory access in SRAM, each route update will take less than 120 ns. As a result, the performance degradation can be as low as 0.05% even under the circumstance with 4000 route updates per second. To estimate the performance in IPv6, the IPv4 routing tables are projected to IPv6 directly. Since both the storage requirement and the lookup performance of the proposed scheme is proportionate to the address length, the required storage would be 1440 Kbytes and the lookup time with software implementation would be 1920 ns. Obviously, the software-based implementation is not fast enough. If the hardware design is adopted, the lookup time is improved to 165 ns, enabling 6 MPPS to be achieved. A pipelining hardware could further enhance the lookup performance. The worst case update time is 475 ns, thus the performance is reduced by 0.19% with 4000 updates per second. In Table 3, we compare other existing algorithms with ours. The results of previous works have been scaled to 300 MHz CPU to ease the comparison. Since the worst-case lookup time of LC-trie [12] is not addressed in the literature, we fill it with the average lookup time. The lulea scheme consumes less storage than the proposed scheme due to the ultimate compression; however, the information reorganization also makes the update difficult. The multibit trie [9] provides fastest lookup performance, but it needs a complex dynamic programming to minimize the storage. The other schemes require changing the complete data structure during a route update, thus Table 1 Performance evaluation with five routing tables Database
Routing prefixes
Memory requirement
Maximum lookup time
AADS
24 770
208
440
Mae-East
58 101
360
480
Mae-West PacBell
36 943 32 388
270 249
451 441
Paix
13 395
128
440
Table 2 Number of accessed entries Update events
Accessed entries Worst case
Route change,
Route insertion (II)
delete and insertion (I)
LD 3
LD > 3
LP /4 + 16 24 (LP = 29)
LN /4 + 3 11
LN /4 + 3(LD − 3)/4 + 1 23 (LN = 1, LP = 32)
LP is the length of the updated prefix. LN is the length of the existing node upon the inserted prefix. LD is the length difference between LP and LN .
232
S.-H. Hu et al. / Efficient IP forwarding engine with incremental update Table 3 Comparison with other existing works Existing schemes
Worst case
Memory
lookup time
requirement
IPv6
1650 650
3262 1600
χ √
Lulea scheme [6]
409
160
6-way search tree [11] LC-trie [12]
490 500
960 464
χ √ √
Multibit trie [9] Proposed scheme
236 440
640 378
Patricia trie [12] Binary search on hash tables [10]
Supporting Update √ χ χ
√
χ χ √
√
√
feature slow insertion/deletion performance. To our knowledge, only the proposed scheme could support feasible implementation while supports incremental update and IPv6. 6. Conclusions In this paper, we present how an IP routing table can be succinctly represented and efficiently searched by compressing the complete binary tree of routing prefixes. By traversing the necessary nodes in the complete binary tree, our data structure is simple with high memory utilization (100%) and not based on any ad hoc assumptions about the distribution of the prefix lengths in routing tables. For the routing table with more than 58 000 entries, the size of the forwarding table is smaller than 360 Kbytes. Also, each address lookup can be accomplished within nine memory accesses and 20 MPPS can be achieved within a 5-ns SRAM. We also present how to support incremental update by localizing of updates in the forwarding table. By adding 40% storage, the proposed scheme can accomplish a single route update within 120 ns for IPv4. Even with the extra storage, the required memory of the proposed scheme is still small as compared to the existing schemes. This is benefited from the avoidance of exhausted branch expansion. Even where route flaps impede lookup performance, the performance degrades by only 0.05% with 4000 route updates per second. We also address the IPv6 scalability by estimating the performance. The superiority of the fast update makes the proposed scheme viable in the instable networks. References [1] C. Partridge et al., A 50-Gb/s IP Router, IEEE/ACM Trans. on Networking 6(3) (1998), 237–248. [2] S. Keshav and R. Sharma, Issues and trends in router design, IEEE Communications Magazine 36(5) (1998), 144–151. [3] P.C. Wang, C.T. Chan and Y.C. Chen, Performance enhancement of IP forwarding by using Routing Interval, IEEE/KICS Journal of Communications and Networks 3(4) (2001), 374–382. [4] C. Labovitz, G. Malan and F. Jahanian, Internet routing instability, IEEE/ACM Trans. on Networking 6(5) (1998), 515–528. [5] P.C. Wang, C.T. Chan and Y.C. Chen, A fast table update scheme for high-performance IP forwarding, IEICE Transactions on Communications E85-B(1) (2002), 318–324. [6] M. Degermark, A. Brodnik, S. Carlsson and S. Pink, Small forwarding tables for fast routing lookups, in: ACM SIGCOMM, 1997, pp. 3– 14. [7] S. Sikka and G. Varghese, Memory-efficient state lookups with fast updates, in: ACM SIGCOMM, 1997, pp. 335–347. [8] P.C. Wang, C.T. Chan and Y.C. Chen, A fast IP lookup scheme for high-speed networks, IEEE Communications Letters 6(3) (2001), 125–127. [9] V. Srinivasan and G. Varghese, Fast IP lookups using controlled prefix expansion, ACM Trans. on Computers 17 (1999), 1–40. [10] M. Waldvogel, G. Varghese, J. Turner and B. Plattner, Scalable high speed IP routing lookups, in: ACM SIGCOMM, 1997, pp. 25–36. [11] B. Lampson, V. Srinivasan and G. Varghese, IP lookups using multiway and multicolumn search, IEEE/ACM Trans. on Networking 7(4) (1999), 323–334. [12] S. Nilsson and G. Karlsson, IP-address lookup using LC-Tries, IEEE JSAC 17(6) (1999), 1083–1092. [13] Merit Networks Inc., IMPA Project.