Scalable IP Routing Lookup in Next Generation Network Chia-Tai Chan1 , Pi-Chung Wang1 , Shuo-Cheng Hu2 , Chung-Liang Lee1 , and Rong-Chang Chen3 1
Telecommunication Laboratories, Chunghwa Telecom Co., Ltd. 7F, No. 9 Lane 74 Hsin-Yi Rd. Sec. 4, Taipei, Taiwan 106, R.O.C. {ctchan,abu,chlilee}@cht.com.tw 2 Department of Info. Management Ming-Hsin University of Science and Technology 1 Hsin-Hsing Rd. Hsin-Fong, Hsinchu, Taiwan 304, R.O.C.
[email protected] 3 Department of Logistics Engineering and Management National Taichung Institute of Technology No. 129, Sec. 3, Sanmin Rd., Taichung, Taiwan 404, R.O.C.
[email protected]
Abstract. Ternary content-addressable memory has been widely used to perform fast routing lookups. It is able to accomplish the best matching prefix problem in O(1) time without considering the number of prefixes and its lengths. As compared to the software-based solutions, the Ternary content-addressable memory can offer sustained throughput and simple system architecture. It is attractive for IPv6 routing lookup. However, it also comes with several shortcomings, such as the limited number of entries, expansive cost and power consumption. Accordingly, we propose an efficient algorithm to reduce the required size of Ternary contentaddressable memory. The proposed scheme can eliminate 98 percentage of Ternary content-addressable memory entries by adding tiny DRAM. We also address related issues in supporting IPv6 anycasting.
1
Introduction
Due to the advance of the World Wide Web and the promise of future ecommerce, it has shown that the Internet access continues to grow exponentially. The Internet has been facing the depletion of IPv4 address. Network administrators must increasingly rely on network address translation (NAT) technologies to deploy network. However, it complicates the network management and breaks the end-to-end principle of the Internet. Some applications cannot work across a NAT device, such as IPsec. Furthermore, the Internet hosts are no longer just computers, but a whole new range of information appliances, that will require global IP addresses. All these issues are the main driving force of IPv6 for its large address space. For example, the current IPv6 address allocation policy recommendation is to H.-K. Kahng (Ed.): ICOIN 2003, LNCS 2662, pp. 46–55, 2003. c Springer-Verlag Berlin Heidelberg 2003
Scalable IP Routing Lookup in Next Generation Network
47
allocate a 48-bit prefix to every site on the Internet, whether homes, small offices, or large enterprise sites. The 48-bit prefix allows 65,000 subnets within each site, each of which could accommodate a virtually infinite number of hosts. IPv6 also brings such benefits as stateless auto-configuration, more efficient mobility management and integrated IPsec. The major obstacle for the design of the high-speed router is the relatively slow IP lookup scheme. To forward packets toward their destinations, a router must perform forwarding decision, the next hop for the incoming packet, based on the information gathered by the routing protocols. Since the development of CIDR in 1993 [1], IP routes have been identified by a (route prefix, prefix length) pair, where the prefix length varies from 1 to 32 bits. Due to the variable prefix lengths, the search of best match prefix (BMP) may be time consuming for a backbone router with a large number of table entries. The exponential growth of the Internet hosts has further stressed the routing system. It is difficult for the packet-forwarding rate to keep up with the increased traffic demand. Specifically, the address lookup operation is a major bottleneck in the forwarding performance of today’s routers. Ternary content-addressable memory (TCAM) is one popular hardware device to perform fast IP lookups. As compared to the software-based solutions, the TCAM can offer sustained throughput and simple system architecture, thus makes it attractive for IPv6 routing lookup. However, it also comes with several shortcomings, such as the limited number of entries, power consumption and expansive cost. Specifically, it will need fourfold size of TCAM to process the IPv6 address (128-bit) with identical IPv4 entries. In this article, we propose an efficient algorithm to eliminate 98 percentage of TCAM entries by adding tiny DRAM and also address related issues in supporting IPv6 anycasting. The rest of the paper is organized as follows. Firstly, the related algorithms are introduced in Section 2. Section 3 presents the proposed algorithm. The experiment results are presented in Section 4. Finally, a summary is given in Section 5.
2
Related Works
There has been an extensive study in constructing the routing tables during the past few years. The proposals include both hardware and software solutions. In [2], Degermark et al. use a trie-like data structure. The main idea of their work is to quantify the prefix lengths to levels of 16, 24 and 32 bits and expand each prefix in the table to the next higher level. It is able to compact a large routing table with 40,000 entries into a table with size 150-160 Kbytes. The minimum and maximum numbers of memory accesses for a lookup are two and nine, respectively in hardware implementation. Gupta et al. presented fast routinglookup schemes based on a huge DRAM [3]. The scheme accomplishes a routing lookup with the maximum of two memory accesses in the forwarding table of 33 megabytes. By adding an intermediate-length table, the forwarding table can be reduced to 9 megabytes; however, the maximum number of memory accesses
48
Chia-Tai Chan et al.
for a lookup is increased to three. When implemented in a hardware pipeline, it can achieve one route lookup every memory access. This furnishes 20 million packets per second (MPPS) approximately. Huang et al. further improve it by fitting the forwarding table into SRAM [4]. Regarding software solutions, algorithms based on tree, hash or binary search have been proposed. Srinivasan et al. [5] present a data structure based on binary tree with multiple branches. By using a standard trie representation with arrays of children pointers, insertions and deletions of prefixes are supported. However, to minimize the size of the tree, dynamic programming is needed. In [6], Karlsson et al. solve the BMP problem by LC tries and linear search. Waldvogel et al. propose a lookup scheme based on a binary search mechanism [7]. This scheme scales very well as the size of address and routing tables grows. It requires a worst-case time of log2 (address bits) hash lookups. Thus, five hash lookups are needed for IPv4, and seven for IPv6 (128-bit). The software-based schemes can be further improved by using multiway and multicolumn search techniques [8]. Although these approaches feature certain advantages, however, they either use complicated data structures [1, 3, 5, 8] or are not scalable for IPv6 [2, 3, 6].
3 3.1
IPv6-aware Router Design Router Architecture
Figure 1 schematically depicts the architecture of the IPv6-aware router. With the advent of the high-speed switch capacity and increased traffic volume, distributed routing architecture has became a significant improvement to achieve high capacity in router design. It mainly consists of a network processor module and forwarding engines interconnected to the switching fabric. The network processor executes the routing protocols, such as BGP and OSPF, and maintains a routing table (RT). In each line card, the forwarding engine employs a forwarding table (FT) to make the routing decision for packet forwarding. The forwarding table is derived from the routing table and contains an index of IP prefix associated with an outgoing interface. This separation is used to ensure the routing instability does not impact the performance of packet forwarding engine. The conceptual configuration of the FE is shown in Figure 2. For an incoming IP packet, the header verification, header update, and route-lookup process (based on the destination IP address) are initiated simultaneously. If the IP header is not correct, the packet is dropped, and the lookup is terminated. Otherwise, the packet header is updated (TTL decrement and checksum update), and the route-lookup process provides the next hop (port number) where the packet should be forwarded. The MAC-address substitution module then substitutes the source-MAC address and the destination-MAC address of the packet before it is forwarded into the interface port. The bottleneck of the FE is the route lookup process, so that our study focuses on the design of a fast and scalable IP lookup scheme.
Scalable IP Routing Lookup in Next Generation Network
49
Network Processor FT
LINE CARD 1
FE
FE
¯¯ FE
RT
LINE CARD 1
FE
LINE CARD 2
FE
¯
¯¯ LINE CARD N
FE
LINE CARD 2
¯ LINE CARD N
Switching Fabric
Fig. 1. Architecture of the IPv6-aware router
3.2
TCAM Entry-Reduction Algorithm
With the compelling technical advantages, TCAM based networking devices become a preferred solution for fast, sophisticated IP packet forwarding. The original of the trend is in humble beginnings. The first binary Content Addressable Memories brought to market in the early 1990s and suffered from various limitations. It was too expensive and the software based look-up solutions were adequate to modest traffic-forwarding loads at that time. However, as faster line rates began choking the legacy table lookup solutions, CAM based design became increasingly attractive due to their massively parallel and highly deterministic search characteristics to rapidly perform IP lookup. The introduction of TCAM opened new possibilities, particularly for the BMP problems. TCAM devices are able to prioritize search results in such a way that multiple search matches, corresponding to different prefix lengths, could be resolved in accordance with BMP requirements. We use an example to explain the TCAM operation, as shown in Figure 3. There are six routing prefixes expressed as the form < pref ix/pref ixlength >. For an incoming packet with destination address ”140.113.215.207”, the TCAM performs searching within all the prefixes in parallel. Several prefixes may match the destination address (P1
IP Header
TTL Decrement and Checksum Update
Header Verification
IPv4/IPv6 Route Lookup
MAC Address Substitution
Fig. 2. Function Components of the IPv4/IPv6 forwarding engine
Chia-Tai Chan et al.
Destination IP Address 140.113.215.207
Prefix
Nexthop
(P1) 140.113.215/24 (P2) 140.113.147/24 (P3) 120.3.96/20
210.19.4.33 210.19.4.10 188.3.20.3
(P4) 140.113/16 (P5) 96.40/16
210.19.4.10 300.1.1.3
(P6) 12/8
8.2.100.9
Match Indicate
Priority Encoder
50
(P1) 140.113.215/24 210.19.4.33
TCAM Entries
Fig. 3. Solve the BMP problem by sorting the prefixes in decreasing order of lengths
and P4 in this case). A priority encoder then selects the longest matching entry as the result. Even though the application of TCAM technology is growing gradually, it still comes with some limitations. It is clear that TCAM operates with lower clock rate, much higher power consumption/price and larger package as compared with SRAM. In next generation network, the TCAM will be suffering from a limited number of entries. Table management is another issue of TCAM. As described above, the prefixes in the TCAM are listed in sorted order. However, forwarding tables in routers are dynamic; prefixes can be inserted or deleted due to the changes in network status. These changes can occur at the rate as high as 1,000 prefixes per second [9], and hence it is desirable to obtain quick TCAM updates and keep the incremental update time as small as possible. Forwarding table updates complicate the keeping of sorted prefixes in the TCAM. This issue is explained with the example in Figure 3. Assume that a new prefix < 120.3.128/18 > is inserted into the forwarding table. It must be stored between prefixes P3 (< 120.3.96/20 >) and P4 (< 140.113/16 >) to maintain the sorted order. In the worst case, the inserted prefix is compared to all existed prefixes. Furthermore, inserting the prefix in its correct place involve moving other elements. The naive solution is not efficient for a forwarding engine with large amount of entries. Another solution is to keep few locations for following updates. In [9], Gupta et al. proposed a PLO OPT algorithm. By keeping all the unused entries in the center of the TCAM, each prefix insertion/deletion will cause prefixes swap between different lengths. Hence, the worst-case update time is W/2. The reduction of TCAM entries can improve the TCAM performance in terms of power consumption, price and board usage. Consequently, we introduce a novel TCAM entry-reduction algorithm. The new scheme can reduce the TCAM entries, and also make the current TCAM to support large IPv6 routing table. The basic idea is to merge multiple routing prefixes into one representative prefix. The generated prefixes are inserted into the TCAM and the original multiple prefixes are recorded in the extra memory, such as DRAM. The routing
Scalable IP Routing Lookup in Next Generation Network
51
Binary Tree Representation
Prefixes Default Route P1 00000 P2 00001 P3 001 P4 01 P5 010000 P6 010001 P 1 P7 0101 P8 0111 P9 11
Default Route
P4
P9
P3 P2
P7
P5
P8
P6
Fig. 4. Representation of the routing prefixes with binary tree
lookup consists of one TCAM and one DRAM access. Although one extra memory access is required, it can use pipeline design to alleviate the performance degradation. However, the system design cost can be reduced dramatically since DRAM is much cheap and power saving. To begin with the proposed algorithm, we have to construct the binary tree from the routing prefixes. Assume there are 10 prefixes in the routing table, including the default route. The TCAM will need ten entries to record the complete routing information. The binary tree is constructed according to the bit stream of the routing prefixes, as shown in Figure 4. From the binary tree, we can divide the tree into multiple subtrees. Let the height of the subtrees equal to 2, the binary tree can be divided into four
Binary Tree Representation Default Route
P4
P9
P3 P1
P2
P7
P5
P8
P6
Fig. 5. Four subtrees with a height of two-bit
52
Chia-Tai Chan et al.
subtrees, as shown in Figure 5. Each routing prefix will be contained in at least one subtree. Consequently, the four bit streams corresponding to the roots of the subtree are inserted into TCAM, rather than the original prefixes. The detailed algorithm of the subtree construction is listed below. It selects the subtree from the leaves of binary tree. It is because that each leaf (i.e., routing prefix) must be covered. The bottom-up construction can obtain minimum number of subtrees. Subtree Construction Algorithm: Input:The root of the constructed binary tree from the routing table Output:The constructed subtrees and the corresponding bit stream Constructor (Parent, Current.Root, Current.Depth); Begin If ((the depth of the deepest child in the subtree - Current.Depth) == H) Generate Subtree (Parent, Current.Root); Else Constructor(New.Parent,Current.Root− >Left.Child,Current.Depth+1); Constructor(New.Parent,Current.Root− >Right.Child,Current.Depth+1); If ((the length of the longest unprocessed prefix - Current.Depth) == H) Generate Subtree (Parent, Current.Root); Endif End
Routing Lookup: The routing lookup procedure is divided into two parts. The first one is to find out the best match ”subtree”. The second step is to extract the best match prefix within the subtree. Since the size of the subtree is much smaller, we can expand it completely and put it into the DRAM. In our example, each subtree will be expanded to four entries. The entries of TCAM will indicate their corresponding addresses of subtrees, as shown in Figure 6. Consequently, the follow-up two bits behind the length of matched bits are used to select the coincident entry. For example, we perform the search for address 0101. It will match the P3 in the TCAM whose length is 2 and corresponding subtree is S2. Consequently, the third and fourth bits (’0’ and ’1’ respectively) indicate the second entry in the S2 and the best matching prefix is P2. Route Update: Since the original routing prefixes have been encapsulated in the subtrees, each route update must enquire the best matching subtree to check whether the updated prefix can be covered. If yes, the related entries in the DRAM will be modified. Otherwise, the routing prefix is inserted into the TCAM directly. After accumulating a certain number of such entries, we can rebuild the subtrees to keep the number of TCAM entries few. The worst-case update time is equal to max(W/2, 2height of subtree ). Routing Lookup for IPv6 Anycasting: Anycasting is a new addressing scheme in IPv6. Each anycast address could indicate a set of servers. Any con-
Scalable IP Routing Lookup in Next Generation Network
Prefix
Nexthop
(P1) 0100/4
S4
(P2) 00/2 (P3) 01/2
S3 S2
(P4) */0
S1
53
TCAM Entries
00 01 10 11
Default
P4
P1
P5
P4 Default
P7 P4
P2 P3
P6 P4
P8
P3
P4
P9
DRAM Entries
Fig. 6. Reorganize the binary tree into four subtrees with a height of two-bit
nection to the anycast address will be routed to the nearest server. This can help to balance the server load and improve network resiliency. A route to an anycast address can be treated as a host route. Thus the routing lookup for anycasting is exact-matching, rather than best-matching. We could handle the anycast addresses by using an additional CAM (or hash table). The routing lookup for each packet is performed for both TCAM and CAM while the result of CAM has higher priority..
4
Performance Evaluation
Through experiments, we demonstrate that the proposed scheme features much lower TCAM entries with less extra DRAM. Currently, the IPv6 routing tables consists only a few hundred prefixes. To realize the scalability of the proposed scheme, we use the real data available from the IPMA [10] and NLANR [11] projects for comparison, these data provide a daily snapshot of the routing tables used by some major Network Access Points (NAPs). The major performance metrics include the number of generated TCAM entries and the required DRAM storage. Figure 7 shows the process results for different routing tables. We set the height of the subtree as four and eight. For the largest table with 102271 prefixes (NLANR), it generates 30,260 and 6,010 TCAM entries accompanying with 473 Kbytes and 1.5 Mbytes DRAM, respectively. Intuitively, the number of TCAM entries is reduced with larges subtree, but the required DRAM is also increased as well. Both performance metrics are proportionate to the number of prefixes. By changing the height of the subtree, the TCAM entries and the required DRAM can be adjusted according to the practical environments. It has been observed that there are a large amount of single-path prefixes in the routing tables [6]. The sparseness caused by the single-path prefixes will make the subtree construction un-efficient. We can adopt the path compression
54
Chia-Tai Chan et al. 40,000 35,000
1,600 TCAM Entries (H=8)
1,400
30,000
Memory Required (H=8)
1,200
Memory Required (H=4) 25,000
1,000
20,000
800
15,000
600
10,000
400
5,000
200
0 10,000 20,000
Required DRAM (KBytes)
Number of TCAM Entries
TCAM Entries (H=4)
0 30,000 40,000
50,000 60,000 70,000
80,000 90,000 100,000 110,000
Number of Prefixes
Fig. 7. The performance metrics for different routing tables
to eliminate the single-path prefixes before the subtree construction. As shown in Figure 8, the performance is improved significantly. The TCAM entries can be reduced to as low as 3,536 entries with only 884 Kbytes DRAM (Subtree Height = 8) which shows a dramatically improvement. Note that the lookup result may not be the matched prefixes due to the path compression, thus an extra memory access to its shorter prefix would be necessary.
1,000 TCAM Entries (H=8)
20,000
TCAM Entries (H=4)
Number of TCAM Entries
Memory Required (H=8) 16,000
Memory Required (H=4)
900 800 700 600 500
12,000
400 8,000
300
Required DRAM (KBytes)
24,000
200
4,000
100 0 10,000 20,000
0 30,000 40,000
50,000 60,000 70,000
80,000 90,000 100,000 110,000
Number of Prefixes
Fig. 8. The performance metrics for different routing tables.(Subtree Height = 4, 8; Path Compression Enabled)
Scalable IP Routing Lookup in Next Generation Network
5
55
Conclusion
This study investigates the related issues in TCAM-based FE design. To make use of the TCAM in IPv6 routing lookup, we need a more efficient approach. The proposed algorithm reduces the number of TCAM entries by merging routing prefixes into subtrees according to the associative positions. The subtrees’ roots and their interior information are stored in the TCAM and DRAM, respectively. Each routing lookup procedure consists of one TCAM and one DRAM access which can proceed in pipelining. By adjusting the height of subtrees, we can decide the number of generated TCAM entries and the required DRAM size. If the technique of path compression is applied, both the required TCAM and DRAM can be further reduced. But the scheme will need one extra memory access due to possible ”incorrect” match. In the experiments, we illustrate various performance metrics with different settings. In the best results, the proposed algorithm can eliminate 98 percentage of TCAM entries with adding only 2.2 Mbytes DRAM. We also discuss the process for anycast addresses. We believe that this scheme would simplify the design of the IPv6 routers by alleviating the TCAM cost dramatically.
References [1] Y. Rekhter, T. Li, ”An Architecture for IP Address Allocation with CIDR.” RFC 1518, Sept. 1993. 47, 48 [2] M. Degermark, A. Brodnik, S.Carlsson, and S. Pink. ”Small Forwarding Tables for Fast Routing Lookups,” In Proc. ACM SIGCOMM ’97, pages 3-14, Cannes, France, Sept. 1997. 47, 48 [3] P. Gupta, S. Lin, and N. McKeown, ”Routing Lookups in Hardware at Memory Access Speeds,” In Proc. IEEE INFOCOM ’98, San Francisco, USA, March 1998. 47, 48 [4] N. F. Huang, S. M. Zhao, and J. Y. Pan, ”A Fast IP Routing Lookup Scheme for Gigabit Switch Routers,” In Proc. IEEE INFOCOM ’99, New York, USA, March 1999. 48 [5] V. Srinivasan and G. Varghese: Fast IP lookups using controlled prefix expansion. ACM Trans. On Computers, Vol. 17. (1999) 1-40. 48 [6] S. Nilsson and G. Karlsson, ”IP-Address Lookup Using LC-Tries,” IEEE JSAC, 17(6):1083-1029, June 1999. 48, 53 [7] M. Waldvogel, G. vargnese, J. Turner, and B. Plattner, ”Scalable High Speed IP Routing Lookups,” In Proc. ACM SIGCOMM ’97, pages 25-36, Cannes, France, Sept. 1997. 48 [8] B. Lampson, V. Srinivasan and G. Varghese, ”IP Lookups Using Multiway and Multicolumn Search,” IEEE/ACM Trans. On Networking, 7(4):324-334, June 1999. 48 [9] D. Shah and P. Gupta, ”Fast updating Algorithms for TCAMs,” IEEE Micro Mag., 21(1):36-47, Jan.-Feb. 2001. 50 [10] Merit Networks, Inc. Internet Performance Measurement and Analysis (IPMA) Statistics and Daily Reports. See http : //www.merit.edu/ipma/routingt able/. 53 [11] NLANR Project. See http : //moat.nlanr.net/. 53