502
IEEE COMMUNICATIONS LETTERS, VOL. 7, NO. 10, OCTOBER 2003
High Speed IP Address Lookup Architecture Using Hashing Hyesook Lim, Member, IEEE, Ji-Hyun Seo, and Yeo-Jin Jung, Student Member, IEEE
Abstract—One of the most important design issues for IP routers responsible for datagram forwarding in computer networks is route-lookup mechanism. In this letter, we explored a practical IP address lookup scheme which converts longest prefix matching problem into the exact matching problem. In the proposed architecture, the forwarding table is composed of multiple SRAMs, and each SRAM represents an address lookup table in a single prefix. Hashing functions are applied to each address lookup table in order to find out matching entries in parallel, and the entry matched with the longest prefix among them is selected. Simulation using data from MAE-WEST router shows that a large routing table with 37 000 entries is compacted to a forwarding table of 189 kbytes in the proposed scheme and achieves one route lookup every two memory accesses in average.
exact matching [2]. However it is known that the hashing is not easily applied for the longest prefix matching problem since it does not accommodate the hierarchy existed in IP address. In this letter, we explore converting the longest prefix matching problem of IP address lookup into the exact matching problem and apply hashing for IP address lookup. This letter is composed as follows. Section II summarizes the existing IP address lookup schemes. In Section III, we show our proposed scheme and the hardware architecture. Section IV evaluates the performance of the proposed scheme using actual data example in MAE-WEST router. Brief conclusions are made in Section V.
Index Terms—Address lookup, best matching prefix, parallel hashing, longest prefix matching.
II. EXISTING SCHEMES
I. INTRODUCTION
D
ATAGRAM forwarding is one of the most challenging tasks that should be performed in internet routers defined in RFC 1812 [1]. Forwarding an IP datagram generally requires a router to choose a relevant interface going to a next-hop router or a destination host depending upon a route database called forwarding table. Using the hierarchical architecture of IP addresses, IP routers forward datagrams based simply on network part, until packets arrived at destination network. The network part of IP address is called prefix. In classless inter-domain routing (CIDR) scheme, prefix lengths in IP addresses are not fixed, and hence arbitrary lengths of prefixes is used. When a datagram is entered, IP router finds entries agreed with the incoming datagram’s destination address and selects an entry with the longest prefix. The selected entry provides the output interface to which the datagram should be sent. This is called longest prefix matching, or best matching prefix (BMP). Since the destination address of an arriving datagram does not carry the prefix length information, routers need to search among the space of all prefix lengths as well as the space of all prefixes of a given length. Hence the longest prefix matching is more complicated to implement than the exact matching. Moreover, since routers should process 10 million average-sized packets per second assuming OC192 line speed, the problem becomes more challenging. Hashing is a very efficient way in link layer address lookup requiring the Manuscript received March 12, 2003. The associate editor coordinating the review of this letter and approving it for publication was Dr. J. H. Kim. This work was supported by Hynix Semiconductor, Seoul, Korea. H. Lim and Y.-J. Jung are with the Information Electronics Engineering Department, Ewha W. University, Seoul 120-750, Korea (e-mail:
[email protected]). J.-H. Seo is with Hynix Semiconductor, Seoul 135-840, Korea. Digital Object Identifier 10.1109/LCOMM.2003.818885
Several factors are considered to evaluate routing scheme: lookup speed, scalability, memory requirement, and update overhead. Contents addressable memories (CAMs) apply the hardware-based parallelism to improve address lookup speed. Contents stored in CAMs are directly compared with the keys of incoming packets in parallel. However, CAMs are usually slower and much more costly than common memory, and hence it is very expensive that CAM implementation is applied for IPv6 that uses 128 bits of address. The great parts of survey and complexity analysis on IP address lookup algorithms have been provided in [3] focused on trie structures. Degermark et al. [4] proposed a scheme which constructs a forwarding table using a small size memory. As a software-based scheme, this structure has the minimum and maximum number of memory accesses two and nine, respectively. A lookup scheme proposed by Gupta et al. [5] offers the maximum two memory accesses for an address lookup, but it uses 33 Mbytes of DRAM. Waldvogel et al. [6] proposed an address lookup scheme based on binary search of hash tables. The scheme requires a worst-case time of (address bit) hash lookups and additional memory space to store markers remembering the last found BMP. Adding or deleting a single prefix can change the BMP values of a large number of markers, and hence updating the forwarding table is expensive in the scheme. Huang and Zhao [7] proposed a scheme constructing next hop array based on the extension of the node-compression concept used in tries. III. APPROACH The most intuitive method to implement an address lookup is to use IP address itself as a memory pointer. Unfortunately this method is impractical since it requires huge memory space. Hashing function provides the capability of taking the longer field of addresses and producing a shorter field that can be used as an index to a subset of the table in memory. Hence the required size of memory can be significantly reduced when the
1089-7798/03$17.00 © 2003 IEEE
LIM et al.: HIGH SPEED IP ADDRESS LOOKUP ARCHITECTURE USING HASHING
503
Fig. 1. Proposed architecture.
hashing output is used as an index [2]. Hashing has been effectively used for search with exact matching. However, there is an issue to apply hashing functions to search with longest prefix. In the longest prefix match, it is required to locate the entry matched with an input address in maximum length, and hence it is not known how many bits of address should be used as an input of hash function in advance. Waldvogel [6] has shown a binary search algorithm in each prefix length using hashing, but the binary search takes times of memory accesses in worst case when is the number of distinct prefixes in IP address. The scheme assumes to use a perfect hashing hardware and does not consider the occasion of collisions in hashing. The proposed scheme of this paper is to apply parallel hashing to each address table of a single prefix and apply binary search for collided entries. Fig. 1 shows the proposed scheme. As shown in Fig. 1, each plane represents a separate hardware employing hashing in each prefix length. Hash tables are composed of main table and sub-table. Since main tables include finite entries, two or more hash values may locate to the same entry, and sub-tables are provided to solve the collision. Each main entry includes a prefix, a pointer to the forwarding RAM, a pointer to the sub-table, and the number of collisions corresponding to this entry. Hashing result locates one entry in the main table, and if the prefix used as hashing input is matched with the prefix stored in the pointed entry, then search for this prefix is done, and the forwarding RAM pointer is directed to a priority encoder. If the input prefix is not matched with the prefix stored in this entry, then the pointer to the sub-table is used to direct the search to the sub-table. Each sub entry includes a prefix and a pointer to the forwarding RAM. Binary search is performed in the sub-table for the collided entries. Once address lookups in each prefix return matching entries, the entry with the longest prefix among them is selected in the priority encoder. The retrieved information from the forwarding RAM pointed by the selected entry is used to forward the packet. As shown on Fig. 1, the proposed architecture has modular structure and hence it is easy to implement. The pro-
Fig. 2. Pseudo code of proposed architecture.
posed scheme can be expressed as Fig. 2. Since main tables are separated by prefix lengths and the limited number of entries of sub-table is involved in binary search, by distributing empty entries in-between colliding prefixes of sub-tables, inserting or deleting entries can be more efficient. The proposed scheme is scaled to IPv6 address lookup easily because of its modular characteristics. IV. SIMULATION RESULT AND PERFORMANCE COMPARISON A simulation is performed to evaluate the performance of the proposed scheme using data gathered from MAE-WEST router 1 . Fig. 3 depicts a route distribution of MAE-WEST router in each prefix length dated at March 15, 2002. It is shown that an incoming packet has a prefix length range between 8–32 bits. Since address lookups in the main table and the sub-table are performed successively, the main table and the sub-table can be constructed using a single SRAM. The proposed architecture is implemented using 24 SRAMs for the prefix length of 8 to 32 (except the length of 31). The size of memories in each prefix is adjusted depending on the number of different routes shown in Fig. 3, and the number of bits of hashing output is calculated accordingly. For instance, 1Merit
Networks, Inc. [Online.] Available: http://www.merit.edu
504
IEEE COMMUNICATIONS LETTERS, VOL. 7, NO. 10, OCTOBER 2003
TABLE I PERFORMANCE COMPARISON WITH OTHER SCHEMES
Fig. 3. Route distribution in each prefix length.
Fig. 4. Distributions of the number of memory accesses.
a small memory of main table with 4 entries is allocated for prefix length 10, and hence hashing hardware receives 10 bits as an input and generates 2 bits as an output. Hashing function is selected based on its hardware simplicity. Hashing performs EXOR logic after grouping the bits of prefix into the bits required in hashing result. If prefix length is not divided into the bits of are padded to the the required result, arbitrary bits end. Each main entry has four fields: 8 through 32 bits of prefix, 5 bits of pointer to the forwarding RAM, 15 bits of pointer to the sub-table, and 4 bits of the number of colliding entries. Sub table entries only include a prefix and a pointer to the forwarding RAM. Forwarding RAM is composed of 24 entries and each entry has 16 bits of output interface information. The number of total entries included in the simulation is about 37 000 and the required memory size is 189 kbytes including the forwarding RAM. Fig. 4 shows the simulation results of the number of memory accesses versus the number of routes. From the simulation results, address lookups of more than 78% of routes are achieved within 2 memory accesses, and address lookups of more than 95% of routes are achieved within 3 memory accesses. Table I describes the performance comparison of the proposed scheme with other schemes. As shown in Table I, the proposed scheme requires the smallest memory space except SFT [4] which is a software-based scheme. Even though the proposed scheme has the overhead caused by requiring multiple SRAMs and multiple hashing logics, multiple small size SRAMs are simply included on a chip in current technology
and hashing using EXOR logic does not require much area. The simulation shows that the number of average memory accesses per packet is 1.93 times, which is comparable performance with other hardware-based schemes that require huge memory. The maximum number of memory accesses is 5. It is possible to adjust the maximum number of memory accesses by allocating more entries to the main table. Hardware pipelining can be considered to reduce the number of memory accesses. In other words, if main tables and sub-tables are implemented using separate SRAMs, then the address lookups in main table and sub-table can be performed in parallel for incoming stream of datagrams. V. CONCLUSION A practical IP address lookup architecture is proposed by exploring the exact matching and the parallel hashing in each prefix length. In the proposed scheme, forwarding table is composed of multiple SRAMs, and each SRAM represents an address lookup table in each prefix. Simple hashing in main table and binary search in sub-table are applied in order to find out the matching entry in parallel, and the entry matched with the longest prefix among them is selected. Simulation using data from the MAE-WEST router is performed and the performance of the proposed scheme is evaluated. The proposed scheme shows that a large routing table with 37 000 entries can be compacted to a forwarding table of 189 kbytes of memory. Memory access is 1.93 times in average, one in minimum, and five in maximum. The proposed scheme is an excellent hardware architecture which has modular structure and performs address lookup through reasonable number of memory accesses using small amount of memory. REFERENCES [1] RFC1812 Requirements for IP Version 4 Routers [Online]. Available: http://www.faqs.org/rfcs/rfc1812.html [2] R. Jain, A comparison of hashing schemes for address lookup in computer networks, Digital Equipment Corp., DEC-TR-593. Tech. Paper. [3] M. A. Ruiz-Sanchez, E. W. Biersack, and W. Dabbous, “Survey and taxonomy of IP address lookup algorithms,” IEEE Network, vol. 15, pp. 8–23, Mar./Apr. 2001. [4] M. Degermark, A. Brodnik, S. Carlsson, and S. Pink, “Small forwarding tables for fast routing lookups,” in Proc. ACM SIGCOMM’97, Sept. 1997, pp. 3–14. [5] P. Gupta, S. Lin, and N. McKeown, “Routing lookups in hardware at memory access speeds,” in Proc. IEEE INFOCOM, vol. 3, Mar./Apr. 1998, pp. 1240–1247. [6] M. Waldvogel, G. Varghese, J. Turner, and B. Plattner, “Scalable high speed IP routing lookups,” in Proc. ACM SIGCOMM’97, Sept. 1999, pp. 25–35. [7] N. Huang and S. Ming, “A noble IP-routing lookup scheme and hardware architecture for multigigabit switching routers,” IEEE J. Select. Areas Commun., vol. 17, pp. 1093–1104, June 1999.