Longest Prefix Match and Updates in Range Tries Ioannis Sourdis
Sri Harsha Katamaneni
Computer Science and Engineering, Chalmers University of Technology, Sweden
[email protected]
Computer Engineering, TU Delft, The Netherlands
[email protected]
Currently, the most widely used IP Lookup solutions are Trie-based data structures, such as Cisco’s Tree-bitmap Tries [3], the Lulea Tries [2] and the Level-Compresses (LC) Tries [6]. They store address prefixes and directly support LPM with fast updates. However, their linear scalability to the address width becomes a significant drawback. Such approaches are expected to quadrable their lookup latency and update time as well as to significantly increase their memory size when moving from IPv4 to IPv6 addresses. A less popular lookup approach is the Range Tree [1], [5], [12]. Range Trees scale their lookup latency to the address width better than the Tries, but their storage is linear to the address width. In addition, Range Trees do not inherently support LPM and therefore require significant additions to store prefixes. Finally, updates in a hardware Range Tree implementations can be slow, as discussed in Section III. The Range Trie data-structure, introduced in [10], improves on existing IP lookup solutions addressing the challenges posed by the use of IPv6 as well as by the increasing size of routing tables. A Range Trie is a multi-way tree which performs comparisons on parts of addresses. As shown in [10], the Range Trie memory size and lookup latency scale well with the address width. Only 5 and 7 tree-levels are required to store half a million IPv4 and IPv6 prefixes, respectively, occupying about 2 Mbytes of memory. However, similarly to the Range Trees, the basic Range Trie does not support LPM, while updating the data structure in hardware is not simple. In this paper, we introduce a solution for LPM support and incremental updates in Range Tries; more precisely, we:
Abstract—In this paper, we describe an IP-Lookup method for network routing. We extend the basic Range Trie data-structure to support Longest Prefix Match (LPM) and incremental updates. Range Tries improve on the existing Range Trees allowing shorter comparisons than the address width. In so doing, Range Tries scale better their lookup latency and memory requirements with the wider upcoming IPv6 addresses. However, as in Range Trees, a Range Trie does not inherently support LPM, while incremental updates have a performance and memory overhead. We describe the additions required to the basic Range Trie structure and its hardware design in order to store and dynamically update prefixes for supporting LPM. The proposed approach is prototyped in a Virtex4 FPGA and synthesized for 90-nm ASICs. Range Trie is evaluated using Internet Routing Tables and traces of updates. Supporting LPM roughly doubles the memory size of the basic Range Trie, which is still half compared to the second best related work. The proposed design performs one lookup per cycle and one prefix update every four cycles.
I. I NTRODUCTION The proliferation of Internet introduces significant challenges in network routers. The demand for network bandwidth doubles every six months [4] and needs to be supported by the network infrastructure. In addition, the number of online devices is rapidly increasing, exhausting the IPv4 address space [7] and requiring larger routing tables. IP Lookup and Longest Prefix Match (LPM) is a key function for Internet routers which needs to keep pace with the above developments of the Internet evolution. Address-lookup (IP-lookup) is defined as follows: given an address space [0, 2n ) and k unique addresses Ai , which define k + 1 address ranges (where 0 < Ai < 2n − 1 and i = 1, 2, ..., k), then, address-lookup is the function that determines the address range an incoming address AIN belongs to. In current network routers, the address space [0, 2n ) is the IPv4 or IPv6 address space (n=32 and 128 respectively), while the incoming address AIN is the destination IP-address of a packet. In network routing, the address ranges are expressed as address-prefixes, e.g., for n = 5, the address prefix P=01* describes the address range [01000, 01111] or else [01000, 10000); during a lookup the longest matching prefix is reported (LPM). The upcoming new generation of Internet Protocol (IPv6) uses four times wider addresses than IPv4 and together with the increasing size of routing tables are the two main challenges for IP Lookup. Future address lookup solutions will need to scale well with the increasing IP address width and routing table size.
978-1-4577-1292-0/11/$26.00 c 2011 IEEE
•
•
•
•
describe a technique for storing and updating prefixes in a Range Trie in order to support LPM; present the new Range Trie hardware design, prototyped in a Virtex4-60 FPGA, and synthesized for 90-nm ASICs; measure the memory overhead of supporting LPM in a Range Trie using real Internet routing tables; extract traces of Internet router updates for a period of 12 months and evaluate the performance of our proposal.
The remainder of this paper is organized as follows: Section II offers some background on existing address-lookup solutions as well as on the basic Range Trie structure. In Section III, we describe the modified Range Trie for LPM and incremental updates. Then, in Section IV we evaluate our design and compare to related works. Finally, Section V concludes our paper.
51
ASAP 2011
2
Intervals R1: [00000,00110) R2: [00110,01000) R3: [01000,01010) R4: [01010,01100) R5: [01100,10000) R6: [10000,11100) R7: [11100,11101) R6': [11101,100000)
Prefixes R1: 0* R2: 0011* R3: 0100* R4: 0101* R5: 011* R6: 1* R7: 11100
II. BACKGROUND Many address lookup algorithms have been proposed in the past; most of them have been summarized in the survey by Ruiz-Sanchez et al. [9]. We follow the taxonomy of [9] and discuss different approaches providing more details on the basic Range Trie data-structure. A set of address ranges can be expressed, as shown in Figure 1(a), either directly as intervals, where the complete bit representation of the addresses can be compared to perform a lookup, or as prefixes out of which the longest matching one should be reported. As indicated in [9] address lookup involves searching in two dimensions: length and value. Consequently, existing address lookup data structures are categorized according to the dimension their search is based on, namely in “search on length” and “search on values” approaches. Tries are considered a “search on length” approach as they perform a sequential search on the length dimension, matching at step n prefixes of length n (Figure 1(b)). Improvements on the basic Trie structure may include prefix expansion for multibit strides [11] with or without leaf pushing, compressed Tries [6], the Lulea bitmaps to reduce the storage requirements [2], or Tree bitmaps [3]. Trie-based structures inherently support longest prefix match, but their major drawback is that the number of tree-levels scales linearly to the address width [9]. Multibit tries reduce the decision tree height but do not improve scalability in the address width, while significantly increase memory size. Range Tree is a “search on values” approach, it avoids the length dimension performing comparisons on the expanded prefixes (complete addresses). As depicted in Figure 1(c), Range Trees perform address comparisons creating a balanced decision tree. They store complete addresses to be compared at each node and therefore consume considerable memory size. Multiway Range Trees read and compare at every step multiple addresses [12]. The number of comparisons per node, however, is limited by the available memory bandwidth, which, consequently, reduces scalability with respect to the address width. As described in [5], [9], [12], Range Trees need to store additional information in order to support longest prefix match.
(a) Address Ranges Set (Prefixes or Intervals). 1 0
R1 0
R6 1
1
1
1
0
1
R5 1 0
0
1
0
R2 R3 R4 R7 R1 R2
R6
R3 R4
R7
R5
0
2n (b) Search on Length: Trie 01100
G E
L
01000 G E L 00110 01010
G E
10000
11101 L
L
L
G E
GE
G E
G E
L
11100 L
R1
R2 R3
R4
R5
R1
R2
R3 R4
R5
R7 R6
R6
R6
R7 R6'
0
2n (c) Search on Values: Range Tree Ain 01---
1----
G E L
--11-
G E
L
--01- -L
L
L
L
0
R1
R2
R3
R1
R2
R3 R4
R4
R5 R5
R6 R6
R7
G
G E
G E
G E
CP: -110E ----1 L G
--10-
E
R6
R7 R6'
2n
(d) Range Trie. CP: Common Prefix, ‘-’: omitted bit, L: less, E: equal, GE: greater or equal, G: greater.
A. The basic Range Trie data-structure for address-lookup In general, Tries perform exact match in parts of addresses, while Range Trees perform comparisons of full addresses. The proposed Range Tries attempt to combine the advantages of the above performing comparisons in parts of addresses [10]. Figure 1(d) illustrates a Range Trie example and shows that comparing fewer address-bits can be sufficient for address lookup. At the root node, comparing the two most significant bits “01---” and the most significant bit “1----” is the equivalent of comparing the complete addresses “01000” and “10000”. In the second iteration and after taking the middle root branch, we do not need to compare the two most significant bits since after the first step we know the incoming address is “01xxx”. Similarly, after taking the right branch of the root node we know that the most significant bit is “1xxxx”. Then, the two
Fig. 1.
The Trie, Range Tree, and the basic Range Trie data-structures.
addresses to be compared (“11100” and “11101”) have a common prefix (“-110x”) which is shared and compared separately. The decision of that node is based on the outcome of the common prefix comparison and (if needed) the comparison of the least significant bit. As shown in [10], the Range Tries employ five rules to reduce the number of address bits needed per comparison. One or multiple of these rules can be applied to each Range Trie node. Figure 2 illustrates examples of the rules and shows that, compared to a range tree, fewer address bits are required.
52
ASAP 2011
SET OF ADDRESS RANGES
Range Trie node SET OF ADDRESS RANGES
R1 R1
R2
R3 R3
A2
A1
Na
L E
G
L
0x1A009023
E
L
SET OF ADDRESS RANGES
G E
R2 R2
R1: [0x3FFFFFF0,0x3FFFFFF8) R2: [0x3FFFFFF8,0x40000009) R3: [0x40000009,0x4000000F)
R3 R3
A2
A1 L
G
L
Nb
0x1B090000
E
Na: 0x3FFFFFF0 A1: 0x3FFFFFF8 A2: 0x40000009 Nb: 0x4000000F
0x1C000000
CP: 0x1D33---E R3 R3
A2
Nb
0x1D337811
L
L
L
Na’
R1 R1
0x1D33F463
(c) Common Prefix Range-Trie rule.
R2 R2
Nb’
L
Range Tree node
G E
0x1D337811
0x1E447811
(d) Common Suffix Range-Trie rule. Fig. 2.
Na Nb
Range Tree node
A2'
R2 R2
R3 R3
A2 E G
A1
Na
GE
A1'
L
E
G E
L
R1
G E
L
G
CS: 0x----7811 G E L
R2 R2 A1
G
L
Na: 0x1A002134 A1: 0x1D337811 A2: 0x1E447811 Nb: 0x21512342
0x------19
GE
G
R1 R1
0x------08 E
R1: [0x1A002134,0x1D337811) R2: [0x1D337811,0x1E447811) R3: [0x1E447811,0x21512342)
G
L
0x----1811 0x----F463 G E G E L L
Subtract Ain - 0x------F0
0x1E44----
E
R1: [0x1A002134,0x1D337811) R2: [0x1D337811, 0x1D33F463) R3: [0x1D33F463,0x21512342)
0x1D33----
SET OF ADDRESS RANGES
Na’=Na-Na: 0x00000000 A1'=A1-Na: 0x00000008 A2'=A2-Na: 0x00000019 Nb’=Nb-Na: 0x0000001F
Range Trie node
Range Trie node
SET OF ADDRESS RANGES
Range Tree node
0x1C------
(b) Zero-Suffix Range-Trie rule.
Range Trie node
Na
0x1B09---G E
R1 R1 Na
Range Tree node
0x1A00672A
(a) Node Common Prefix Range-Trie rule.
Na: 0x1A002134 A1: 0x1D331811 A2: 0x1D33F463 Nb: 0x21512342
L
E
G
Range Tree node
Nb
Na: 0x1A002134 A1: 0x1B090000 A2: 0x1C000000 Nb: 0x20009023
G
L
Na: 0x1A002134 A1: 0x1A009023 A2: 0x1A00672A Nb: 0x1A009023
R3 R3
A2
A1 L
GE 0x3FFFFFF8
L
G E
0x----9023 0x----672A G E L G E
Na ≤ Ain < Nb, Ain=0x1A00xxxx
Range Trie node
R1: [0x1A002134,0x1B009000) R2: [0x1B009000,0x1C000000) R3: [0x1C000000,0x20009023)
L
R1: [0x1A002134,0x1A0033FB) R2: [0x1A0033FB,0x1A00672A) R3: [0x1A00672A,0x1A009023)
Nb
0x40000009
(e) Address Alignment Range-Trie rule.
Range Trie rules.
The five Range Trie rules are the following: 1) Omit Node Common Prefix: The address prefix of length L (L < W , where W is the address width), which is common for all addresses in the node address-range [Na , Nb ) can be omitted from the comparisons between the incoming address AIN and an address bound Ai stored in the node. 2) Omit address Zero Suffix: Let an address bound Ai have a suffix of length L bits, where L < W , that is zero. Then, this suffix of Ai does not need to be compared against the L last bits of AIN . 3) Share addresses’ Common Prefix: The Common Prefix of the addresses Ai of length L (L < W ) can be shared among concurrent comparisons and processed separately. 4) Share addresses’ Common Suffix: The common suffix of the addresses Ai of length L (L < W ) can be shared among concurrent comparisons and processed separately. 5) Address Alignment: The lookup of address AIN in node N that maps to the address range [Na , Nb ) and stores addresses Ai is equivalent to the lookup of address A′IN = AIN − Na in node N ′ that maps to the address range [0, Nb − Na ) and stores addresses A′i = Ai − Na . Based on the last rule, a Range Trie node which maps to an address range [Na , Nb ) maximally requires ⌈log2 (Nb − Na )⌉ address bits per comparison. This means in practice that, as opposed to Range Tree and Trie structures, the memory requirements as well as the number of tree levels do not depend directly to the address width improving the scalability of the structure. Each node in the basic Range Trie stores a number of address-parts to be compared, some control bits to define the length and alignment of the compared address parts, and a single pointer to the next tree level. Our previous designs of
the basic Range Trie used tree-nodes of 256 bits for the address parts plus 22-44 bits for the next-level pointer and the control. Using the same node size, a Range Tree is able to perform 8-9 IPv4 or only 2 IPv6 comparisons in a node. On the contrary, the above Range Trie configuration performs up to 28 8-bit comparisons increasing the number of branches per node as well as reducing the memory size and overall number of treelevels. III. R ANGE T RIES WITH LPM & I NCREMENTAL U PDATES At each step, a Range Trie performs comparisons between the incoming address (destination IP-address) and the stored (parts of) addresses in a node; in so doing, the correct branch to the next level node is determined. As opposed to a Range Tree which compares full addresses, parts of the address comparisons in a Range Trie can be shared or even omitted as illustrated in the examples of Figures 1(d) and 2. However, similar to the Range Trees, a Range Trie requires modifications in order to support LPM. Below, we first describe how prefixes can be stored in a Range Trie and how LPM lookup is performed. Then, we discuss our approach for supporting prefix updates and present the new Range Trie hardware design. A. Storing Address-Prefixes in a Range Trie As shown in Figure 3, there are three alternatives for storing a prefix in a Range Trie, adopted from Range Tree solutions [1], [5], [9], [12]. The first option, illustrated in Figure 3(a), is to store the prefix in the leaf nodes contained in the prefix range [9]; this however has an excessive update complexity O(n), where n is the number of leafs as well as increased storage requirements. The second approach, shown in Figure 3(b), is to further allow storing prefixes in internal nodes [1], [12]; then, when a prefix is stored in a node, its children do not need to store it. This approach reduces the storage and update
53
ASAP 2011
P1 P1 P1
P1
P1
P1
P1 P1
P1
P1
P1
P1
P1
F I
P1
D XE
e n g R a
(a) Storing only at the (b) Storing at leafs and (c) Storing at the parent leafs. internal nodes. node. Fig. 3.
A
A1
a
b
a2
a3
b1
Pnew
b2
b3
c1
hb_up c3
A4
Pnew Pnew d1
d2
….. B
L3 2n
UPDATABLE Range Trie part
Sub-tree
the high prefix bound d2 updates node entries lower than d2 (hb up). When the two prefix bounds visit the same node at a tree level, then only the intersection of lb up and hb up can be updated. For each node entry that may be potentially updated, we check whether the newly inserted prefix is longer than the existing one; only then the new prefix in inserted in the corresponding entry, by updating its prefix length and action pointer. When deleting a prefix, the node entries to be updated are identified the same way. Then, the prefix is deleted from the entries stored and replaced with the next longer prefix.
d3
d2
Inserting a prefix in a Range Trie.
complexity to O(m logm (n)), where m is the branching factor of the tree. Still, at each level of the tree up to 2m nodes need to be updated when inserting or deleting a prefix. The third option, illustrated in Figure 3(c), improves the previous one by storing the prefix of a node in its parent [5]. In this approach, a parent node maintains one entry per child for storing prefixes. Although, storage requirements remain unchanged, the update complexity of the third case is reduced to O(logm (n)). In order to store prefixes is the Range Tries, we adjust the third alternative which suits better our hardware pipelined design. In so doing, during a prefix update only two nodes per tree-level need to be accessed and updated, as explained below. During an address lookup, the incoming address visits a single node per level on its way down to a leaf; only the last, longest matching prefix stored in one of the visited nodes is reported at the output. As opposed to Range Tree implementations [1], [5], [12], storing a prefix in a Range Trie involves storing only its length and a pointer to the related action1 ; thereby, we minimize the on-chip memory requirements of the Range Trie.
Inserting and Deleting Prefix Bounds: In the above example, the updated prefix had addressbounds (endpoints) which preexisted in the Range Trie. However, it may be the case that a newly inserted prefix has bounds which are not stored in the data-structure, or the bounds of a deleted prefix are no longer needed for the remaining prefixes. Not deleting unneeded prefix bounds may be tolerated, but inserting new bounds is necessary for inserting new prefixes in a Range Trie. Inserting a new prefix bound in a node can be complex since it requires to support Range Trie optimizations (rules) employed to reduce the number of address bits per comparison. When a Range Trie is constructed from scratch, these optimizations are computed in software; however, supporting them in hardware would require complex logic and hence slow circuits. To avoid the above problem and still support updates of the stored prefix bounds we split the Range Trie in two parts, namely, the fixed and the updatable part, as shown in Figure 5. Both tree parts allow updates of prefixes with endpoints in preexisting bounds, however prefix bounds can be inserted or deleted only at the updatable part of the tree. We know whether a prefix bound already exists in the tree, and hence whether it should be inserted, only after a lookup in the tree and after reaching a leaf. To insert a new bound, in case that leaf is in the fixed part, the leaf is replaced by a new node created at the updatable part. When the leaf is in the updatable part, then the bound is inserted to the last visited node if not full otherwise a new node is created at the next spare updatable-level. In general, under each leaf of the fixed part it is possible to create an updatable subtree, given there
B. Supporting Incremental Updates Updating the Range Trie data-structure involves inserting new prefixes and deleting existing ones. In general, a prefix update requires to lookup using the bounds of the respective prefix and to update the nodes visited during this lookup. Figure 4, illustrates an example of inserting a new prefix. As shown also in Figure 3(c), each node maintains a number of entries, equal to the number of its branches, for storing the respective prefixes (prefix lengths and the action pointers). In the example of Figure 4, the new prefix Pnew ∈ [b2, d2) is inserted in the Range Trie. The prefix bounds b2 and d2 are used to lookup in the tree. At every visited node, the low prefix bound b2 may update the node entries which are greater-equal than b2, marked as lb up in the Figure, while 1A
L2
Fig. 5. A Range Trie that supports incremental updates. Prefixes are updated in the entire tree, but bounds are inserted/deleted only in the updatable part.
b2
Fig. 4.
Spare levels
c
c2
…..
0
lb_up
A3
Pnew Pnew
L1
A
d2
lb_up
A2 a1
Pnew
b2
rt p a
…..
Alternative ways for storing a prefix in a Range Trie. hb_up
ie T r
prefix-action is the Routing Table entry of the corresponding prefix.
54
ASAP 2011
Pipeline registers
Merge after deleting 9 10
10
7
Fixed level 1
Split after inserting 9 1
5
Fig. 6.
7
11 13 15
1
5
9
Fixed level 2
Splitting or merging nodes in a Range Tree after inserting a bound. 9
Fixed last level
3
5
7
9 Updatable level 2
(b)
Fig. 7. A subtree can be unbalanced after bound insertions, e.g. in order insertions, (a), then the subtree should be rebuilt to get balanced again (b).
a , Nb )
U M
Address part comparisons
P
Encode comparison results
U M S
U M P
Update logic
Processing Unit
P
P
Bound counters
Actions
Pipeline registers
U M
S Fig. 8.
Compute next level branch Compute next level memory addr
Spare level logic Spare Level Memory Management
Bound Insertion/ Deletion & Node creation
Update Logic Identify updatable node entries
Range Trie Hardware Design.
rebuilt and get balanced like the one in Figure 7(b). Still, after some point there is a chance that rebuilding a subtree may no longer allow new bound insertions. That may be caused due to the lack of either spare tree-levels (pipeline stages) or free memory in the updatable part. In that case, the entire Range Trie (fixed and updatable part) needs to be rebuilt. Rebuilding a subtree of 3 levels requires to stall incoming address lookup requests for a few hundreds of cycles, while rebuilding an entire Range Trie of about 250k prefixes needs a stall for several hundreds of thousand cycles. As shown in Section IV, the cost of the infrequent tree-rebuilds is the price to pay for the fast updates in the common case.
are enough resources: i.e., free memory and spare tree-levels. To reduce the complexity of inserting and deleting bounds in the updatable-levels, we apply in the corresponding nodes only Range Trie rule 5. Thereby, nodes in a subtree at the updatable part can store addresses of length ⌈log2 (Nb − Na )⌉2 , after performing a single subtraction between the stored addresses and Na . This reduces the number of bits per comparison in the updatable part and allows all stored addresses to have the same width and hence to simplify updates. Although the above restrictions substantially reduce the complexity of new bound insertions and deletions, the boundupdates need to be performed in a feed-forward manner in order to suit our pipelined hardware design. Related Range Tree (software) implementations allow node-splits or -merges during insertion and deletion of prefix bounds, respectively [1], [12]. As illustrated in Figure 6, when a new bound needs to be inserted in a node that is full, then that node needs to be split in two, while deleting a bound may allow two neighboring nodes to be merged in one. This technique results in well balanced trees, but requires to feed-back and update the parent of the splitted/merged node. In the worse case, the phenomenon can be propagated back to the root of the tree; e.g., a node split or merge will require updating its parent, which in turn may cause a subsequent split or merge, and so on. Therefore, this approach is not suitable for our hardware Range Trie design because it would require to stall the pipeline for O(logm (n)) cycles per update. Instead, in the Range Trie we choose to avoid node-splits and -merges at the cost of less balanced trees. In the Range Trie, a new prefix bound is inserted in the last visited node after the bound lookup, when that node is in the updatable part and has room for storing one more bound; otherwise a new node is created under the last visited one. Thereby, we eliminate long feedbacks in our pipeline, but the sub-trees may become unbalanced. For example, in-order bound-insertions in the same subtree may create unbalanced trees such as the one in Figure 7(a). Given that a Range Trie implementation has a limited number of spare tree-levels to support the updatable tree part, subtrees such as the one of Figure 7(a) need to be 2 [N
Action pointers
Update Prefix Lengths
1
Prefix lengths
Update Action Pointers
(a)
5 6 7 8
P
Next level pointer
at the last fixed- level & at updatable part
Update Actions
d
c
Compare Control values
For LPM support
Update Bound Counters
2 4 6 8 Updatable level 1
b
Memory
Select incoming address part
10
1 2 3 4 a
11 13 15
U M
C. Range Trie Hardware Design for LPM The Range Trie hardware design is pipelined having twice as many stages as the number of the implemented tree levels. Figure 8 depicts such a Range Trie pipeline. Two stages corresponding to a tree level involve a memory access, to read the contents of a tree-node, and a processing stage, to determine the correct branch to the next level. Next to each memory stage there is logic to support updates, while in the updatable tree part, next to the processing, there is logic to manage the spare-levels memory and to support bound insertions and deletions. A node configuration is stored in a single memory line at the respective level. The basic Range Trie structure [10] maintained three fields: the compared address part values, some control bits for the comparisons, as well as a pointer to the next level. For LPM support we have added the lengths and the action pointers of the prefixes stored in the node as well as the action of the corresponding prefix stored at each parent; e.g., node A3 in Figure 3 will store four prefix lengths and four action pointers for its four children and one action corresponding to Pnew stored at its parent. At the last level of the fixed part and at the updatable part, an extra field is reserved to store a counter for each bound in the node; this way we can keep track whether a bound is still needed after prefix updates. The processing logic computes the branch to the next level during a lookup using the incoming address at hand and the node contents read from the memory, as described in [10].
is the address range of the respective leaf node in the fixed-part.
55
ASAP 2011
1.1
Cycle: Pipe Stage:
1
Mem 1
AIN1
2
3
4
5
LB_p1 HB_p1
6
7
8
9
10
11
AIN2
LP_p1 HB_p1
L1
2.2
2.1
Mem 2 LP_p1
3.1
LB_p1 HB_p1 AIN1
3.3
AIN2 WB split
LB_p1 HB_p1 AIN1
L2
HB_p1
3.2
AIN1
LB_p1 HB_p1 AIN1
Mem 3
AIN2 AIN2
LB_p1 HB_p1
WB WB LB_p1 HB_p1
AIN2
HB_p1
LP_p1
L3
AIN1
p1 update
LB_p1 HB_p1
AIN2
4 cycle bubble in the pipeline for prefix P1 update
(a) Pref. update
(b) Range Trie pipeline. Fig. 9.
The Range Trie pipeline during lookups and prefix updates.
the same branch after lookup in node 1.1, therefore, this node is not updated. On the contrary, after lookup at node 2.1 of the second level, the prefix bounds are split in different branches and the intersection of the node entries (as in node A of Figure 4) needs to be updated. This is only known after the lookup of the second bound, HB p1, in cycle 6; then, in cycle 7 the contents of node 2.1 are updated. In the third level, the two prefix bounds LB p1 and HB p1, continue independently, with lookups and subsequent updates at nodes 3.2 and 3.3, respectively. In general, the pipeline supports one lookup per cycle and in most cases, since subtree and tree rebuilds are rare, a prefix update takes four cycles.
First the correct part of the incoming address is selected and subsequently the address part comparisons are performed against the compare values stored in the node. Then, the multiple comparison results are encoded, the branch to the next level is determined and the memory location of the next visited node is computed. During a prefix update, the prefix bounds are used for lookup at each level and subsequently the update logic computes the changes to be written back to the memory entry of the visited node at that level. The update logic determines the node-entries to be updated, as described in Figure 4, and then updates the corresponding fields in the memory. At the updatable part of the tree an extra block is designed parallel to the processing logic. This block implements the spare-level logic responsible for inserting and deleting prefix bounds during updates as well as for creating new nodes when needed. Since we cannot know beforehand how the subtrees will be formed during updates, we decided to create a common pool of on-chip memory blocks for all the spare levels of the updatable tree part. These memory blocks are allocated dynamically on demand to the spare-levels of the updatable part during the incremental updates of the Range Trie. The management of the spare memory is performed at the sparelevel logic. Figure 9(b) shows the functionality of the Range Trie pipeline during lookups and prefix updates at the tree of Figure 9(a). During a lookup (e.g. of address AIN 1 , AIN 2 ) the node contents are read from the memory of each tree-level and then processed in the subsequent processing stage. During a prefix update, the bounds of the updated prefix (LB p1, HB p1) are used for lookups in two consecutive cycles. After the lookup of the two bounds at a tree-level, we can determine the fields to be updated in the visited node(s). In addition to the two cycles for lookup, during an update, two more cycles are required in the pipeline to update the nodes (memory write-back). In total, a prefix update requires a bubble of four cycles as shown in the Figure 9(a). In the example of the figure, both bounds follow
IV. E XPERIMENTAL R ESULTS In this Section, we present the Range Trie implementation results, measure the overhead of supporting LPM and updates, and compare to related works. We have implemented our Range Trie designs and synthesized them for ASIC 90-nm technology achieving a frequency of 630MHz. A three-level Range Trie with support for three extra updatable spare levels has been further prototyped in a Xilinx Virtex4-60 FPGA. We used IPv4 routing tables from RIPE [8] containing about 270k prefixes each and compared the proposed Range Trie solution with related data-structures. We further converted the above IPv4 tables to IPv6 and measured again tree-height and memory size. Compared to our basic data-structure introduced in [10], supporting LPM in a Range Trie requires roughly double the memory compared to the basic Range Trie, while the tree levels remain unchanged. As shown in Figure 10(a) a Range Trie requires 5 tree levels for the IPv4 routing tables and about 1.4 Mbytes of memory, while other solutions require at least 6 levels and twice as much memory. When using IPv6 addresses, the Range Trie design needs 3 times less treelevels compared to the second best approach, and occupies 1.9 Mbytes of memory which is 4 times less than linear-search and 10 times less than other related structures. This shows that Range Trie scales better than related works to IPv6 in terms
56
ASAP 2011
Trie
MultiBitTrie
Patricia
TreeBitmap
Trie
LC-Trie RangeTrie
RangeTree
LC-Trie Linear Src
TreeBitmap
100
Memory (MBytes)
32 28 24 20 16 12
10
1
vix
vix
sfinx
ptt ripe
ny
paix
mix mskix
linx
netnod
sfinx
ptt ripe
ny
paix
mix mskix netnod
linx
jpix
decix
amsix
0
jpix
0.1
4
decix
8 amsix
# of Tree levels
MultiBitTrie RangeTrie
Patricia RangeTree
(a) Memory Size for IPv4 routing tables. Trie
MultiBitTrie RangeTrie
Patricia RangeTree
(a) Number of tree levels for IPv4 routing tables.
LC-Trie Linear Src
TreeBitmap
100
4 vix
sfinx
ptt ripe
paix
mix mskix netnod ny
linx
jpix
vix
sfinx
(b) Memory Size for IPv6 routing tables.
decix
0
ptt ripe
1
8
ny
12
paix
16
netnod
20
10
mix mskix
24
amsix
# of Tree levels
28
linx
RangeTrie
RangeTree
32
jpix
LC-Trie
decix
TreeBitmap
MultiBitTrie
amsix
Patricia
Memory (MBytes)
Trie
Fig. 11. The memory requirements of the Range Trie and related datastructures when storing different Ipv4 and IPv6 Internet routing tables.
(b) Number of tree levels for IPv6 routing tables.
per month were required. Still the performance overhead was negligible as shown in Figure 13. Even when having a constant rate of one million updates/sec, our design serves 626 million of packets/sec (MPPS) showing that in practice the overhead of a prefix update is four cycles.
Fig. 10. The number of tree levels of the Range Trie and related datastructures when storing different Ipv4 and IPv6 Internet routing tables.
of latency and memory. In order to measure the overhead of incremental updates, we extracted from RIPE traces of routing table updates for 12 months (June’09-May’10) and fed them to our Range Trie assuming real time. On average there are about 100 updates per second. Updates arrive every five minutes having a peak of 3.5 million updates in a five minute time-slot. Figure 12 depicts the time spent in updates (measured in nsec) for every 5-minutes over the period of the 12 months, considering the ASIC operating frequency of our design. In the worst case, 0.1 second was spent in updates within the five minute timeslot, while the average overhead was about 1 msec. During the period of 12 months, we needed to completely rebuild the entire Range Trie 6 times, while about 50-250 subtree-rebuilds
V. C ONCLUSIONS In this paper, we introduced a technique for supporting LPM and incremental updates in the Range Trie data-structure. We described the additions required to the basic structure in order to store prefixes and support updates. We partitioned the Range Trie having a first upper part that is more rigid but also more efficient exploiting all Range Trie optimizations, and a subsequent second part that is less efficient but simpler to update. The overhead of supporting LPM in a Range Trie is the memory size which doubles compared to the basic structure, however, compared to related works is still half for IPv4 and 10× lower for IPv6 routing tables. We opted for fast updates
57
ASAP 2011
update overhead (nanoseconds)
10
10
full tree rebuilds
8
10
6
10
4
10
large number of sub−tree rebuilds
2
10
0
10
June ’09 July ’09 Aug ’09 Sept ’09 Oct ’09 Nov ’09 Dec ’09 Jan ’10 Feb ’10 Mar ’10 Apr ’10 May ’10 June ’10
time (months) Fig. 12. The update overhead in the Range Trie design during a period of 12 months; overhead measured in nanoseconds spent in updates every 5 minutes.
Fig. 13.
Range Trie performance in MPPS, considering various update rates.
in the common case which introduces a four cycle bubble in our pipeline per prefix update, at the cost of slow but infrequent partial or complete tree rebuilds. The Range Trie design performs over 625 million lookups/sec even with 1 million updates/sec.
[3] W. Eatherton, G. Varghese, and Z. Dittia, “Tree bitmap: hardware/software IP lookups with incremental updates,” SIGCOMM Comput. Commun. Rev., vol. 34, no. 2, pp. 97–122, 2004. [4] G. Gilder, “Telecosm: How Infinite Bandwidth Will Revolutionize Our World,” September 2000, free Press. [5] H. Lu and S. Sahni, “A b-tree dynamic router-table design,” IEEE Trans. Comput., vol. 54, no. 7, pp. 813–824, 2005. [6] S. Nilsson and G. Karlsson, “IP-address lookup using LC-tries,” IEEE JSAC, vol. 17, no. 6, pp. 1083–1092, 1999. [7] NRO: IPv6 Growth Increases 300 Percent in Two Years, “www.nro.net/documents/press release 031108.html,” Dec 2008. [8] RIPE Network Coordination Centre, “http://www.ripe.net/.” [9] M. Ruiz-Sanchez, E. Biersack, and W. Dabbous, “Survey and taxonomy of ip address lookup algorithms,” IEEE Network, 15(2), pp. 8–23, 2001. [10] I. Sourdis, G. Stefanakis, R. D. Smet, and G. N. Gaydadjiev, “Range tries for scalable address lookup,” in ACM/IEEE ANCS, 2009, pp. 143–152. [11] V. Srinivasan and G. Varghese, “Fast address lookups using controlled prefix expansion,” ACM Trans. Comput. Syst., vol. 17, no. 1, pp. 1–40, 1999. [12] P. Warkhede, S. Suri, and G. Varghese, “Multiway range trees: scalable ip lookup with fast updates,” Comput. Netw., 44(3), pp. 289–303, 2004.
ACKNOWLEDGEMENTS This research was partially supported by the Technology Foundation STW, applied science division of NWO and the technology programme of the ministry of Economic Affairs (project number 11572). R EFERENCES [1] Y.-K. Chang and Y.-C. Lin, “Dynamic segment trees for ranges and prefixes,” IEEE Trans. Comput., vol. 56, no. 6, pp. 769–784, 2007. [2] M. Degermark, A. Brodnik, S. Carlsson, and S. Pink, “Small forwarding tables for fast routing lookups,” SIGCOMM Comput. Commun. Rev., vol. 27, no. 4, pp. 3–14, 1997.
58
ASAP 2011