Boolean logic minimization is being increasingly applied to new applications which demands very fast and frequent minimization services. These applications ...
m-Trie: An Efficient Approach to On-chip Logic Minimization Seraj Ahmad, Rabi Mahapatra Department of Computer Science Texas A & M University, College Station, Texas-77843 {seraj,rabi}@tamu.edu
Abstract
strained devices. For example Espresso-II logic minimization algorithm described in [3] takes 109 seconds to minimize a routing table containing 11091 entries and supports 2 worst case updates per seconds on a 400 MHz ARM platform. Further, it requires 500 kilobytes data memory and 100 kilobytes of instruction memory. A newer logic minimizer ROCM described in [6] takes 120 seconds to minimize the same routing table but offers about 30 worst case updates per seconds. Contrast this with over 100,000 routing table entries and peak rates of over 2000 worst-case updates per seconds required in current backbone routers. This paper introduces a novel approximate minimization technique of the complexity O(N) based on a trie data structure called m-T rie (minimization trie). The technique provides fast minimization and efficient incremental updates using localized minimization technique. It has a very small code footprint of about 20 KB and can operate with a data memory budget as small as 16KB. Experimental results show that it can attain up to 25,000 updates per second. The rest of this paper is organized as follows. Section 2 and 3 describe the m-Trie data structure and insertion/deletion algorithms. Section 4 discusses a case study and results on routing table compaction to demonstrate the usefulness of the proposed approach. The conclusion and future enhancements is discussed in Section 5.
Boolean logic minimization is being increasingly applied to new applications which demands very fast and frequent minimization services. These applications typically offer very limited computing and memory resources rendering the traditional logic minimizers ineffective. We present a new approximate logic minimization algorithm based on ternary trie. We compare its performance with Espresso-II and ROCM logic minimizers for routing table compaction and demonstrate that it is 100 to 1000 times faster and can run with a data memory as little as 16KB. It is also found that proposed approach can support up to 25000 incremental updates per seconds positioning itself as an ideal onchip logic minimization algorithm.
1. Introduction Logic Minimization techniques traditionally found its use in logic synthesis to reduce the number of gates required for a given circuit. However in recent years, logic minimization has been applied to numerous applications other than logic synthesis such as routing table reduction [5], access control list reduction in the network processors [6] and hardware/software partitioning [10]. These applications demand the logic minimizer to be kept closer to the application and share the limited computing and memory resources available on-chip. It requires logic minimizer to be as fast as possible with an acceptable loss of minimization. In addition to this, it should also have small code footprint and be able to operate with a given data memory budget in order to support even the most resource constrained devices. These application also demands high frequency logic minimization services due to continuous update of application database targeted for minimization. The logic minimization problem is known to be NP complete making exact algorithms unsuitable for practical size problems. Also, the currently available approximate algorithms are unsuitable due to their large computing and memory resources and inflexibility to adapt to resources con-
0-7803-8702-3/04/$20.00 ©2004 IEEE.
2. Preliminaries The m-Trie is defined in terms of a 3-ary tree known as ternary trie. Each non-leaf node v in a ternary trie can have up to three children labelled v0 , v1 and vx . The edges connecting the node to its children v0 , v1 and vx are labelled as 0, 1 and x respectively to specify its direction. The labels 0 and 1 denote two disjoint directions while x is defined as a union of both the directions 0 and 1. The subtrie rooted at node v is denoted as T (v). The basic unit of insertion and deletion in the ternary trie is a path. A path containing all the edges between nodes vi and vj is denoted as Pvi ∼vj and can be uniquely mapped to a string s(Pvi ∼vj ) in {0, 1, x}∗ known as route. The route
428
Table 1. Truth e1/e2 0 1 x
4. m-T rie(v1 ) m-T rie(vx ) = ∅
Table For τ (e1, e2) 0 1 x 0 ∅ 0 ∅ 1 1 0 1 x
is formed by concatenating directions of all the edges between nodes vi and vj . Therefore a path can be specified as P (u, s), where u is the starting node and s specify the route taken by the path. A path can also be specified simply as s, where starting node is implicity assumed to be the root. The cover Q for a given path P is defined as any path formed by promoting one or more of 0 and 1 directions of P to x. The height h(v) of a node v is defined as the number of nodes visited to reach a leaf node from v. The height of root node is denoted by W . The trend, τ (e1 , e2 ) between two edges e1 = (ui , ui+1 ) and e2 = (vi , vi+1 ) in the ternary trie is defined according to Table 1. This definition can be extended to define trend between two paths Pu1 ∼ul and Pv1 ∼vk as follows: τ 1 · · · τk−1 if l = k τ (Pu1 ∼ul , Pv1 ∼vk ) = ∅ otherwise. where τi = τ ((ui , ui+1 ), (vi , vi+1 )). The edges e1 and e2 are said to match if τ (e1 , e2 ) = ∅ i.e. if they follow the same trend. The distance δ(Pu1 ∼uk , Pv1 ∼vl ) between two paths Pu1 ∼uk and Pv1 ∼vl in ternary trie is defined as the number of mismatches between (ui , ui+1 ) and (vi , vi+1 ) for 1 ≤ i ≤ min(l, k). The similarity Pv∼w Px∼y of two paths Pv∼w and Px∼y is defined as follows: τ (Pv∼w , Px∼y ) if δ(Pv∼w , Px∼y ) = 0 Pv∼w Px∼y = ∅ if δ(Pv∼w , Px∼y ) ≥ 1. The two paths are said to be dissimilar when Pv∼w Px∼y = ∅. Two Trees T1 and T2 are said to be dissimilar if there are no paths P ∈ T1 and Q ∈ T2 such that P Q = ∅.
If we map the route s of the path P (s) to a minterm, the two level logic minimization problem can be modelled as minimization of number of leaf nodes. m-Trie attempts to minimize the number of leaf nodes by merging similar paths in direction 0 and 1 and rerouting towards direction x. Property 1 requires all the leaf nodes to have the same level to facilitate checking for similarity and path merging. The properties 2, 3 and 4 are minimization constraints and states that m-Tries rooted at children must be dissimilar pairwise. If there is some similarity between the subtries T (v0 ) and T (v1 ), the paths showing similarity must be removed from T (v0 ) and T (v1 ), merged and rerouted to subtrie T (vx ). However if there is some similarity between subtries T (v0 ) and T (vx ), the path in T (v0 ) showing the similarity is removed and merged with the similar path in T (vx ). The same reasoning can be applied for similarity between subtries T (v1 ) and T (vx ). The algorithm to remove similarity between a pair of tries with m and n paths respectively takes m×n path comparisons. If there are N prefix paths present in the trie and distributed equally among the children at every level then it 2 will take 3 × ( N3 × N3 ) = N3 path comparisons to build m-Trie. So the complexity of building the m-Trie from a ternary trie is O(N 2 W ). Following subsection provides an O(W 2 ) insertion algorithm on m-Trie. Since building of m-Trie is equivalent to N insertions on m-Trie, we can build it in O(N W 2 ).
3.2. Insertion Insert operation on m-Trie consists of two steps. The first step involves traversal of the path to be inserted and creating those part of the path which do not already exist in the trie. Algorithm 1. Traverse(v, s) 1 i = 0, dir = s[i + +] 2 while(vdir = NULL) 3 |v| = |v| + 1 4 v = vdir 5 dir = s[i + +] 6 endwhile 7 repeat 8 vdir = CreateNewNode(dir) 9 |v| = 1 10 v = vdir 11 dir = s[i + +] 12 until(s[i] = ∅) 12 return v end
3. Minimized Trie 3.1. Definition m-Trie or minimized trie is a ternary trie T, such that for every node v ∈ T , the following properties are satisfied. 1. h(v0 ) = h(v1 ) = h(vx ) 2. m-T rie(v0 ) m-T rie(v1 ) = ∅ 3. m-T rie(v0 ) m-T rie(vx ) = ∅
429
Traversal procedure maintains a path counter |v| to track the number of paths passing through each node v. Traversal increments the path counter of each visited node by 1 and returns the leaf node. Figure 1 shows a path to be inserted in the trie. The path starting from root to node u already exists in the tree, so this path is simply traversed till node u. Now there is no path from u to leaf node v, therefore the path Pu∼v is created during traversal. Second step involves merging of the newly added path P by rerouting. It consists of inspecting all the nodes on P starting from the leaf node for a similarity condition with its offsprings and resolve it. Please note that the existence of a similarity condition is a violation of properties 2-4. In order to remove the similarity at a node v ∈ P , the path segments showing similarity are untraversed and traversed again in direction x at v. Figure 2 shows a path similarity condition at node v. Here the path Pv∼w shows similarity with path Pu∼w , where node u is an offspring of v. To break the similarity, the paths Pv∼w and Pu∼w untraversed and traversed again in direction x.
Figure 1. Path Traversal During Insertion.
Algorithm 2. Insert(T, P ) 1 v = Traverse(root(T ), s(P )) 2 while(v = root(T)) 3 =offspring(v) 4 p = parent(v) 5 for every u ∈ 6 τ = T (v) T (u) 7 if(τ = ∅) 8 UnTraverse(v, τ ) 9 UnTraverse(u, τ ) 10 Traverse(px , τ ) 11 Traverse(px , τ ) 12 endif 13 endfor 14 v=p 15 endwhile end
Figure 2. Similarity at node u and v. the offsprings of v with a depth . If we assume that the minimization constraints are not violated below v, the subtrie T (v) has only one path which may have a similar path in either T (of f 1) or T (of f 2). If a similar path exists, both of them are merged and rerouted to the subtrie rooted in direction x to restore minimization constraints. The merging and rerouting of path P implies that there is still one path now up to parent(v) which may have similarity with other paths in subtrie rooted at offsprings of parent(v). This shows that constraints violation are localized along path P . Further, the constraint restoration procedure requires searching for the presence of an identical path in subtries T (of f 1) and T (of f 2), which takes 2 × steps. The procedure has to be applied for all the nodes on P starting from the last non-leaf node up to root in a bottom-up manner to completely restore the minimization constraints, requiring =W =1 2 × = W (W + 1) steps. Therefore constraints can be restored in O(W 2 ) steps.
Insertion maintains two bitmaps b0 and b1 of length W at every leaf node. The bitmap b0 (b1 ) remember the node positions where a path in direction 0(1) is merged with a path in direction x. The path counters and bitmaps together are used to support the deletion in m-Trie. The correctness and complexity of the insertion algorithm is established in following lemma. Lemma 1. The minimization constraints for an m-Trie are violated only along the newly added path during insertion upon the m-Trie. The constraints can be restored in O(W 2 ) steps.
3.3. Deletion The deletion operation on m-Trie consists of three major steps involving searching, un-traversal and balancing. Due to path merging and rerouting during insertion, the inserted path may not be directly present in the m-Trie. For
Suppose v be a node on the newly inserted path P in mTrie and T (of f 1) and T (of f 2) be the subtries rooted at
430
example, an edge (u, v) in direction 0 or 1 may be rerouted to direction x to restore minimization constraints during insertion. This rerouting scheme merges the original path into another path which is a cover of P. For example, the path P in Figure 2 is rerouted at node parent(v) during insertion and can be found only along path Q. The first step in the deletion involves searching of a covering path. The uniqueness and search complexity of the covering path is proved in the lemma 2. The second step involves untraversing the path to be deleted along the covering path. Untraverse decrements |v| for every visited node v by 1 and returns the leaf node. Un-traverse deletes the path segment containing nodes with |v| = 0. During insertion, path pairs of the form (P1 (s1 .0.s2 ), P2 (s1 .1.s2 )), (P3 (s1 .0.s2 ), P4 (s1 .x.s2 )) and (P5 (s1 .1.s2 ), P6 (s1 .x.s2 )) are merged into path Q1 (s1 .x.s2 ). The deletion of P1 , P2 , P4 or P6 from the path Q1 , breaks the similarity causing an imbalance and indicating the need to reroute the other path P2 , P3 or P5 in its original direction. The node positions where the covering path (Q) differs from the path to be deleted(P ) may have an imbalance. We can capture these positions by a bitmap b = P ⊕ Q. However the bitmap b indicates only potential imbalance positions and may not necessarily have one in reality. For example, P5 and Q1 differs at one position but deletion of P5 from Q1 requires no rerouting. The third step consists of finding the imbalance positions and reroute the unbalanced paths in its original directions. Please note that if several contiguous imbalance point are encountered on the path, they are rerouted together in the original directions of the node farthest from the leaf. Following two lemmas discuss the complexity of the different steps involved in the deletion procedure.
The deletion of path P along the covering path Q may leave unbalanced path segments at every node in direction x, where path P suggests a 0 or 1 direction. The rerouting of the path segment in m-Trie at level takes 2 × steps. We can encounter nodes having imbalance at each level on the path Q in the worst case. Therefore the complexity of restoring the balance after untraverse operation is =W =1 2 × = W (W + 1). Algorithm 3. Delete(T, P ) 1 C= Find Covering Path(P ) 2 w=UnTraverse(root(T ), s(Q)) 3 b0 = w → b 0 , b1 = w → b 1 4 bitmap = ((C ⊕ P ) ⊕ b0 )|((C ⊕ P ) ⊕ b1 ) 5 v= w 6 while(! root(v)) 7 if(bitmap & 1) 8 UnTraverse(v, s(Qv∼w )) 9 Traverse(v, s(Pv∼w )) 10 endif 11 bitmap = bitmap 1; 12 endwhile end Algorithm 4. UnTraverse(v, s) 1 i = 0, dir = s[i + +] 2 while(vdir = NULL) 3 |v| = |v| − 1 4 if(|v| = 0) 5 DeletePath Pvdir ∼leaf 6 return NULL 5 endf 7 v = vdir 8 endwhile 9 return parent(v) end
Lemma 2. The covering path Q for a given path P, is unique and can be found in O(2W ) steps. Due to rerouting, we may have to explore the original direction (0 or 1) as well as direction x at every level in order to find exact path. From which we can infer that the search complexity for prefix cover in a m-Trie having level satisfies T = 2 × T−1 . In the worst case we may have to perform bidirectional search at each level, therefore the search complexity for covering path of P becomes O(2W ). Further it can be noted that the path P will be either found in original direction or in direction x but not both as it will violate the constraints 3 and 4 given in Section 3.1. Hence there will be a unique path denoted by Q in the m-Trie covering the given path P .
4. Case Study : Routing Table Compaction in TCAM 4.1. Background Internet is a packet switched network and consists of a number of routers and hosts interconnected together by communication links. Hosts reside at the boundary of the network and host-to-host communication takes place via a number of routers using Internet Protocol (IP). The communication between two nodes is converted into a series of packets known as IP datagrams. Every datagram carries a 32-bit destination address to facilitate independent rout-
Lemma 3. Let P be the path to be deleted and Q be a path present in m-Trie such that Q covers P. The minimization constraints for an m-Trie are violated only along path Q. The constraints can be restored in O(W 2 ) steps.
431
ing. The packet is forwarded to a node corresponding to the best-known path to the destination, which is called nexthop. The next hop is determined by an IP lookup operation performed on a routing table. IP lookup operation searches the most specific network hosting the specified destination in a list of variable length network identifiers known as IP prefixes. IP prefixes can be represented using a ternary string · · · x where P denotes a 32-bit prefix and l the prefix Pl xx
number of TCAM entries enabled for searching. Research efforts to reduce TCAM power consumption can be divided into two categories. The first approach attempts to reduce power consumption by partitioning the entire TCAM memory into a set of TCAM pages and then finding a suitable hashing algorithm to map each entry into a set of target pages [11], [8]. During searching only target pages are enabled. This reduces power consumption by a ratio of np , where p and n are average number of target pages and total pages respectively. The second approach reduces the power consumption by compacting routing table entries using logic minimization techniques as discussed in [5] and [6]. IP prefixes contain the symbol x only at the end while in minimized IP prefixes it can occur at any position. Since TCAM allows storage of x at any bit position, routing semantics can be guaranteed even with minimized routing table. Here, the reduction in power consumption is dependent on the compaction ratio achieved by the logic minimization technique applied. Statistics show that logic minimization can reduce power consumption by up to 30-40% [5]. Both the techniques have merits of their own however a hybrid approach shall achieve higher performance and economy.
32−l times
length. Pl represents the l most significants bits of the prefix. Only Pl is compared against the specified destination address to decide a match. IP lookup operation searches the routing table to find the longest prefix matching the destination.
4.2. Issues in IP lookup A survey of software and hardware based methods for IP lookup can be found in [4]. The lookup algorithms designed for conventional memory to solve longest prefix match problem requires several memory accesses to retrieve the nexthop. This can quickly become a bottleneck for high-speed backbone routers operating at gigabit speed and with large routing tables containing more than 100000 entries. Francis et. al. investigate techniques for O(1) IP lookup using binary content addressable memory(BCAM) and ternary content addressable memories(TCAM) [7]. BCAMs allow storage of 0 and 1 in each memory cell and can perform only fixed length match. Hence multiple BCAMs are required to search variable length prefixes in a single cycle. This can lead to significant under utilization of the available memory. TCAMs are similar to BCAMs but allow storage of 0, 1, and x states. The state x is treated as don’t care and ignored during a matching operation. Thus TCAMs allow the storage of variable length prefixes in a single unit achieving more economy. Also TCAMs offer easier management and update of the routing tables. Despite its advantages, TCAM based lookup solutions remained unpopular due to its high cost, low capacity and poor performance. However, recent advances in manufacturing and interconnection technology allows fabrication of high capacity, high performance and low cost TCAM units matching the requirement of today’s backbone routers. For example, the latest available TCAM in the market operates at 100 million searches per second and offer capacities up to 16MB [2].
4.3.
Reducing TCAM
Power
Consumption
4.4. Logic Minimization Using m-Trie In order to apply logic minimization, a given IP routing table R is first partitioned according to nexthops hi ∈ R into subtables Rhi . Each of these subtables are pruned to remove overlapping entries and further sub-divided on the basis of prefix length into subtables R(hi ,) . Logic minimization algorithm is applied to each of these sub-tables R(hi ,) . The m-Trie based logic minimization is described in Algorithm 5. Algorithm 5. LogicMin(R(hi ,) , µb ) 1 mTrie = CreateRootNode() 2 µc = 0 /* Memory Consumed */ for every prefix s ∈ R(hi ,) 3 4 µa = µb − µc /* Available Memory */ if(µa > µmin ) 5 5 µc += Insert(mTrie, s) 7 else 8 FlushPrefixes(mTrie) 9 DeleteTrie(mTrie) 10 mTrie = CreateRootNode() 11 µa = µ b 12 endif 13 endfor end
in
The algorithm takes a subtable R(hi ,) and memory budget µb as its input. The prefixes in the subtable are treated as routes of the path in the m-Trie. To minimize the
TCAM based fast lookup seems promising but it is not without its disadvantages. TCAMs consume a lot of power in normal operating conditions which is proportional to
432
subtable, an empty m-Trie is created and prefix paths are added to it one at a time. If the memory budget is reached before consuming all the prefixes in the subtable, the paths in the current m-Trie are enumerated as minimized prefixes. m-Trie is then deleted to reclaim the memory consumed by it. The whole process is repeated till all the prefixes in the subtables are consumed by the algorithm. Please note that µmin is the minimum amount of memory needed to create a prefix path during insertion in the m-Trie.
Table 3. Match Behavior For Ternary Symbols 0 1 xtcam 0 y n y 1 n y y xsr n n y To initiate a search in TCAM, the match line is charged high. The data to be searched is stored in search register. Since the search data is fed to large number of cells, TCAM provides drivers to handle the capacitive sink load contributed by each cell. TCAM words which do not match the search data causes the matchline to be discharged, which is detected with the aid of a sense amplifier.
4.5. Covering Path Lookup in TCAM As we have noted in the Section 3.3 that the most expensive step in deletion procedure is finding the covering path in the m-Trie. After minimization, each path in the m-Trie is mapped into a minimized prefix and stored in TCAM. Hence, the covering path can be searched in O(1) time in TCAM with suitable enhancement of TCAM architecture. This enables the deletion algorithm to perform at par with the insertion algorithm. The rest of the section describes the standard TCAM architecture and enhancements for searching the covering path.
Table 4. Sample Routing Table Prefix Next Hop 1000* H3 100* H2 10* H1 In order to perform covering path lookup in TCAM, the search register is enhanced to store 0 , 1 and x. This can be easily accomplished by using 2 bits to represent each symbol using the encoding scheme given in Table 2. The corresponding hardware overhead is quite small and doesn’t impact the TCAM cells or interconnections. This opposite encoding of ”don’t care” symbol x in TCAM and search register results in match behavior shown in Table 3. For example, consider the TCAM cell shown in Figure 3(a). If the symbol x is stored in search register, the encoding scheme implies that it will turn on the transistor T1 and T2. If a ”0” or ”1” is stored in the cell, it would cause a mismatch by turning off the transistor T3 or T4 and thereby creating a discharge path to ground. However, if the ”don’t care” symbol x is stored in TCAM, it will match any of 0, 1 and x in the search register by turning off the transistor T3 and T4 and blocking all the paths to ground. This differential match behavior for symbol x gives the desired combined IP lookup and covering path lookup capability to TCAMs. For example, a search for IP addresses prefixed with ”100000” will match all the TCAM entries in Table 4 and returns 1000*, which is the longest prefix among all the matched prefixes. However, a search for path routes of the form 100* will match 10*, 100* and return 100*, which is the longest matching route in this case.
Table 2. Encoding of Ternary Symbols d1 d0 T 0 0 xtcam 0 1 0 1 0 1 1 1 xsearch register A NOR based TCAM cell is shown in Figure 3(a). It uses two SRAM based binary storage cell to store states 0, 1 and x based on the encoding scheme given in Table 2. It has four transistor switches T1-T4 to assist comparison. These transistor switches prevent matchline from getting shorted to ground when a match occurs. For example a 0 state in TCAM cell will turn off the transistor T 3. A search for 0 applied on search lines sl0 and sl1 = sl¯0 will turn off transistor T 1 blocking match line from getting shorted to ground, implying a match. However a search for 1 will turn on the transistor T 1 creating a path to ground through transistor T 4. On the other hand an x state will turn off both T 3 and T 4 blocking all path to ground thus matching all search keys applied to TCAM cell. A simplified TCAM architecture is shown in Figure 3(b) adapted from [9]. An array of TCAM cells are arranged to form a TCAM word. In order to perform word comparison, all cells belonging to a single word share a common matchline. Since data being searched can match multiple words in TCAM due to variable length matching, all the matchlines are connected to a priority encoder. The priority encoder selects the word at the lowest address among all the matched words.
4.6. Experimental Results To establish the suitability of using m-Trie based minimization, we evaluated its performance on standard routing table traces used in [5] and [6] as well as two ad-
433
1
T3
T1
T2
search line
search line
T4
search lines
match line
1
1
1 0
1 0
1 0
0
0
0
0 1
0 1
0 1
0
0
*
0
0 1
*
0 1
*
0 0
0 1
0 0
0 0
* *
*
0 0
0 0
0 0
* *
*
0 0
Priority Encoder
sense amplifier
match lines
0 0
0 0 search line drivers
1
1 0
0
0 1
(a)
0
0 1
*
1 1
*
1 1
*
1 1
(b)
Figure 3. (a) NOR based TCAM cell and (b) TCAM architecture with modified search register.
Base Tables Orig. Pruned paix pacbell maewest aads att bbn
13914 22165 29585 33740 112412 124538
11091 16124 22042 24795 79743 92773
Table 5. Comparison of Performance Espresso-II (8000 lines) ROCM (1800 lines) Data Time Table Data Time Table (KBytes) (sec) Size (KBytes) (sec) Size 499.5 108.860 8863 76.0 119.840 8984 1119.0 1119.010 11222 288.0 1220.000 11604 1156.1 925.050 16305 272.0 929.320 16728 1183.4 910.000 18440 276.0 1013.410 18898 4164.2 6187.870 54773 256.0 8405.360 57175 3937.0 1691.600 69316 224.0 2858.340 70984
ditional large routing table traces from bbnplanet and att canada against two existing logic minimizer Espresso-II and ROCM. All results were obtained on cerfcube running embedded linux on a 400 MHz Intel XScale processor [1].
m-Trie (300 lines) Time Table (KBytes) (sec) Size 15.6 0.240 8881 15.5 0.310 11351 15.7 0.440 16354 15.7 0.510 18531 15.9 2.380 57918 15.9 2.820 70716
Data
99% within 200 KB. This is due the fact that each instance of path merging reduces the average memory required to insert a path. Therefore more paths can be inserted for a given memory budget. m-Trie keeps inserting the paths till the memory budget is reached thereby giving more compaction opportunities to the regions where compaction is more. This greedy approach results in a better compaction with a limited memory budget.
The main results have been summarized in Table 5. The first row lists the code complexity in terms of C code lines beside each algorithm. The code complexity for m-Trie is the lowest with 300 lines of C code as compared to 1800 and 8000 lines for other algorithms. The first column gives the original and pruned routing table sizes for each router. It is worth mentioning that pruning alone can compact the routing tables by 20-30%. The remaining columns give data memory, execution time and size of the minimized routing table in that order for each algorithm. As we can see that m-Trie based minimization is the fastest among all the existing approaches by an order of 100 to 1000 times for all the standard routing tables. Further, we found that m-Trie shows only 0.2-5.0% loss of compaction with a memory budget of 16KB which is 10 to 100 times lower than what is required by other algorithms.
It is also worth mentioning that peak compaction achieved in m-Trie mostly outperforms the other algorithms. This extra compaction is achieved due to pruning property built into the m-Trie. It causes the m-Trie to greedily prune the overlapping prefixes even during minimization, which other algorithms lack. The peak compaction for each router is given in Figure 4. We further observed that the performance of EspressoII and ROCM depends heavily on the distribution of subtables sizes. For example, the ROCM or Espresso-II minimization on bbnplanet routing table takes lesser time than attcanada routing table while bbnplanet has more routing table entries. This occurs due to the fact that most of the attcanada routing entries are concentrated in a few partitions while bbnplanet routing table maintains a fairly uniform distribution. On the contrary, we observed a linear dependence for m-Trie based minimization with respect to
The routing table compaction achieved by m-Trie with respect to different memory budgets is shown in Figure 4. Please note that the total routing table compaction as achieved by m-Trie approaches rapidly upto 95% of peak compaction within a memory budget of 16 KB and upto
434
BBN (68982) MAEWEST (16159)
ATT (54986) PAIX (8844)
6. Acknowledgements
AADS (18318) PacBell (11187)
30 25
Authors would like to acknowledge initial discussion with V.C.Ravikumar to help develop the idea. Authors would also like to acknowledge R. Lysecky and F. Vahid for providing ROCM code and benchmarks to help us compare the performance.
Compaction (%of total size)
20 15 10 5
200
50
100
30
40
10
20
9
9.5
8
8.5
7
7.5
6
6.5
5
5.5
4
4.5
3
3.5
2
2.5
1.5
1
0
Memory Budget (in kilobytes)
References Figure 4. Compaction Performance vs Memory Budget.
[1] Cerfcube 255 with Embedded Linux. Intrinsyc, http://www.intrinsyc.com/products/cerfcube. [2] SCT2000CB3, Ultra 2M Family. SiberCore Technologies, www.sibercore.com, March, 2004. [3] R. Brayton and et. al. Logic Minimization Algorithm for VLSI Synthesis. Kluwer Academic Publishers, Boston, MA, 1984. [4] P. Gupta. Algorithms For Routing Lookups and Packet Classifi cation. Stanford University, PhD Thesis, December, 2000. [5] H. Liu. Routing table compaction in ternary-cam. IEEE, Micro, 15(5):58–64, Jan/Feb 2002. [6] R. Lysecky and F. Vahid. On-chip logic minimization. In Proceedings of the 40th conference on Design automation, pages 334–337. ACM Press, 2003. [7] A. J. McAuley and P. Francis. Fast routing table lookup using CAMs. In INFOCOM (3), pages 1382–1391. IEEE, 1993. [8] R. Panigrahi and S. Sharma. Reducing tcam power comsumption and increasing throughput. In HOT Interconnects, page 107. IEEE, August 2002. [9] K. J. Schultz. CAM-Based Circuits for ATM Switching Networks. University of Toronto, PhD Thesis, 1996. [10] G. Stitt, R. Lysecky, and F. Vahid. Dynamic hardware/software partitioning: a fi rst approach. In Proceedings of the 40th conference on Design automation, pages 250– 255. ACM Press, 2003. [11] F. Zane, G. Narlikar, and A. Basu. Coolcams: Powereffi cient tcams for forwarding engines. In INFOCOMM. IEEE, April 2003.
routing table size. Thus m-Trie shows a very predictable execution time even in worst case partitioning. We also evaluated the update performance of different routing algorithms on four randomly selected sub-tables from attcanada and bbnplanet routing tables. The results for a single update in routing table of different sizes have been summarized in Table 6. We found that m-Trie based update requires about 40 microseconds in our case study and outperforms the Espresso-II and ROCM update methods by 1000 to 10000 times and remains fairly independent of the routing table size. The speed advantage however comes with a price of 17MB more data memory for attcanada testbed. This data memory is required to maintain m-Trie by preventing the garbage collection. We also evaluated performance of a memory efficient update scheme on m-Trie which reminimizes the updated subtables. This update scheme can be supported by a memory as little as 16KB to achieve update rates comparable to ROCM on the better end. Table 6. Update Performance Espresso-II ROCM m-Trie 1047 3955 7581 15255
m-Trie
(sec)
(sec)
(microsec)
memory efficient (sec)
0.460 81.760 220.930 745.030
0.030 0.210 0.360 0.770
19 37 36 38
0.020 0.150 0.280 0.590
5. Conclusions And Future Work The m-Trie based logic minimization approach offers memory efficient and fast logic minimization and incremental updates suitable for high frequency logic minimization service on resource constrained devices. m-Trie based logic minimization works well on the benchmark set by ROCM and Espresso-II, however its suitability for other logic minimization problems remains to be investigated.
435