Performance Evaluation and Cache Behavior of LC-Trie for IP ...

1 downloads 0 Views 146KB Size Report
The results show that for a realistic traffic trace, the LC-trie algorithm is capable of performing 20 million packet lookups per second on a Pentium 4, 2.8 GHz ...
Performance Evaluation and Cache Behavior of LC-Trie for IP-Address Lookup Jing Fu

Olof Hagsand

Gunnar Karlsson

School of Electrical Engineering KTH, Royal Institute of Technology SE-100 44, Stockholm, Sweden Email: [email protected]

Network Operation Center, NADA KTH, Royal Institute of Technology SE-100 44, Stockholm, Sweden Email: [email protected]

School of Electrical Engineering KTH, Royal Institute of Technology SE-100 44, Stockholm, Sweden Email: [email protected]

Abstract— Many IP-address lookup software algorithms use a trie-like data structure to perform longest prefix match. LCtrie is an efficient algorithm that uses level compression and path compression on tries. By using realistic and synthetically generated traces, we study the performance of the LC-trie algorithm. Our study includes trie search depth, prefix vector access behavior, cache behavior, and packet lookup service time. The results show that for a realistic traffic trace, the LC-trie algorithm is capable of performing 20 million packet lookups per second on a Pentium 4, 2.8 GHz computer, which corresponds to a 40 Gb/s link for average sized packets. Further, the results show that LC-trie performs up to five times better on the realistic trace compared to a synthetically generated network trace. This illustrates that the choice of traces may have a large influence on the results when evaluating lookup algorithms.

the FUNET trace is representative of Internet backbone traffic. By comparing the results from the synthetic network trace and the FUNET trace, we show that the LC-trie performs much better on real data. The rest of the paper is organized as follows. Section II describes the related work. Section III gives a brief introduction to the LC-trie algorithm. Section IV presents the data, including the routing table and associated traffic traces. Section V describes our performance measurement experiments and introduces our model for estimation of packet lookup service time. Section VI presents and analyzes the results from the model and from the experiments. Section VII concludes the paper.

I. I NTRODUCTION As packet transmission rates are growing, routing of IP packets requires faster IP lookups. For each packet, a router needs to extract the destination IP address, make a lookup in a routing table, and transmit it on an outgoing interface. Each entry in the routing table consists of a prefix length pair. Instead of an exact match, the IP-address lookup algorithm performs a longest-prefix match. LC-trie is an efficient routing lookup algorithm proposed by S. Nilsson and G. Karlsson that uses a modified trie data structure [1]. The algorithm is implemented in Linux Kernel 2.6.13. It uses level compression and path compression to reduce the number of trie nodes and the depth of the trie. So a typical IP-address lookup in LC-trie will require fewer number of node traverses than in an ordinary trie. Fewer node traverses also lead to fewer memory accesses and instruction executions, which increases the performance of the algorithm. In this work, we evaluate various aspects of the LC-trie’s performance. Our study covers the search depth of the LC-trie lookups, memory access and cache behavior for the trie nodes, as well as the base and prefix vector. Thereafter, we derive a model to estimate the packet lookup service time. Using this model, we try to measure the performance achieved by the LC-trie lookup algorithm for different traces. In addition, we validate the model by performing experiments to measure the packet lookup service time. In addition to using synthetically generated random network trace, we also use a real trace from FUNET [2]. We argue that

II. R ELATED W ORK The current approaches for performing IP-address lookup use both hardware and software solutions. The hardware solutions make use of Content Addressable Memory (CAM) and Ternary Content Addressable Memory (TCAM) to perform IP address lookup [3] [4] [5]. To achieve higher performance and reduce the time for memory accesses, many hardware solutions such as [6] and [7] use a pipelined architecture. However, there are many approaches for fast IP-address lookup that are based on software solutions. The majority of these algorithms use a modified version of a trie data structure. Degermark, et al., try to fit the data structure into the cache [8]. Srinivasan and Varghese also present a modified trie data structure for performing IP-address lookup [9]. There are also several efforts for performing accurate performance measurement for IP-address lookup algorithms. Narlikar and Zane present an analytical model to predict the performance of software-based IP-address lookup algorithms [10]. Kawabe et al., also present a method for predicting IP-address lookup algorithm based on statistical analysis of the Internet traffic [11]. Ruiz-Sanchez et al., surveys different route lookup algorithms [12] . In particular, they examined packet lookup service time of different lookup algorithms. However, they did not analyze the packet lookup service time in detail and only used a random network trace for the study.

III. T HE LC- TRIE L OOKUP A LGORITHM

IV. DATA In order to evaluate the LC-trie algorithm, the following input data is required: (1) A routing table consisting of

x 10

9 8 7 Number of entries

A trie is a general data structure for storing strings [13]. Each string is represented by a leaf node in the trie while the value of the string corresponds to the path from the root to the leaf node. Prefixes in a routing table can be considered as binary strings, their values being stored in leaf nodes. In a longest prefix match, a given address is looked up by traversing the trie from the root. Every bit in the address represents a path selection: If the bit is zero, the left child is selected, otherwise, the right one is selected. However, the trie data structure is not very efficient. The number of nodes for a whole routing table may be large and the trie may be deep. For IPv4 addresses, the search depth varies from 8 to 32 depending on the prefix length. The search depth is larger in the case of IPv6 which has longer prefix length. To deal with this problem, the LC-trie uses level compression and path compression to reduce the number of nodes and the average depth in the trie. The path compression removes internal nodes with only one child. The level compression replaces the i complete levels of a binary trie with a single node of degree 2i . With these methods, the LC-trie reduces the number of nodes and the depth of the trie. For data from a Bernoulli-type process with character probabilities that are not all equal, the expected average search depth is Θ(log log n), where n is the number of routing entries [14]. In addition, the average search depth is independent of the prefix length. In order to further optimize the LC-trie performance, the LC-trie may use a weaker criterion for level compression. It uses a fill factor that only requires that a fraction of the prefixes are present to allow level compression. In addition, it allows fixed branching at the root independent of the fill factor. Experiments show that fixed branching to 216 children gives the best performance, because most of the prefix lengths are longer than 16. The routing table consists of four parts. The first part is the LC-trie data structure as previously described. The second part is a base vector containing complete strings. The third part is a next-hop vector storing next-hop addresses. Finally, there is a prefix vector that stores proper prefixes of other strings. When performing a lookup of an IP-address, the main part of the search is spent traversing the trie. In a standard setup with fixed branching to 216 children and a fill factor of 0.5, the search depth varies from one to five for most of the routing tables, which corresponds to an equal number of memory references. The next step is to access the base vector by following the pointer from the leaf node; this accounts for an additional memory reference. If the IP-address does not match the string in the base vector, lookups in the prefix vector are required. In most of the cases, the prefix vector is accessed no more than once per lookup. Finally the lookup algorithm needs to access the next-hop table once to determine the next-hop address.

4

10

6 5 4 3 2 1 0

Fig. 1.

5

10

15 20 Prefix length

25

30

Prefix length distribution for the FUNET routing table. TABLE I T HE FUNET ROUTING TABLE .

Site

Routing entries

LC-trie (B)

Basevector (B)

Prefix vector (B)

FUNET Helsinki

160129

324677 ×4

146676 × 16

13453 × 12

Nexthop vector (B) 10 × 4

prefixes, and (2) a trace of destination IP-addresses. In the following subsections, the routing table and the associated traces used in the experiments are described and characterized. A. Routing Table We used a snapshot of a routing table from the Helsinki core router provided by the Finnish University Network (FUNET). The routing table is also available at CSC’s FUNET looking glass page [3]. The routing table was downloaded in May 2005 and contains 160129 distinct routing entries. Duplicate routing entries that store multi-path entries to the same prefix are removed for simplicity. The prefix lengths of the entries vary from 8 to 32. In Fig. 1, the distribution of the prefix lengths is shown. As can be seen from the figure, the prefix lengths varies from 8 to 32 where the most common prefix length is 24. Further, more than 99.8% of the prefixes have a length between 16 and 24. We built the LC-trie routing table using the FUNET routing table. We used a fixed branching to 216 children at the root and a fill factor of 0.5. The size of the routing table in four different parts is shown in Table I. The total size of the table is 3.45 MB. B. FUNET Trace The first trace we used is a real packet trace containing more than 20 million packets with destination addresses. It is captured on one of the interfaces of FUNET’s Helsinki core router. The router is located close to Helsinki University of Technology and carries the FUNET international traffic. The interface where the trace was recorded is connected to the

C. Random Network Trace

1 Random network 0.9

FUNET trace

0.8

0.7

Cummulativ traffic

FICIX2 exchange point, which is a part of Finnish Communication and Internet Exchange (FICIX). It peers with other operators in Finland. In addition to universities in Finland, some of the student dormitories are also connected through FUNET. The trace itself is confidential due to the sensitive nature of packet. We were unable to find any trace public available on the Internet containing destination IP addresses.

The second trace has been synthetically generated by choosing prefixes randomly from a uniform distribution over the 160129 routing entries in the routing table. The length of the generated trace is equal to the length of the FUNET trace. In such a trace, every entry in the routing table has an equal probability to be chosen. However, as the table may contain prefixes that are more specific than others, this introduces a slight non-uniformity but we believe this to be of minor importance. The trace could be generated by choosing IP addresses randomly over the IP address space as an alternative. Such a trace would give higher probability to shorter prefixes, higher locality in the lookup.

0.6

0.5

0.4

0.3

0.2

0.1

0

1

3

10

30

Fig. 2.

100

300 1000 Number of prefixes

3000

10000

100000

160129

Traffic distribution by prefix.

4

10

x 10

9 8

Fig. 2 shows the fraction of destination IP addresses captured by the most popular prefixes. As can be seen from the figure, the FUNET trace is heavily skewed towards a few popular prefixes: 40% of the destination addresses are captured by the 30 most popular prefixes and 80% of destination addresses are captured by the 1000 most popular prefixes. In addition, all destination addresses are captured by 100000 prefixes even though there are 160129 prefixes in total. For the random network trace, the 1000 most popular prefixes cover only 0.6% of the destination addresses. Thus, the traces are very different. In the FUNET trace, the destination addresses are concentrated to a small number of networks, while the destination addresses in the random network trace are evenly distributed. Fig. 3 shows the prefix length distribution for the FUNET lookups. We have scaled the number of lookups to 160129, which corresponds to the number of distinct routing entries. For the random network trace, the prefix length distribution is similar to that shown in Fig. 1. As can be seen from these figures, the prefix length for most of the FUNET lookups is 16, while the prefix length for most of the random network lookups is 24. The differences on prefix length for the two traces may result in differences in trie search depth and further affects the lookup performance. V. P ERFORMANCE M EASUREMENT E XPERIMENTS We performed a series of experiments to study the behavior of the LC-trie lookup algorithm and its performance. First, we were interested in the average search depth and the search depth distribution in the LC-trie lookups.

7 Number of lookups

D. Trace Characterization

6 5 4 3 2 1 0

0

Fig. 3.

5

10

15 20 Prefix length

25

30

Prefix length distribution for the FUNET lookups.

A. Search Depth The search depth in a trie has an impact on the number of memory references and the number of executed instructions. In turn, it has an affect on the packet lookup service time. To perform the experiments, the LC lookup algorithm code [2] was modified to record each of the trie node references in the lookups. We ran the modified code with the FUNET routing table and the two traces. Using the output from the experiments, we could calculate the average search depths and search depth distributions for these two traces. B. Prefix Vector Reference We were also interested in how often the prefix vector is referenced. Fewer number of prefix vector references reduces the number of memory references and executed instructions, which decreases the packet lookup service time. In the experiments, we modified the LC-trie source code to record each of the prefix vector references in the lookups.

C. Cache Behavior We also looked at the cache behavior when performing LCtrie lookups. This includes the cache behavior for trie nodes, the prefix vector, and the base vector. In current cache and memory technology, a cache reference requires about 5 ns, while a main-memory reference requires about 60 ns. So, depending on whether a data item can be found in the cache or not, the access time for the data item can be very different. This may have a large impact on the packet lookup service time because a significant part of the packet lookup time is spent on data accesses. We have designed a cache model and implemented two cache replacement policies. The cache model supports a directmapped cache with line size of 16 bytes and two cache replacement policies. In addition, the cache model supports multiple cache levels, since most computers today have two cache levels. The supported cache replacement policies are random replacement policy and least tecently used (LRU) policy. The random replacement policy simply removes a randomized cache item when the cache is full. The LRU policy has to record the latest time an item was accessed, and removes the least recently used item when the cache is full. In the experiments, we used the trie node, base vector, and prefix vector references as input data. By varying the cache size and cache replacement policy, we obtained the cache behavior output for these data structures. D. Packet Lookup Service Time We were also interested in the packet lookup service time for different packet traces. As packet lookup service time is dependent on the computer architectures, we derived a general model to calculate the packet lookup service time. In addition, we validated our results by performing experiments on a computer. We compared the results achieved from the model and from the experiments. The LC-trie implementation in C is sequential and the average packet lookup service time T s is the sum of the average instruction execution time T i and the average data access time T d . 1) Average Instruction Execution Time: T i can be calculated by taking the product of the average number of instructions in a lookup N i and the time required to execute a single instruction T n . N i is dependent on the average trie search depth Ds and the average number of prefix vector references D p . The formula for calculating N i is as follows: N i = Ds Nt + D p Np + Nc ,

(1)

where Nt is the number of instructions spent in a trie node and N p is the number of instructions spent in a prefix vector entry. Nc is the count for other instructions executed in the lookup function. The LC-trie lookup source code was compiled using the gcc compiler with the highest optimization level -O4. Afterwards, we used the text output of the compiled assembler instructions to calculate Nt , N p , and Nc .

2) Average Data Access Time: T d can be calculated by taking the sum of the access time spent in the L1 cache, L2 cache, and main memory. The formula is shown as follows: T d = N 1 T1 + N 2 T2 + N m Tm ,

(2)

where N 1 , N 2 , and N m are the average number of accesses in the L1 cache, L2 cache, and main memory while T1 , T2 , and Tm are the access latencies for the L1 cache, L2 cache, and main memory. For all memory references, the L1 cache is accessed first. Therefore, N 1 is also the number of memory references and can be calculated by taking the sum of the trie, prefix vector, and base vector references: N 1 = Ds + D p + 1

(3)

The average number of accesses in the L2 cache N 2 is the same as the number of L1 cache misses. This can also be calculated by taking the sum of L1 cache misses in the trie, prefix vector, and base vector: N 2 = Ds P1t + D p P1p + P1b ,

(4)

where P1t is the L1 cache miss percentage for the trie nodes, P1p is the L1 cache miss percentage for the prefix vector, and P1b is the L1 cache miss percentage for the base vector. In a similar way, we have the formula for N m : N m = Ds P2t + D p P2p + P2b ,

(5)

where P2t is the L2 cache miss percentage for the trie nodes, P2p is the L2 cache miss percentage for the prefix vector, and P2b is the L2 cache miss percentage for the base vector. By knowing the number of cache and memory references and their access latencies, we derive T d according to (2), (3), (4), and (5). By adding T d and T i , we obtain the average packet lookup service time T s . 3) Validation: Finally, we validate our results by performing experiments on a Pentium 4 computer with a clock frequency of 2.8 GHz that runs Windows XP. If we assume that the computer performs one instruction per clock cycle, then the time required to perform one instruction is 0.36 ns. The computer has two cache levels, a L1 data cache with 8 kB and a L2 cache with 512 kB. In addition, it has a main memory of 512 MB. According to the hardware specification, the access latencies for the L1 cache, L2 cache and main memory are 3 ns, 8 ns and 60 ns respectively. When measuring the packet lookup service time, the system clock was read before and after the lookup code. VI. R ESULTS AND A NALYSIS In this section, we present and analyze the results of the experiments.

TABLE II AVERAGE TRIE SEARCH DEPTHS AND SEARCH DEPTH DISTRIBUTIONS . Ds 2.61 1.42

63.9%

base vector prefix vector average trie node level 1 trie node level 2 trie node level 3 trie node level 4 trie node

0.9 0.8 0.7

30.2%

5.5%

0.4%

TABLE III P REFIX VECTOR REFERENCE ,

AVERAGE AND PERCENTAGE OF LOOKUPS

Miss percentage

Random network FUNET trace

Percentage of lookups with search depth 1 2 3 4 4.8% 37.5% 50.7% 5.7%

1

WITH DIFFERENT NUMBER OF REFERENCES .

0 95.5%

1 4.4%

2 0.05%

0.285

71.8%

27.9%

0.3%

A. Search Depth As can be seen from Table II, there is a big difference in search depth between the random network trace and the FUNET trace. The average search depth Ds for the random network trace is nearly twice as high compared to the FUNET trace. The percentage of lookups with specific search depth is also shown in Table II. As can be seen from the table, 63.9% of the FUNET trace lookups have search depths of one, resulting in a very short search depth in this case. However, in the random network trace, the average search depth is 2.61 and most of the lookups have search depths of two or three.

0.5 0.4 0.3 0.2 0.1 0

Fig. 4. policy.

8KB

64KB

128KB

256KB 512KB Cache size

1MB

2MB

Cache behavior for the random network trace, random replacement

1 base vector prefix vector average trie node level 1 trie node level 2 trie node level 3 trie node level 4 trie node

0.9 0.8 0.7 Miss percentage

Random network FUNET trace

Dp 0.045

0.6

0.6 0.5 0.4 0.3

B. Prefix Vector Reference Table III shows the prefix vector accessing behavior. As can be seen from the table, the prefix vector is not accessed often. For the random network and the FUNET trace, the prefix vector is not accessed in 95.5% and 71.8% of the lookups respectively. Further, the prefix vector is rarely accessed twice per lookup.

0.2 0.1 0

Fig. 5.

C. Cache Behavior

64KB

128KB

256KB 512KB Cache size

1MB

2MB

Cache behavior for the FUNET trace, random replacement policy.

1 base vector prefix vector average trie node level 1 trie node level 2 trie node level 3 trie node level 4 trie node

0.9 0.8 0.7 Miss percentage

The cache behavior for the different traces is shown in Fig. 4 to 7. The cache size has been varied from 8 kB to 2 MB. The cache-miss percentages for the trie nodes, prefix vector, and base vector are shown on the y-axis. Also, the cache-miss percentages for different levels of the trie nodes are shown. As can be seen from the figures, the FUNET trace has much lower cache miss percentage compared to the random network trace. If a 512 kB L2 cache is assumed, then the average trie node cache miss percentage for the FUNET trace is 3% for LRU replacement policy and 6% for random replacement policy. In the random network trace, the trie node cache miss percentage is 30% for LRU replacement policy and 50% for random replacement policy. There are even larger differences in cache miss percentages for the base vector and prefix vector: 4% and 1% respectively for the FUNET trace, while in the random network trace, the miss percentage is approximately 90% for both vectors.

8KB

0.6 0.5 0.4 0.3 0.2 0.1 0

Fig. 6. policy.

8KB

64KB

128KB

256KB 512KB Cache size

1MB

2MB

Cache behavior for the random network trace, LRU replacement

1

250

0.9 0.8 0.7 0.6

200

Time in ns

Miss percentage

FUNET trace Random network

base vector prefix vector average trie node level 1 trie node level 2 trie node level 3 trie node level 4 trie node

0.5 0.4

150

100

0.3 0.2

50

0.1 0

Fig. 7.

8KB

64KB

128KB

256KB 512KB Cache size

1MB

0 64 kB

2MB

Cache behavior for the FUNET trace, LRU replacement policy.

Fig. 8.

TABLE IV AVERAGE INSTRUCTION EXECUTION TIME .

256 kB 512 kB L2 cache size

1 MB

2 MB

Average data access time, T d .

300

Ds 2.61

Dp 0.045

Ni 114

T i (ns) 41

1.42

0.29

85

31

It can also be seen that the LRU replacement policy has a slightly lower cache miss percentage compared to the random replacement policy, especially for the FUNET trace. This behavior is mostly likely caused by the traffic locality in the FUNET trace. If we look at trie nodes, we observe that low-level nodes have smaller cache miss percentages, most likely because they are accessed more often in the lookups. This behavior will further reduce the lookup time for lookups with short search depths. Finally, as we would expect, larger cache size leads to lower cache miss percentage. D. Packet Lookup Service Time In this section, we first present the average instruction execution time T i and the average data access time T d . Afterwards, we analyze the average packet lookup service time T s achieved from the model and from the experiments. 1) Average Instruction Execution Time: From the assembly output, we calculated Nt to be 25, N p to be 25, and Nc to be 49. If we set the execution time for a single instruction Tp to be 0.36 ns on a Pentium 4, 2.8 GHz computer, then the average instruction execution time T i is shown in Table IV. As can be seen from the table, the random network trace requires 41 ns on average per packet lookup, compared to 31 ns for the FUNET trace. 2) Average Data Access Time: Equation (2) shows that the average data access time, T d , depends on the number of cache and memory references and access latencies. The number of cache and memory references depends on the cache size and

FUNET trace Random network 250

200 Time in ns

Random network FUNET trace

128 kB

150

100

50

0 64 kB

128 kB

Fig. 9.

256 kB 512 kB L2 cache size

1 MB

2 MB

Average packet lookup service time, T s .

replacement policy. In the experiments, all configurations have a fixed L1 cache of 8 kB and use the random replacement policy. The size of the L2 cache is varied. The access latencies T1 , T2 , and Tm is set to 3 ns, 8 ns, and 60 ns respectively according to the hardware specification of the Pentium 4 computer. Fig. 8 shows the average data access time T d for the two traces. As can be seen from the figure, T d decreases as the cache size increases. Also, T d is much higher for the random network trace compared to the FUNET trace, which is caused by the lower cache miss percentage and the shorter search depth for the FUNET trace. 3) Validation: The average packet lookup service time T s according to the model is shown in Fig. 9. The figure is quite similar to Fig. 8, which shows the average data access time T d : We just added 31 ns for the FUNET trace and 41 ns for the random network trace. So, T s is still higher for the random network trace compared to the FUNET trace. As we

TABLE V AVERAGE PACKET LOOKUP SERVICE TIME , L2

CACHE SIZE IS

T s model (ns) Random network FUNET trace

210 51

Ts (ns) 286 49

512 K B.

experiment

only performed experiments on a Pentium 4 computer with 512 kB L2 cache, the comparison of the results shown in Table V only contains setup with a 521 kB L2 cache. Also, T s obtained from the experiments is the shortest running time among 20 experiments. The longest running time is about 19 ns and 5 ns extra respectively for the random network trace and the FUNET trace. As can be seen from the table, T s obtained from the model and experiments are quite close. For the FUNET trace, the difference is only 2 ns. For the random network trace, the difference is about 70 ns. The differences are acceptable since we did not model a Pentium 4 computer in details. Many factors, such as predictive branching and instruction cache miss may cause this discrepancy. Another observation is that T s for the random network trace is about five times higher compared to the FUNET trace. In particular, out of the total 210 ns that are required for the random network trace, about 41 ns is used for the instruction execution and 169 ns for the data access. Hence, the best way to improve the performance is to reduce the data access time by using a larger cache. For the FUNET trace, out of the 51 ns, 31 ns are used for the instruction execution and only 20 ns for the data access. Therefore, the impact of a larger cache size on packet lookup service time is lower. We have compared our packet lookup service times with the result reported by Ruiz-Sanchez et al., [12]. Their average lookup time for the random network trace is 1200 ns, while ours is 286 ns. However, they used a Pentium computer with 200 MHz with a 512 kB L2 cache, while we used Pentium 4 computer with 2.8 GHz. Even though the cache sizes are similar, our clock frequency is 14 times faster. This explains the large performance differences. In addition, their results show that LC-trie has a rather short packet lookup service time compare to other lookup algorithms. VII. C ONCLUSION In this work, we have performed a thorough investigation of LC-trie performance. The performance measurements include trie search depth, prefix vector access behavior, cache behavior, and packet lookup service time. The results show that the LC-trie requires only about 50 ns to perform a lookup for the FUNET traffic on a Pentium 4, 2.8 GHz computer. This is approximately 20 million lookups per second and corresponds to an average throughput of 40 Gb/s for average sized packets. We used the FUNET trace, which is a realistic and representative trace for aggregated Internet traffic. It contains both university and student dormitory traffic: This is a typical example of aggregated traffic where we have both office and home traffic. The high throughput of 40 Gb/s on the FUNET

trace indicates that a standard PC is capable of performing route lookup for commercial high-speed links. Of course, other limitations, such as the capability of the PC’s forwarding functionality, may set the limitations. The possibility to predict the performance based on our model might be useful now when the LC-trie is implemented in Linux Kernel 2.6.13. We observed large performance differences between the random network trace and the FUNET trace. We conclude that, it is especially important to use a proper trace and understand the characteristics of the trace when performing evaluation of route lookup algorithms. ACKNOWLEDGMENT The authors would like to thank Markus Peuhkuri at Networking Laboratory in the Helsinki University of Technology for providing the FUNET routing table and packet traces. The implementation of LC-trie for Linux is made by Robert Olsson and Jens L˚aa˚ s from SLU, Swedish University of Agricultural Sciences, Sweden and Hans Liss, Uppsala University, Sweden. R EFERENCES [1] S. Nilsson and G. Karlsson, ”IP-address lookup using LC-tries,” IEEE Journal on Selected Areas in Communications, vol. 17, no. 6, pp. 10831092, June 1999. [2] CSC, Finish IT Center for Science, FUNET Looking Glass, Available: http.//www.csc.fi/sumoi/funet/noc/looking-glass/lg.cgi [3] A. McAuley and P. Francis, ”Fast Routing table Lookups using CAMs,” in Proc. of IEEE Infocom’93, vol. 3, pp. 1382-1391, San Francisco, 1993 [4] F. Zane, N. Girija and A. Basu, ”CoolCAMs: Power-Efficient TCAMs for Forwarding Engines,” in Proc. of IEEE Infocom’03, pp. 42-52, San Francisco, May 2003. [5] E. Spitznagel, D. Taylor and J. Turner, ”Packet Classification Using Extended TCAMs”, in Proc. of ICNP’03, pp. 120-131, Nov. 2003. [6] P. Gupta, S. Lin, and N. McKeown, ”Routing Lookups in Hardware at Memory Access Speeds,” in Proc. of IEEE Infocom’98, pp. 1240-1247, San Francisco, Apr. 1998. [7] A. Moestedt and P. Sj¨odin, ”IP Address Lookup in Hardware for HighSpeed Routing ,” in Proceedings of Hot Interconnects VI, Stanford, 1998 [8] M. Degermark et al., ”Small Forwarding Tables for Fast Routing Lookups”. in Proc. ACM SIGCOMM Conference’97, pages 3-14, Oct. 1997 [9] V. Srinivasan and G. Varghese, ”Faster IP lookups using controlled prefix expansion,” in Proc. of ACM SIGMETRICS 1998, Madison, pp.1-10, 1998. [10] G. Narlikar and F. Zane, ”Performance modeling for fast IP lookups”, in Proc. of ACM SIGMETRICS 2001, pp.1-12, 2001. [11] R. Kawabe, S. Ata and M.Murata, ”Performance Prediction Method for IP lookup algorithms,” in Proc. of IEEE Workshop on High Performance Switching and Routing, 2002 pp.111-115, Kobe, Japan, May 2002. [12] M. A. Ruiz-Sanchez, E. W. Biersack, and W. Dabbous, ”Survey and taxonomy of ip address lookup algorithms,” IEEE Network Magazine, vol. 15, no. 2, pp. 8-23, Mar. 2001 [13] E. Fredkin. ”Trie memory,” Communications of the ACM, pp. 490-499. 1960 [14] A. Andersson and S. Nilsson. ”Faster searching in tries and quadtreesan analysis of level compression,” in Proc. of Second Annual European Symposium on algorithms, pp. 82-93, 1994.

Suggest Documents