Energy Efficient Exact Matching for Flow Identification ... - IEEE Xplore

3 downloads 0 Views 225KB Size Report
P. Reviriego, S. Pontarelli, and J. A. Maestro. Abstract—Energy efficiency has become an important de- sign goal for networking equipment. Traditionally routers ...
IEEE COMMUNICATIONS LETTERS, VOL. 18, NO. 5, MAY 2014

885

Energy Efficient Exact Matching for Flow Identification with Cuckoo Affinity Hashing P. Reviriego, S. Pontarelli, and J. A. Maestro Abstract—Energy efficiency has become an important design goal for networking equipment. Traditionally routers and switches have been designed to minimize peak power consumption but they operate most of the time with settings and traffic that is far from that peak. Therefore, many elements and functions of networking equipment are being redesigned to improve energy efficiency. A common functionality in networking is flow identification that is needed in many applications. Flow identification can be implemented with Content Addressable Memories (CAMs) or alternatively with several data structures. Among those, one efficient option is Cuckoo hashing that enables fast searches and high memory utilization at the cost of complicating the insertion procedure. In this letter, first the energy efficiency of exact matching using Cuckoo hashing is analyzed and then a technique is presented to improve the energy efficiency of Cuckoo hashing. The proposed scheme is evaluated using a traffic monitoring application and compared with the traditional Cuckoo hashing. The results show that significant energy savings can be obtained by using the proposed technique. Index Terms—Cuckoo hashing, flow identification, traffic monitoring.

I. I NTRODUCTION

I

N the last decade, energy efficiency has become an important design goal for networking equipment [1],[2]. The energy consumption of routers and other communication equipment has been traditionally almost independent of the traffic load and configuration [3]. This results in a large inefficiency as 1) traffic load typically presents large variations over time and 2) the router configuration is also far from the worst case (for example the number of entries in the routing table is lower than the maximum value). Therefore many efforts have focused on designing network equipment that adapts their energy consumption to the traffic load and configuration [4]. One of the functions that are widely used in networking is flow identification that is employed for different purposes like routing, security, quality of service or traffic monitoring [5],[6],[7]. Flow identification is a particular case of exact matching. In exact matching, a group of fields or bits of the packets are compared to a set of stored values to find a match. Exact matching can be directly implemented using a Content Addressable Memory (CAM) [8]. Another option is to Manuscript received March 10, 2014. The associate editor coordinating the review of this letter and approving it for publication was A. Vinel. This work was supported by the Spanish Ministry of Science and Education under Grant AYA2009-13300-C03. This letter is part of a collaboration in the framework of COST ICT Action 1103 ‘Manufacturable and Dependable Multicore Architectures at Nanoscale’. P. Reviriego and J. A. Maestro are with the Universidad Antonio de Nebrija, C/ Pirineos, 55 E-28040, Madrid, Spain (e-mail: {previrie, jmaestro}@nebrija.es). S. Pontarelli is with Consorzio Nazionale Interuniversitario per le Telecomunicazioni, CNIT, Via del Politecnico 1 - 00133 Rome, Italy (e-mail: [email protected]). Digital Object Identifier 10.1109/LCOMM.2014.040214.140506

use data structures that enable efficient search operations like for example different types of multiple hash tables [9]. Among those structures, Cuckoo hashing [10] provides efficient search operations and high memory utilization at the cost of making the insertion procedure more complex. This makes it attractive for high performance packet processing platforms [11],[12]. The energy consumption required for exact matching depends mainly on two factors: the number of packets that have to be matched and the size of the set against which the match is performed. The first factor is directly related to the traffic load and more precisely to the number of packets per second that are sent or received. The second factor depends on how exact matching is implemented. In the case of data structures, most of the energy consumption is caused by the memory accesses and those typically depend on the size of the set stored. This second factor can be estimated with the average number of memory accesses to perform flow identification. The total energy consumption will be given by the number of packets times the number of memory accesses to perform flow identification. Therefore reductions on the number of accesses required for flow identification map directly to the same relative reduction in the total number of accesses. In this letter two contributions are presented. The first one is a study of the average number of memory accesses of exact matching using Cuckoo hashing in a traffic monitoring application. This analysis shows the energy efficiency versus the set size for traditional Cuckoo hashing. The second contribution is a new scheme Cuckoo affinity hashing that modifies Cuckoo hashing to improve its energy efficiency. The proposed scheme is compared to traditional Cuckoo hashing and the results show that significant energy savings are achieved for a wide range of set sizes. The rest of the letter is organized as follows, in section II the basic preliminaries on Cuckoo hashing are given. Then in section III the analysis of the energy efficiency of Cuckoo hashing in a traffic monitoring application is presented. The proposed technique, Cuckoo affinity hashing is described and evaluated in section IV. Finally the conclusions and ideas for future work are summarized in sections V and VI. II. P RELIMINARIES The data structure used in Cuckoo hashing is a set of d hash tables such that an element x can be placed in tables 1, 2, . . . d in positions h1 (x), h2 (x), . . . , hd (x) given by a set of d hash functions. The following operations can be performed on the structure: • Match: the table given by i = 1 is selected and position hi (x) is accessed and compared with x. If there is no match, the second table is selected and the process is repeated. If no match is found in that table, the third

c 2014 IEEE 1089-7798/14$31.00 

886

IEEE COMMUNICATIONS LETTERS, VOL. 18, NO. 5, MAY 2014

table is selected and the search continues until a match is found or all d tables have been searched. • Insertion: the table given by i = 1 is selected and position hi (x) is accessed. If it is empty, the new element is inserted there. If not, the second table is selected and the process is repeated. If the position in that table is not empty, the third table is selected and the search continues until a table with an empty position is found or all d tables have been searched. At that point a random table j is selected and the new element x is stored in position hj (x). Then, the insertion process is executed for the entry y that was displaced when inserting x but not considering table j as an option for insertion. This procedure is recursive and tries to move elements to accommodate the new element if needed. • Removal: the same operation as a match but when the element is found it is removed. Several architectural options can be used to implement Cuckoo hashing. The most direct implementation is to store the d hash tables in a memory and perform the access operations sequentially. Another option is to have d memories such that each table is stored in a different memory. In that case, to minimize power consumption, it is common to use a pipeline to access the memories. The rest of the letter assumes that a sequential architecture is used as it is the simplest implementation that serves to illustrate the energy savings. The results obtained are also applicable to the parallel pipeline implementation. III. E NERGY E FFICIENCY OF C UCKOO H ASHING As mentioned in the introduction, traditionally the design of networking equipment has focused on the worst case. For Cuckoo hashing this corresponds to having to access all the tables and therefore requires d memory accesses. However, to evaluate the energy efficiency, the average number of memory accesses is needed. This number varies with the occupancy of the tables and to the best of our knowledge it has not been studied in the literature. In this section the average number of accesses needed for Cuckoo hashing in a traffic monitoring application is studied. A. Traffic monitoring application Monitoring plays a fundamental role in communication networks where it is useful to detect problems, for network planning and for security among other applications [13]. For IP networks, a typical monitoring application identifies traffic flows (usually defined by 5-tuple composed of source and destination addresses and ports and the protocol field) and counts number of packets and bytes for each flow [14][15]. The network device (router, switch etc.) that acts as a monitor inserts a new flow record into the hash table if a packet does not belong to an existing flow. The flow record is removed when some specific conditions occur (e.g. a packet belonging to a flow contains some flags signaling the end of the transmission or if no packet belonging to the specific flow arrives for a certain time interval). Incoming packets are matched against the set of stored flows and an exact match is found in most cases with the exception of the first packet

Fig. 1.

Number of active flows during a week for a 10 Gb/sec link [17].

of each flow. Since usually flows have many packets, the successful match operations represent a fraction that is close to 100%. Therefore, in the following to simplify the analysis, it is assumed that all match operations are successful. To implement the traffic monitoring described, flow identification, that is, exact matching of incoming packets to the set of active flows is needed. This can be implemented using Cuckoo hashing. The size of the tables has to be sufficient to store the maximum number of active flows. Therefore, they are dimensioned for a peak traffic condition. However, most of the time the traffic load is much lower and fewer flows are stored [16]. As an example, figure 1 shows the number of active flows over a one week period for a 10 Gb/sec Internet link [17]. It can be observed that the number varies widely over time with a maximum value of close to 150k flows. However, most of the time the link has less than 30k active flows. Most network links experience these variations. In addition to the variations, it can be observed that the link stays most of the time at medium or low values. This clearly suggests that to optimize energy efficiency of flow identification, performance at medium table occupancies is important. B. Energy efficiency analysis The energy consumption of flow identification is related as mentioned before to the number of packets that have to be matched and to the number of stored flows. The first factor determines the number of matches and the second the complexity of the matching. The number of packets and the number of active flows are given by traffic conditions and cannot be influenced. On the other hand, the variation of the energy consumption with the size set for matching depends on the data structure used and how it is implemented. This is therefore the factor that can be influenced by router or switch design and implementation. The average consumption per match is directly related to the average number of memory accesses needed to find a match. Assuming a sequential implementation, once a match is found at access j the remaining d − j accesses are no longer needed. The probability of finding a match on access j depends on the number of elements stored. The average number of accesses has been evaluated by simulation using a C implementation

REVIRIEGO et al.: ENERGY EFFICIENT EXACT MATCHING FOR FLOW IDENTIFICATION WITH CUCKOO AFFINITY HASHING

of Cuckoo hashing. For each occupancy factor, the simulation has been done first inserting a number of items corresponding to the occupancy factor under analysis, second performing 1000 match operations on the set of items stored in the hash table. For each load factor, 10000 simulation runs have been performed and the average is reported. The simulations have been done for a traditional Cuckoo hash with d = 2, 4, 8, hash tables of size m = 32k and varying the occupancy of the tables. The maximum table occupancy that can be achieved in Cuckoo hashing depends on the number of tables (d). This was observed in the simulations as the maximum occupancy for d = 2 only reaches 50.8%, for d =4 increases to 97.6% and finally for d = 8 is 99.9%. The results in terms of the average number of memory accesses are presented in figures 2 to 4 (in figure 2 the plot stops at 50% which is close the maximum occupancy that can be achieved). It can be observed that as occupancy (or equivalently number of active flows) grows, so does the average number of accesses. Therefore exact matching with Cuckoo hashing adjusts its energy consumption to the traffic conditions. The results also show that the average number of accesses increases with d. This shows a trade-off between memory occupancy and number of accesses. Typically d = 4 is used to achieve good memory occupancy with a moderate number of accesses [11]. In the analysis so far, it has been assumed that the energy consumption is mostly due to the memory accesses. Energy is also consumed in the computation of the hash functions. However this factor does not influence the results because the number of hash computations is proportional to the number of memory accesses so they do not affect the relative savings. Additionally since simple, non-cryptographic hash functions are commonly used in these applications [18], their power consumption is in most cases negligible compared to that of memory accesses. As an example, the H3 hash function [18] with 15 bits has been implemented in Verilog and synthesized using Synopsis Design Compiler for a 45nm library [19]. The power consumption estimate provided by the synthesis tool is 0.0014 mWatts. This is negligible compared to the consumption of the memory that is detailed in the following. For the memory there are two main contributors to energy consumption. The first one is a static energy consumption that is independent of the number of memory accesses. The second is the dynamic consumption that is related to the number of accesses. To evaluate the energy consumption of the memory for the proposed case study, the CACTI tool has been used [20]. A low power SRAM memory of 20Mbits with 104 bit width capable of storing approximately 200k flows is selected. The consumption estimates for a 45 nm technology are 21 mWatts for static power consumption and 240 mWatts for dynamic power consumption in the worst utilization case. The estimates clearly show that memory consumption is much larger than that of the hash functions and that dynamic consumption caused by memory accesses will dominate even at low/medium loads. IV. C UCKOO A FFINITY H ASHING A. Description The proposed Cuckoo affinity hashing scheme modifies traditional implementations by using an additional hash function

887

ha (x). This function is used to set an affinity of each element to one of the d hash tables. Then insertion and matching operations start with that table instead of starting from the first table. The corresponding procedures are modified as follows: • Match: the table given by i = ha (x) is selected and position hi (x) is accessed and compared with x. If there is no match, the next table 1 + mod(i, d) is selected an the process is repeated. If no match is found in that table, the next table is selected and the search continues until a match is found or all d tables have been searched. • Insertion: the table given by i = ha (x) is selected and position hi (x) is accessed. If the position is empty the new entry x is stored there. If not, the next table 1 + mod(i, d) is selected and the process is repeated. If the position in that table is not empty, the next table is selected and the search continues until a table with an empty position is found or all d tables have been searched. At that point a random table j is selected and the new element x is stored in position hj (x). Then, the insertion process is executed for the entry y that was displaced when inserting x but not considering table j as an option for insertion. This procedure is recursive and tries to move elements to accommodate the new element if needed. • Removal: the same operation as a match but when the element is found it is removed. For low and medium memory occupancies, affinity increases the probability that an element is found in the first memory accesses. As an example, consider an implementation with four hash tables and memory occupancy of 25%. In a traditional implementation, many elements will not be placed in the first table (a 25% global occupancy is equivalent to 100% occupancy in a single table). Therefore in many cases two or more memory accesses will be needed. However with affinity the load of each table will be approximately 25% and therefore in most cases the elements will be placed in the table given by ha (x). This means that they will be found on the first memory access. This intuition on the benefits of the proposed scheme is confirmed in the following by the simulation results. B. Energy efficiency analysis The proposed scheme has been simulated with the parameters used in the previous section. The average number of memory accesses is shown in figures 2 to 4 and compared to that of traditional Cuckoo hashing. It is clear that the addition of affinity reduces the average number of memory accesses and thus improves energy efficiency. The benefits are larger when d = 4, 8 with reductions of more than 20% and 30% respectively over a wide range of occupancies. For d = 2, the benefits are smaller with values around 5%. As discussed before, d = 4 is commonly used as d = 2 severely reduces memory occupancy and d = 8 increases the average number of accesses. The maximum memory utilization has also been evaluated by simulation and the results are 50.8%, 97.5% and 99.9% for d =2,4,8 respectively. These values are similar to those obtained in section III. Finally, the power consumption versus the memory bandwidth use is shown for d = 4 in figure 5 for the memory

888

IEEE COMMUNICATIONS LETTERS, VOL. 18, NO. 5, MAY 2014 4.5

4.5 Traditional Affinity

4 3.5

3.5

3

3

2.5

2.5

2

2

1.5

1.5

1

1

0.5

0.5

0

Traditional Affinity

4

0

0.2

0.4 0.6 memory occupancy

0.8

0

1

Fig. 2. Average number of memory accesses for each match operation (d=2).

0

0.2

0.4 0.6 memory occupancy

0.8

1

Fig. 4. Average number of memory accesses for each match operation (d=8).

4.5 250

Traditional Affinity

4

Traditional Affinity Power consumption (mWatts)

3.5 3 2.5 2 1.5 1

200

150

100

50

0.5 0

0

0.2

0.4 0.6 memory occupancy

0.8

0

1

0

20

40 60 80 Percentage of memory bandwidth used

100

Fig. 3. Average number of memory accesses for each match operation (d=4).

Fig. 5.

configuration described in section III.B and a memory occupancy of 20%. It can be observed that the savings increase when the memory is more frequently used.

[5] P. Gupta and N. McKeown, “Algorithms for packet classification,” IEEE Network, vol. 15 no. 2, pp. 24–32, Mar./Apr. 2001. [6] A. Kirsch, M. Mitzenmacher, and G. Varghese, “Hash-based techniques for high-speed packet processing,” Algorithms for Next Generation Networks, pp. 181–218. Springer, 2010. [7] M. Waldvogel, G. Varghese, J. Turner, and B. Plattner, “Scalable high speed IP routing lookups,” ACM SIGCOMM Computer Commun. Rev., vol. 27, no. 4, pp. 25–36, Oct. 1997. [8] K. Pagiamtzis and A. Sheikholeslami, “Content-addressable memory (CAM) circuits and architectures: a tutorial and survey,” IEEE J. SolidState Circuits, vol. 41, no. 3, pp. 712–727, Mar. 2006. [9] A. Broder and M. Mitzenmacher, “Using multiple hash functions to improve IP lookups,” in Proc. 2001 IEEE INFOCOM. [10] R. Pagh and F. F. Rodler, “Cuckoo hashing,” in J. of Algorithms, pp. 122–144, May 2004. [11] P. Bosshart, et al., “Forwarding metamorphosis: fast programmable match-action processing in hardware for SDN,” in Proc. 2013 SIGCOMM. [12] O. E. Ferkouss, et al., “A 100gig network processor platform for openflow,” in Proc. 2011 IEEE International Conference on Network and Service Management. [13] M. Crovella and B. Krishnamurthy, Internet Measurement: Infrastructure, Traffic and Applications. John Wiley and Sons, 2006. [14] C. Estan et al., “Building a better NetFlow,” in Proc. 2004 SIGCOMM. [15] N. McKeown et al., “OpenFlow: enabling innovation in campus networks,” ACM SIGCOMM Computer Commun. Rev., vol. 38, no. 2, pp. 69–74, Apr. 2008. [16] P. Borgnat, et al., “Seven years and one day: sketching the evolution of Internet traffic,” 2009 IEEE INFOCOM. [17] CAIDA realtime passive network monitors. Available: http://www.caida.org/data/realtime/passive. [18] M. V. Ramakrishna, E. Fu, and E. Bahcekapili, “Efficient hardware hashing functions for high performance computers,” IEEE Trans. Computers, vol. 46, no. 12, pp. 1378–1381, Dec. 1997. [19] E. Stine et al., “FreePDK: an open-source variation-aware design kit,” in Proc. 2007 IEEE Int. Conf. Microelectronic Systems Education, pp. 173–174. [20] T. Shyamkumar, N. Muralimanohar, and N. P. Jouppi, “CACTI 5.0,” HP Laboratories, Technical Report, 2007.

V. C ONCLUSIONS In this letter, first the energy efficiency of flow identification in a traffic monitoring application using Cuckoo hashing has been studied. The conclusion is that the energy consumption adapts well to the number of active flows. Then Cuckoo affinity hashing, an enhancement to improve energy efficiency has been proposed and evaluated. The results show that it can significantly reduce energy consumption at medium and low memory occupancies. VI. F UTURE WORK Future work will consider the theoretical analysis of the average number of accesses for a match operation as a function of the table occupancy. More generally, Cuckoo affinity hashing shows that existing data structures can be enhanced to improve energy efficiency. In fact, the same idea of introducing affinity can be used in other schemes like d-left hashing. R EFERENCES [1] K. J. Christensen, C. Gunaratne, B. Nordman, and A. D. George, “The next frontier for communications networks: power management,” Computer Commun., vol. 27, no. 18, Dec. 2004. [2] M. Gupta and S. Singh “Greening of the Internet,” in Proc. 2003 SIGCOMM. [3] D. Kharitonov, “Time-domain approach to energy efficiency: highperformance network element design,” in Proc. 2009 IEEE GLOBECOM (Workshops). [4] L. Niccolini, G. Iannaccone, S. Ratnasamy, J. Chandrashekar, and L. Rizzo, “Building a power-proportional software router,” in Proc. 2012 USENIX Annual Technical Conference.

Power consumption versus memory bandwidth use for d = 4.

Suggest Documents