Boundary Cutting for Packet Classification - Logic Systems

0 downloads 0 Views 3MB Size Report
Education, Science and Technology and by the ITRC support program (NIPA- ... Digital Object Identifier 10.1109/TNET.2013.2254124 .... packet classification problem and the list of matching rules ...... [7] H. Song and J. W. Lockwood, “Efficient packet classification for net- .... She works for the Technology IT Team, Hyundai.
IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 22, NO. 2, APRIL 2014

443

Boundary Cutting for Packet Classification Hyesook Lim, Senior Member, IEEE, Nara Lee, Geumdan Jin, Jungwon Lee, Youngju Choi, Student Member, IEEE, and Changhoon Yim, Member, IEEE

Abstract—Decision-tree-based packet classification algorithms such as HiCuts, HyperCuts, and EffiCuts show excellent search performance by exploiting the geometrical representation of rules in a classifier and searching for a geometric subspace to which each input packet belongs. However, decision tree algorithms involve complicated heuristics for determining the field and number of cuts. Moreover, fixed interval-based cutting not relating to the actual space that each rule covers is ineffective and results in a huge storage requirement. A new efficient packet classification algorithm using boundary cutting is proposed in this paper. The proposed algorithm finds out the space that each rule covers and performs the cutting according to the space boundary. Hence, the cutting in the proposed algorithm is deterministic rather than involving the complicated heuristics, and it is more effective in providing improved search performance and more efficient in memory requirement. For rule sets with 1000–100 000 rules, simulation results show that the proposed boundary cutting algorithm provides a packet classification through 10–23 on-chip memory accesses and 1–4 off-chip memory accesses in average. Index Terms—Binary search, boundary cutting, decision tree algorithms, HiCuts, packet classification.

I. INTRODUCTION

P

ACKET classification is an essential function in Internet routers that provides value-added services such as network security and quality of service (QoS) [1]. A packet classifier should compare multiple header fields of the received packet against a set of predefined rules and return the identity of the highest-priority rule that matches the packet header. The use of application-specific integrated circuits (ASICs) with off-chip ternary content addressable memories (TCAMs) has been the best solution for wire-speed packet forwarding [2], [3]. However, TCAMs have some limitations. TCAMs consume 150 more power per bit than static Manuscript received July 10, 2010; revised May 31, 2011; October 21, 2011; March 17, 2012; and January 03, 2013; accepted February 26, 2013; approved by IEEE/ACM TRANSACTIONS ON NETWORKING Editor M. Kodialam. Date of publication April 11, 2013; date of current version April 14, 2014. This work was supported by the Mid-Career Researcher Program (2012-005945) under the National Research Foundation of Korea (NRF) grants funded by the Ministry of Education, Science and Technology and by the ITRC support program (NIPA2013-H0301-13-1002) under the NIPA funded by the Ministry of Knowledge Economy (MKE), Korea. The work of C. Yim was supported by the Basic Science Research Program (2011-0009426) under the NRF funded by the Ministry of Education, Science and Technology. H. Lim, N. Lee, G. Jin, J. Lee, and Y. Choi are with Ewha Womans University, Seoul 120-750, Korea (e-mail: [email protected]). N. Lee, G. Jin, and Y. Choi were with Ewha Womans University, Seoul 120750, Korea. C. Yim is with Konkuk University, Seoul 143-701, Korea. Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TNET.2013.2254124

random access memories (SRAMs). TCAMs consume around 30%–40% of total line card power [4], [5]. When line cards are stacked together, TCAMs impose a high cost on cooling systems. TCAMs also cost about 30 more per bit of storage than double-data-rate SRAMs. Moreover, for an -bit port range field, it may require TCAM entries, making the exploration of algorithmic alternatives necessary. Many algorithms and architectures have been proposed over the years in an effort to identify an effective packet classification solution [6]–[35]. Use of a high bandwidth and a small on-chip memory while the rule database for packet classification resides in the slower and higher capacity off-chip memory by proper partitioning is desirable [36], [37]. Performance metrics for packet classification algorithms primarily include the processing speed, as packet classification should be carried out in wire-speed for every incoming packet. Processing speed is evaluated using the number of off-chip memory accesses required to determine the class of a packet because it is the slowest operation in packet classification. The amount of memory required to store the packet classification table should be also considered. Most traditional applications require the highest priority matching. However, the multimatch classification concept is becoming an important research item because of the increasing need for network security, such as network intrusion detection systems (NIDS) and worm detection, or in new application programs such as load balancing and packet-level accounting [7]. In NIDS, a packet may match multiple rule headers, in which case the related rule options for all of the matching rule headers need to be identified to enable later verification. In accounting, multiple counters may need to be updated for a given packet, making multimatch classification necessary for the identification of the relevant counters for each packet [4]. Our study analyzed various decision-tree-based packet classification algorithms. If a decision tree is properly partitioned so that the internal tree nodes are stored in an on-chip memory and a large rule database is stored in an off-chip memory, the decision tree algorithm can provide very high-speed search performance. Moreover, decision tree algorithms naturally enable both the highest-priority match and the multimatch packet classification. Earlier decision tree algorithms such as HiCuts [8] and HyperCuts [9] select the field and number of cuts based on a locally optimized decision, which compromises the search speed and the memory requirement. This process requires a fair amount of preprocessing, which involves complicated heuristics related to each given rule set. The computation required for the preprocessing consumes much memory and construction time, making it difficult for those algorithms to be extended to large rule sets because of memory problems in building the decision trees. Moreover, the cutting is based on a fixed interval, which

1063-6692 © 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

444

IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 22, NO. 2, APRIL 2014

does not consider the actual space that each rule covers; hence it is ineffective. In this paper, we propose a new efficient packet classification algorithm based on boundary cutting. Cutting in the proposed algorithm is based on the disjoint space covered by each rule. Hence, the packet classification table using the proposed algorithm is deterministically built and does not require the complicated heuristics used by earlier decision tree algorithms. The proposed algorithm has two main advantages. First, the boundary cutting of the proposed algorithm is more effective than that of earlier algorithms since it is based on rule boundaries rather than fixed intervals. Hence, the amount of required memory is significantly reduced. Second, although BC loses the indexing ability at internal nodes, the binary search at internal nodes provides good search performance. The organization of the paper is as follows. Section II provides an overview of the earlier decision tree algorithms. Section III describes the proposed BC algorithm. Section IV describes the refined structure of the BC algorithm, termed selective BC. Section V shows the data structure of each decision tree algorithm. Section VI shows the performance evaluation results using three different types of rule sets with 1000–100 000 rules each acquired from a publicly available database. Section VII offers a brief conclusion. II. RELATED WORKS Packet classification can be formally defined as follows [10]: Packet matches rule , for , if all the header fields , for , of the packet match the corresponding fields in , where is the number of rules and is the number of fields. If a packet matches multiple rules, the rule with the highest priority is returned for a single-best-match packet classification problem and the list of matching rules is returned for the multimatch packet classification problem. Rule sets are generally composed of five fields. The first two fields are related to source and destination prefixes and require prefix match operation. The next two fields are related to source and destination port ranges (or numbers), which require range match. The last field is related to protocol type and requires an exact match. A. HiCuts Each rule defines a five-dimensional hypercube in a five-dimensional space, and each packet header defines a point in the space. The HiCuts algorithm [8] recursively cuts the space into subspaces using one dimension per step. Each subspace ends up with fewer overlapped rule hypercubes that allow for a linear search. In the construction of a decision tree of the HiCuts algorithm, a large number of cuts consumes more storage, and a small number of cuts causes slower search performance. It is challenging to balance the storage requirement and the search speed. The HiCuts algorithm uses two parameters, a space factor (spfac) and a threshold (binth), in tuning the heuristics, which trade off the depth of the decision tree against the memory amount. The field in which a cut may be executed is chosen to minimize the maximum number of rules included in any subspace. The spfac, a function of space measure, is used

to determine the number of cuts for the chosen field. The binth is a predetermined number of rules included in the leaf nodes of the decision tree for a linear search. For simplicity, considering a two-dimensional (2-D) plane composed of the first two prefix fields, a rule can be represented by an area. Fig. 1 shows the areas that each rule covers in a prefix plane for a given 2-D example rule set. As shown, a rule with ) lengths in and fields covers the area of , where is the maximum length of the field ( is 32 in IPv4). An example decision tree of the HiCuts algorithm for the given rule set is illustrated in Fig. 2. In this example, the spfac and binth are set as 2 and 3, respectively. The number of cuts to be made in each internal node is determined by considering spfac and the number of rules included in the subspace covered by the internal node. In Fig. 2, each internal node (oval node) has a cut field and the number of cuts, and each leaf node (rectangular node) has rules included in the subspace corresponding to the leaf. For the first cut, eight cuts in the field are selected. Cutting should be repeated for nodes having rules more than the binth. The subspaces corresponding to the cuts made by the decision tree shown in Fig. 2 are indicated by dotted lines in Fig. 1. For example, at the root node, the field ( -axis in Fig. 1) is split into eight partitions. The first partition is split further by two cuts in the field ( -axis in Fig. 1), and so on. Rules covering (or belonging to) a partition are stored into the corresponding leaf node. Starting from the root node, the search procedure examines a number of bits of a header field that is defined at each internal node until a leaf node is reached. For example, for a given input (000110, 111100, 19, 23, TCP), the first three bits of the field are examined since the root node has eight cuts. Since they are 000, the search follows the first edge of the decision tree. Since the internal node of the second level has two cuts in the field, the first bit of the field is examined and determined to be one; hence, the search follows the second edge. Rules , , and are stored in the corresponding leaf node, and the given input is compared first with and finds a match. For the multimatch classification problem, other rules are compared and the list of matching rules is identified. For the best match classification, other rules in the leaf node are not necessarily compared if rules are stored in decreasing order of priority and a match is found. Note that when the port range fields are used for cutting, determining the child pointer to follow does not rely on simple indexing since the interval specified by a port range is not always a power of two. B. HyperCuts While the HiCuts algorithm only considers one field at a time for selecting cut dimension, the HyperCuts algorithm [9] considers multiple fields at a time. For the same example set, the decision tree of the HyperCuts algorithm is shown in Fig. 3. The spfac and binth are set as 1.5 and 3, respectively. As shown at the root node, the and fields are used simultaneously for cutting. Note that each edge of the root node represents the bit combination of 00, 10, 01, and 11, respectively, which is one bit in the first field followed by one bit in the second field. For the same input, the search follows the third edge of the root node

LIM et al.: BOUNDARY CUTTING FOR PACKET CLASSIFICATION

445

Fig. 1. Rules in a prefix plane.

Fig. 2. Decision tree of the HiCuts algorithm.

Fig. 3. Decision tree of the HyperCuts algorithm.

and compares it to . The search then follows the first edge and compares to and at a leaf. Compared to the HiCuts algorithm, the decision tree of the HyperCuts algorithm generally has a smaller depth (not shown in the figures), as multiple fields are used at the same time in a single internal node.

Since the HyperCuts algorithm generally incurs considerable memory overhead compared to the HiCuts algorithm, several techniques have been proposed to refine the algorithm, such as space compaction and pushing common rule subsets upward [9]. For example, rules and at the second level

446

IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 22, NO. 2, APRIL 2014

of the HyperCuts tree in Fig. 3 were pushed upward to the parent internal node because those rules are included in every child of the internal node. These refinements effectively reduce the number of replicated rules, but require complicated processing. Moreover, as will be shown through the simulation, if upward pushing occurs at high levels, search performance is significantly degraded since rules stored in the high levels of a decision tree are compared in the earlier search phase that involves many inputs. For example, if we apply upward pushing to the HiCuts tree in Fig. 2, rules and should be located at the root node, and those rules should be compared to every input. This results in search performance degradation. There is a concern in implementing HiCuts or HyperCuts algorithm. If the predetermined binth is improperly small, the algorithms will try to split a space even though there is no rule boundary inside. If there is no rule boundary inside the space covered by a node, the number of rules included in the node cannot be reduced further by more cutting. It is suggested to use an optimization called rule overlapping. If every rule included in a leaf covers the entire space corresponding to the leaf, which means that if there is no rule boundary inside the space covered by the leaf, rules except the highest-priority rule are removed from the leaf by the rule overlapping optimization. By doing this, the number of rules included in the leaf becomes less than the binth. However, no rule should be ignored for multimatch classification. Hence, HiCuts and HyperCuts algorithms require a stopping condition other than the binth to be applied to the multimatch classification problem. As will be shown in the simulation section, we will find a proper value of binth avoiding this problem instead of finding out other stopping condition. Recently, a new decision tree algorithm, EffiCuts, is proposed [11]. In solving the issues of previous decision tree algorithms, EffiCuts employs several new ideas such as tree separation and equi-dense cut. The tree separation is to separate small rules from large rules and makes multiple decision trees so that fine cutting for small rules does not cause the replication of large rules. Here, a small (large) rule means a rule covering a small (large) subspace. While HiCuts and HyperCuts employ equal-sized cuts as shown in Figs. 2 and 3, the equi-dense cut is to employ unequal-sized cuts based on rule density in each subspace in order to distribute rules evenly among the children.

Fig. 4. Decision tree of the boundary cutting algorithm.

HiCuts and HyperCuts algorithms perform cutting based on a fixed interval, and hence the partitioning is ineffective in reducing the number of rules belonging to a subspace. Moreover, when the number of cuts in a field is being determined, complicated preprocessing should be made to balance the required memory size and the search performance. In this study, we propose a deterministic cutting algorithm based on each rule boundary, termed as boundary cutting (BC) algorithm.

given input are compared for entire fields to the rules belonging to the subspace (represented by a leaf node of the decision tree). For example, regarding the rules shown in Fig. 1, by padding zeros to reach the maximum length [20] (assuming six in this case), the start boundaries in the field of the rules can be derived as 000000, 000100, 001000, 010000, 011100, 100000, and 110000. Hence, the entire line of the field can be split into seven intervals—[0, 3], [4, 7], [8, 15], [16, 27], [28, 31], [32, 47], and [48, 63]—versus the eight regular intervals in the HiCuts algorithm. The same processing can be repeated for the field. The starting boundaries in the and fields are noted in Fig. 1. The decision tree of the proposed algorithm that makes cuts according to the starting rule boundaries is shown in Fig. 4. Here, binth is also set as three. At the root node, the field is used to split the entire line to seven disjointed intervals; hence, the root node has seven binary search entries. At level 2, the field is used to further split each subspace that has more rules than binth. Comparison of the proposed decision tree to the HiCuts decision tree shows that the BC algorithm does not cause unnecessary cutting. Clearly, no two leaves have the same set of rules. Since cutting does not occur in the absence of a boundary, unnecessary cutting is avoided. Hence, rule replication is reduced in the BC algorithm. Additionally, the depth of the decision tree is always less than or equal to six including a leaf since each field is used once at the most.

A. Building a BC Decision Tree

B. Searching in the BC Algorithm

When the cutting of a prefix plane according to rule boundaries is performed, both the starting and the ending boundaries of each rule can be used for cutting, but cutting by either is sufficient since decision tree algorithms generally search for a subspace in which an input packet belongs and the headers of the

The cuts at each internal node of the BC decision tree do not have fixed intervals. Hence, at each internal node of the tree, a binary search is required to determine the proper edge to follow for a given input. However, it will be shown in Section VI that the BC algorithm provides better search performance than the

III. PROPOSED ALGORITHM

LIM et al.: BOUNDARY CUTTING FOR PACKET CLASSIFICATION

HiCuts algorithm despite of the memory access for the binary search at the internal nodes. During the binary search, the pointer to the child node is remembered when the input matches the entry value or when the input is larger than the entry value [20]. Consider an input packet with headers (000110, 111100, 19, 23, TCP), for example; since is used at the root node, a binary search using the header of the given input is performed. The header 000110 is compared to the middle entry of the root node, which is 010000. Since the input is smaller, the search proceeds to the smaller half and compares the input to the entry 000100. Since the input is larger, the child pointer (the second edge) is remembered, and the search proceeds to a larger half. The input is compared to 001000, and it is found to be smaller, but there is no entry to proceed in a smaller half. Hence, the search follows the remembered pointer, the second edge. At the second level, by performing a binary search, the last edge is selected for the header 111100. The linear search, which is the same as that in the HiCuts or HyperCuts algorithm, is performed for rules stored in the leaf node. The binary search at each internal node takes time finding a suitable edge to follow for entries. IV. SELECTIVE BC ALGORITHM This section proposes a refined structure for the BC algorithm. The decision tree algorithms including the BC algorithm use binth to determine whether a subspace should become an internal node or a leaf node. In other words, if the number of rules included in a subspace is more than binth, the subspace becomes an internal node; otherwise, it becomes a leaf node. In the BC algorithm, if a subspace becomes an internal node, every starting boundary of the rules included in the subspace is used for cutting as shown in Fig. 4. We propose a refined structure using the binth to select or unselect the boundary of a rule at an internal node. In other words, the refined structure activates a rule boundary only when the number of rules included in a partition exceeds the binth. For example, there are three internal nodes in the second level of the decision tree shown in Fig. 4. For the first internal node, the middle boundary, which is 100000, separates the area covered by the internal node into two partitions, which are [0, 31] and [32, 63] in the interval. Since the first partition has two rules, and , there is no need to use the other boundary 011000 included in [0, 31], and since the second partition has three rules (which does not exceed binth), , , and , there is no need to use the other boundary 111000 included in [32, 63]. Hence, the internal node results in only two partitions. Similarly, in the second internal node, the middle boundary (110000) separates the area covered by the internal node into two partitions, each of which has three rules; hence, the other boundaries do not have to be used. The same thing happens in the last internal node. The refined structure of the BC algorithm, called selective BC (SBC), is shown in Fig. 5. At the root node, the middle boundary of each partition is recursively considered to determine whether it should be activated or inactivated, but each partition has more rules than binth; hence, all boundaries are used. Upon comparison of the SBC and the BC processes, the number

447

Fig. 5. Decision tree of the selective boundary cutting algorithm.

of leaves is reduced from 16 to 10; consequently, the number of stored rules is reduced from 34 to 27. V. DATA STRUCTURE There are two different ways of storing rules in decision tree algorithms. The first way separates a rule table from a decision tree. In this case, each rule is stored only once in the rule table, while each leaf node of a decision tree has pointers to the rule table for the rules included in the leaf. The number of rule pointers that each leaf must hold equals the binth. In searching for the best matching rule for a given packet or the list of all matching rules, after a leaf node in the decision tree is reached and the number of rules included in the leaf is identified, extra memory accesses are required to access the rule table. The other way involves storing rules within leaf nodes. In this case, search performance is better since extra access to the rule table is avoided, but extra memory overhead is caused due to rule replication. In our simulation in this paper, it is assumed that rules are stored in leaf nodes since the search performance is more important than the required memory. A leaf node includes multiple rule entries. Table I shows the data structure of a rule entry stored in a leaf node of decision tree algorithms. The data structure of the rule entry is the same for all decision tree algorithms except for the last field, which is the next rule pointer. The number of bits allocated for the next rule pointer field is , where is the total number of stored rules. As will be shown in Section VI, rule sets with about 100 000 rules cause tremendous rule replication; hence, becomes an extremely large number. The next rule pointer shown in Table I is roughly estimated as 30 bits that hold up to 10 entries. Table II shows the data structure of the internal nodes of each decision tree. For BC and selective boundary cutting (SBC), represents the total number of boundary entries at internal nodes. The number of entries field is used to specify the entries that are involved in the binary search at an internal node. Each internal node of the BC and SBC algorithms requires 3.625 and 3.5 B, respectively, to represent node information such as field selection and the number of boundary entries. Each boundary

448

IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 22, NO. 2, APRIL 2014

TABLE I DATA STRUCTURES OF STORING A RULE AT LEAF NODES

entry of the internal node has a 32-bit boundary value and a child pointer. Note that each boundary entry has a child pointer for the BC and SBC algorithms. For the HiCuts and HyperCuts algorithms, represents the number of internal nodes. The internal nodes of every decision tree algorithm have bits that are used to represent a field selection (multiple fields for HyperCuts). Since an internal node can have both internal nodes and leaf nodes as its children, separately storing internal nodes from leaf nodes and avoiding storing many child pointers require the presence of a map that represents whether each child is an internal node or a leaf node and two pointers that direct the first internal child and the first leaf child. VI. SIMULATION RESULTS Simulations using C++ have been extensively performed for rule sets created by Classbench [38]. Three different types of rule sets—access control list (ACL), firewall (FW), and Internet protocol chain (IPC)—are generated with sizes of approximately 1000, 5000, 10 000, 50 000, and 100 000 rules each. Rule sets are named using the set type followed by the size such as with ACL1K, which means an ACL type set with about 1000 rules. The performance of each decision tree is highly dependent on the rule set characteristics, especially the number of rules with a wildcard in the prefix fields, since prefix fields are mostly used for cutting because of the variety of distinct values. The numbers and rates of those rules with a wildcard in each field are shown in Table III, in which is the number of rules included in each rule set. ACL type sets have the smallest rate of wildcards in the prefix fields, but a high rate in the port number field. The source port number is always a wildcard. IPC type sets have a high rate of wildcards in port number fields and in the protocol type field. Compared to other types, FW type sets have a much higher rate of wildcards in the prefix fields, especially in the destination prefix field. A. Setting binth The HiCuts and HyperCuts algorithms do not have a cut-stop condition other than binth. Setting binth as an arbitrary small number will cause an infinite loop of unnecessary cutting if

there is no rule boundary inside the subspace covered by an internal node. The optimization of rule overlapping that removes rules except the highest-priority rule in such nodes can solve this problem. However, no rule should be ignored for multimatch packet classification. To solve this issue and reduce the unpredictability that arises from using different binth values, we performed the simulation in two steps. The first step is to determine the proper binth value. We first constructed the BC algorithm setting binth as an arbitrary small number and then identified jumbo leaves, those here that have more rules than binth. The jumbo leaves exist because the number of rules cannot be reduced further by more cutting since there is no rule boundary inside the subspace covered by them. Among the number of rules included in the jumbo leaves, the largest number is identified, and we name the number lower bound binth, meaning that binth should not be smaller than this number to avoid unnecessary cutting. For example, considering the prefix plain shown in Fig. 1, there is no rule boundary inside the areas covered by , , and . Among these areas, in the area of , and are also involved. Hence, the largest number is three, and hence the lower bound binth is three. B. Decision Tree Characteristics In the second step, we constructed decision trees of BC, SBC, HiCuts, and HyperCuts algorithms setting binth as the identified lower bound value. Cutting is recursively performed and stopped if the number of rules included in a subspace is less than or equal to the lower bound binth. The performances of the HiCuts and HyperCuts algorithms depend on spfac as well. In our simulation, spfac was set to 1.5 for 1K sets and 2 for all other sets. The optimization of rule overlapping was not included in our simulation to allow the decision trees to simultaneously support the multimatch and best-match classifications. Node merging optimization was implemented for HiCuts tree. Node merging, space compaction, and pushing common rules upward optimizations were implemented for HyperCuts tree. Regarding the optimization of the pushing common rules upward, the HyperCuts study [9] states that all common rules associated with every child of an internal node should be pushed upward. There can be two different ways of pushing upward: during the building or after the building of a decision tree. If we consider the pushing upward optimization during the building of a decision tree, it is implemented as follows. Once one or more fields are selected as cut fields at an internal node, rules having wildcards in the selected fields are located at the internal node (by means of the pushing upward) without splitting into child nodes. In this case, rule replication is greatly reduced, but search performance is significantly degraded since many rules are located at internal nodes and those rules are linearly compared to inputs arriving at the node. The search performance is not tolerable for large rule sets or rule sets with many wildcard rules or short-length prefixes. On the other hand, if we consider pushing upward after the completion of tree building, it is implemented as follows. A decision tree is built without performing the optimization, and hence each leaf node has a number of rules less than or equal to binth. From the bottom of the tree, each subtree is examined to find common rules in every child, and those rules are pushed

LIM et al.: BOUNDARY CUTTING FOR PACKET CLASSIFICATION

449

TABLE II DATA STRUCTURES OF INTERNAL NODES IN EACH DECISION TREE

TABLE III CHARACTERISTICS OF RULE SETS IN THE NUMBER AND RATE OF WILDCARDS

upward into the root of the subtree. This processing is repeated until the root of the decision tree is visited. In this case, the total number of rules located at internal nodes along a path from the root to a leaf node is guaranteed to be at most binth. Hence, the search performance is not degraded much by the optimization, but the rule replication becomes significant. For the HyperCuts simulation, we implemented two different versions: HyperCuts_1 and HyperCuts_2. Those two implementations are the same for all other things except for the optimization of pushing upward. In HyperCuts_1, the optimization was implemented during the building of the decision tree. In HyperCuts_2, the optimization was implemented after the building of the decision tree. The time to construct each decision tree varies widely depending on the tree and rule set types. The time to build the HiCuts and HyperCuts_2 are longer than building other trees because they create many internal nodes and a high number of replicated rules. Other trees show similar building time. FW types requires the most time because their sets have many rules with wildcards (Table III) or short-length prefixes and these rules are replicated many times. The characteristics of each decision tree for the identified lower bound binth are shown in Table IV, where is the

constructed tree depth and is the rule replication factor, which is the total number of stored rules divided by . For large rule sets such as ACL100K, IPC50K, IPC100K, FW50K, and FW100K, the decision trees of HyperCuts_2 could not be built because of memory shortage problem due to the high rule replication. It is represented by na (not available) in corresponding fields. As shown, the depth of the BC and the SBC is 4–6 including a leaf since each field is used once at the most. The HiCuts depth is the worst. The depth of the HyperCuts_2 is between the HyperCuts_1 and the HiCuts. Comparing the rule replication factors of the BC and the HiCuts, the BC is sometimes better and sometimes worse. The reason is that all of the boundaries are used in the BC for cutting if a node is determined to be an internal. The rule replication factor of the SBC algorithm is always smaller than that of the BC and much better than that of the HiCuts. Because of the optimization of pushing upward in the HyperCuts_1, the rule replication factor is the smallest except for the IPC sets. Comparing the HyperCuts_1 and the HyperCuts_2, is much increased by performing pushing upward optimization after the building of the tree. The number of nodes affected by the optimization is shown in Tables V and VI for HiCuts and HyperCuts, respectively.

450

IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 22, NO. 2, APRIL 2014

TABLE IV CHARACTERISTICS OF EACH DECISION TREE

TABLE V OPTIMIZATION CHARACTERISTICS (HICUTS ALGORITHM)

TABLE VI OPTIMIZATION CHARACTERISTICS (HYPERCUTS ALGORITHM)

The node merging and region compaction optimizations reduce tree depth and rule replication. Node merging optimization is not required in the BC and SBC algorithms since leaf nodes having the same rule subset are not created. The optimization of region compaction is not required either because it is naturally provided in the BC and SBC algorithms. The pushing upward optimization can be applied to the HiCuts and the BC and SBC algorithms. However, it must be reconsidered since it can cause search performance degradation. As shown in Table VI, the maximum number of rules pushed upward at an internal node in the HyperCuts_1 can exceed 1000 in large rule sets. This results in significant search performance degradation.

Comparing the HyperCuts_1 and the HyperCuts_2, we observed several interesting facts. In the HyperCuts_2, more number of internal nodes have rules, the total number of rules located at internal nodes are a lot more, but the maximum number of rules located at an internal node is much smaller than in the HyperCuts_1. C. Required Memory Table VII shows the estimated memory required to store the internal nodes of each algorithm. In this table, represents the number of internal nodes, while represents the total number of boundary entries at the internal nodes of

LIM et al.: BOUNDARY CUTTING FOR PACKET CLASSIFICATION

451

TABLE VII REQUIRED MEMORY FOR STORING INTERNAL NODES OF EACH DECISION TREE

TABLE VIII REQUIRED MEMORY FOR STORING RULES IN EACH DECISION TREE

the BC and SBC algorithms. The required memory is calculated by MB for BC nodes and MB for SBC nodes (Table II). The required memory for the internal nodes of HiCuts and HyperCuts is calculated using MB and MB, respectively. The BC and SBC algorithms require a reasonable amount of memory, less than 4 MB except for large sets such as IPC100K, FW50K, and FW100K; hence, they can be implemented using a cache or embedded memory. The memory requirement for internal HiCuts nodes is much higher, up to 50 that of SBC algorithm. Hence, it is not practical to implement the internal nodes of large rule sets in an on-chip memory or a fast cache for HiCuts. While the HyperCuts_1 requires less memory than the BC or the SBC, mainly due to the pushing upward optimization, the HyperCuts_2 requires much more memory. Table VIII lists the estimated memory required to store rules in each decision tree. The memory amount is estimated by MB based on the width of a rule entry shown in Table I, where is the total number of stored rules. The required memory for storing rules for decision tree algorithms is generally reasonable for small sets with up to 10K rules. However, the memory overhead is significantly increased for large

sets. The HyperCuts_1 requires the least memory for all of the ACL sets, but requires more memory than SBC for IPC sets. The memory requirement for the HyperCuts_2 is more than that of the HyperCuts_1 in all cases because of the large rule replication. The HiCuts always requires more memory than the SBC (up to 8 ). Building any of the decision trees for a large number of FW type rules is not practical in required memory amount. Table IX lists the total memory amount (bytes per rule) required by each decision tree for storing a rule. The number is the required memory amount storing internal nodes shown in Table VII plus the required memory amount storing rules shown in Table VIII, divided by the number of rules, . The memory amount for decision tree algorithms depends on rule number and type. The ACL and IPC types require several kilobytes per rule, while the FW type requires several hundred kilobytes for large sets. The HyperCuts_1 requires the smallest memory overhead except for the IPC50K and IPC100K. As shown, all of the decision tree algorithms require considerable memory overhead, especially for large sets. The memory overhead can be controlled by increasing the binth and reducing the spfac at the expense of degraded search performance. Determining the balance between memory overhead and search performance is not a simple task.

452

IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 22, NO. 2, APRIL 2014

TABLE IX TOTAL REQUIRED MEMORY AMOUNT

PER

RULE (kB)

Fig. 6. Worst-case number of memory accesses at internal nodes. (a) ACL. (b) IPC. (c) FW.

Fig. 7. Average number of memory accesses at internal nodes. (a) ACL. (b) IPC. (c) FW.

D. Search Performance

match packet classification. Rules are stored in descending order of priority, and the search is immediately stopped when a match is found. For HyperCuts, rules stored in the internal nodes by pushing upward optimization are also compared to the descending order of priority. If a match is found, the

Input traces have been also generated from the Classbench [38] to evaluate search performance. The number of inputs is about 3 the number of rules. The search performance was evaluated assuming the highest-priority

LIM et al.: BOUNDARY CUTTING FOR PACKET CLASSIFICATION

453

Fig. 8. Worst-case number of memory accesses at leaf nodes. (a) ACL. (b) IPC. (c) FW.

Fig. 9. Average number of memory accesses at leaf nodes. (a) ACL. (b) IPC. (c) FW.

current best match is remembered. When the search continues at a child node, the priority of the current best match is always compared to the priority of a rule under comparison to avoid unnecessary rule comparison in cases that the priority is lower than that of current best match. Figs. 6 and 7 compare the worst-case and the average-case search performances in terms of the number of memory accesses required at internal nodes. For the HyperCuts_1 and the HyperCuts_2, the rule comparison at internal nodes caused by the pushing upward optimization was not included in these graphs because it is different from the usual operation performed at the internal nodes. It will be accounted for the number of memory accesses at the leaf nodes in later graphs. These graphs show that the number of memory accesses required at internal nodes of the BC or SBC algorithms is much

better than that of the HiCuts and slightly worse than that of the HyperCuts, even though they perform binary searches in each internal node. The HiCuts provides the worst performance because of its large depths. The worst-case number is up to 76 for the HiCuts, while it is up to 29 for the proposed algorithms. The average number is also better in the BC and SBC algorithms than in the HiCuts. The HyperCuts_1 provides the best performance in the number of memory accesses at the internal nodes since the depth is greatly reduced by the pushing upward optimization. Figs. 8 and 9 show the worst-case and average-case search performances at leaf nodes. The worst-case number of memory accesses at the leaf nodes is closely related to binth. Because of the rule comparison at internal nodes by the pushing upward optimization, the HyperCuts_1 shows the worst performance,

454

Fig. 10. Total number of memory accesses in the worst case. (a) ACL. (b) IPC. (c) FW.

which is not tolerable for high-speed packet classification. The HyperCuts_2 shows much better performance than HyperCuts_1, but slightly worse performance than other algorithms. In the average search performance, the BC algorithm provides the best for all cases since every rule boundary is used; hence, leaves with the smallest number of rules are generated and require only 1.0–2.8 accesses. The SBC algorithm also shows excellent search performance at leaf nodes, approximately 1.4–4.0. Assuming that the internal nodes in our proposed algorithms are stored in an on-chip memory or a fast cache, the packet classification of our proposed decision tree algorithms requires 1.0–4.0 in average and 4–47 in the worst cases in terms of the number of off-chip memory accesses for the sets with up to 100 000 rules. To the best of our knowledge, it is the best among the known packet classification algorithms.

IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 22, NO. 2, APRIL 2014

Fig. 11. Total number of memory accesses in average. (a) ACL. (b) IPC. (c) FW.

Figs. 10 and 11 show the total number of memory accesses including the internal node access and the leaf node access in the worst case and in the average case, respectively, assuming that both nodes are stored in off-chip memories. The HyperCuts_1 provides excellent search performance at internal nodes, but the worst search performance at leaf nodes, while the HiCuts provide the worst search performance at internal nodes, but excellent search performance at leaf nodes. The proposed algorithms and the HyperCuts_2 algorithm provide a reasonable search performance as a whole, but the memory requirement for the HyperCuts_2 is not tolerable as shown in Table IX. VII. CONCLUSION In this paper, decision tree algorithms have been studied, and a new decision tree algorithm is proposed. Throughout the

LIM et al.: BOUNDARY CUTTING FOR PACKET CLASSIFICATION

extensive simulation using Classbench databases [38] for the previous decision tree algorithms, HiCuts and HyperCuts, we discovered that the performance of decision tree algorithms is highly dependent on the rule set characteristics, especially the number of rules with a wildcard or a short-length prefix. For example, the HiCuts algorithm can provide high-speed search performance, but the memory overhead for large sets or sets with many wildcard rules makes its use impractical. We also discovered that the HyperCuts algorithm either does not provide high-speed search performance or requires a huge amount of memory depending on how to implement the pushing upward optimization. While the cutting in the earlier decision tree algorithms is based on a regular interval, the cutting in the proposed algorithm is based on rule boundaries; hence, the cutting in our proposed algorithm is deterministic and very effective. Furthermore, to avoid rule replication caused by unnecessary cutting, a refined structure of the proposed algorithm has been proposed. The proposed algorithms consume a lot less memory space compared to the earlier decision tree algorithms, and it is up to several kilobytes per rule except for FW50K and FW100K. The proposed algorithms achieve a packet classification by 10–23 on-chip memory accesses and 1.0–4.0 off-chip memory accesses in average. New network applications have recently demanded a multimatch packet classification [4] in which all matching results along with the highest-priority matching rule must be returned. It is necessary to explore efficient algorithms to solve both classification problems. The decision tree algorithms including the proposed algorithms in this paper naturally enable both the highest priority match and the multimatch packet classification. ACKNOWLEDGMENT The authors would like to thank the anonymous reviewers for their valuable comments and suggestions to improve the quality of the paper, especially related to the implementation of HyperCuts. REFERENCES [1] H. J. Chao, “Next generation routers,” Proc. IEEE, vol. 90, no. 9, pp. 1518–1588, Sep. 2002. [2] A. X. Liu, C. R. Meiners, and E. Torng, “TCAM razor: A systematic approach towards minimizing packet classifiers in TCAMs,” IEEE/ACM Trans. Netw., vol. 18, no. 2, pp. 490–500, Apr. 2010. [3] C. R. Meiners, A. X. Liu, and E. Torng, “Topological transformation approaches to TCAM-based packet classification,” IEEE/ACM Trans. Netw., vol. 19, no. 1, pp. 237–250, Feb. 2011. [4] F. Yu and T. V. Lakshnam, “Efficient multimatch packet classification and lookup with TCAM,” IEEE Micro, vol. 25, no. 1, pp. 50–59, Jan. –Feb. 2005. [5] F. Yu, T. V. Lakshman, M. A. Motoyama, and R. H. Katz, “Efficient multimatch packet classification for network security applications,” IEEE J. Sel. Areas Commun., vol. 24, no. 10, pp. 1805–1816, Oct. 2006. [6] H. Yu and R. Mahapatra, “A memory-efficient hashing by multi-predicate bloom filters for packet classification,” in Proc. IEEE INFOCOM, 2008, pp. 2467–2475. [7] H. Song and J. W. Lockwood, “Efficient packet classification for network intrusion detection using FPGA,” in Proc. ACM SIGDA FPGA, 2005, pp. 238–245. [8] P. Gupta and N. Mckeown, “Classification using hierarchical intelligent cuttings,” IEEE Micro, vol. 20, no. 1, pp. 34–41, Jan.–Feb. 2000.

455

[9] S. Singh, F. Baboescu, G. Varghese, and J. Wang, “Packet classification using multidimensional cutting,” in Proc. SIGCOMM, 2003, pp. 213–224. [10] P. Gupta and N. Mckeown, “Algorithms for packet classification,” IEEE Netw., vol. 15, no. 2, pp. 24–32, Mar.–Apr. 2001. [11] B. Vamanan, G. Voskuilen, and T. N. Vijaykumar, “EffiCuts: Optimizing packet classification for memory and throughput,” in Proc. ACM SIGCOMM, 2010, pp. 207–218. [12] H. Song, M. Kodialam, F. Hao, and T. V. Lakshman, “Efficient trie braiding in scalable virtual routers,” IEEE/ACM Trans. Netw., vol. 20, no. 5, pp. 1489–1500, Oct. 2012. [13] J. Treurniet, “A network activity classification schema and its application to scan detection,” IEEE/ACM Trans. Netw., vol. 19, no. 5, pp. 1396–1404, Oct. 2011. [14] L. Choi, H. Kim, S. Ki, and M. H. Kim, “Scalable packet classification through rulebase partitioning using the maximum entropy hashing,” IEEE/ACM Trans. Netw., vol. 17, no. 6, pp. 1926–1935, Dec. 2009. [15] F. Baboescu and G. Varghese, “Scalable packet classification,” IEEE/ACM Trans. Netw., vol. 13, no. 1, pp. 2–14, Feb. 2005. [16] P. C. Wang, C. L. Lee, C. T. Chan, and H. Y. Chang, “Performance improvement of two-dimensional packet classification by filter rephrasing,” IEEE/ACM Trans. Netw., vol. 15, no. 4, pp. 906–917, Aug. 2007. [17] P. C. Wang, C. L. Lee, C. T. Chan, and H. Y. Chang, “O(log W) multidimensional packet classification,” IEEE/ACM Trans. Netw., vol. 15, no. 2, pp. 462–472, Apr. 2007. [18] X. Sun, S. K. Sahni, and Y. Q. Zhao, “Packet classification consuming small amount of memory,” IEEE/ACM Trans. Netw., vol. 13, no. 5, pp. 1135–1145, Oct. 2005. [19] W. Lu and S. Sahni, “Succinct representation of static packet classifiers,” IEEE/ACM Trans. Netw., vol. 17, no. 3, pp. 803–816, Jun. 2009. [20] B. Lampson, B. Srinivasan, and G. Varghese, “IP lookups using multiway and multicolumn search,” IEEE/ACM Trans. Netw., vol. 7, no. 3, pp. 324–334, Jun. 1999. [21] D. E. Taylor, “Survey and taxonomy of packet classification techniques,” Comput. Surveys, vol. 37, no. 3, pp. 238–275, Sep. 2005. [22] V. Srinivasan, G. Varghese, S. Suri, and M. Waldvogel, “Fast and scalable layer four switching,” in Proc. ACM SIGCOMM, 1998, pp. 191–202. [23] T. V. Lakshnam and D. Stiliadas, “High-speed policy-based packet forwarding using efficient multi-dimensional range matching,” in Proc. ACM SIGCOMM, Oct. 1998, pp. 203–214. [24] M. M. Buddhikot, S. Suri, and M. Waldvogel, “Space decomposition techniques for fast layer-4 switching,” in Proc. Protocols High Speed Netw., Aug. 1999, pp. 25–41. [25] F. Baboescu and G. Varghese, “Scalable packet classification,” in Proc. ACM SIGCOMM, Aug. 2001, pp. 199–210. [26] F. Baboescu, S. Singh, and G. Varghese, “Packet classification for core routers: Is there an alternative to CAMs?,” in Proc. IEEE INFOCOM, 2003, pp. 53–63. [27] S. Dharmapurikar, H. Song, J. Turner, and J. Lockwood, “Fast packet classification using Bloom filters,” in Proc. ACM/IEEE ANCS, 2006, pp. 61–70. [28] P. Wang, C. Chan, C. Lee, and H. Chang, “Scalable packet classification for enabling internet differentiated services,” IEEE Trans. Multimedia, vol. 8, no. 6, pp. 1239–1249, Dec. 2006. [29] H. Lim, M. Kang, and C. Yim, “Two-dimensional packet classification algorithm using a quad-tree,” Comput. Commun., vol. 30, no. 6, pp. 1396–1405, Mar. 2007. [30] H. Lim, H. Chu, and C. Yim, “Hierarchical binary search tree for packet classification,” IEEE Commun. Lett., vol. 11, no. 8, pp. 689–691, Aug. 2007. [31] H. Lim and J. Mun, “High-speed packet classification using binary search on length,” in Proc. IEEE/ACM ANCS, Orlando, FL, USA, Dec. 2007, pp. 137–144. [32] A. G. Alagu Priya and H. Lim, “Hierarchical packet classification using a Bloom filter and rule-priority tries,” Comput. Commun., vol. 33, no. 10, pp. 1215–1226, Jun. 2010. [33] H. Lim and S. Y. Kim, “Tuple pruning using bloom filters for packet classification,” IEEE Micro, vol. 30, no. 3, pp. 48–58, May–Jun. 2010. [34] H. Song, J. Turner, and S. Dharmapurikar, “Packet classification using coarse-grained tuple spaces,” in Proc. ACM/IEEE ANCS, 2006, pp. 41–50. [35] V. Srinivasan, S. Suri, and G. Varghese, “Packet classification using tuple space search,” in Proc. ACM SIGCOMM, 1999, pp. 135–146.

456

IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 22, NO. 2, APRIL 2014

[36] H. Song, S. Dharmapurikar, J. Turner, and J. Lockwood, “Fast hash table lookup using extended bloom filter: An aid to network processing,” in Proc. ACM SIGCOMM, 2005, pp. 181–192. [37] P. Panda, N. Dutt, and A. Nicolau, “On-chip vs. off-chip memory: The data partitioning problem in embedded processor-based systems,” Trans. Design Autom. Electron. Syst., vol. 5, no. 3, pp. 682–704, Jul. 2000. [38] D. E. Taylor and J. S. Turner, “ClassBench: A packet classification benchmark,” IEEE/ACM Trans. Netw., vol. 15, no. 3, pp. 499–511, Jun. 2007. Hyesook Lim (M’91–SM’13) received the B.S. and M.S. degrees in control and instrumentation engineering from Seoul National University, Seoul, Korea, in 1986 and 1991, respectively, and the Ph.D. degree in electrical and computer engineering from the University of Texas at Austin, Austin, TX, USA, in 1996. From 1996 to 2000, she had been employed as a Member of Technical Staff with Bell Labs, Lucent Technologies, Murray Hill, NJ, USA. From 2000 to 2002, she was a Hardware Engineer for Cisco Systems, San Jose, CA, USA. She is currently a Professor with the Department of Electronics Engineering, Ewha Womans University, Seoul, Korea, where she does research on packet forwarding algorithms such as IP address lookup, packet classification, and deep packet inspection, and research on hardware implementation of various network protocols such as TCP/IP and Mobile IPv6. She is currently working as the Associate Vice President of the Office of Faculty and Academic Affairs at Ewha Womans University.

Nara Lee received the B.S. and M.S. degrees in electronics engineering from Ewha Womans University, Seoul, Korea, in 2009 and 2012, respectively. She is currently a Research Engineer with the IP Technical Team, DTV SoC Department, SIC Lab, LG Electronics, Inc., Seoul, Korea. Her research interests include various network algorithms such as IP address lookup, packet classification, Web caching, and Bloom filter application to various distributed algorithms.

Geumdan Jin received the B.S. and M.S. degrees in electronics engineering from Ewha Womans University, Seoul, Korea, in 2008 and 2010, respectively. She is currently working for the TMS Service Development Team, Hyundai Motor Company, Seoul, Korea. Her research interests are layer-2 and layer-3 architectures, fast packet classification, and wireless networks.

Jungwon Lee received the B.S. degree in mechatronics engineering from Korea Polytechnic University, Gyeonggi-do, Korea, in 2011, and the M.S. degree in electronics engineering from Ewha Womans University, Seoul, Korea, in 2013, and is currently pursuing the Ph.D. degree in electronics engineering from Ewha Womans University. Her research interests include address lookup and packet classification algorithms and packet forwarding technology at content-centric networks.

Youngju Choi (S’12) received the B.S. and M.S. degrees in electronics engineering from Ewha Womans University, Seoul, Korea, in 2011 and 2013, respectively. She works for the Technology IT Team, Hyundai Consulting and Information Company, Ltd., Seoul, Korea. Her research interests include various network algorithms and TCAM architectures.

Changhoon Yim (M’91) received the B.S. degree in control and instrumentation engineering from Seoul National University, Seoul, Korea, in 1986, the M.S. degree in electrical engineering from Korea Advanced Institute of Science and Technology, Seoul, Korea, in 1988, and the Ph.D. degree in electrical and computer engineering from the University of Texas at Austin, Austin, TX, USA, in 1996. He was a Research Engineer working on HDTV with the Korean Broadcasting System, Seoul, Korea, from 1988 to 1991. From 1996 to 1999, he was a Member of the Technical Staff with the HDTV and Multimedia Division, Sarnoff Corporation, Princeton, NJ, USA. From 1999 to 2000, he worked with Bell Labs, Lucent Technologies, Murray Hill, NJ, USA. From 2000 to 2002, he was a Software Engineer with the KLA-Tencor Corporation, Milpitas, CA, USA. From 2002 to 2003, he was a Principal Engineer with Samsung Electronics, Suwon, Korea. He is currently a Professor with the Department of Internet and Multimedia Engineering, Konkuk University, Seoul, Korea. His research interests include multimedia communication, digital image processing, packet classification, and IP address lookup.

Suggest Documents