Scalable Packet Classification Using Condensate

2 downloads 0 Views 1MB Size Report
Apr 4, 2005 - With the ABV scheme, an AND operation on the abvs. “1111” and “1011” ... abvs and false matches. Moreover, the .... To test the scalability of ...
IEICE TRANS. COMMUN., VOL.E88–B, NO.4 APRIL 2005

1440

PAPER

Special Section on Internet Technology V

Scalable Packet Classification Using Condensate Bit Vector Pi-Chung WANG†,†††a) , Member, Hung-Yi CHANG†† , Chia-Tai CHAN††† , and Shuo-Cheng HU†††† , Nonmembers

SUMMARY Packet classification is important in fulfilling the requirements of differentiated services in next generation networks. One of interesting hardware solutions proposed to solve the packet classification problem is bit vector algorithm. Different from other hardware solutions such as ternary CAM, it efficiently utilizes the memories to achieve an excellent performance in medium size policy database; however, it exhibits poor worst-case performance with a potentially large number of policies. In this paper, we proposed an improved bit-vector algorithm named Condensate Bit Vector which can be adapted to large policy databases in the backbone network. Experiments showed that our proposed algorithm drastically improves in the storage requirements and search speed as compared to the original algorithm. key words: packet classification, bit vector, aggregation, scalability

1.

Introduction

Packet classification is important in fulfilling the requirements of differentiated services in next generation networks. Most of Internet service providers (ISPs) mainly strive to correlate business decisions to network by applying associated policies such as bandwidth management, firewall access control, routing, MPLS tunneling, or quality of service (QoS). Packet classification has been extensively employed in the Internet for secure filtering and service differentiation by administrators to reflect policies of network operations and resource allocation. Using the pre-defined policies, the packets can be assigned to a given class. However, packet classification with a potentially large number of policies is difficult and exhibits poor worst-case performance. A classifier which includes a set of policies is used to distinguish an incoming packets into multiple classes. A policy F is called k tuples, F = ( f [1], f [2], . . . f [k]) if the policy contains k fields of packet header, where each f [i] is either a variable length prefix bit string, a range or an explicit value. The most common fields of a packet header are the IP source address (SA), the destination address (DA), the Manuscript received June 30, 2004. Manuscript revised October 1, 2004. † The author is with the Institute of Computer Science and Information Technology, National Taichung Institute of Technology, Taichung, Taiwan 404, ROC. †† The author is with the Department of Information Management, I-Shou University, Kaohsiung, Taiwan 840, ROC. ††† The authors are with the Telecommunication Laboratories, Chunghwa Telecom Co., Ltd., Taipei, Taiwan 106, ROC. †††† The author is with the Department of Information Management, Ming-Hsin University of Science and Technology, Hsinchu 304, Taiwan, ROC. a) E-mail: [email protected] DOI: 10.1093/ietcom/e88–b.4.1440

protocol type, the port numbers of source and destination applications, and protocol flags. A packet P is said to match a particular policy F if for all i, the ith field of the header satisfies the f [i]. Each policy has an associative action. In order to decide the best matching policy, each policy is usually assigned a cost to define its priority among the matched policies. The least-cost matched policy will be enacted to process the arriving packets. In this paper, we proposed the Condensate Bit Vector (CBV) algorithm based on the existing Lucent Bit Vector (BV) scheme. Simulation results show that the CBV algorithm outperforms the Lucent BV scheme in the required storage significantly. The CBV scheme could be further improved by applying the concept of the bit aggregation to construct the Condensate and Aggregate Bit Vector (CABV). The rest of this paper is organized as follows. Section 2 introduces some related works on packet classification and describes the basic ideas of Lucent BV and Aggregate Bit Vector (ABV) in details. Section 3 describes the main idea of the CBV algorithm and how to apply the aggregation concept to construct the CABV. Section 4 presents the storage and speed comparisons of Lucent BV, ABV, CBV and CABV. Finally, Sect. 5 concludes the work. 2.

Related Work

For the past few years, researchers got more interested in the packet classification problem and had proposed several algorithms to solve the issue [1]. The related studies could be mainly categorized into two classes: software-based and hardware-based. Several software-based schemes have been proposed in the literature such as linear search/caching, grid of tries/cross-producting, tuple space search, recursive-flow classification, fat inverted segment trees, and hierarchical intelligent cuttings solutions [2]–[9]. The software-based solutions do not scale well in one of time or storage, thus we focus on the hardware-based solutions. The following briefly describes them. Ternary content addressable memories (TCAMs) have been widely used in policies searching with a high degree of hardware parallelism implementation. It does provide a speed-up advantage. However, TCAMs with a particular word width cannot be used when flexible policy specification is required. Manufacturing TCAMs with sufficiently wide words to contain all bits in a policy is difficult. It also suffers from the problem of power consumption and scalability [10].

c 2005 The Institute of Electronics, Information and Communication Engineers Copyright 

WANG et al.: SCALABLE PACKET CLASSIFICATION USING CONDENSATE BIT VECTOR

1441 Table 1

In [11], Lakshman and Stiliadis proposed a scheme called bit parallelism. It uses the geometrical space decomposition approaches to solve the original k-dimensional packet classification problem. They project every policy onto each dimension and use a bit vector to maintain the positional information in policy database. In [12], Baboescu et al. called the bit parallelism as Lucent BV scheme. By constructing k one-dimensional tries, each prefix node in every one-dimensional trie is associated with a bit vector (bv) and the length of bv is equal to the number of policies in the policy database. Each bit position maps to a corresponding policy in the database. Since the policy database is assumed to be sorted by an descendent order of priority, AND operation is performed on all matched bv and find the first matched result as the best matching policy. Naturally, the length of bv would increase proportional to the number of policies in the database. Hence the Lucent BV scheme is only suitable for the medium size policy database. Baboescu et al. [12] proposed a bit-vector aggregation method to enhance the original BV scheme that is based on the following observations:

Policies F0 F1 F2 F3 F4 F5 F6 F7

Two-dimensional policy database with 16 rules. Source Prefix 000* 0* 1* 01* 111* 0* 000* 10*

Dest. Prefix 11* 0000* 1111* 010* 00* 110* 1111* 1110*

Policies F8 F9 F10 F11 F12 F13 F14 F15

Source Prefix 10* 011* 000* 011* 000* 011* 1* *

Dest. Prefix 110* 10* 0110* 0* 010* 010* 00* 00*

(a) Source trie

• The distribution in the bit vectors are always sparse. • A packet usually matches few policies in the policy database. By adding a new bit vector called aggregate bit vector (abv), the number of memory accesses is decreased significantly. The length of abv is defined as N/A, where N is the number of the policies and A is the aggregate size of abv. Since the frequency of memory access is the main performance factor of the classification algorithm, ABV scheme improves the Lucent BV scheme with one order of magnitude. We use a simple two-dimensional policy (source prefix, destination prefix) database with 16 rules, shown in Table 1, to illustrate the constructed bit vectors of the BV and ABV scheme. There are 8 unique prefix nodes in the source trie and 10 unique prefix nodes in the destination trie, depicted as the shady nodes in Fig. 1. Each prefix node in both tries is labelled with two bit vectors, bv (16 bits) and abv (4 bits). As describe earlier, the length of bv is equal to the number of policies in the database and the length of abv could be altered by the aggregate size (4). For an incoming address pair (00001010, 01001100), Lucent BV scheme uses the source address 00001010 to walk through the source trie, it stops at P0 and receives the bv value “1100011000101001.” Similarly, it stops at P3 and gets a bv value “0001000010011100” by using the destination address 01001100 to walk through the destination trie. Finally, Lucent BV scheme performs an AND operation on these two bvs and obtains a result of “0000000000001000” that shows the matched policy is F12 . With the ABV scheme, an AND operation on the abvs “1111” and “1011” yields result “1011.” It indicates that the second 4-bit segment will not contain any set bits and could be ignored. However, the ABV scheme might result in a so-called false match phenomenon. For example, ABV scheme would access the first segment of both bvs (“1100”

(b) Destination trie Fig. 1

Bit vectors constructed by BV and ABV scheme.

and “0001”), but could not derive any match. In the worst case, the performance of the ABV scheme can be worse than that of the BV scheme due to the extra memory accesses for abvs and false matches. Moreover, the required storage of the ABV scheme is more than that of the Lucent BV scheme. 3.

Condensate Bit Vector (CBV)

In the bit vector algorithm proposed by Lakshman and Stiliadis [11], the hardware solution achieves good performance for the medium size databases, but suffers from a performance degradation with the increase of policy number. While the ABV scheme can improve the average speed of the BV scheme, it increases the required storage by appending extra abv for each bv. Consequently, we propose an efficient method to improve the performance of Lucent BV and ABV schemes in both speed and storage. The new scheme is inspired by two observations. • The number of bvs is equal to the number of different prefixes for each dimension, which is pertinent to stor-

IEICE TRANS. COMMUN., VOL.E88–B, NO.4 APRIL 2005

1442

Subtrie Root Selection Algorithm Input: The root of the trie in each dimension Output: A new trie with CBV Subtrie Constructor(Node, Dimension) BEGIN IF (ChildWithPrefix(Node)==Included Prefix Num ) MarkSubtrieRoot(Node, Dimension); ELSE Subtrie Constructor(Node→Left Child,Dimension); Subtrie Constructor(Node→Right Child,Dimension); IF (ChildWithPrefix(Node)>=Included Prefix Num) MarkSubtrieRoot(Node, Dimension); ELSE IF (Node == Trie Root && ChildWithPrefix(Node)!=Included Prefix Num) MarkSubtrieRoot(Node, Dimension); END /* ChildWithPrefix() calculate the number of prefix nodes successive to the current node.*/ /* MarkSubtrieRoot() marks the current node as a subtree root.*/ Fig. 2

(a) Source trie

Subtrie root selection algorithm.

age. • The length of bvs is equal to the number of policies, which directly influences both speed and storage performance of the bit vector algorithm. Our proposed scheme consists of three parts: 1) subtrie root selection, 2) condensed policy database construction and 3) condensate bv construction. In the first part, the original prefixes extracted from the policies are used to construct the binary trie. For each one-dimensional binary trie, the following algorithm is executed to mark the subtrie root according to the pre-defined threshold “Included Prefix Num.” By traversing the binary trie with the depth first order, Subtrie Root Selection Algorithm checks whether the number of prefixes under the current node is equal to the threshold or not. If so, a subtrie root is marked. Otherwise, its left and right child nodes will be traversed recursively. The detailed algorithm is shown in Fig. 2. We assume that the number of clustered prefixes (Included Prefix Num) is 2. After performing the proposed algorithm to the trie in Fig. 1, the constructed subtrie roots are shown in Fig. 3. Each dark-gray circle represents the position of the subtrie root. Next, the original prefixes in each policy are replaced by the bit-streams corresponding to the nearest ascending subtrie roots. For example, F0 (000∗, 11∗) in Table 1 is changed to (0∗, 11∗), and F6 (000∗, 1111∗) is changed to (0∗, 111∗). Some new policies might be redundant, e.g. the new policies generated by F0 and F5 . These policies should be merged, and the indices of the original policies are appended to the new policy. Table 2 presents the new policy database. Once a matching policy in the new database is derived, we must examine the included policies to derive the exact result. To carry out the task, linear search can be used because few included policies are present. The condensate bv construction is based on the new policy database. Since the number of policies in the new database is reduced, the required bits in each bit vector is

(b) Destination trie Fig. 3

An example of bit vector condensation.

Table 2 Policies F0 F1 F2 F3 F4 F5 F6 F7 F8 F9

New policy database.

First Dimension

Second Dimension

Included Policies

0* 0* 1* 01* 1* 0* 1* 01* 0* *

11* 00* 111* 01* 00* 111* 11* * 01* 00*

F0 , F5 F1 F2 , F7 F3 , F13 F4 , F14 F6 F8 F9 , F11 F10 , F12 F15

less than that in the Lucent BV. Furthermore, the number of different prefixes in each dimension is reduced to generate fewer vectors. An example of 16-bit bvs in Fig. 1 transformed into new 10-bit bvs is shown in Fig. 4. In addition, the number of bvs is reduced from 18 to 9 with the proposed scheme. 3.1 Condensate and Aggregate Bit Vector In Fig. 4, we also present the condensate bit vector with aggregation and call it condensate and aggregate bit vector (cabv). The aggregate size is defined as 4 in these cases. The construction of cabv follows the same procedure as that of abv. Since our CBV scheme condenses several bits into one, cabv would be more condensate than the cbv. The cabv might also cause false match. But in most cases, the number of false matches observed in the CABV scheme is far less

WANG et al.: SCALABLE PACKET CLASSIFICATION USING CONDENSATE BIT VECTOR

1443

(a) Source trie

(b) Destination trie Fig. 4

Modified bit vector condensation.

than that in the ABV scheme. This is mainly because the implied geometrical-information in the cbvs can improve the accuracy of the bit aggregation. By contrast, the abv based on the original bvs only relies on the priority of the policies to perform aggregation which is difficult to reflect the geometrical relation. 4.

Performance Evaluation

In this section, we evaluate the performance of the CBV scheme and compare it with both Lucent BV scheme and ABV scheme. Two different types of databases are evaluated in our simulation. First, we use 22 collected databases in 11 backbone routers of Abilene Network [13] as real world classifiers. These databases are downloaded at different time and contain at most 13 K policies. To test the scalability of our CBV scheme, the synthetic databases are used to evaluate whether CBV scheme could accommodate the future network service configurations. The synthetic databases are generated by randomly selecting the source and destination addresses in these 22 classifiers. The size of our synthetic databases varies from 10 K to 100 K. Assume that the memory word size and the aggregate size of ABV are both 32 bits, identical to the settings used in [12].

required storage mainly ties to the number of bit vectors and their length. The number of bit vectors is equal to the number of prefix nodes, while their lengths is identical to the number of policies. The speed of each packet classification is measured in terms of the number of memory accesses. This is due to the speed of current hardware/software architecture is limited by memory bandwidth, thus it is viable to examine the algorithmic time complexity by the number of memory accesses. The details of accounting the memory accesses are described as follows. We calculate the number of memory accesses  for the bit vector of o f Bit Vector , where Length of Bit each dimension by LengthWord S ize Vector is the length of the compared bit vector and Word Size is the memory word size of the hardware architecture. Next, the required accesses for the linear search is considered. In the aggregation cases, the number of false matches is The worst-case memory accesses would be  Aggregate     included. Lenth o f Bit Vector S ize + + L , where Agi Word S ize×Aggregate S ize Word S ize gregate Size is the aggregate size of ABV construction and Li is the number of fetched policies in the linear search for each set bit. Since we could not expect the linear search time in a general case, we use the cross products of all unique source and destination prefixes to construct the headers of the incoming packets. Lastly, we estimate the number of worstcase false matches for these incoming packet headers. In our experiments, the number of clustered prefixes are set to 2, 4, 8 and 16 in order to evaluate the effect of condensation. Since the number of clustered prefixes does not affect the Lucent BV and ABV schemes, their storage requirement and speed will remain constant. 4.2 Evaluation Results Storage Requirements in Real Classifiers The storage of various real classifiers are evaluated and the results are organized in Fig. 5. In Fig. 5(a), our CBV and CABV schemes do not outperform the existing schemes when the size of classifiers is under 1,000 policies. This is because the effect of prefix clustering is not obvious for small classifier. Also, the overhead incurred by our schemes is larger than the original Lucent BV and ABV schemes. As the size of classifiers gets larger than 3,698, our schemes can reduce at least 50% storage as compared to both the Lucent BV and ABV schemes. If we enlarge the number of clustered prefixes to four, only 25% storage is required, as shown in Fig. 5(b). Figure 5(c) shows that it achieve the same improvement level while the size of classifiers is larger than 2,348. In Fig. 5(d), the reduction ratio can be up to 92%. Thus the proposed CBV and CABV schemes are scalable for large classifiers. Storage Requirement in Synthetic Classifiers

4.1 Performance Metrics Two performance metrics are measured in our experiments: the storage requirement and the classification speed. The

The performance of the proposed scheme for synthetic databases is then presented. The characteristics of the synthetic databases are similar to that of the large real-world

IEICE TRANS. COMMUN., VOL.E88–B, NO.4 APRIL 2005

1444

(a) Clustered prefixes = 2

(b) Clustered prefixes = 4

(c) Clustered prefixes = 8 Fig. 5

(d) Clustered prefixes = 16 Storage requirements for real classifiers (lower is better).

classifiers. Figure 6 shows the results for various settings. Since the prefixes in the synthetic databases are sampled uniformly, the slopes are quite smooth. Drastic increases in storage reduction can be seen as the number of clustered prefixes multiplies. Although the required storage for large classifiers could be reduced effectively in our proposed schemes, it might increase the number of memory accesses for packet classification due to the larger policy buckets. In the following section, we will present the speed of the proposed schemes with different settings. Speed in Real Classifiers The results of worst-case memory accesses for the 22 real-world classifiers are demonstrated in Fig. 7. In the CBV-based schemes, the number of the worst-case memory accesses is equal to the summation of memory accesses for bit vectors and linear search. We assume that fetching one policy in the classifier takes one memory access. Figure 7(a) shows that CBV scheme features much better performance than the Lucent BV scheme while the CABV scheme behaves quite similar to the ABV scheme. When the number of clustered prefixes is enlarged, as shown in Fig. 7(b), the CBV scheme requires only half memory accesses in Lucent BV to accomplish one packet classification. The performance of CABV is unstable as compared with ABV. In some cases, the CABV is even worse than CBV. This is because the false match caused by bit aggregation

would decline the worst-case performance. In Fig. 7(c), the curve of CBV scheme is getting close to that of the ABV scheme since there is a great effect on the prefix clustering. If the classifier is large enough, the CABV will provide best performance in all schemes. In Fig. 7(d), we notice that the speed of CBV scheme and CABV scheme is almost the same in all classifiers. Since as the number of clustered prefixes increases, the number of policies processing in the linear-search phase would dominate the worst-case performance. Also in Fig. 7(d), as the number of clustered prefixes gets less than 2,202, our schemes would have a poorer performance than the ABV scheme since the prefix clustering could not take effect. In addition, the increasing number of policies in the linear-search phase would lead to degradation in the speed performance. We could avoid this by carefully selecting the parameter in the clustering number according to the characteristics of each classifier. Speed in Synthetic Classifiers Lastly, we demonstrate the scalability of our schemes with synthetic classifiers in Fig. 8. The number of memory accesses increases linearly as the size of classifier multiplies. Figure 8(a) shows that the performance of the CABV scheme is similar to that of the ABV scheme. For the 50 Kentry classifier, the effect of false match degrades the performance of the proposed schemes; seen in Fig. 8(b). However, the lookup speed of the proposed schemes is improved as the

WANG et al.: SCALABLE PACKET CLASSIFICATION USING CONDENSATE BIT VECTOR

1445

(a) Clustered prefixes = 2

(c) Clustered prefixes = 8 Fig. 6

(b) Clustered prefixes = 4

(d) Clustered prefixes = 16 Storage requirement in synthetic databases (lower is better).

(a) Clustered prefixes = 2

(b) Clustered prefixes = 4

(c) Clustered prefixes = 8

(d) Clustered prefixes = 16

Fig. 7

Worst case memory accesses in real classifiers (lower is better).

IEICE TRANS. COMMUN., VOL.E88–B, NO.4 APRIL 2005

1446

(a) Clustered prefixes = 2

(b) Clustered prefixes = 4

(c) Clustered prefixes = 8

(d) Clustered prefixes = 16

Fig. 8

Worst case memory access in synthetic databases (lower is better).

number of the prefixes increases. In Fig. 8(c) and Fig. 8(d), the speed of the CBV scheme is getting closer to that of the ABV scheme while the CABV scheme always outperforms the ABV scheme. 5.

Conclusions

Packet classification is an essential function in fulfilling the requirements of differentiated services, such as firewalls, MPLS tunnelling and QoS, in next generation networks. Several algorithms for classifying packets have appeared in the literature to deal with the packet classification problem. Bit vector algorithm is one of them which employs an intriguing mechanism to solve the problem. Nevertheless, the algorithm cannot be scaled up with increasing policies. In this work, we proposed the condensate bit vector with linear search to improve the classification performance and decrease the required storage. The main idea of CBV is that the bit vector can inherit from their ancestors and utilize trie structure to condense several vectors into one. We also proposed condensate and aggregate bit vector which incorporates the bit aggregation into the CBV to decrease the number of memory accesses for large classifiers. In our experiments, the CBV scheme only needs less than 2 Mbytes for the synthetic database with 100 K policies while the Lucent BV and ABV need near 50 Mbytes. Clearly, CBVbased schemes could fit the searchable structures into the

fast SRAM and enable the BV-based schemes to support large classifiers. The experimental results also demonstrate that the CBV-based schemes feature similar speed performance as the ABV scheme and much faster than the Lucent BV scheme while the CBV-based schemes require much less storage. In conclusion, the flexibility of CBV-based schemes is superior to the existing algorithms. With the flexible data structure, the CBV and CABV schemes are suitable for larger classifiers, which will be widely adopted in the near future. References [1] P. Gupta and N. Mckneown, “Algorithms for packet classification,” IEEE Netw., vol.15, no.2, pp.24–32, March/April 2001. [2] V. Srinivasan, G. Varghese, S. Suri, and M. Waldvogel, “Fast and scalable level four switching,” ACM SIGCOMM’98, pp.191–202, Sept. 1998. [3] V. Srinivasan, S. Suri, and G. Varghese, “Packet classification using tuple space search,” ACM SIGCOMM’99, pp.135–146, Sept. 1999. [4] P. Gupta and N. McKeown, “Packet classification on multiple fields,” ACM SIGCOMM’99, pp.147–160, Sept. 1999. [5] P. Gupta and N. McKeown, “Packet classification using hierarchical intelligent cuttings,” IEEE Micro, vol.20, no.1, pp.34–41, Feb. 2000. [6] A. Feldmann and S. Muthukrishnan, “Tradeoffs for packet classification,” IEEE INFOCOM, pp.1193–1202, March 2000. [7] T.Y.C. Woo, “A modular approach to packet classification: Algorithms and results,” INFOCOM, pp.1213–1222, 2000. [8] P.-C. Wang, C.-T. Chan, S.-C. Hu, C.-L. Lee, and W.-C. Tseng, “High-speed packet classification for differentiated services in next-

WANG et al.: SCALABLE PACKET CLASSIFICATION USING CONDENSATE BIT VECTOR

1447

[9]

[10]

[11]

[12] [13]

generation networks,” IEEE Trans. Multimedia, vol.6, no.6, pp.925– 935, 2004. P.-C. Wang, C.-T. Chan, S.-C. Hu, and C.-L. Lee, “Performance improvement of packet classification by using lookahead caching,” IEICE Trans. Commun., vol.E87-B, no.2, pp.377–379, Feb. 2004. F. Baboescu, S. Singh, and G. Varghese, “Packet classification for core routers: Is there alternative to cams?” IEEE Infocom, pp.53– 63, March 2003. T. Lakshman and D. Stiliadis, “High-speed policy-based packet forwarding using efficient multi-demensional range matching,” ACM SIGCOMM’98, pp.203–214, Sept. 1998. F. Baboescu and G. Varghese, “Scalable packet classification,” ACM SIGCOMM’01, pp.199–210, Aug. 2001. “Abilene netflow nightly report,” http://www.itec.oar.net/ abilenenetflow/

Pi-Chung Wang received his Ph.D. degree in Computer Science and Information Engineering from National Chiao-Tung University, Hsinchu, Taiwan in 2001. From 2002 to 2004, he was with the Telecommunication Laboratories, Chunghwa Telecom Co., Ltd, where he worked on network planning. He is currently an Assistant Professor in the Institute of Computer Science and Information Technology, National Taichung Institute of Technology. His research interests include the Internet multimedia communications, traffic control on high-speed network and L3/L4 switching technology. He is a member of IEEE.

Hung-Yi Chang was born in Taiwan in 1970. He received his M.S. and Ph.D. degrees in Computer Science and Information Engineering from National Chiao-Tung University, Hsinchu, Taiwan, in 1994 and 1999, respectively. He is now an assistant professor in the Department of Information Management in I-Shou University, Kaohsiung, Taiwan. His research interests include network management and interconnection networks.

Chia-Tai Chan received his Ph.D. degree in Computer Science and Information Engineering from National Chiao-Tung University, Hsinchu, Taiwan in 1998. He is now with the Telecommunication Laboratories Chunghwa Telecom Co., Ltd. His research interests include the design, analysis and traffic engineering of broadband multiservice networks.

Shuo-Cheng Hu received the Ph.D. degree in computer science from National Chiao-Tung University in 2000. He is currently an Assistant Professor in the Department of Information Management, Ming-Hsin University of Science & Technology. His main research interest is in computer networking, with an emphasis on Internet quality of service, fast address lookup algorithms and VoIP.

Suggest Documents