Page 1 of 14
IET Communications
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication in an issue of the journal. To cite the paper please use the doi provided on the Digital Library page.
Efficient Bloom Filter for Network Protocols Using AES Instruction Set Yao Zhang1,2 , Zhiming Zheng2 , Xiao Zhang2,* 1
Security Technology Department, Inspur Electronic Information Industry Co., Ltd, Beijing 100083, Peoples Republic of China 2 LMIB and School of Mathematics and Systems Science, Beihang University, Beijing 100191, Peoples Republic of China *
[email protected] Abstract: The Internet continues to flourish, while an increasing number of network applications are found deploying Bloom filters. However, the heterogeneity of the Bloom filter realizations complicates the utilization of relevant applications. Also, when applying Bloom filter to traffic that usually has a gigabit capacity, even insignificant delays will accumulate and restrict the effectiveness of the real-time protocols. In this paper, we present a Bloom filter construction that can be easily and consistently adopted at network nodes, with also considerable processing speed. Specifically, we show that AES-based hashes are adequate to create Bloom filters correctly. Then we illustrate how AES New Instructions (AES-NI) can be leveraged to accelerate the Bloom filter realization. According to our experimental results, the proposed Bloom filter enables the best speed performance compared to the competing approaches.
1. Introduction Bloom filter [1] is a space-efficient data structure that can be used to process queries by answering whether an arbitrary element belongs to a certain set. Recently, thousands of real-time network applications and services [2–9] (such as DDoS attack detection and protection [2–4], multicast routing [6], cache filtering in CDN networks [7], high-speed flow measurement [8], and datastream monitoring [9]) are found taking Bloom filter as a fundamental component. The above applications make up just a tip of the entire iceberg. Bloom filter has made attractive contributions to the functionality of dedicated network protocols. Yet, building Bloom filters is not always straightforward. The realization heterogeneity remains as one big challenge when deploying various protocols at network nodes. With distinct underlying hash functions, the actual realization of a Bloom filter differentiates. Although premium algorithms like Murmur Hash are more likely to be selected, typical hash functions like MD5 [10] are still widely used [11, 12]. Such inconsistency unavoidably increases the complexity of the deployment when taking multiple protocols into account. The other concern is the processing performance. Facing potentially large numbers of element queries (summarized in Table 1), slight processing delays will accumulate and significantly constrain the effectiveness of the relevant applications. The speed optimization of a Bloom filter thus becomes critical, while the simplicity of the data structure itself barely leaves space for possible acceleration. As a remedy, this paper presents a fast and lightweight Bloom filter that can be effectively adopted by end hosts and intermediate routers in a consistent manner. Rather than using legacy 1
IET Review Copy Only
IET Communications
Page 2 of 14
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication in an issue of the journal. To cite the paper please use the doi provided on the Digital Library page.
Table 1. Bloom filter applications and the scale of the potential element queries. Application Cases DDoS protection [4] Multicast routing [6] Cache filtering [7] Flow measurement [8] Data monitoring [9]
Insertion/Query Elements Packet information Forwarding address Web objects Flow identifier Data value
Potential Query Scale 106 107 107 107 108
hash functions, we employ Advanced Encryption Standard (AES) [13] for creating the required hashing streams in a Bloom filter. AES is a standard and widely-used block cipher that guarantees both high capabilities in terms of security and speed. In particular, since AES is supported by most of the microprocessors, Bloom filter realization that is driven by AES can be easily unified. Basically, our approach leverages AES New Instructions (or AES-NI) [14, 15], a cryptographic microprocessor instruction set proposed by Intel. We employ AES counter mode [16] to achieve efficient hash function generation, and we illustrate how to enable Bloom filter functions with corresponding AES-NI operations. According to our experimental results, the processing time of such AES-NI supported Bloom filter can be reduced significantly. We summarize the main contributions of the paper as follows: • We present a novel lightweight Bloom filter construction based on AES counter mode and its realization method conducted by AES-NI. Namely, only AES-based one-way functions are involved to build a Bloom filter. The microprocessor-supported feature of our approach naturally unifies the Bloom filter realizations and facilitates the deployment of relevant applications. • We theoretically prove the feasibility and correctness of the proposal. According to our analysis, our AES-based Bloom filter satisfies all required hashing properties. • We implement the proposed Bloom filter as well as other four popular Bloom filters in a common framework. Based on the comprehensive comparison results, our Bloom filter outperforms all competing methods in terms of processing time. The rest of the paper is organized as follows. We illustrate how a standard Bloom filter works in Section 2. Then we present our detailed design in Section 3, followed by a formal analysis of our method in Section 4. We show the evaluation results in Section 5 and give our remarks in Section 6. Related work is presented in Section 7 and we conclude the paper in Section 8. 2. Background: Bloom Filter In this section we briefly introduce the standard form and the mathematical preliminaries behind a Bloom filter. More complete specifications can be found in [11, 17]. Bloom filter is a data structure that has two primary operations: insertion and query. Considering insertion, Bloom filter represents (a series of) elements from a given n-element set A = {a1 , a2 , ..., an } with a binary array of m bits, initially set as an all-0 vector. Each element will be processed by k independent hash functions to determine k positions among the m-bit array. If 2
IET Review Copy Only
Page 3 of 14
IET Communications
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication in an issue of the journal. To cite the paper please use the doi provided on the Digital Library page.
the corresponding value at those positions are 0, they will then be set to 1; otherwise remaining unchanged. All hashing values are assumed to be generated universally at random. After the insertions of elements from set A, the Bloom filter (i.e., the obtained m-bit array) can support query operations by answering if an element bi is in set A. Element bi will be processed with the same hash functions to check if all the corresponding positions are set to 1. If so, the query will be confirmed with “yes”; otherwise a “no” will be returned. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 a1
a2
a3
0 1 0 1 0 1 0 1 0 1 0 0 0 1 0 1 b1
b2
Initialization
Insertion
...
0 1 0 1 0 1 0 1 0 1 0 0 0 1 0 1
Query
Fig. 1. Examples of Bloom filter initialization, insertion, and query. There are 3 hash functions used in the Bloom filter. Original set contains 3 elements a1 , a2 , a3 . For the queries, b1 and b2 are respectively returned with “no” and “yes”, while the answer of the second query happens to be a false positive. Without searching from the original data set, Bloom filter is more adequate for space-limited scenarios. Also, as the basic operations are conducted with hash functions, a Bloom filter is able to respond a query in O(1) time. However, due to the utilization of hash function, the representation of set A becomes lossy. The query operation will cause false positives, meaning a “yes” answer may be returned by the Bloom filter for an element bi that is not in A (while the “no” answer will always be true). The above two operations and a false positive example are shown in Fig. 1. Fortunately, the false positive probability can be minimized when choosing the parameters appropriately. For a given set A with n elements, if the acceptable false positive probability is p, then the optimal length of the Bloom filter m is given as: m=−
n ln p , (ln 2)2
(1)
and the optimal number of hash functions k is calculated as: k=
m ln 2. n
(2)
Thereby, based on the above parameter setting (n, p, m, k), one can always initialize a Bloom filter whose false positive probability is mitigated to p. 3. Design In this section, we first present our goals of building a Bloom filter, and then describe how our scheme works in detail. We will further prove the correctness of our design in Section 4. 3
IET Review Copy Only
IET Communications
Page 4 of 14
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication in an issue of the journal. To cite the paper please use the doi provided on the Digital Library page.
3.1. Design goals The following design goals enable an efficient Bloom filter for network protocols: Functionality. First of all, any proposed Bloom filter must hold correctness for processing basic insertion/query operations. More specifically, all hashing outputs should be uniformly distributed and independent [17]. Deployability. When deploying Bloom filters at network nodes (such as routers and servers), the underlying technique is desired to be unified and supported by all the nodes. Thus more incentives will be provided to adopters (e.g., Internet Service Providers). Efficiency. Additionally, the Bloom filter should achieve high processing speed for both insertion and query operations. Thus, a minimal delay will be introduced by real-time network applications. 3.2. AES-driven Bloom filter Recall the description in Section 2. When the set scale is n and the chosen false positive probability is p, a standard Bloom filter counts for m bits and the required number of hash functions is k. We hereby envision a vector with m buckets. For a given element insertion/query, the output of each hash function (i.e., fi (·), i ∈ {1, ..., k}) represents a position of the buckets. Hence, the required length l for each hash function (denoted also as |fi (·)|) can be calculated as: l = |fi (·)| = ⌈log2 (−
n ln p )⌉. (ln 2)2
(3)
As a consequence, if concatenating all k hash functions together, the entire length of the bit chain L is: n ln p L = k × ⌈log2 (− )⌉. (4) (ln 2)2 Intuitively, considering a symmetric encryption whose ciphertext has sufficient randomness, we can then apply the ciphertext in place of the bit streams from the hash functions as long as both outputs are uniformly distributed and independent. As one of the trusted symmetric cipher candidates, we select Advanced Encryption Standard (AES) [13] for its high security margin, wide-spread usage, and microprocessor-supported feature. We will first elaborate the process of AES-driven hashing stream generation. More details on AES-NI supported implementation are presented in Section 3.3. Since AES is a 128-bit block cipher, the required blocks b for generating a L-bit stream are calculated as: n ln p k × ⌈log2 (− (ln )⌉ 2)2 b=⌈ ⌉, (5) N where the length of the block N is 128 for AES. Note that in our scheme the maximum size of an element will be bounded by b · N , where N is again 128 for AES. We suppose here that the size of the input elements is deterministic and comparable (i.e., within the above bound determined by b). This restriction is however reasonable. For instance in the mentioned network applications (Table 1), the inserted elements are pre-defined by the protocols. As with the multicast routing scheme, elements for insertion/query are forwarding addresses that have a uniform size (e.g., 32 bits). Specific procedures of our method are shown in Fig. 2. For both insertion and query operations, if the initial element’s size is shorter than the maximum input length, the element will be shaped 4
IET Review Copy Only
Page 5 of 14
IET Communications
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication in an issue of the journal. To cite the paper please use the doi provided on the Digital Library page.
Insertion/Query Element (1) Element || Padding (2)
Element Processing
(2)
N-bit AES input
...
N-bit AES input
b blocks
AES-CTR Encryption N-bit AES output l bits
l bits
l bits ...
f1( )
f2( )
f3( ) ...
...
N-bit AES output
b blocks
Hash Functions Generation
Fig. 2. Procedure of using AES-CTR encryption for the generation of k hash functions. For element processing, step (1) refers to block padding and step (2) refers to self-duplication. After the processing of the original Bloom filter insertion/query element, b blocks of inputs will be encrypted with the CTR mode of AES algorithm. At last, desired hash functions are obtained every l bits from the AES ciphertext. for alignment. Such procedure is referred to as element processing. Specifically, we first use the PKCS#7 [18] padding mode to align the element’s size to a multiple of 16 bytes (AES block size). Namely, for the last block of the element which is shorter than 16 bytes, it will be padded with bytes that each represents the length of the padding (e.g., 0x030303 for padding three bytes). Then, if the padded element is still less than b blocks, it will be self-duplicated multiple times to eventually fit in. In Fig. 2, we show an example in which the original element is N/2 bits (i.e., 64 bits). As the block number b is likely to be more than 1, AES encryption has to be executed with a working mode [16]. We select the counter mode (CTR mode) due to the speed and security advantages [16, 19]. Also, CTR mode requires no connection between encryption blocks. Basically, each input block is combined with a 128-bit message which is consisted of a 64-bit nonce and a 64-bit successive counter (by concatenation). And the encryption is conducted by xoring the input with the ciphertext of the corresponding message. As each block is independently encrypted, CTR mode provides potential for parallelization. After AES-CTR encryption, the first l bits of the ciphertext will be used as the output of first hash function f1 (·), followed by the output for f2 (·), and so forth. According to our calculation above, b blocks of AES outputs will be sufficient for k hash functions. As an example, if n = 10, 000, and p = 1%, then m is 95, 851 in bits and k is 7. Furthermore, the required hash function length l will be 17. L is therefore 119 and the required block b is just 1. 3.3. AES-NI supported implementation We further present the details of our implementation using AES-NI [14, 15], an AES instruction set proposed by Intel in 2008 and ubiquitously deployed in current Intel and AMD microprocessors. Since earlier CPUs may not support AES-NI, our Bloom filter implementation will first detect the 5
IET Review Copy Only
IET Communications
Page 6 of 14
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication in an issue of the journal. To cite the paper please use the doi provided on the Digital Library page.
Algorithm 1 Pseudocode used for a sequence of r Bloom filter insertions ai - element for Bloom filter insertion, r - number of insertion requests, key - pre-defined userkey, can be updated when the Bloom filter is reset. k - number of required hash functions, Function Insert(a1 , a2 , ..., ar ) : 1 AES Key Expansion(key); %Generate the round keys for all subsequent insertions. 2 f or i := 1 to r do 3 P roai =P rocessing(ai ); % Adjust the encrypted element (i.e., element processing). 4 f1 (P roai )||...||fk (P roai ) ← AES CT R Encrypt(P roai ); 5 f or j := 1 to k do 6 w ← fj (P roai ); % Determine a position w according to the output of each hash. 7 if (BFw == 0) % Check if the Bloom filter has zero value at position w. 8 BFw ← 1; compatibility of the system (on which the Bloom filter is supposed to be used) according to the feedback of the following command: $ grep module /proc/crypto | sort -u AES-NI supported CPUs will proceed the Bloom filter operations. As counter mode is leveraged in our method, based on the instruction set our implementation requires AES CT R Encrypt() for text encryption and AES Key Expansion() for round-key scheduling [15]. Note that the round-key expansion is much time-consuming compared with the normal encryption [20]. Hence based on a pre-defined userkey, we execute the round-key expansion only once, before the insertion/query operations start. Every time when the Bloom filter is reseted, the userkey can be updated. Algorithm 1 presents the pseudocode for r-element Bloom filter insertions. Similar steps can be adjusted for query, which we omit here for simplicity. 4. Analysis The correctness of a Bloom filter is guaranteed when the selected k hash functions are independent and uniformly distributed [17]. In this section, we formally prove that our AES-based Bloom filter construction satisfies the above two hashing properties. We first give the definitions of two relevant cryptographic primitives according to Katz et al. [21]: Pseudo Random Function (PRF) and Pseudo Random Permutation (PRP). Definition 4.1. (Pseudo Random Function) Let G : {0, 1}x × {0, 1}y → {0, 1}z be an efficient, deterministic, keyed function. Then G is a PRF if for any probabilistic polynomial-time (p.p.t) distinguisher D, there exists a negligible ϵ such that: ! ! ! ! Gs (·) x g(·) x (6) (1 ) = 1] − P r[D (1 ) = 1]! ≤ ϵ, !P r[D
where s and g are chosen uniformly at random from {0, 1}y and the set of functions mapping x-bit strings to z-bit strings, respectively.
Definition 4.2. (Pseudo Random Permutation) Let H : {0, 1}x ×{0, 1}y → {0, 1}x be an efficient, deterministic, keyed bijection (The inverse of H exists). Then H is a PRP if for any probabilistic 6
IET Review Copy Only
Page 7 of 14
IET Communications
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication in an issue of the journal. To cite the paper please use the doi provided on the Digital Library page.
polynomial-time (p.p.t) distinguisher D, there exists a negligible ϵ such that: ! ! ! ! !P r[DHs (·) (1x ) = 1] − P r[Dh(·) (1x ) = 1]! ≤ ϵ,
(7)
where s and h are chosen uniformly at random from {0, 1}y and the set of permutations on x-bit strings, respectively. In our analysis, we assume that AES algorithm is as secure as a PRP. Also, the associated counter mode is supposed to be correctly initialized. Namely, as suggested in our design, a nonrandom nonce (upper 64 bits) will be concatenated with a counter (lower 64 bits). Thus, the counter mode encryption can be seen as a PRF [19] when the following relation is satisfied: Adv[P RF ](t, b) ≤ Adv[P RP ](t, b) + b2 2−N −1 ,
(8)
where b is the number of encrypted message blocks, t is the distinguishing time, N is the block length, and the advantage of distinguishing algorithm A from random is denoted as Adv[A]. Based on our setting, N is 128 and Adv[P RP ](t, b) equals to 0 for AES. Then the advantage Adv[P RF ] can be limited to a negligible ϵ as long as b2 ≪ 2128 . According to Equation (5), the required AES-CTR encryption blocks for our approach is on a logarithmic order of either a certain element number or a defined false positive probability, which guarantees that b will be significantly smaller than 264 (also shown experimentally in Section 5). In a nutshell, the AES-CTR ciphertext in our scheme is indistinguishable from a random source. With the above remarks on AES counter mode encryption, we give the following two theorems: Theorem 4.1. Considering AES encryption as a PRP, the k hash functions generated based on our scheme are uniformly distributed. Proof. In order to prove that every hashing output has a uniform distribution, we turn to show that all k hash functions (with length l) are PRFs. Suppose that one of these hash functions, say fi (·) is not a PRF. Then by definition: ! ! ! ! fi (·) l g(·) l (9) !P r[Di (1 ) = 1] − P r[Di (1 ) = 1]! > δ,
where Di is a probabilistic polynomial-time distinguisher and δ is a non-negligible advantage. Denote how hashing output fi (·) is obtained from the AES-CTR ciphertext as operation Oi (·). Then Oi (·) (i.e., a direct extraction from the ciphertext) can be accomplished within polynomial time. Therefore, by first conducting Oi (·) to the ciphertext, AES-CTR encryption is also distinguishable with a probabilistic polynomial-time distinguisher D = Di (Oi (·)). This is contradicting to the given condition that AES-CTR encryption is a PRF. Hence, we can conclude that all k hash functions are PRFs and further uniformly distributed. Theorem 4.2. Considering AES encryption as a PRP, the k hash functions generated based on our scheme are independent.
Proof. Based on the results of Theorem 4.1, all produced hash functions (fi (·), i ∈ {1, ..., k}) are PRFs. Then with an equivalent output size l: k "
P r[fi (·)] =
i=1
# 1 $k 2l
,
7
IET Review Copy Only
(10)
IET Communications
Page 8 of 14
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication in an issue of the journal. To cite the paper please use the doi provided on the Digital Library page.
where P r[fi (·)] refers to the probability that hash function fi (·) returns a given bit stream. Since each hash functions is a PRF, then for every bit the probabilities of producing zero and one are equivalent. Thereby, every P r[fi (·)] (i ∈ {1, ..., k}) equals to 21l consequently. Furthermore, since AES-CTR encryption is considered as a PRF, a lk-length ciphertext holds that P r[AESCT R (·)] = 21lk . As k hash functions are generated in order from the ciphertext, then taking the entire concatenated (hashing) outputs into account we have: P r[f1 (·) ∩ f2 (·) ∩ ... ∩ fk (·)] = P r[f1 (·) ∥ f2 (·) ∥ ... ∥ fk (·)] = P r[AESCT R (·)] 1 = lk . 2
(11)
In conclusion, the following relation can be obtained: P r[f1 (·) ∩ f2 (·) ∩ ... ∩ fk (·)] =
k "
P r[fi (·)].
(12)
i=1
This indicates that all k hash functions in our scheme are independent. 5. Evaluation In this section, we evaluate the performance of the proposed scheme. Specifically, in our experiments we test the following aspects of a Bloom filter, namely real false positive rate (concerning correctness) and massive-operation processing time (concerning speed). For comparison, we choose 4 prominent Bloom filters (namely Cassamdra, Sdroege, Squid, and Python-supported Bloom filters), in which Murmur Hash, FNV1a, MD5 [10], and SHA-1 [22] are respectively deployed as hash functions. We implement all 5 Bloom filters in our PC with Intel i5-3470 CPU @ 3.20GHz. To eliminate irrelevant variables, we realize a common framework (e.g., Bloom filter initialization and underlying bit operations) for all Bloom filters and implement corresponding hash functions differently for each Bloom filter. As to the size of the hash algorithm, MD5 and SHA-1 have their standard sizes (128 and 160 bits, respectively), and we select 128-bit length for Murmur Hash and 256-bit length for FNV1a. For the implementation of our scheme, we choose AES-128 version (supported by AES-NI). All the tests are implemented in C++11 (on Linux) with compiler flags msse4.1, maes, and O4. Without loss of generality, by default we set strings within 128 bits as the elements for insertion/query operations. 5.1. Real false positive rate First of all, we measure the real false positive rate of each Bloom filter realization. We initialize every Bloom filter by inserting 10, 000 randomly-chosen elements and varying the intended false positive probabilities (i.e., 0.25%, 0.5%, 1%, 2.5%, and 5%). Then in turn we observe the false positive rate of each Bloom filter scheme under 3 query scales (i.e., 104 , 106 , 108 ). For the test we intentionally select valid query elements that are not in the set of inserted elements, and all the tests are conducted independently. 8
IET Review Copy Only
Page 9 of 14
IET Communications
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication in an issue of the journal. To cite the paper please use the doi provided on the Digital Library page.
Table 2. Real false positive rate (%) of 5 Bloom filter implementations. For each table entry, the corresponding rates are in order referring to Bloom filters based on (1) Murmur Hash, (2) FNV1a, (3) MD5, (4) SHA-1, and (5) our approach, respectively. FP(%) 0.25 0.5 1 2.5 5 Tested BF
104 0.32 0.37 0.33 0.32 0.31 0.68 0.80 0.66 0.68 0.70 1.42 1.53 1.43 1.46 1.33 3.09 3.35 3.13 3.16 3.16 5.32 5.70 5.42 5.30 5.45 (1) (2) (3) (4) (5)
Query times 106 0.32 0.40 0.32 0.31 0.34 0.71 0.84 0.72 0.71 0.83 1.45 1.62 1.47 1.45 1.44 3.08 3.33 3.07 3.09 3.19 5.38 5.75 5.43 5.38 5.55 (1) (2) (3) (4) (5)
108 0.34 0.40 0.32 0.32 0.32 0.72 0.83 0.71 0.69 0.82 1.44 1.60 1.46 1.46 1.41 3.09 3.35 3.09 3.04 3.01 5.36 5.71 5.31 5.29 5.53 (1) (2) (3) (4) (5)
As can be shown in Table 2, all 5 Bloom filters perform correctly as the real rates change in compliance with the initial settings. As to individual performance, cryptographic hash functions (MD5 and SHA-1) provide more stable rates across all query cases. Both as non-cryptographic hash functions, Murmur Hash performs better than FNV1a. Our scheme achieves an acceptable false positive rate, performing among the best rates 6 times out of 15. Hence again, we experimentally confirm the correctness of our AES-based Bloom filter. According to our measuring samples, all false positive rates are marginally higher than the predefined probabilities which are basically the theoretical expectations. Therefore, for applications requiring exact false positive control, it is recommended to set the initial probability slightly lower. 5.2. Processing time comparison Next, we compare the processing time of each scheme. For hash-based approaches, a straightforward design is to employ k different hash functions (when k hash functions are required). Yet based on a less-hashing technique proposed by Kirsch et al. [23], only two seed hash functions are necessary to construct a Bloom filter. Hence in our implementation, all 4 hash-based Bloom filters are optimized accordingly. For test configuration, we differentiate the false positive probability to 1%, 0.1%, and 0.01%; while the number of the operation (insertion/query) requests varies from 104 to 108 . Table 3. Bloom filter processing time (in seconds) for different scales of operations. Default false positive probability is set to 1%. Involved Algorithm Common AES-NI Murmur FNV1a SHA-1 MD5
#Operation (FP=1%) 10 106 −4 8.80 × 10 9.00 × 10−2 +1.90 × 10−4 +1.40 × 10−2 −3 +1.65 × 10 (88.5%) +2.70 × 10−2 (48.2%) +2.88 × 10−3 (93.4%) +1.33 × 10−1 (89.5%) +3.75 × 10−3 (94.9%) +3.75 × 10−1 (96.3%) +4.99 × 10−3 (96.2%) +4.35 × 10−1 (96.8%) 4
9
IET Review Copy Only
108 9.07 +1.43 +2.53 (43.5%) +13.23 (89.2%) +37.13 (96.2%) +43.33 (96.7%)
IET Communications
Page 10 of 14
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication in an issue of the journal. To cite the paper please use the doi provided on the Digital Library page.
Table 4. Bloom filter processing time (in seconds) for different scales of operations. Default false positive probability is set to 0.1%. Involved Algorithm Common AES-NI Murmur FNV1a SHA-1 MD5
#Operation (FP=0.1%) 104 106 −3 1.23 × 10 1.24 × 10−1 +1.30 × 10−4 +1.10 × 10−2 −3 +2.07 × 10 (93.7%) +2.70 × 10−2 (59.3%) +3.02 × 10−3 (95.7%) +1.31 × 10−1 (91.6%) +3.69 × 10−3 (96.5%) +3.72 × 10−1 (97.0%) +4.99 × 10−3 (97.4%) +4.35 × 10−1 (97.5%)
108 12.70 +1.40 +2.40 (41.7%) +13.00 (89.2%) +37.10 (96.2%) +43.20 (96.8%)
The evaluation results are shown in Table 3, 4, and 5. “Common” in the tables refers to the underlying instructions in common for all five Bloom filters. Thereby each entry refers to the processing time in seconds to accomplish the given amount of operations continually. In general, our AES-NI based Bloom filter outperforms all other 4 schemes for all sampled cases. Compared to cryptographic hash functions (SHA-1 and MD5), non-cryptographic hash functions (Murmur Hash and FNV1a) contribute to faster Bloom filters. Taking Table 3 as an example, the AESNI based Bloom filter consumes only 0.104 seconds to execute in total 106 operations with a 0.09-second common processing time. Strikingly, our scheme leads to an improvement of 48.2%, 89.5%, 96.3%, and 96.8% upon Bloom filters driven by Murmur Hash, FNV1a, SHA-1, and MD5, respectively. The average processing time of each scheme is also shown in Fig. 3. As can be seen from the figure, the proposed approach guarantees a stable and much faster processing speed. Table 5. Bloom filter processing time (in seconds) for different scales of operations. Default false positive probability is set to 0.01%. Involved Algorithm Common AES-NI Murmur FNV1a SHA-1 MD5
#Operation (FP=0.01%) 10 106 −3 1.63 × 10 1.58 × 10−1 −4 +1.40 × 10 +1.50 × 10−2 +2.28 × 10−3 (93.9%) +3.10 × 10−2 (51.6%) +2.83 × 10−3 (95.1%) +1.33 × 10−1 (88.7%) +3.60 × 10−3 (96.1%) +3.71 × 10−1 (96.0%) +5.36 × 10−3 (97.4%) +4.54 × 10−1 (96.7%) 4
108 16.30 +1.80 +2.60 (30.8%) +13.00 (86.2%) +36.90 (95.1%) +43.10 (95.8%)
When combining Equation (1) and (2), the hash function number k can also be calculated as: k=−
ln p . ln 2
(13)
Thus the false positive probability solely determines the number of hash functions. According to our configuration, when the false positive probability is 1%, 0.1%, and 0.01%, essentially 7, 10, and 13 hash functions are required to construct a Bloom filter. As can be seen in the Tables, under a tighter false positive setting all schemes will take a longer processing time, but mainly 10
IET Review Copy Only
Page 11 of 14
IET Communications
Per Operation Processing Time (second)
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication in an issue of the journal. To cite the paper please use the doi provided on the Digital Library page.
×10 -7 5 4 3 2 1 0
AES-NI
Murmur
FNV1a
SHA-1
Hash Used in the Bloom filter
MD5
Fig. 3. Per-operation processing time for each Bloom filter. at the common part for dealing with more bit operations. Note that the generation of additional AES blocks barely influences the processing speed. In fact, the input blocks of the AES-CTR encryption is only 4 (according to Equation (5)) when handling 108 operations with a 0.01% false positive probability (as in Table 5). Overall our scheme provides attractive performance compared to other approaches. At most 4 blocks of AES encryption is adequate for processing massive Bloom filter operations under tight false positive restrictions. 6. Discussion As illustrated in Section 5, our scheme shows the capability to deal with large-scale Bloom filter operations. Indeed, with a 4-block AES-CTR encryption, our approach can support as many as 3 × 1010 operations with a 10−4 false positive probability, or over 4×107 operations with a false positive constraint of 10−5 . Both the settings above refer to extremely harsh situations, which indicates a high upper bound for using the proposed Bloom filter. Moreover, our approach can be further optimized by leveraging parallelization. Since the key scheduling of AES encryption is conducted beforehand, only AES-CTR encryption needs to be executed for element insertion/query. For instance if 4 blocks are required, then with a parallelism degree of 4, encryption of all blocks can be accomplished simultaneously. In a nutshell, our approach provides well efficiency to the deployed applications. It also deserves to mention that when our scheme is running on a platform whose microprocessor does not supported AES-NI, the Bloom filter acceleration will not be feasible. Yet our scheme remains to be promising as currently network routers, servers, and even individual devices are widely using AES-NI. According to our analysis assisted by Intel’s Processor Feature Filter, during the period between 2010 and 2016, 864 out of 1243 processors that Intel has produced support AES New Instructions. As shown in Fig. 4, this ratio witnessed a three-fold jump from 31% in 2010 to 95% in 2016. This indicates an increasing deployability of the Bloom filter construction based on AES instruction set. Choosing AES-NI as the unified underlying technique reduces the complexity of Bloom filter 11
IET Review Copy Only
IET Communications
Page 12 of 14
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication in an issue of the journal. To cite the paper please use the doi provided on the Digital Library page.
Number of Processors
250 200 77%
150
65%
100
91% 95%
67% 58%
50
All Types of Processors AES-NI Enabled
31%
0 2010
2011
2012
2013
Years
2014
2015
2016
Fig. 4. The ratio of AES-NI enabled processors among all produced Intel CPUs during the last 7 years. realization. As with our implementation, the AES-NI based hash function requires around 60 lines of codes [15], compared to hundreds lines of codes for any other legacy hash function realizations. Furthermore, since AES is widely used to provide confidentiality or authenticity, building a Bloom filter with AES-NI technique will also enhance the reuseability of the applications that already have encryption or identity-verification functions. 7. Related Work In this paper, we focus on the realization of a standard Bloom filter [1], while more variants of Bloom filters (compressed Bloom filter, hierarchical Bloom filter, etc.) have been introduced to tackle different limitations or requirements. A comprehensive summary about Bloom filter variants is given by Tarkoma et al. [11]. Recently, Fan et al. [24] proposed cuckoo filter, which is comparable to a revised Bloom filter that supports dynamically adding and deleting items. Yet our work is orthogonal to the above directions of Bloom filter enhancement. Considering Bloom filter implementation, Kirsch and Mitzenmacher [23] proposed a doublehashing technique: The k required hash functions in a Bloom filter can be generated from two independent seed hashes. With less hash computations, the performance of a Bloom filter will be much faster. AES instruction set [14, 15] has provided profound contributions to the performance of AES-involved protocols and other fundamental designs such as hash functions. Benadjila et al. [25] presented the optimized implementation results of blockcipher-based SHA-3 candidates. Later, Bos et al. [20] proposed a software benchmark on compression hash operations that leverage block cipher in the design. Instead of targeting the construction of a hash algorithm with arbitrary inputs, this paper discusses how to build an efficient Bloom filter using AES-NI, which is distinct from the existing applications of this hardware-level AES support.
12
IET Review Copy Only
Page 13 of 14
IET Communications
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication in an issue of the journal. To cite the paper please use the doi provided on the Digital Library page.
8. Conclusion In this paper, we present a Bloom filter construction as well as its implementation method. By employing AES counter mode encryption and the microprocessor-supported instruction set AESNI, the proposed Bloom filter guarantees consistent realization at network nodes and high-speed element operations. According to our comparison results, the processing speed of our Bloom filter outperforms other prominent approaches in all sampled cases. Moreover, our proposal shows substantial potential for parallelization, and thus an optimized implementation is recommended for future work. 9. Acknowledgements This work is supported by Major Program of National Natural Science Foundation of China (11290141), NSFC (61402030), and Fundamental Research of Civil Aircraft no. MJ-F-2012-04. We also would like to acknowledge Adrian Perrig, Qi Wang, Sitao Wang, and Yagu Xu for their valuable discussion and feedback. 10. References [1] Bloom, B.H.: ‘Space/time trade-offs in hash coding with allowable errors’, Commun. ACM, 1970, 13, (7), pp. 422-426 [2] Zhang, Y., Wang, X., Perrig, A., et al.: ‘Tumbler: Adaptable link access in the bots-infested Internet’, Comput. Netw., 2016, 105, pp. 180-193 [3] Kandula, S., Katabi, D., Jacob, M., et al.: ‘Botz-4-sale: Surviving organized DDoS attacks that mimic flash crowds’, Proc. USENIX NSDI, Boston, USA, 2005, pp. 287-300 [4] Dixon, C., Anderson, T.E., Krishnamurthy, A.: ‘Phalanx: Withstanding multimillion-node botnets’, Proc. USENIX NSDI, San Francisco, USA, 2008, pp. 45-58 [5] Zhang, Y., Zheng, Z., Szalachowski, P., et al.: ‘Collusion-resilient broadcast encryption based on dual-evolving one-way function trees’. Secur. Commun. Netw., 2016, 9, (16), pp. 36333645 [6] GrtinvaU, B.: ‘Scalable multicast forwarding’, ACM SIGCOMM Comput. Commun. Rev., 2002, 32, (1), pp. 68-68 [7] Maggs, B.M., Sitaraman, R.K.: ‘Algorithmic nuggets in content delivery’, ACM SIGCOMM Comput. Commun. Rev., 2015, 45, (3), pp. 52-66 [8] Kumar, A., Xu, J., Wang, J.: ‘Space-code bloom filter for efficient per-flow traffic measurement’, IEEE J. Sel. Area Commun., 2006, 24, (12), pp. 2327-2339 [9] Aguilar-Saborit, J., Trancoso, P., Muntes-Mulero, V., et al.: ‘Dynamic adaptive data structures for monitoring data streams’, Data Knowl. Eng., 2008, 66, (1), pp. 92-115 [10] Rivest, R.: ‘The MD5 message-digest algorithm’, RFC 1321, 1992 [11] Tarkoma, S., Rothenberg, C.E., Lagerspetz, E.: ‘Theory and practice of bloom filters for distributed systems’, IEEE Commun. Surv. Tut., 2012, 14, (1), pp. 131-155 13
IET Review Copy Only
IET Communications
Page 14 of 14
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication in an issue of the journal. To cite the paper please use the doi provided on the Digital Library page.
[12] Fan, L., Cao, P., Almeida, J., et al.: ‘Summary cache: A scalable wide-area web cache sharing protocol’, IEEE/ACM Trans. Netw., 2000, 8, (3), pp. 281-293 [13] Daemen, J., Rijmen, V.: ‘The design of rijndael: AES-the advanced encryption standard’ (Springer Science & Business Media, 2013) [14] Gueron, S.: ‘Intel’s new AES instructions for enhanced performance and security’, Proc. Int. Workshop on Fast Software Encryption, Leuven, Belgium, 2009, pp. 51-66 [15] Gueron, S.: ‘Intel advanced encryption standard (AES) instructions set’, Intel Corporation, 2010 [16] Dworkin, M.: ‘Recommendation for block cipher modes of operation: Methods and techniques’, DTIC Document, 2001 [17] Broder, A., Mitzenmacher, M.: ‘Network applications of bloom filters: A survey’, Internet Mathematics, 2004, 1, (4), pp. 485-509 [18] Housley, R.: ‘PKCS #7: Cryptographic message syntax’, RFC 2315, 1998 [19] McGrew, D.: ‘Counter mode security: Analysis and recommendations’, Cisco Systems, 2002 ¨ [20] Bos, J.W., Ozen, O., Stam, M.: ‘Efficient hashing using the AES instruction set’, Proc. Int. Workshop on Cryptographic Hardware and Embedded Systems, Nara, Japan, 2011, pp. 507522 [21] Katz, J., Lindell, Y.: ‘Introduction to modern cryptography’ (CRC Press, 2014), pp. 85-95 [22] Eastlake, D. 3rd., Jones, P.: ‘US secure hash algorithm 1 (SHA1)’, RFC 3174, 2001 [23] Kirsch, A., Mitzenmacher, M.: ‘Less hashing, same performance: Building a better bloom filter’, Random Struct. Algor., 2008, 32, (2), pp. 187-218 [24] Fan, B., Anderson, D.G., Kaminsky, M., et al.: ‘Cuckoo filter: Practically better than bloom’, Proc. ACM Int. Conf. Emerging Networking Experiments and Technologies, Sydney, Australia, 2014, pp. 75-88 [25] Benadjila, R., Billet, O., Gueron, S., et al.: ‘The Intel AES instructions set and the SHA3 candidates’, Proc. Int. Conf. on Theory and Application of Cryptology and Information Security, Tokyo, Japan, 2009, pp. 162-178
14
IET Review Copy Only