Adaptive Compression Trie based Bloom Filter

4 downloads 0 Views 555KB Size Report
Forwarding Information Base (FIB). CS maintains cached content chunks and their name index table, for every request forwarded to NDN node, CS index.
1

Adaptive Compression Trie based Bloom Filter: Request Filter for NDN Content Store Ran Zhang∗ , Jiang Liu∗† , Tao Huang∗‡ , Tian Pan∗‡ , and Lixuan Wu∗ Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, P.R. China † Beijing Laboratory of Advanced Information Networks, Beijing, P.R. China ‡ Beijing Advanced Innovation Center for Future Internet Technology, Beijing, P.R. China Email: {zhangran, liujiang, htao, pan, shinning}@bupt.edu.cn

∗ State

Abstract—In Named Data Networking (NDN), Content Store (CS) is proposed to provide on-path cache service. When user’s request with content name is forwarded to NDN node, exact match in CS is carried out first. Under moderate cache hit ratio, most requests result in mismatch in CS searching process, which causes large overhead to the packet forwarding, and the overhead would rise as the scale of CS increases. In this work, request filter of CS is studied and Compression Trie based Bloom Filter (CTBF) is proposed. CT-BF takes advantage of on-chip Bloom Filter to quickly filter out mismatch requests, and Compression Trie is adopted to accommodate the large CS name Trie into spacelimited on-chip Bloom Filter. Optimal Compression Trie under space constraint is discussed for the first time and a heuristic approach Adaptive Compression Trie based Bloom Filter (ACTBF) is proposed for on-line operation. Simulation results show that ACT-BF can efficiently filter out mismatch requests with given on-chip space constraint and hence reduce average CS search delay. Index Terms—Named Data Networking, Compression Trie, Bloom Filter, Content Store

I. I NTRODUCTION Named Data Networking(NDN) [1], [2] is a promising future network architecture that changes the network semantics from IP address to content name. In NDN, users request content by sending Interest packet to the network with content name encoded in it, then the request is processed at each hop to find out whether there is local copy to serve it or how to further forward it towards the content source. Three key components within an NDN node enable the feature, namely Content Store (CS), Pending Interest Table (PIT) and Forwarding Information Base (FIB). CS maintains cached content chunks and their name index table, for every request forwarded to NDN node, CS index table search is carried out first. If an exact match entry is found, CS serves the request with its local copy, otherwise the request is forwarded to PIT and FIB for further processing. As is described, CS is the first component to deal with all incoming requests, and its performance could be bottleneck of the system throughput. Some researchers have realized the significance and put forward their solutions for CS design. In paper [3], CS index table scale was analyzed and possible application of several memory technologies in NDN were Jiang Liu is the corresponding author of the paper.

discussed. Paper [4] proposed a two-layer CS architecture design namely SRAM-based Hash Index table and DRAMbased chunk store to improve throughput and reduce energy cost of CS. In [5], Pan et al. put forward Locality-Aware Skip List to accelerate skip list searching. Hierarchical CS assisted with Prefetching and Multi-Thread Processing implementation was thoroughly studied to achieve 10Gbps throughput in [6]. Paper [7] proposed a similar caching architecture, but improvements were made to support video service in Information Centric Networking based on correlation of user requests. In [8], Terabyte-scale caching was studied by improving block device I/O efficiency. Although such worthy efforts have been made on CS architecture design, most of them treated every request equally and optimized the operation after cache hit happened. However, for moderate request redundancy, large proportion of requests cannot be satisfied by cached content, and the overhead of massive searching for these non-existent content in CS degrades the system performance. And what is worse, the side effect can be amplified as CS scale increases, which means a larger index table and more complex search operation for every request. The problem hinders the performance improvement of NDN node in terms of throughput and process delay, to solve this problem, we propose Compression Trie based Bloom Filter (CT-BF) for CS. On-chip Bloom Filter is adopted to quickly filter out requests which do not have local copies to serve them. While the on-chip memory is usually rare and often shared with other threads, space constraint is considered in CTBF, and Compression Trie is responsible for accommodating the large name index table into the capacity-limited on-chip Bloom Filter. Considering the filter performance is influenced by both Bloom Filter and Compression Trie, CT-BF takes both factors into account in the construction phase. Another problem aroused by compression is how to construct the optimal Compression Trie for request filtering, in this paper, Optimal Compression Trie considering space constraint is studied for the first time, and a heuristic solution Adaptive Compression Trie (ACT) is proposed to provide optimized filter performance. The remainder of the article is organized as follows. Section 2 introduces related work and technology background of CTBF. Section 3 discusses the forwarding model in NDN and

2

states the technical challenges to be solved. Section 4 demonstrates the proposed solution, CT-BF and ACT-BF. Section 5 describes the simulation and analyzes the system performance to validate its improvement. Section 6 concludes the paper with perspective of future work. II. R ELATED W ORKS AND T ECHNICAL BACKGROUND A. NDN Data Plane Data Structure and Table Lookup Scheme The ability of CS to deal with content request depends on its data structure and table lookup scheme. These problems have been studied extensively in traditional IP forwarding process, such as in [9] Sarang et al. applied Counting Bloom Filter to the Longest Prefix Matching (LPM) of IP addresses, and in [10] Will et al. proposed Tree Bitmap which compactly encoded large prefix index table and accelerated the forwarding process to support OC-192 (10Gbps) rate. However, new features in NDN name bring new challenges, the scale of names in NDN is much larger than that of IP address, which means a larger table to search against during the on-line forwarding process, and in addition the length of name in NDN is variable and usually longer than IP address, which causes extra complexity to name processing. To solve the arising problems, a lot of efforts have been made and up to now basically three kinds of approaches have been proposed. One is character tree or Trie based, which organizes name entries of NDN index table as a Trie based on some segmentation principles, and search as well as update is carried out on the Trie. Typical implementation of this idea is [11], Wang et al. proposed a parallel search algorithm which divided name index table into multiple number of modules, considering the accessing probability of a module, and performed a parallel search among the modules. Another is Hash function based solution, such as hash table and Bloom Filter. In [12] Lee et al. proposed on-chip Bloom Filter Based pre-searching scheme reducing the amount of off-chip searching. Wang et al. proposed two-stage Bloom Filter consisting of onememory-access Bloom Filter [13] and merged Bloom Filter, which improved the lookup throughput and reduced memory occupation. And the last solution is the integration of the preceding approaches, such as in [14], Quan et al. proposed to split the content name into B-prefix and T-suffix, then organized them as Bloom Filter, Hash Table and Trie, Bprefix helped to carry out longest prefix matching and also to accelerate T-suffix searching which returned searching result. In [15], Shi et al. proposed NDN-NIC, which applied on-chip Bloom Filter to filter out irrelevant requests and applied Trie to help update the on-chip Bloom Filter.

functions, then k corresponding bits in the bit array are set to 1, and the process repeats until all elements are fed into the bit array. The same operation is conducted when adding new elements to Bloom Filter. To test whether an entry is member of the set, the same k hash functions are carried out on the entry, then the corresponding k bits in the bit array are checked, and positive conclusion is drawn when all k bits are 1. However, Conventional Bloom Filter doesn’t support element deletion which is caused by hashing zone intersection of different elements, so Counting Bloom Filter (CBF) is proposed in [18] to solve the problem, instead of using k single bits to represent the state of an element, k integers are adopted, corresponding integers add 1 when element is inserted, and minus 1 when deleted. As long as an integer is positive, the position is considered as positive, and hence the deletion of one element wouldn’t influence others. But the cost is that space occupation grows from m to Size(Integer) · m. Hash collisions can happen in Bloom Filter as well, different entries could result in same hashing value by one hash function, and if all k hash values happen to be the same for two different entries and only one is in the set, then false positive would happen when querying for the one not belonging to the set. The possibility of false positive is given in [19] as (1). kn 1 kn ] ) ≈ (1 − e− m )k (1) m The parameters k, m and n are mentioned in the preceding paragraphs, as can be seen from (1), false positive is decided by k, m and n as well as the homogeneity of hash function. Another important feature of Bloom Filter is that although false positive could happen, there is no false negative, which means as long as the query result is false, there is no such entry in the set.

(1 − [1 −

C. Memory Technology To support line-speed cache operation, memory access speed and capacity need to be taken into account when designing CS data structure and table lookup scheme. In [3], Perino et al. made a survey on memory technology as is shown in Table I, although it has been several years since the survey, the relationships among different memory media still make sense nowadays. TABLE I: Memory Technology TCAM

SRAM

DRAM

SSD

Access Latency/ns

4

0.45

55

10000

Max Capacity

∼ 20Mb

∼210Mb

∼10GB

∼1TB

Price/$/MB

200

27

0.016

0.003

B. Bloom Filter Theory Bloom Filter [16] is a space efficient data structure for membership inquiry of a given set of elements. It has been widely deployed in searching systems, such as a typical large scale application case in [17]. Basic Bloom Filter consists of a bit array of length m and k hash functions. For a given set with n elements which would be accommodated into Bloom Filter for membership inquiry, each element is hashed to k different values with k hash

Ternary Content Addressable Memory (TCAM) locates an entry within a single clock cycle by comparing all stored data at the same time, however despite its attractive performance, limited capacity and high price restricts its application in CS design. Static Random-Access Memory (SRAM) is a type of semiconductor memory that doesn’t need to be refreshed periodically compared with Dynamic Random-Access Memory (DRAM), SRAM is faster and more expensive, so it is

3

typically used as CPU on-chip cache while DRAM is used as computer’s main memory. Solid State Drive (SSD) is a solidstate storage device that provides large and persistent content storage, and it is faster than traditional Hard Disk Drive though much slower than RAM. III. F ORWARDING M ODEL AND T ECHNICAL C HALLENGES A. Forwarding Model CS search is the first step of the forwarding process for every content request. Conventionally, throughput of CS is determined by its ability of searching for names carried by requests as Fig.1(a) shows, all requests are searched equally in large name index table no matter whether or not corresponding copies would be found. However, as Fig.1(b) suggests, the request flow actually consists of two parts, requests with local cached content chunks to serve them and requests without. The ability µCS to process incoming requests λin lies in two parts, the ability µCShit to process requests searched and satisfied locally and the ability µCSmiss to process requests needing further forwarding. To fully empower the system and make sure stable run, constraint (2) must be met. λin

μCSLookup CS

(a) Mismatch-Unaware Forwarding Model

λin

Request Filter

μDistinguish

μCSmiss

PIT+FIB

μPIT+FIB

μCShit CS

(b) Mismatch-Aware Forwarding Model

Fig. 1: Forwarding Model

related to the latency to make out such requests and further forward them. Under a moderate CS hit probability with about 30 percent of redundancy [22], demand for µCShit will be less challenging than µCSmiss . But as Fig.1(a) shows, traditional CS design doesn’t support differentiating requests, although this is very necessary. So the goal of our design is to promptly distinguish requests that cannot be served locally and forward them to PIT and FIB. B. Technical Challenges Although the table lookup problem has been widely studied in NDN FIB design, the problem is different in CS processing. Conventional FIB lookup focuses mainly on LPM, as long as one entry in FIB matches conjunction of first several segments of the requested name, FIB is responsible for processing the request. However, CS caches local copies of content, and only when exact match occurs should CS return the cached content chunks back to user, which hinders the direct application of previous research. Besides, the current CS design proposals treat incoming requests equally without discrimination, which brings heavy burden to the lookup engine. Request filter for CS is proposed in this work. Similar ideas have been proposed before by [15], while there are still challenges remain to be overcome. The request filter is usually placed in the on-chip memory to guarantee the access latency, while the size of name index table is proportional to the scale of cached content, which means a larger filter for a larger CS, and on-chip memory resources are rare and usually shared with other threads, so the memory constraint becomes an important concern, but none of the proposals takes account how to construct an optimal filtering model under the constraint of limited on-chip memory capacity. And in this proposal, the main target is to design a CS architecture with optimal request filter and hence minimize the average process delay of requests in CS under the constraint of limited space occupation. IV. P ROPOSED S OLUTION

λin < µDistinguish λin · phit < µCShit λin · pmiss < µCSmiss

(2)

µCSmiss < µP IT +F IB The cascade structure of NDN forwarding process and Constraint (2) suggest that to maximize allowed incoming request rate λin and process delay, µCShit and µP IT +F IB are of vital importance. Now that extensive research on PIT and FIB have been done such as in [3], [14], [20], [21], we assume ability of PIT and FIB µP IT +F IB is no hindrance of CS performance, and thus it is out of the scope of our work. And this way, key challenge to be solved in CS design is to enable the ability µDistinguish to distinguish cache hit and mismatch before CS search, and then guarantee µCShit is larger than λin · phit and µCSmiss is larger than λin · pmiss . µCShit is the ability to locate content chunks and return them to users, this is related to the speed of table lookup and storage device I/O throughput. µCSmiss is the ability to forward requests without copies cached locally, and this is

A. Solution Overview The solution we propose for CS request filtering is Compression Trie based Bloom Filter, and the overview of system design is Fig.2. Interest Flow SRAM BF Update DRAM CBF Name Trie Most Popular Content

PIT+FIB

Update

SSD Content Storage

Content Store

Fig. 2: CS Design Architecture The basic architecture of CS consists of three parts and correspondingly three kinds of memory medium, fastest SRAM, faster DRAM and fast SSD. In on-chip SRAM, Bloom Filter

4

is maintained to promptly check the local existence of content requested by Interest. In DRAM, popular and frequently accessed content is cached, a name Trie is maintained based on the name of content cached locally including those in DRAM and SSD, and accordingly a Counting Bloom Filter (CBF) is updated by inserting all name entries in name Trie, the CBF is also responsible for the update of Bloom Filter in SRAM. SSD is a pure warehouse for large scale of data storage. When requests are forwarded to an NDN node, a large proportion will not be served locally and they are supposed to be quickly filtered out by fast SRAM-based Bloom Filter and forwarded to PIT, for the rest accepted by CS, a Trie search is carried out, if a local copy is found, it is directly returned to the user, otherwise, false positive of the request Filter happens and corresponding Interest is forwarded to PIT, which guarantees all requests are served. The fast request filtering relies on on-chip Bloom Filter. Although multiple hashing checks of Bloom Filter increase the frequency of memory access, but the advantage of fast onchip memory access outweighs this disadvantage. However, as is mentioned in preceding sections, on-chip memory is usually rare and shared with other threads, so the space allocated to Bloom Filter is limited, and with given memory space and false positive rate, the capacity of Bloom Filter is determinate, once the number of name Trie entries exceeds the capacity, the performance of Bloom Filter would degrade apparently. In this case, Compression Trie is adopted to represent all the entries in CS. B. Compression Trie Based Bloom Filter 1) Name Aggregation Opportunities: Content name in NDN is organized in hierarchical manner, and the name consists of several segments divided by separator “/”, then in CS the name index table is organized as name Trie according to the segmentation. The conjunction from Trie root to leaf node constitutes an intact name, children of the same node on the conjunction path share the same prefix. In the following, we will use leaf node to denote the conjuction from Trie root to the leaf for simplicity. The compression approach we come up with is Compression Trie, which has prefixes to represent the whole name set. From the perspective of the Trie structure, the idea is not to add the entire Trie leaf node set to Bloom Filter, but to feed nodes in certain level of the Trie into Bloom Filter, i.e. feed prefix set into Bloom Filter. The logic is that different names might share the same prefix, and the scale of prefix is smaller than that of the entire name set. In order to verify the feasibility of name compression, we analyze the URL trace described in [23] due to the lack of large scale NDN traces. In the analysis process, 4.5 million URL names are collected and names in URL dataset are transformed into NDN style and then organized into Trie structure. For example, www.google.com/doodles is transformed into /com/google/www/doodles, the root node of the Trie is empty, and the first and second level nodes are “com” and “google”, then conjunctions are “/com” and “/com/google” respectively. The Trie based dataset information is shown in Fig.3. As is

demonstrated in Fig.3, about 80% of the URLs have names shorter than 7 segments. and within the length range major name elements lie, scale of nodes in certain level is much larger than their parents, which shows an optimistic possibility to compress the names in hierarchical manner. 2.5

×10 6

1 Name Length CDF Number of Nodes in this Level Number of Names of This Length

2

1.5 0.5 1

0.5

0 0

1

2

3

4

5

6

7

8

0 9 10 11 12 13 14 15

Length/Components

Fig. 3: Name Length and Prefix Aggregation Analysis After the rationality is proved, the compression is carried out hierarchically as shown in Fig.4(a), then all the leaf nodes of the Compression Trie are added to the Bloom Filter. When user request is forwarded to CS, Bloom Filter query of name prefix carried by request is carried out first, if positive result is returned, name Trie search is executed to locate the content, otherwise it proves the requested content doesn’t exist in local CS, then the request is forwarded to PIT and FIB for further processing.

(a) Direct Compression Trie

(b) Adjusted Compression Trie

Fig. 4: Compression Trie 2) Construction of Compression Trie Based Bloom Filter: Problem arises in the process of Compression Trie Based

5

Bloom Filter (CT-BF) construction. In previous work, such as in [15], entries are added into Bloom Filter until the false positive threshold is reached, and the threshold is usually set based on empirical assumption. This approach doesn’t apply to the condition when false positive of the request filter results from not only Bloom Filter but also that of Compression Trie. As Fig.4(a) shows, the false positive of Compression Trie would reduce when it covers the third level of the name Trie compared with the second. However, deeper level involvement in Compression Trie means more elements in Bloom Filter, and for given Bloom Filter space occupation, it means higher false positive. The tradeoff between the two kinds of false positive needs to be balanced and the balance should be achieved before the construction of CT-BF. Because for given space occupation and element number of Bloom Filter, the lowest false positive can be achieved by choosing appropriate number of hash functions, which is given in (3), once it is decided, it cannot be changed on-line. m ln2 (3) n CT-BF takes account of false positive of both Bloom Filter and Compression Trie, for given bit array length, CT-BF estimates the filter ratio of Compression Trie when certain level elements are added into Bloom Filter, and calculates the false positive of Bloom Filter, then derives the total false positive and find the condition with maximal filter ratio. The problem statement is (4), and CT (l) is the Compression Trie covering the l − th level of the name Trie, σ(CT (l)) is the number of elements in Compression Trie, f p(·) is the false positive ratio of Bloom Filter, and f r(·) is the filter ratio of given Compression Trie. With the depth of Compression Trie decided, CT-BF is decided accordingly. A noteworthy phenomenon in CT-BF is that the false positive of Compression Trie is estimated based on the its depth rather than its capacity. This is because for given capacity of Compression Trie, the filter ratio is influenced by a lot of factors, such as request pattern, name length distribution, Compression Trie shape and so on, and thus there is no analytical form for the mapping between filter ratio and capacity, so it is not practical to estimate the filter ratio based on Compression Trie capacity. For simplicity and robustness, Compression Trie depth is adopted to guarantees stable filter performance, besides, the scale of Compression Trie depth is much smaller than that of Compression Trie capacity, so even the enumeration is flexible. k=

max{(1 − f p(σ(CT (l))) · f r(CT (l))} s.t. 0 6 l 6 M axT rieDep

(4)

3) Update of CT-BF: Update of CT-BF is very flexible, Compression Trie does not require extra memory space, it lies on the name Trie and only its leaf nodes are marked in name Trie, so its update consists with the that of name Trie. When new content chunk is cached in CS, its name is added to name Trie, the first segment of the name is added to the first level, and then the second until all the segments are added, if there is a Compression Trie leaf node on the traversal path, the original Compression Trie leaf node is able

to cover the newly added name entry, and no change needs to be done to Compression Trie, otherwise Compression Trie needs to update accordingly. If the name entry is shorter than Compression Trie depth D, then the last segment of the name is marked as the Compression Trie leaf node, and if it is longer, then the D − th segment of the name are marked as leaf of Compression Trie. As for deletion of the name entry, the last component is deleted first, and then the last but one, until the single branch of the name Trie is deleted, if there is an Compression Trie leaf node on the deletion path, then the Compression Trie leaf node is deleted as well. After the insertion or deletion happens the CBF is updated accordingly, and on-chip Bloom Filter is updated if necessary. C. Adaptive Compression Trie Based Bloom Filter 1) Optimal Compression Trie: Basic CT-BF can be constructed based on the aforementioned solution, with the depth of Compression Trie decided, the capacity of CT-BF is decided as well. However, for given capacity, the Compression Trie covering the entire level does not necessarily achieve optimal filter performance. The optimal Compression Trie is constructed to filter out maximum irrelevant requests and hence minimize average search delay in CS. In a more formal way, the question can be expressed as (5). minSD(N, T, T c, R) s.t.

σ(T c) 6 δ

(5)

U (T c) 6 τ In (5), SD(·) is the function of average search delay, N is the request name set, T is the CS name Trie, T c is the Compression Trie generated from T , R is the user request pattern, σ(T c) is the number of elements in T c, δ is the upper bound of T c element number, U (T c) is the average update time consumption, and τ is the upper bound of acceptable update time. More specifically, the construction of Optimal Compression Trie relies on the knowledge of N , T , and R. For example, supposing /A/B/C is the Compression Trie leaf node and added to Bloom Filter, the name set under it consists of /A/B/C/1, /A/B/C/2 and /A/B/C/3, and CS caches only /A/B/C/1, if user requests a content chunk with name prefix /A/B/C, it will return “True” after Bloom Filter query and then Trie search is conducted, but it does not guarantee the exact match in Trie search, and the possibility of exact match in local CS depends on the popularity distribution among the three pieces of content as well as caching policy. In one word, to fully utilize the limited memory space in terms of request filtering, construction of Optimal Compression Trie needs to take account of the request pattern, request name set and CS name Trie architecture. Then the Optimal Compression Trie can be achieved as is shown in (6). X X min (λpref ix(i) · p(chunk(i, j)|pref ix(i))) pref ix(i)

chunk(i,j)∈T /

s.t. 0 6 i 6 σ(CT (l)) ∀m, n pref ix(m) * pref ix(n) [ leaf (pref ix) = T (6)

6

1 3 Component 4 Component 5 Component 6 Component 7 Component 8 Component 9 Component 10 Component Average

0.9 0.8 0.7

Filter CDF

In (6), pref ix(i) is a Compression Trie leaf node, which represents a name prefix and covers several children leaf nodes of original name Trie. λpref ix(i) is the request rate of content with prefix pref ix(i), chunk(i, j) is a name entry which shares pref ix(i), and the restriction means that content chunk with name chunk(i, j) isn’t cached locally. p(chunk(i, j)|pref ix(i)) is the specific possibility of requesting chunk(i, j) when the request shares pref ix(i). The first subjection is to guarantee that number of elements or leaf nodes in Compression Trie doesn’t exceed capacity constraint, the second guarantees there is no ambiguous search result from Bloom Filter query, and the last one guarantees all leaf nodes in original name Trie are covered by Compression Trie. With all information provided, the formulation (6) becomes a knapsack problem, which is usually NP hard. Another realistic problem which hinders direct solution of (6) is that the information of request pattern and request name set is usually unavailable to the caching node. To overcome the challenges and achieve satisfying performance in realistic CS request filtering, a greedy based heuristic solution Adaptive Compression Trie (ACT) is proposed. 2) Adaptive Compression Trie: ACT dynamically adjusts the compression Trie to achieve higher request filter ratio under the space constraint. The principle of adjustment is to spare space to expand the Compression Trie where most mismatches occur. The term “expand the Compression Trie” means removing the leaf node in the Compression Trie and adding its children nodes as the new Compression Trie leaves, so longer name prefixes are added to the request filter which promise more accurate filtering. While the total space occupation for ACT is fixed, expansion usually requires more space occupation, if the compression Trie keeps expanding without making room for newly added prefix entries, then there would finally be no place for expansion or the expected improvement, so shrinking of the Compression Trie is proposed as well. The term “shrink the Compression Trie” means to add one node as new leaf of Compression Trie whose children are all Compression Trie leaf nodes, and then remove all its children from the Compression Trie, in this way space for NUM(node.children)1 elements could be spared for expansion. The shrinking takes place where least negative effects would be made to ACT filtering, and node whose children attract least requests is chosen as the shrinking node. The rationality of the adjustment could be further proved by Fig.5. In Fig.5, 100000 different name entries are added into CS and name search is carried out across the 4.5 million name set, for names resulting in mismatch, the depth of first component mismatch is recorded, and the CDF for the entire name set is demonstrated in the figure. As can be seen, for all lengths, longer prefixes achieve better filter performance. Taking names with 5 components for example, if the first 3 components are covered by the Compression Trie, then about 40% of mismatch names with 5 components can be filtered, if the first 4 components are involved, then the proportion would rise to about 70%. The improvement validates the “expanding” and “shrinking” of ACT. The expanding and shrinking of ACT bring changes to the Bloom Filter querying process, because lengths of name pre-

0.6 0.5 0.4 0.3 0.2 0.1 0

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Prefix Length in Filter

Fig. 5: Prefix Filter Performance fixes added to Bloom Filter are different after the adjustment as Fig.4(b) shows, multiple queries for different prefix lengths becomes necessary. To prevent the computation overhead of Bloom Filter query from outweighing the benefits gained from ACT-BF filtering, the ACT depth is confined with an upper and lower bound. For request with name longer than lower bound, Bloom Filter query is conducted from the length of lower bound, if negative is returned, then add another component to the prefix and conduct query again, until positive result is returned and then name Trie search is conducted, if even the entire name or prefix of upper bound length still returns false, then it proves there is no local copy that can serve the request and it is forwarded to PIT for further process. For request with name shorter than the lower bound, the entire name is regarded as name prefix and preceding query and process is conducted as well. The ACT-BF construction consists of two phases, in the first phase, CT-BF evaluates the false positive of Bloom Filter and the filter ratio of Compression Trie based on space constraint, then constructs the basic Compression Trie with best filter performance. In the second phase ACT is conducted, leaf node expansion happens when the mismatch reaches certain threshold and shrinking is carried out to make space for the expansion, and thus the total request filter performance is improved. V. E VALUATION In this section, simulation is carried out to validate the improvement of our design. The simulation is carried out from two aspects. In the first part, different settings of CT-BF are tested, and in the second part, the greedy based ACT-BF solution is tested. A. Simulation Setup The simulation is based on the request trace described in [23], the request URLs in the data set are transformed into NDN style to guarantee the accuracy of the evaluation. The first 10 million requests are chosen as the experiment data set,

7

0.9

0.7 0.6

B. Compression Trie Based Bloom Filter

0.5 0.4 0.3 0.2 0.1 0 0

1

2

3

4

5

Level Covered

(a) Filter Ratio Regression

3.5

×10 5 FP=1% FP=10% FP=20% FP=30% FP=40% FP=50%

3

2.5

Space Occupation/bit

According to repeated experiments of 100 times, Compression Trie covering certain level of CS name Trie occupies a relatively fixed amount of space, and results in very similar filter performance when the same level is covered. Fig.6(a) is the validation of the assumption of CT-BF filter accuracy, the dash lines are the estimated filter ratio, and the full lines represent the actual filter performance. As can be seen from Fig.6(a), the filter performance estimation basically complies with the actual filter performance, but for false positive ratio lower than 30%, it tends to underestimate the filter performance, and for false positive higher than 30%, it tends to overestimate. Another interesting observation from Fig.6(a) is that when the Compression Trie covers the fifth level of the name Trie, it tends to underestimate the filter performance of all false positive ratios. It is straightforward that lower false positive ratio and more elements in Compression Trie could result in better filter performance, but as Fig.6(a) suggests, under favourable conditions, the filter performance could be enlarged compared with the theoretical standard. However, the performance improvement comes with cost, the space consumption of Bloom Filter for different false positive ratio and different Compression Trie capacity is described in Fig.6(b). As Fig.6(b) shows, although the Bloom Filter for Compression Trie covering the fifth level of the name Trie with false positive of 1% achieves highest filter ratio, it occupies much more space than Bloom Filters of other settings, over six times of the one with same capacity and 50% false positive, and from this perspective, the 1% false positive Bloom Filter is very costly. And another cost which isn’t shown in the figure is the number of hash functions required in the process of element inquiry, for given Bloom Filter space and element number, optimal number of hash functions and false positive rate is decided accordingly, and for the condition when 1% false positive is required, 7 hash functions need to be executed for every element inquiry, but for 50% false positive, only 1 hash function is required. The phenomenon suggests for system with powerful computing capability and abundant onchip memory resources, lower false positive ratio could be adopted, but for system with limited computing and on-chip memory resources, cost performance of higher false positive ratio is apparently higher. Then we evaluate the performance of the CT-BF, different length of bit arrays are allocated to Bloom Filter to carry out requests filtering. Considering the baseline of Bloom Filter with fixed false positive, for given bit array and false positive rate, the capacity of Bloom Filter is decided, but the capacity could be too large for level l coverage but not enough for level l + 1. In order to prevent the waste of precious on-chip

Actual FP=1% Actual FP=10% Estimated FP=10% Actual FP=20% Estimated FP=20% Actual FP=30% Estimated FP=30% Actual FP=40% Estimated FP=40% Actual FP=50% Estimated FP=50%

0.8

Filter Ratio

and there are about 4.5 million different name entries in the experiment dataset. Then 40000 entries are randomly chosen and added into CS, approximately 1% of the total data set. In this experiment, the sequence of the request processing is decided by their original sequence to avoid influence to the evaluation result, and in this case, the cache hit ratio is about 25%.

2

1.5

1

0.5

0 0

1

2

3

4

5

Level Covered

(b) Bloom Filter Space Occupation

Fig. 6: CT-BF Features memory, expansion of Compression Trie is allowed until there is no spare space for expansion. In this way, the space allocated to the Bloom Filter can be fully utilized. The result is shown in Fig.7, as can be seen, CT-BF always outperforms the fixed false positive ratio Bloom Filter in terms of request filtering. A phenomenon worth noticing is the curves of Bloom Filter with false positive rate of 30% and 50% are incomplete, this is because for given bit array and given CS dataset, even if the entire name Trie is fed into the Bloom Filter, false positive rate of properly constructed Bloom Filter is still lower than the expected level, and the result doesn’t make sense in terms of performance comparison. C. Adaptive Compression Trie Based Bloom Filter The basic CT-BF can decide the optimal level to contain taking account of the false positive of both Bloom Filter and Compression Trie, while further improvement could be made under the space constraint as (6) shows, the greedy based heuristic solution ACT-BF has been tested in this part. Now that Compression Trie covering the fifth level achieves better performance for all false positive ratio, the comparison is

8

average search depth is defined as the sum of DRAM access time during name Trie search divided by the number of total requests, as can be observed from the result, CT-BF can reduce average search depth to the half of the original Trie based solution when false positive of Bloom Filter is tuned to 1%, and ACT-BF could further enlarge the improvement under the same space constraint. As the false positive of Bloom Filter increases, average search depth would increase as well, and when it reaches 50%, the search depth saving of ACT-BF would decline to about half of the original level, and the trend consists with the request filter ratio.

0.9 CT-BF BF-10% BF-30% BF-50%

0.8 0.7

Filter Ratio

0.6 0.5 0.4 0.3 0.2 0.1

4.5

0 0

2.5

5

7.5

10

12.5

Bitarray Length

×10

4

4 3.5

carried out under the setting as well. The result is shown in Fig.8, as is described in the figure, ACT-BF outperforms CTBF with the same Bloom Filter capacity and false positive ratio, as the false positive rate of Bloom Filter increases, the filter ratio declines for both approaches, and the improvement of ACT-BF over CT-BF declines as well. This is because ACT aims to improve the filter ratio of Compression Trie, when the false positive of Bloom Filter is low, the false positive of Compression Trie is the major influence factor of the overall filter performance, and when the false positive of Bloom Filter is high, it becomes the dominant factor that hinders the improvement of filter ratio.

4

1 CT-BF ACT-BF Improvement

Filter Ratio

3

0.6 2 0.4

1

Improvement of ACT over CT(%)

0.8

0.2

0 1%

10%

20%

30%

40%

50%

False Positive Rate

Fig. 8: Adaptive Compression Trie Filter Performance Another important metric is the average name Trie search delay. As is described in the preceding part, cache hit ratio is about 25%, and for the rest 75%, with CT-BF adopted, up to 85% of them could be filtered out, and ACT-BF could further improve the filter ratio. Even for the Bloom Filter with false positive ratio of 50%, over 40% of the mismatch requests could be filtered. This improvement applies to the Trie search depth as well, which reflects the time complexity of name Trie searching. The result is shown in Fig.9. The

Average Search Depth

Fig. 7: CT-BF Evaluation

0

Trie CT-BF ACT-BF

15

3 2.5 2 1.5 1 0.5 0 1%

10%

20%

30%

40%

50%

False Positive Rate

Fig. 9: Average Search Depth ACT-BF is not implemented in hardware, but the key metrics are measured and evaluated, such as average DRAM access frequency, SRAM access frequency and number of hash functions to be executed for each request. Performance of ACT-BF is compared with NLAPB proposed in [14], and the result is shown in Fig.10. As is shown in the figure, ACT-BF outperforms NLAPB in terms of DRAM access, which is the most time-consuming operation, and as the on-chip memory increases, the improvement becomes more apparent, when 125 kb on-chip memory is provided, DRAM access required by ACT-BF is only half of NLAPB. However, ACT-BF requires more hashes and SRAM accesses as the on-chip memory increases, which is inferior to NLAPB, but now that the SRAM access is very fast and hash execution could be accelerated by multi-threading and preprocessing, this disadvantage can be compensated as well. Another phenomenon is the curve of ACT-BF SRAM access overlaps that of hash function number, and this is because in ACT-BF, SRAM is only accessed after hash functions of Bloom Filter is carried out, so the data is identical to each other. In summary, ACT-BF is able to achieve lower process delay in the condition where on-chip memory is limited. VI. C ONCLUSION This paper proposes ACT-BF, a request filter for NDN CS. ACT-BF prevents irrelevant request searching in CS and forwards them to PIT and FIB processing, which speeds up the forwarding process and reduces the burden of CS lookup

9

5

12

10 NLAPB-SRAM ACT-BF-SRAM NLAPB-Hash ACT-BF-Hash NLAPB-DRAM ACT-BF-DRAM

3

8

2

6

1

4

0

Frequency

DRAM Access Frequency

4

2 2.5

5

7.5

10

On-Chip Memory/bit

12.5 ×10 4

Fig. 10: Overall Performance Evaluation engine. The challenge is to take advantage of the fast yet limited on-chip memory to construct the optimal request filter and filter out as many irrelevant requests as possible. ACTBF simplifies the problem into two stages, in the first step the capacity of Bloom Filter under given space constraint is decided considering false positive of both Bloom Filter and Compression Trie, and in the second ACT-BF tries to optimize the filter performance with given Compression Trie capacity. Although we don’t implement ACT-BF in hardware, the major technical challenges have been solved. Simulation results show that under given space constraint, ACT-BF is efficient in terms of request filtering and CS searching burden offloading. ACKNOWLEDGEMENTS The work described in this paper was supported by the National High Technology Research and Development Program (863) of China (Project Number: 2015AA016101), National Natural Science Foundation of China (Project Number: 61671086), Beijing New-Star Plan of Science and Technology (Project Number: Z151100000315078) and Consulting Project of Chinese Academy of Engineering (Project Number: 2016XY-09). R EFERENCES [1] V. Jacobson, D. K. Smetters, J. D. Thornton, M. F. Plass, N. H. Briggs, and R. L. Braynard, “Networking named content,” in Proceedings of the 5th international conference on Emerging networking experiments and technologies. ACM, 2009, pp. 1–12. [2] L. Zhang, A. Afanasyev, J. Burke, V. Jacobson, P. Crowley, C. Papadopoulos, L. Wang, B. Zhang et al., “Named data networking,” ACM SIGCOMM Computer Communication Review, vol. 44, no. 3, pp. 66–73, 2014. [3] D. Perino and M. Varvello, “A reality check for content centric networking,” in Proceedings of the ACM SIGCOMM workshop on Informationcentric networking. ACM, 2011, pp. 44–49. [4] S. Arianfar, P. Nikander, and J. Ott, “Packet-level caching for information-centric networking,” in ACM SIGCOMM, ReArch Workshop, 2010. [5] T. Pan, T. Huang, J. Liu, J. Zhang, F. Yang, S. Li, and Y. Liu, “Fast content store lookup using locality-aware skip list in contentcentric networks,” in Computer Communications Workshops (INFOCOM WKSHPS), 2016 IEEE Conference on. IEEE, 2016, pp. 187–192.

[6] R. B. Mansilha, L. Saino, M. P. Barcellos, M. Gallo, E. Leonardi, D. Perino, and D. Rossi, “Hierarchical content stores in high-speed icn routers: Emulation and prototype implementation,” in Proceedings of the 2nd International Conference on Information-centric Networking. ACM, 2015, pp. 59–68. [7] G. Rossini, D. Rossi, M. Garetto, and E. Leonardi, “Multi-terabyte and multi-gbps information centric routers,” in INFOCOM, 2014 Proceedings IEEE. IEEE, 2014, pp. 181–189. [8] W. So, T. Chung, H. Yuan, D. Oran, and M. Stapp, “Toward terabytescale caching with ssd in a named data networking router,” in Architectures for Networking and Communications Systems (ANCS), 2014 ACM/IEEE Symposium on. IEEE, 2014, pp. 241–242. [9] S. Dharmapurikar, P. Krishnamurthy, and D. E. Taylor, “Longest prefix matching using bloom filters,” in Proceedings of the 2003 conference on Applications, technologies, architectures, and protocols for computer communications. ACM, 2003, pp. 201–212. [10] W. Eatherton, G. Varghese, and Z. Dittia, “Tree bitmap: hardware/software ip lookups with incremental updates,” ACM SIGCOMM Computer Communication Review, vol. 34, no. 2, pp. 97–122, 2004. [11] Y. Wang, H. Dai, J. Jiang, K. He, W. Meng, and B. Liu, “Parallel name lookup for named data networking,” in Global Telecommunications Conference (GLOBECOM 2011), 2011 IEEE. IEEE, 2011, pp. 1–5. [12] J. Lee, M. Shim, and H. Lim, “Name prefix matching using bloom filter pre-searching for content centric network,” Journal of Network and Computer Applications, vol. 65, pp. 36–47, 2016. [13] Y. Qiao, T. Li, and S. Chen, “One memory access bloom filters and their generalization,” in INFOCOM, 2011 Proceedings IEEE. IEEE, 2011, pp. 1745–1753. [14] W. Quan, C. Xu, J. Guan, H. Zhang, and L. A. Grieco, “Scalable name lookup with adaptive prefix bloom filter for named data networking,” IEEE Communications Letters, vol. 18, no. 1, pp. 102–105, 2014. [15] J. Shi, T. Liang, H. Wu, B. Liu, and B. Zhang, “Ndn-nic: Name-based filtering on network interface card.” in ICN, 2016, pp. 40–49. [16] B. H. Bloom, “Space/time trade-offs in hash coding with allowable errors,” Communications of the ACM, vol. 13, no. 7, pp. 422–426, 1970. [17] F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber, “Bigtable: A distributed storage system for structured data,” ACM Transactions on Computer Systems (TOCS), vol. 26, no. 2, p. 4, 2008. [18] L. Fan, P. Cao, J. Almeida, and A. Z. Broder, “Summary cache: a scalable wide-area web cache sharing protocol,” IEEE/ACM Transactions on Networking (TON), vol. 8, no. 3, pp. 281–293, 2000. [19] M. Mitzenmacher and E. Upfal, Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge University Press, 2005. [20] H. Lim, M. Shim, and J. Lee, “Name prefix matching using bloom filter pre-searching,” in Proceedings of the Eleventh ACM/IEEE Symposium on Architectures for networking and communications systems. IEEE Computer Society, 2015, pp. 203–204. [21] D. Perino, M. Varvello, L. Linguaglossa, R. Laufer, and R. Boislaigue, “Caesar: A content router for high-speed forwarding on content names,” in Proceedings of the tenth ACM/IEEE symposium on Architectures for networking and communications systems. ACM, 2014, pp. 137–148. [22] X. Hu and A. Striegel, “Redundancy elimination might be overrated: A quantitative study on real-world wireless traffic,” arXiv preprint arXiv:1605.04021, 2016. [23] Y. Liu, J. Miao, M. Zhang, S. Ma, and L. Ru, “How do users describe their information need: Query recommendation based on snippet click model,” Expert Systems with Applications, vol. 38, no. 11, pp. 13 847– 13 856, 2011.

Suggest Documents