Split-Cache: A Holistic Caching Framework for Improved Network

0 downloads 0 Views 245KB Size Report
department of New Mexico State University, Las Cruces, NM. Email:{misra, rtourani}@cs.nmsu.edu. A. Related Work ..... My(u, j) = (1− fy (u,j). maxxeе1U{n}.
Split-Cache: A Holistic Caching Framework for Improved Network Performance in Wireless Ad Hoc Networks Nahid Ebrahimi Majd† , Satyajayant Misra‡ and Reza Tourani‡ Abstract—Wireless ad hoc networks (WAHNs) consist of autonomous nodes cooperating with each other to transmit/receive data over multiple-hops in the network. Caching is a useful mechanism to leverage this cooperation. Nodes with cached content can satisfy requests from other nodes, thus helping reduce network traffic and energy consumption, and improve latency. With the proliferation of wireless devices on the Internet and the proposal of a future Internet with emphasis on in-network caching, improvements in caching can significantly improve network response while reducing network load. In this paper, we present a holistic caching framework, SplitCache, which enables a network node to account for the frequency of requests of data items and their presence in the network, and to leverage a split-cache (one part caches popular items and the other caches less popular items) to make caching and cache-eviction decisions. We performed exhaustive simulations to compare SplitCache with the state-of-the-art: Split-Cache improved the cache request resolution time on an average by 30% (and as high as 72%), and required 15% less average traffic for resolving requests–large savings when considering large number of requests. Keywords: In-network caching, wireless ad hoc network, content delivery, caching.

1. I NTRODUCTION Cooperative-caching mechanisms in WAHNs seek an optimal data placement on the network nodes to reduce the overall data access cost/latency. The study of cooperative caching is motivated by the following reasons: First, given the intrinsic multi-hop communication, network performance can be greatly enhanced from caching and data provisioning from in-network nodes. Second, if a data request is satisfied from nodes close to the requester, it can significantly reduce the number of network communications, consequently reducing the power consumed by the network nodes in forwarding data. Third, given that the network topology changes continuously on account of node mobility, obtaining data from the neighborhood will be more successful than from a distant source. By the time the data arrives at the requesting node, the node may have already moved, resulting in delivery failure. An important limiting factor in leveraging data caching is the cache size. In practice, a node’s cache size is limited, thus it is imperative that it is used wisely to help improve system performance. Hence a node following any caching strategy has to make two important decisions: caching decision, which is to decide whether to cache a new data item; and the cache replacement strategy, that is, which item to evict in the event that the cache is already full. A caching decision policy determines the nodes that are qualified to cache a data and a cache replacement strategy defines a policy to nominate a subset of cached data to be evicted in order to free space for new arriving data. † The author is with the Computer Science and Information Systems department of California State University, San Marcos, CA. Email:{nmajd}@csusm.edu. ‡ Both authors are with the Computer Science department of New Mexico State University, Las Cruces, NM. Email:{misra, rtourani}@cs.nmsu.edu.

A. Related Work Cooperative caching in wireless ad hoc networks was first studied by Yin and Cao in [7] where they proposed three distributed caching strategies. In HybridCache–the best among the three strategies–nodes on the data forwarding path cache the data if its size is below a threshold, but if the size is large, they cache the path to the closest location of the data (closer of the data source or destination) for future use. The system assumptions in [7] represent fairly general settings and hence have formed the basis in subsequent studies including this study. The centralized and distributed approaches presented by Tang et al. [6] employed a cache replacement function, which minimized the data access cost. Like most such solutions their centralized solutions are non-scalable in ad hoc networks. Their distributed approaches employed cache tables, thus incurring significant communication costs for table synchronization. The idea of employing cache tables in cooperative caching algorithms was expanded by Fan et al. [3] with the proposal of a Gossip-based method aimed at minimizing the total interruption intervals. In [4], Fan et al.proposed a contention-aware cache replacement to share only a single data item in a WAHN. Fiore et al. [5] presented Hamlet, a distributed cooperative cache replacement strategy that homogeneously scatters data items based on their presence. Hamlet forms the state of the art in cooperative caching. The principle in the design of Hamlet is based on data presence in a region, which is defined as the proportion of nodes in a given region that have the data in their cache. The higher the proportion the greater the data presence. In Hamlet, only the node requesting a data item caches it; if the cache on the node is full it evicts a stored item to make room. This eviction decision is based on the presence of the already cached item–the cached item with the highest presence is evicted from the cache. Hence, Hamlet attempts to cache data items with almost the same density throughout the network. In our study of the state of the art, we identified that there is no caching algorithm that is holistic and accounts for all factors affecting cache performance. The design of such a holistic algorithm is our focus here. B. Motivation The state of the art caching algorithms in the literature can be broadly classified into two categories: algorithms based on request frequency of data and neighborhood data presence. However, these algorithms are not holistic and end up being unable to leverage available caches effectively. a) For instance, approaches based purely on data request frequency (HybridCache and Benefit-based caching [6]) favor high request frequency data items (the popular first few), while neglecting low request frequency items. Hence they are inefficient in leveraging the cache for caching lower request frequency items, whose aggregated frequencies is still quite large, thus

undermining performance. b) On the other hand, for approaches based only on presence, high request frequency (popular) items, which also have high presence on account of their popularity, are invariably evicted from the nodes in a neighborhood due to their high presence. This may result in an unending “evictionrequest-eviction” sequence for popular items. The evictions, especially when they are not synchronized in a neighborhood, may result in subsequent requests having higher latencies and requiring the requests to travel more hops, thus undermining system performance. From our analysis, we have identified that an effective caching algorithm should consider both request frequency and data presence. We have also inferred that a differential treatment of popular and less-popular data items can improve the number of data-requests fulfilled (represented by the query-solving ratio), while reducing the network traffic requirements and average time to requests fulfillment (termed as average querysolving time). However, no approach in literature has attempted a holistic cooperative caching solution that considers the three factors mentioned above. These reasons motivate us to investigate a holistic cooperative in-network caching approach that combines data requestfrequency, presence-index, and also cache-splitting (popular and non-popular), to improve query-solving ratio, average query-solving time, and the average network traffic. In Section 2, we present our system model and assumptions. Section 3 presents our Split-Cache framework. In Section 4, we present the simulation results and analyses. In Section 5, we conclude this paper. 2. S YSTEM M ODEL AND A SSUMPTIONS In this paper, we assume we have a WAHN composed of a collection of stationary/mobile nodes communicating with each other. Each node is equipped with an omni directional antenna with fixed transmission range and the two-ray ground model is the signal propagation model. There is a set of datastorage nodes (called gateway nodes) in the network, which are strategically placed, and between themselves, can satisfy all data requests. Every node u has a cache and stores data items it requests from the network in its cache. Node u uses a cache replacement function, Merit function (refer Section 3), to evict item(s) from a full cache to accommodate newly requested data items. Each node u can count the number of requests it has received and forwarded for each individual data item. Each request contains a sequence number for unique identification. We assume that the cache on a node can be split. We use a weak cache-consistency model, in which each data item has a version number and the nodes delete old-versions in favor of the corresponding new-versions. A node u regularly asks for items it does not possess with a rate governed by the items’ popularity. The request frequency (popularity) of items follows a Zipf distribution [1]. A node u broadcasts a data-request by mitigated flooding, that is, the number of hops a request can propagate is upper-bounded by the time to live (TTL) value. All nodes (including u) reachable within this TTL value comprise the neighborhood, Nu of u. When a node u receives a data request, it checks its cache and returns the data item to the neighbor from whom it has received the request. In the case of a cache-miss, u refers to

a table of pending requests, to see if there is an outstanding request for the item. It does not re-broadcast the request if a pending request exists for the item. If the item is not in the cache and there is no pending request for it, then u waits for a query-lag time, and rebroadcasts the request if it sees no other rebroadcasts during this period. If u sees a re-broadcast from a neighbor, then it drops the request. This helps reduce redundant packet broadcast. When a data-request arrives at a gateway or a node that has the data in its cache, the item is transmitted to the requesting node by unicast along the reverse-path. 3. S PLIT-C ACHE F RAMEWORK In this section, we present our framework, Split-Cache, which enables each node u to make its own independent decision for cache replacement. In our framework, every node u keeps track of the number of unique requests for a data item i (request frequency) by simply counting its own requests for i and the number of unique requests for i it receives from its neighbors. Each node u also keeps information of the presence of item i in its neighborhood as it either is involved in data forwarding or can overhear (promiscuous mode) the data transmission to the requester. For each data item i that u helps forward to the requester r, u records the distance hr in number of hops of r (r is likely to cache the data item as well). Node u also records the distance hs in number of hops to the node s, which satisfies the request. Based on the above calculations, u computes a presence index for each data item. This index is an indicator of the data item’s presence redundancy in the neighborhood. Intuitively, the more the presence index of a data item i in u’s neighborhood, the higher is the probability that u evicts i to free the cache for a newly arriving requested item, as i is widely available in the neighborhood. Although the data presence in the neighborhood has a significant impact on cache-replacement decision function, the effect of request frequency cannot be unaccounted. Hence in Split-Cache, we consider the presence of data items and also the frequency of their requests in a neighborhood in a certain time period (τ ) and use both these parameters for cache-replacement decisions at a node. In addition, we add the important aspect of a split-cache at a node. The cache at a node is split into two parts: one for popular data and the other for less-popular data. s

s

r

u

r

(1)

(2)

hr = 5 s

hs = 3

v

hs = 3 r

s (3)

hs = 3

hr = 2 u

r (4)

Fig. 1. Scenarios for computing presence index: Node r is a requester, node s satisfies the request, node u is a forwarder and node v overhears the item. From node u or v, hr is the distance of r, and hs is the distance of s.

A. Estimation of Presence Index For Split-Cache we break the duration T of operation of nodes in the network into smaller time steps. At each time step j (end of the time step), for each data item i, a forwarding node u (or an overhearing node v) calculates distances of all nodes that have requested i or have satisfied the request for i. Fig. 1 shows the four possible scenarios for recording these distances at an intermediate node: (1) When a data provider satisfies a request, it records the distance of requesting node,

which would likely cache the item. (2) A forwarding node u that transfers a data item, records distances of both the provider and the requesting nodes. (3) An intermediate node v that does not receive/forward a request, but overhears the data item, records the distance of the data provider (from the time-to-live value in the packet). (4) A node receives a duplicate data item from another provider, that is, the request has already been satisfied (solved) by one other provider. In this figure, a requesting node (r), colored pink, issues a request and a source or a data provider (s), colored red, satisfies the request. The figure shows how a forwarding node u or an overhearing node v (small green concentric-ring circle) records distances needed to compute the current presence index. The distances of the data provider and the requesting node from u are shown by hs and hr respectively. The set of data providers and requesting nodes whose distances has been recorded by u for data item i during estimation time step j is given by si (u, j) and ri (u, j) respectively. The term estimated presence index pˆi (u, j) at time step j accounts for the presence of new copies of data item i in the neighborhood of u, identified by u when u sourced, received, forwarded, or overheard i. This estimated presence index is defined as: X 1 X 1 + } (3.1) pˆi (u, j) = min{1, hs hr s∈si (u,j)

r∈ri (u,j)

where pˆi (u, j) is in the range [0, 1]. If pˆi (u, j) is 0, it means u has not seen (received, sent, overheard, or forwarded) data item i during time step j, that is, hr = hs = ∞. In case u has a copy of i or is aware of a copy of i cached one hop away, then pˆi (u, j) would be 1, which is the best case scenario. If an item i is more than one hop away from u, then the value of the estimated presence index can be between 0 and 1, with presence of several copies possibly adding up to 1. Thus the presence of an item at a few nearby nodes may have the same value as the presence of the item at several far-off nodes. B. Cache Replacement Function The limited cache size implies that a node needs a decision function to choose an item(s) to evict from a full cache to accommodate a newly arriving item(s). One option for the decision function is to use weighted presence indices over a recent time window, say τ (a system parameter), to decide which item to evict, where the weight of an index depends on its recency [5]. For instance, at time step j, node u can weigh all indices, that is, pˆi (u, k) ∈ {ˆ pi (u, j − τ + 1), . . . , pˆi (u, j)}, by a smoothing factor ωi (u, k, j), which is defined as  1, if (j − k) ≤ Γ(u, k) ωi (u, k, j) = (3.2) 0, otherwise where Γ(u, k) = ⌊f · χ ˆi (u, k − 1)⌋, f is the frequency of estimation of the parameters in each time unit, and χ ˆi (u, k − 1) is the estimate, in time step k, of the amount of time item i will be cached and is defined in Equation 3.5 [5]. Equation 3.2 computes the weight for each of the τ estimated presence indices for item i. The weight for the presence indices for the last Γ(u, k) estimation steps being one and zero for all steps before that. Node u weighs the past presence indices by their corresponding weights as φi (u, k, j) = ωi (u, k, j) · pˆi (u, k), ∀i and k.

(3.3)

Then, the cumulative estimated presence index at estimation step j is computed as j X ˆ i (u, j, τ ) = φi (u, k, j), (3.4) p k=j−τ +1

where τ is the window size. Then, the estimated caching time χ ˆi (u, j) in time step j is computed as follows   ˆ i (u, j, τ ) p · MC , (3.5) χ ˆi (u, j) = 1 − maxx∈C {ˆ px (u, j, τ )} where C is the set items in the cache and MC is the maximum cache drop time (a system parameter). In time step j, for two items l and h in the cache of node u with low and high popularity respectively, χ ˆl (u, j) ≥ χ ˆh (u, j) ˆ l (u, j, τ ) ≤ p ˆ h (u, j, τ ). Following this inequality derived as p using Equation 3.5, in Hamlet, u evicts item h before item l. Thus, Hamlet attempts to level the playing field by giving importance to low popularity data at the expense of the high popularity data by broadening the range of popularity among the items cached in a neighborhood. This broadening of range reduces latency of requests for low popularity data. However, on the flip-side, if a node’s cache is subject to significant churn, then in Hamlet highly present (popular) data is unlikely to stay cached for a long time. This leads to increasing requests especially for popular data in the neighborhood, which we noticed in our experiments. Thus increasing the time required to solve query and also the network traffic. In our framework, we attempt to solve this issue by our approach of considering both presence index and access frequency besides cache partitioning. C. Split-Cache: A Holistic Caching Framework Existing literature such as HybridCache [7] and BenefitBased caching [6], have concentrated on request-frequencies in making caching and cache-replacement decisions. However, there is no study in the literature that considers both data presence and access frequency. In our Split-Cache framework, we propose a mechanism to do exactly that, in addition, accounting for splitting of the cache at a node. The reasoning behind using data request frequency in addition to presence index can be motivated by the following two scenarios. ˆ i (u, j, τ ) in Scenario 1: Consider the case of calculation of p time step j. If i is cached several hops away from u but spread ˆ i (u, j, τ ) will with high density in those far-off regions, then p be high. Hence u would evict i, even if there are several requests for i from u’s neighbors, thus u will never be able to serve i. Same is true for all nodes in Nu also. Consequently, several requests, originating from Nu , would lead to an increase in network traffic and higher latency in query solving. This could create a cycle of request-evict-request for popular items. Scenario 2: Consider the scenario b) in the Motivation subsection where the presence index of an item, say i, is high in a neighborhood (Nu ) and the request frequency is also high. From detailed experimentation, we realized that eviction of i from Nu causes undesirable increase in network traffic and request resolution latency. In this case, if u and other neighbors in Nu cache i, then it may improve performance. However, this is not possible only with the use of the cumulative estimated presence index and the corresponding estimated caching time.

We solve the problem highlighted through these scenarios by two mechanisms: damping the presence of highly popular data to make them attractive for caching and also splitting the cache to provide more opportunity for popular data to be cached. First we talk about the damping mechanism. 1) Frequency Based Merit Function: For each data item i observed by node u, we define the term request frequency counter, fi (u, j), as the total count of the number of unique requests (requests are generally resent) observed by u until time step j. Note that u resets fi (u, j) when it evicts i from its cache. At time step j, Equation 3.6 defines a Merit function identified as Mi (u, j), calculated by u for item i as  β fi (u, j) ˆ i (u, j, τ ),(3.6) Mi (u, j) = 1 − ·p maxx∈C {fx (u, j)} where maxx∈C {fx (u, j)} refers to the maximum request freˆ i (u, j, τ ) is the quency among all items in the cache C, p cumulative estimated presence index at time step j, and β < 1 fi (j,k) β is a pre-defined constant. The term (1 − maxx∈C {fx (j,k)} ) is the damping function, which is designed to damp the presence index of items enough such that highly popular items (in the 95th -percentile and higher) have a presence index that are scaled down. From our experiments, we noticed that the value of β = 0.05 gives the best result. It is easy to see that Mi (u, j) for an item i is zero when the frequency of the item is maximum, that is, this item is not going to be evicted even if its presence index is high. As the frequency decreases the chance that the item remains in the cache decreases as well, but not drastically. The merit function combines both the request frequency and the cumulative estimated presence index at time j to help a node u make decision on which data to evict. If an item n arrives at time step j, node u calculates Mi (u, j) for each item i in its cache and Mn (u, j) and then chooses the item e such that Me (u, j) = maxi∈C∪{n} {Mi (u, j)}. If e is n, then n is not cached, else e is evicted from C for accommodating n. 2) Identifying an Appropriate Cache-Split Ratio: We analyzed the caching and eviction procedures after incorporating the Merit function. From our analyses, we noticed that the popular items, on account of their higher presence value, still get evicted out of the caches on the nodes frequently. This results in more requests for the popular items being transmitted in the network, adding to the network traffic and average query solving time. To address this issue, we use the second mechanism of splitting the cache C into two parts C1 and C2 for popular and less popular items respectively. Thus, the cache C at node u of size |C| is broken into two caches C1 and C2 of sizes |C1 | = α · |C| and |C2 | = (1 − α) · |C| respectively, where α ≤ 1. For a node u, the popularity of data item i at time step j is proportional to the frequency counter fi (u, j), the request frequency of i since the last time it was evicted from u. In Split-Cache, we cache δ% (system parameter) of the most frequently requested items in C1 , and the rest in C2 . If a new data item n comes to u at time step j and the cache is full, then u first uses the request frequency of n to identify which cache (C1 or C2 ) n should be put in. Then u performs the eviction procedure using the Merit function value Mn (u, j) and the corresponding merit values of the other items in the

Algorithm 1 Cache eviction at node u in time-step j Input: A new data item n, the cached items statistics, such as Mx (u, j)∀x ∈ C, where C is the cache at node u. Output: Data item e to evict. 1: if fn (u, j) > minx∈C1 {fx (u, j)} then 2: Calculate theMerit value of n,  β

Mn (u, j) =

3:

1−

fn (u,j) maxx∈C1 ∪{n} {fx (u,j)}

ˆ n (u, j, τ ) ·p

{Merit value is calculated using estimated cumulative presence index and request frequency counter.} for each y ∈ C1 do  β

4: 5: 6: 7: 8:

My (u, j) =

·ˆ py (u, j, τ ).

end for Choose e, s.t. Me (u, j) = maxx∈C1 ∪{n} Mx (u, j). else Calculate theMerit value of n,  β

Mn (u, j) =

9:

fy (u,j) {fx (u,j)} 1 ∪{n}

1− maxx∈C

1−

fn (u,j) maxx∈C2 ∪{n} {fx (u,j)}

ˆ n (u, j, τ ) ·p

{Merit value is calculated using estimated cumulative presence index and request frequency counter.} for each y ∈ C2 do  β

10: 11: 12: 13: 14:

My (u, j) =

fy (u,j) {fx (u,j)} 2 ∪{n}

1− maxx∈C

ˆ y (u, j, τ ). ·p

end for Choose e, s.t. Me (u, j) = maxx∈C2 ∪{n} Mx (u, j). end if If e is n, then do not cache n, else evict e for n.

cache, removing the item with the highest merit value in the process. Note that the new item may have the highest merit value and not be cached, or have one of the higher values. We performed several experiments and identified the best value of α and δ to be 0.3 and 0.2 respectively for low average query solving time and the network traffic. Algorithm 1 illustrates the cache eviction algorithm used by a node u. In Line 1, n is identified to be popular or not, based on its request frequency counter. If n is a popular item, then in Line 2, the Merit value of n is calculated and in Lines 3-5 the popular item with highest Merit value (e) is chosen from C1 ∪{n} to be evicted to accommodate n. Note that n may itself be chosen if it has the highest Merit value in C1 , and then it is not cached. If n is not a popular item, then in Lines 7-14 the same procedure is adopted as above, but in cache C2 . In Line 15, the chosen item e is evicted to accommodate n in the corresponding cache or u decides to not cache n. 4. S IMULATION AND A NALYSES A. Simulation Setup We compared Split-Cache with two well-known cache replacement techniques HybridCache [7] and Hamlet [5]. For fair comparison, we created a variant of HybridCache that broadcasts the requests using mitigated flooding instead of the original HybridCache, in which the nodes unicast their requests to the data provider. We further assumed the data item size is small enough, so no path-caching occurs, and also the eviction is done based on the request frequencies of the data items.

We performed the simulations in ns-2 version 2.35 (installed on an Intel Core i7 machine with 8 GB RAM) and recreated the settings presented in [5]. We simulated 100 wireless nodes deployed uniformly at random over a square area of 100 × 100 m2 . We also studied a mobile scenario, where the nodes moved based on the random direction mobility model (close to human mobility in a mall-like setting) with edge-reflections and their speeds were uniformly distributed in the range [0 m/s, 1 m/s]. A node’s communication range was 10 m to ensure that the network was connected to start with. The transport protocol was UDP and the MAC layer protocol was IEEE 802.11 standard in the promiscuous mode. No routing algorithm was implemented; queries were broadcast through a MAC-layer transmission, and information messages were directed to the requesting node through a unicast path by reverse-path forwarding. The communication bandwidth was 11 Mb/s, and signal propagation followed the two-ray ground model [2]. The simulation time was 3000 seconds. Two fixed gateways were placed on the top-right and bottom-left corners of the field respectively, and all the information items were split equally between them. At the start of the simulation, the nodes’ started with empty caches and they randomly requested information items not in their caches following the Poisson distribution with parameter λi = Λ · qi , 1 ≤ i ≤ I; Λ was the query generation rate and qi was the item i’s popularity level according to Zipf distribution [1] and was represented as the request probability of item i. We used mitigated flooding for all schemes with the TTL value for requests being five hops and the query-lag time being 50 ms. We set Λ = 0.1 and the Zipf distribution exponent to 0.5 as it is the value observed in the real world [1]. We recalculated the presence index f = 1 times per second (f is the estimation frequency) and set the maximum cache drop time MC to 50 secs and 20 secs for the static and the mobile scenarios respectively. The window size τ = |f | · |MC | secs. Thus the window size in the mobile scenario was only 20 time steps, which is reasonable. As the nodes move regularly and their neighborhood changes, the presence and request frequency information become stale quickly. The number of data items I ranged from 100 to 500 and we assumed each item was in itself complete. The case of several data items making up a file or a movie is an easy extension. To make the simulations realistic and study cache-eviction effects, we set the cache size to 10 data items, that is, between 2% and 10% of the total number of data items. A query was 20 bytes, and an information packet had a 20 bytes header and 1024 bytes payload. As in Hamlet, an issued request had a re-send timeout of 30 secs. We compared the three frameworks (Split-Cache, Hamlet, and HybridCache) in terms of the following metrics: the average ratio between solved and generated requests, termed avg. solved-queries ratio (S); the average time needed to solve a request, termed avg. query-solving time (in secs.) (T ); and the average number of requests to be transmitted per each solved query, termed as the avg. traffic per solved query (T r). We use the term Si to represent the number of solved queries for data item i, Gi for number of generated queries for data item i, GPi for the amount of query packets (in bytes) sent to the physical layer by any node along the path(s) used to obtain

a data item i, and Ti for the average time needed to solve a query for i. Then the metrics, computed and averaged over all per-item results from all nodes, canPbe calculated as: I S Avg. solved-queries ratio = S = PIi=1 Gi ; Avg. query-solving PI i=1 Ti ×Si time = T = P ; I i=1 Si PI GP i . T r = Pi=1 I i=1 Si

i=1

i

and Avg. traffic per solved query =

B. Simulation Results Fig. 2 and Fig. 3 present the results for the static scenarios (nodes do not move) and mobile scenarios respectively. Fig. 2(a) and Fig. 3(a) present the average solved queries ratio (S), Fig. 2(b) and Fig. 3(b) present the average querysolving time (T ), and Fig. 2(c) and Fig. 3(c) present the average traffic per solved query (T r) for the static and mobile scenarios respectively. Our results were averaged over 8 different static and mobile scenarios. Figures show the average of results and error-bars with 95% confidence interval. We note that some error-bars are very short and are invisible in the figures. Although Split-Cache is marginally better (by 3-5%) than Hamlet in the average solved-queries ratio (both are very good), it performs a lot better than Hamlet and HybridCache in the other two very important metrics. For instance, for I = 100, Split-Cache required 4.7 seconds less (or 72% less) time to solve a query on an average per item than Hamlet. The reduction in time required is much larger for I = 100, because I = 100 is not that big compared to the cache size of 10 items. Thus in Split-Cache almost all popular items are cached on the nodes in the neighborhood. This results in the requests being satisfied quickly by the neighbors, whereas in Hamlet as the nodes cache less popular items too, the query has to travel farther in the network. The improvement is significant for the other values of I as well. For I = 200 the improvement in time is on an average 26% per item with the error bars showing that the improvement is for all items. For I = 500 the improvement in time is on an average 16.3 seconds per item, which is 20% less. This is a large time saving when viewed in the perspective of the number of items and the repeated queries (requests) for each of them (total requests were greater than 10, 000). In the case of average traffic per solved query, Split-Cache required 9.6% less bandwidth for each query than Hamlet when I = 100 and the percentage savings in bandwidth increased to 13.7% for each query when I = 500. Again, these numbers need to be viewed in terms of the savings over the large number of queries (requests) for the items. Low latency and low bandwidth requirement are extremely important in a wireless setting, especially the mobile one. Even with node mobility our results are equally good. Split-Cache performs markedly better than the state of the art because it caches the popular items longer and closer to the requesters. In Fig. 4, we show the results for each item when I = 200 in the static scenario. Along the X-axis, items are ordered in decreasing order of their popularity, with item 1 representing the most sought-after information and item 200 the least requested information. Fig. 4(a), Fig. 4(b) and Fig. 4(c) present per-item average solved-queries ratio, average querysolving time and the total network traffic. From Fig. 4(a) we

0.8 0.6 0.4 0.2 100

200 300 400 Number of Data Items

500

80

Split−Cache Hamlet Hybrid

60 40 20 0

100

(a) S

5 4

Split−Cache Hamlet Hybrid

3 2 1 0

100

(b) T in seconds

Split−Cache Hamlet Hybrid

1 0.8 0.6 0.4 0.2 100

200 300 400 Number of Data Items

500

Avg Query−Solving Time (s)

Avg. Solved−Queries Ratio

500

6

200 300 400 Number of Data Items

500

(c) T r in kilobytes

Static Scenario: 100 nodes, transmission range = 10m, 100 × 100m2 region, number of items I ∈ {100, . . . , 500}, MC = 50s.

Fig. 2.

0

200 300 400 Number of Data Items

100 80

Split−Cache Hamlet Hybrid

60 40 20 0

100

200 300 400 Number of Data Items

500

Avg. Traffic per Query Solved

0

100

Avg. Traffic per Query Solved

1

Avg Query−Solving Time (s)

Avg. Solved−Queries Ratio

Split−Cache Hamlet Hybrid

6 5 4

Split−Cache Hamlet Hybrid

3 2 1 0

100

200 300 400 Number of Data Items

500

(a) S (b) T in seconds (c) T r in kilobytes Fig. 3. Mobile Scenario: 100 nodes, transmission range = 10m, velocity ∼ [0 m/s,1 m/s], 100×100m2 region, I ∈ {100, . . . , 500}, MC = 20s.

Solved Query Ratio

1 0.9 0.8

Split−Cache Hamlet Hybrid

0.7 0

50

100 Data Item ID

150

200

150

200

(a) Avg. solved-queries ratio Average Query Time (s)

120

Split−Cache Hamlet Hybrid

100 80 60 40 20 0

0

50

100 Data Item ID

Total Query Traffic (mb)

(b) Avg. query-solving time 2.5

Split−Cache Hamlet Hybrid

2.0

either. Note that if we do not split the cache and only leverage the Merit index the results would be worse than Split-Cache for the most popular items. 5. C ONCLUSIONS In this paper we introduced Split-Cache, a distributed cooperative cache replacement framework for WAHNs. In our framework, each node makes cache replacement decisions based on the presence of the item in its neighborhood, the data access frequency for an item, and by splitting its cache for popular and less popular data items. Simulation results show that in comparison to the basic method (HybridCache) and the state-of-the-art (Hamlet), Split-Cache satisfies more data requests while requiring on an average much less time to solve the queries and generating less network traffic. ACKNOWLEDGMENTS This material is based upon work supported by the National Science Foundation under Grant Nos. 1345232 and 1248109. R EFERENCES

1.5

1 0

50

100 Data Item ID

150

200

(c) Total network traffic consisting of queries per item (in megabytes) Fig. 4. Static Scenario: Per-item results, I = 200.

observe that the solved query ratio in Split-Cache is the most among three schemes for the 40 most popular items (20% of items). For the remaining items Split-Cache and Hamlet perform similarly, and better than HybridCache. Besides that, Fig. 4(b) and Fig. 4(c) show that the average access time and query traffic in Hamlet is much higher than the other schemes for most popular items, whereas for the other remaining items Hamlet and Split-Cache provide similar results, which are again better than HybridCache. This shows that Split-Cache manages performances that are the best of both worlds–popular items have much less query time, but less popular items do not suffer

[1] L. Breslau, P. Cao, L. Fan, G. Phillips, and S. Shenker. Web caching and zipf-like distributions: Evidence and implications. In IEEE INFOCOM, volume 1, pages 126–134. IEEE, 1999. [2] Josh Broch, David A Maltz, David B Johnson, Yih-Chun Hu, and Jorjeta Jetcheva. A performance comparison of multi-hop wireless ad hoc network routing protocols. In Proceedings of the ACM/IEEE MobiCom, pages 85– 97. ACM, 1998. [3] Xiaopeng Fan, Jiannong Cao, Haixia Mao, and Yunhuai Liu. Gossip-based cooperative caching for mobile applications in mobile wireless networks. Journal of Parallel and Distributed Computing, 2013. [4] Xiaopeng Fan, Jiannong Cao, and Weigang Wu. Contention-aware data caching in wireless multi-hop ad hoc networks. Journal of Parallel and Distributed Computing, 71(4):603–614, 2011. [5] M. Fiore, C. Casetti, and C. Chiasserini. Caching strategies based on information density estimation in wireless ad hoc networks. IEEE Transactions on Vehicular Technology, 60(5):2194–2208, 2011. [6] B. Tang, H. Gupta, and S.R. Das. Benefit-based data caching in ad hoc networks. IEEE Transactions on Mobile Computing, 7(3):289–304, 2008. [7] L. Yin and G. Cao. Supporting cooperative caching in ad hoc networks. IEEE Transactions on Mobile Computing, 5(1):77–89, 2006.

Suggest Documents