Journal of Network and Computer Applications 50 (2015) 101–113
Contents lists available at ScienceDirect
Journal of Network and Computer Applications journal homepage: www.elsevier.com/locate/jnca
Cache capacity-aware content centric networking under flash crowds$ Dabin Kim a, Sung-Won Lee a, Young-Bae Ko a,n, Jae-Hoon Kim b a b
Graduate School of Computer Engineering, Ajou University, 206 World cup-ro, Suwon, Republic of Korea Network Biz. S/W Group, Samsung Electronics, Suwon, Republic of Korea
art ic l e i nf o
a b s t r a c t
Article history: Received 1 February 2014 Received in revised form 26 May 2014 Accepted 30 June 2014 Available online 18 July 2014
Content-centric networking (CCN) is a new networking paradigm to resolve the explosion of data traffic on the Internet caused by the rapid increase in file sharing and video streaming traffic. Networks with CCN avoid repeatedly delivering the same content of a link every time it is requested, as the content can be stored and transferred by CCN router caches. Two major features that are currently considered in CCN are in-network caching and content-aware routing. Even though both aspects are important, there is little comprehensive work on the interaction between them. In this paper, we propose a cache capacityaware CCN that consists of selective caching and cache-aware routing methods that interact with each other to encompass cache management and cache-aware request forwarding. The main motivation of the proposed scheme is to utilize the network caches evenly and redirect a content request based on the previous forwarding of the desired contents (or chunks). To enable this function, we utilize a cache capacity metric based on recent cache memory consumption and reflect it to content replication according to the popularity information on the content server. We evaluate the proposed scheme with respect to existing cache replication algorithms and show that it leads to significant performance improvements and a better utilization of network caches. & 2014 Elsevier Ltd. All rights reserved.
Keywords: Selective caching Cache-aware routing Content centric networking Flash crowds
1. Introduction In recent years, traditional IP-based network architecture has been severely burdened by the need to service massive traffic from end-to-end nodes, including high-definition multimedia streaming and user-generated content sharing (CVNI White paper, 2013). This traffic explosion problem has become one of the most important networking issues and can significantly impede user performance. Moreover, it is exasperated when large amounts of people instantaneously access some temporarily popular content or web site. This phenomenon is called a flash crowd (or the Slashdot effect). It is difficult to remedy quickly and results in web sites or servers becoming temporarily unavailable. To overcome these problems, content-centric networking (CCN) (Jacobson et al., 2009; Zhang et al., 2010) that is increasingly becoming an emerging future Internet paradigm, has been designed to substitute traditional host-centric communications with content-centric communications. CCN proposes an architecture
☆ An earlier version of this paper was presented at IEEE GLOBECOM 2013 (Lee et al., 2013). It has been significantly extended with more realistic scenarios of flash crowds. n Corresponding author. Tel.: þ 82 31 219 2432; fax: þ82 31 219 1614. E-mail address:
[email protected] (Y.-B. Ko).
http://dx.doi.org/10.1016/j.jnca.2014.06.008 1084-8045/& 2014 Elsevier Ltd. All rights reserved.
that focuses on the content itself, regardless of where that content is physically located. The key features of the CCN architecture include in-network caching and content-based routing, where each piece of content has its own name as an identifier. The content-based routing mechanism delivers content request messages to the routers that hold the contents. The content request message, called an Interest or Interest packet, visits intermediate routers along the path to the designated server. The content message, often referred to as a Data or Data packet, is returned along the reverse path of the Interest packet. The content may be provided by either the content server or the cache storage of an intermediate router. It may also be transparently replicated in an intermediate router. The in-network caching tries to store some portions of the content that has been requested frequently or used recently, based on the expectation that these content items will be requested again by other routers in the near future. Consequently, content requesters can pull the Interest contents from adjacent cache storage instead of distant servers and thereby reduce the bandwidth usage, latency, and workload of the original content server. Existing cache replication proposals in CCN do not effectively consider actual cache capacity from a network-wide point of view. Most operate from only local information to determine content replication. Therefore, the caches near the content server are
102
D. Kim et al. / Journal of Network and Computer Applications 50 (2015) 101–113
required to handle a greater amount of incoming content than others, causing a “cache pollution (Xie et al., 2012)” problem where more popular content is evicted in favor of newly cached content. To alleviate this problem, previous studies have proposed a cooperative caching approach that utilizes cached content stored in off-path routers as well as on-path routers (Long et al., 2012; Guo et al., 2012; Wang et al., 2012). The main objective of this approach is to increase content utilization in local domain caches with prior knowledge of the caching list. In cache-aware routing, however, it is generally necessary to utilize a special control packet that shares information about the content cached in nearby routers. For this reason, it can impose an additional burden on networks with respect to the exchange period and sharing range of a control packet, directly affecting the accuracy of delivered information over time. To resolve these problems, we have designed a Cache Capacityaware CCN (CC-CCN), which is an extension of our previous work (Lee et al., 2013) that aims to solve flash crowd problem. The CC-CCN is composed of selective caching and a cache-aware routing algorithm, reducing the network/server load while improving caching performance, even if unexpected bulk requests target a content server over a short period of time. The contributions of our study are threefold. First, the proposed selective caching strategy considers both the cache capacity of each router on the path and the content's popularity for efficiently disseminating the content over the network. Second, the selective caching operates based on an Interest packet monitoring function so that it can effectively cope with the flash crowd phenomenon while considering other router's cache capacity as well. Finally, the proposed cache-aware routing scheme operates without any separate control plane protocol to exchange the cached content list among neighbors. Instead, it piggybacks this information on Interest and Data packets. The rest of this paper is organized as follows. Section 2 introduces CCN and briefly reviews related works. Section 3 investigates the current problems of CCN, and Section 4 describes the proposed CC-CCN in detail and Section 5 evaluates it. Finally, Section 6 concludes the paper.
2. Background and related work Several traditional web-caching approaches have been revised into chunk-based algorithms and applied to CCN. The simplest method among them is the Leave Copy Everywhere (LCE) approach that replicates content into every intermediate CCN router. However, despite its simplicity, it generates excessive cache replacements and severe cache pollution. The Leave Copy Down (LCD) approach replicates the content (or the chunk) only on the next CCN router down from Data generator, either the content server or a cache (Laoutaris et al., 2004). Although LCD is regarded as being a simple yet efficient mechanism, it still shows a high level of redundant cache replications on the routers near the content server. Psaras et al. (2012) proposed a probabilistic caching algorithm named ProbCache where the received data is cached in CCN routers to mitigate caching redundancy. The caching probability is computed by each router based on the total cache capacity of the path and the distance from content provider, considering fair sharing of other flows. Even though ProbCache utilizes the available cache capacity, the cache capacity of each router is not handled individually because it is accumulated with others in content server. Recently, Li et al. (2013) proposed a coordinating in-network caching algorithm to provision routers' storage capability and network performance as an optimal solution. However,
it requires a coordinator node to collect the required information from every router. Content-aware routing has as yet no generalized protocol. However, Jacobson et al. (2009) and Zhang et al. (2010) have suggested a routing methodology. The guided-diffusion flooding model was suggested for the pre-topology phase and a route management was introduced that periodically announces the name prefix(es) from the content server (at a minimum). However, both routing mechanisms have not been detailed as thoroughly as their metric. Thus, almost all research on cache management assumes a pre-constructed routing table or IP-overlay architecture that provides at least the route to the content server. To provide enhanced content-aware routing, there are some strategies for the awareness of cached contents not only within the path (on-path) but also near the path (off-path) toward the content server. The simplest method of doing this is to advertise a list of cached contents. However, this is not recommended due to its huge overhead and inefficiency. The scalable content routing for content-aware networking (SCAN) (Lee et al., 2011) aims to alleviate this scalability problem. SCAN exchanges information about the cached contents and their routes utilizing a bloom filter. This algorithm efficiently reduced the name space according to a hash function with space efficiency. However, problems such as false positives and negatives can occur, leading to route failures and detours. Consequently, if SCAN stands alone without an IP-overlay architecture, it causes a large control overhead and unstable communication. Therefore, SCAN cannot be directly utilized as the normal cache-aware routing for CCN. A flash crowd, as briefly mentioned in Section 1, refers to the bulk occurrence of user requests for some interesting news or video on a web blog, Twitter, or Facebook within a short time. This phenomenon induces serious network problems by exhausting the network bandwidth and processing capability of a content server so that requesters cannot be provided with high quality service. Several studies have characterized and analyzed flash crowds in a Content Distribution Network (CDN) (Wendell and Freedman, 2011). To resolve the heavy network burden and resource problem caused by a flash crowd, they suggest cooperative caching strategies between cache proxies. However, the authors said that some quickly growing flash crowd traffic is difficult to handle through cooperative caching due to relatively slow cache resource allocations. In contrast to CDN as well, CCN does not have a full awareness of cached content information, therefore, a new cooperative caching strategy is necessary to efficiently handle flash crowds.
3. Problem statement The main idea of this paper is to evenly utilize in-network caches for maximizing available cache capacity over the whole network. Its goal is to cache efficiently across the network, alleviating the cache pollution effect that worsens in a router with a weighted caching burden. In general, routers near a content server and on the core network tend to handle more Interest and Data forwarding and thus have a higher possibility of caching more content than others. Consequently, some content that is better to be resided in the cache cannot help being evicted within a short lifetime, even it is part of popular content. In other words, cache capacity in the routers is limited to wait for making use of them again due to short caching time, while the others on the opposite side take an undeserved time to distinct popular contents in the cache. To understand weighted caching time and the cause of its problems, we evaluated average content residence time and average cache hit ratio via a simulation study. The simulated Internet-like
D. Kim et al. / Journal of Network and Computer Applications 50 (2015) 101–113
103
Fig. 1. Average content residence time per router. (For interpretation of the references to color in this figure, the reader is referred to the web version of this article.)
weighted caching load is achieved, a larger amount of popular content will be distributed across the network. Thus, a proper solution for cache-aware routing will significantly enhance performance. Existing studies lead us to expect that in-networking can significantly reduce flash crowd effects. However, there is a need for a fast reaction mechanism for efficiency because of flash crowd dynamics. In addition, this mechanism should be a light-weight process because an individual analysis about recent popularity trends for every content would need an impractically massive computing resource. In this paper, we proposed the simple method for monitoring and reacting to flash crowds for selective caching scheme by considering those design issues. Fig. 2. Average cache hit ratio per hop distance from the content server.
topology was constructed with 100 routers based on a transit-stub model produced by the Georgia Tech Internet Topology Model (GT-ITM) (GT-ITM: Modeling Topology of Large Internetworks). In addition, the LCD caching strategy was applied, where the alpha value of the Zipf distribution varied from 0.8 to 1.5. Other simulation parameters were the same as those given in Table 2 (in Section 5). Figure 1 presents the average content residence time in the cache per router, estimated as the average time lag between content insertion and eviction from the cache. The purple dotted circle at node 15 is a content server. The red rectangle indicates routers in the transit domain. They show the least average content residence time due to the weighted caching burden and suffer from the lowest average cache hit ratio, as shown in Fig. 2, where they have a range of 1–3 hops from a content server. On the contrary, the routers inside the stub domain have relatively longer content residence time and higher cache hit ratio on average. Indicated by the green circles in Fig. 1, they are the routers that connect the transit and stub domains within a 3 or 4 hop distance, showing an overall medium level of content residence time. However, their performance is still limited in terms of the average cache hit ratio, as shown in Fig. 2. Therefore, we can observe the fact that, even if all the routers have the same physical cache capacity, the actual cache utilization varies according to location. In these simulation results, the routers far away from the content server cache popular content with high probability. By utilizing those cached contents, cache-aware routing can enhance the caching effect. Furthermore, when our aim to alleviate the
4. Cache capacity-aware content centric networking The proposed Cache capacity-aware CCN (CC-CCN) consists of three steps: (1) cache capacity estimation; (2) selective caching; and (3) cache-aware routing. First, the Cache Capacity Value (CCV), a key metric of the proposed scheme, is passively estimated in each router and the highest value on the path is recorded on the forwarded Interest message. Cache capacity is generally defined as the maximal size of a caching pool. However, in other research, it can also mean the available cache capacity or supportable caching space because cache size alone cannot explain the actual caching workload of a node. In other words, the available cache capacity can determine the variance of the caching load among routers with the same cache size. Therefore, the proposed scheme estimates the cache capacity by observing recent cache consumption. By utilizing the CCV metric, the router with the highest cache capacity on the path caches the content, while nearby routers may cache the content according to its popularity and their cache capacity. We define this area as the selective caching range. Therefore, nodes within this range can cache the content according to cache capacity and content popularity, while forwarded content and highly popular content has more opportunity to be cached in intermediate routers. In this selective caching scheme, if routers within the selective caching range do not satisfy the caching condition, they create a Temporal Forwarding Information Base (TFIB) entry that consists of content name and expected outgoing face. With TFIB entries, a novel cache-aware routing is performed
104
D. Kim et al. / Journal of Network and Computer Applications 50 (2015) 101–113
Table 1 Packet formats and description. Interest packet format Content name CCV NDV rflag
Requested content name The highest cache capacity value on the path Network distance value. The value is a hop distance from the node with the highest CCV to content provider. Redirection flag. This field is used for cache-aware routing scheme.
Data packet format Content name CCVth Wr oNDV dNDV
Responding content name CCV threshold value calculated by content provider Weight value for calculating CCVth according to content popularity. Original network distance value. Hop distance between content provider and the router with the highest CCV. Decreasing network distance value. This value is decreased by one for each hop from oNDV.
Pending Interest Table (PIT)
Content Store (CS)
Prefix
Name
Incoming Face
/ aaa
Index
...
Ptr Type
/aaa/bbb/ccc
Data
...
C P
Temporal FIB (TFIB) Prefix
Face list
Forwarding Info Base (FIB)
T
Prefix
Face list
F /aaa/bbb/ccc
...
C:CS P:PIT T:TFIB F:FIB
/aaa
...
Fig. 3. CC-CCN forwarding engine model.
without additional control overheads, providing high routing accuracy for cached contents by timer that expires. 4.1. Packet formats and data structures In order to support the proposed CC-CCN, we need to extend the Interest and Data packet formats (as shown in Table 1) and the forwarding engine model (as shown in Fig. 3) for selective caching and cache-aware routing schemes, respectively. In the Interest packet, there are three additional fields. The Cache Capacity Value (CCV) field records the highest cache capacity value between the requester and content provider and the Network Distance Value (NDV) is the hop distance from the router with the highest CCV during content retrieval. The redirection-flag (rflag) is utilized for cache-aware routing scheme in order to prevent route looping from Interest redirection. The Data packet includes four additional fields. The original NDV (oNDV) in the Data packet is a fixed value copied from the NDV of the Interest packet by the content provider. The decreasing NDV (dNDV) is decremented for each hop from the oNDV value. The CCVth, is a fixed value that contains the CCV threshold value calculated by the content provider from the CCV of the received Interest packet and a weighted value Wr. Figure 3 illustrates the TFIB data structure added to support the cache-aware routing scheme. TFIB is responsible for keeping a pair of attributes for full content name and expected outgoing face with lifetime. When a node receives an Interest packet, it first looks for the exact matching content name from the TFIB prior to the original FIB, retrieving the match with the longest matching prefix. If there is no matched entry, then the node sends the Interest towards the content server based on the FIB.
4.2. Cache capacity estimation The cache capacity is a metric to represent cache utilization and there are several ways to express it. Cache utilization can vary according to cache size, location and management strategy. In addition, even if every router has the same physical cache size, the actual cache utilization will be different according to other factors. The most straightforward measure is to use the measured raw values, such as how much of the cache is empty or how many contents a router currently caches. Even though these raw values can be useful, they do not form a fair metric when comparing the value of a router with those of other routers. For a fairer and more accurate assessment, the estimated cache capacity must be normalized. To do this, we evaluate the cache capacity as an inverse function of the amount of recent cache consumption. In addition, the physical cache size of each router is also reflected in the cache capacity estimation as their sizes could be heterogeneous. As a result, the CCV of router i is calculated according to the equation as follows: CCVi ¼
CSi ; Li
ð1Þ
where CSi represents the physical cache size and Li is the caching load of router i, defined as the cumulative amount of cached content over a certain period. For instance, during the nth period, router i utilizes Li , measured from (n 1)th period, while accumulating the caching load for (nþ 1)th period. Figure 4 shows an example scenario of how the CCV is calculated and how the highest CCV is inserted in the forwarded Interest message. In the figure, Node 1 is assumed to have a cache size of 500 and to cache content amounting to size 200. Therefore,
D. Kim et al. / Journal of Network and Computer Applications 50 (2015) 101–113
105
4.3.1. How to detect flash crowds? To detect a flash crowd phenomenon and provide an appropriate response, a content server monitors the received Interest traffic pattern and inter-Interest arrival time. Algorithm 1 below lists the steps of the flash crowd detection process. Variable T i records the last accessed time of content i and IT i represents the inter-Interest arrival time, i.e., the difference between the (n 1)th and (n 2)th access time of content i. Based on the estimated value of each IT i , the content server maintains SN , the summation of IT i for all contents. Note that we utilize IT i as a comparison value for the average IT i (i.e., SN /N, where N is the total number of content items) to determine whether content i is the target of a flash crowd or not. The rationale of this process lies in the fact that highly popular content of normal Internet traffic (which follows a Zipf distribution) tends to have a longer IT i than the average IT i , because intermediate caching nodes deal with most of the highly popular Interests instead of the content server. On the contrary, the contents that were previously less required are apt to have analogous pattern to unpopular contents. However, as more clients request for target contents, the more Interest traffic pattern becomes similar with the popular contents. In other words, at the initial stage of flash crowd situation, only few intermediate nodes cache the target contents of flash crowd. Most requests become concentrated on content server, and consequently the interInterest arrival time of the target content becomes relatively short. As the situation progresses, the inter-Interest arrival time of the content gradually increases because more intermediate nodes cache the contents and service them in place of the content server. Therefore, the content server can detect the target of flash crowd when unpopular contents (i.e. low rank of Zipf distribution) shows long inter-Interest arrival time as similar to popular contents (i.e. high rank of Zipf distribution).
the CCV of Node 1 is derived as 2.5. The other routers calculate their own CCV in the same way. Assume that a client generates an Interest to request content in the content server. The Interest has a CCV field for recording the highest CCV on the path. At the client that has no cache, this field is initialized to zero. When Node 1 receives an Interest from the client, it compares its own CCV (2.5 in this example) to the CCV of the Interest (zero), and updates it, as its own CCV is higher. Similarly, on receiving the Interest from Node 1, Node 2 also updates the Interest CCV (2.5) with its own higher value (4). Note that in Fig. 4, this highest value of CCV is retained until the forwarded Interest eventually reaches the targeted content server because the rest of the nodes on the path all have smaller CCV values than the Interest. At the content server, the router having the highest CCV can be identified using the Interest NDV field. The value of NDV is reset to one by any node who decides to update the CCV field; otherwise, it is incremented by each hop. As a result, the content holder (either the content server or an intermediate router) identifies the highest CCV on the path and the hop distance from itself. That is, in Fig. 4, the content server is aware that the router node with the highest CCV (4) is four hop distances away.
4.3. Flash crowd detection and selective caching algorithm As we mentioned before, our selective caching algorithm operates differently according to situation: normal or flash crowd. In this paper, we assume that Internet traffic follows some kind of pattern, such as a Zipf distribution, in a normal situation. However, in a flash crowd situation, unexpected content requests arise instantaneously, regardless of the traffic pattern of a normal situation. Therefore, to alter the selective caching operation for
Algorithm 1 : Flash Crowds Detection N i
i i
n-1)th
n-2)th
Initialization while ( reception) if (current time if i
)
then then
else else endif current time ; current time; ( N) * ; endwhile
both situations, we firstly propose a flash crowd detection method on the content server side, then explain the selective caching algorithm in detail for each scenario.
To detect flash crowd behavior, it is necessary to set the flash crowd detection threshold value FC th as a criterion by multiplying the average IT i of all contents and weighted value w. The value w is
106
D. Kim et al. / Journal of Network and Computer Applications 50 (2015) 101–113
Interest
Interest
Interest
Interest
Interest
Interest
CCV
0
CCV
2.5
CCV
4
CCV
4
CCV
4
CCV
4
NDV
1
NDV
1
NDV
1
NDV
2
NDV
3
NDV
4
Node 2
Node 1
Client
Node 3
Node 4
Node 5
Content Server
Category
Node 1
Node 2
Node 3
Node 4
Node 5
Cache Size
500
1000
1500
1500
3000
Caching Load
200
250
600
500
1500
CCV
2.5
4
2.5
3
2
Fig. 4. Example of cache capacity estimation.
tunable and adjusts the degree of flash crowd detection level. As value w becomes smaller, more contents are considered as flash crowd contents. On the other hand, greater value of w means that less contents are deemed as flash crowd. Following this method, a content server recognize a flash crowd situation if it detects flash crowd contents that showing longer inter-Interest arrival time that has lower rank of Zipf distribution than FC th , otherwise it regards the situation is normal. Based on the server-side detection, the content server can operate the selective caching algorithm according to each situation. Note that in order to detect the flash crowd situation we utilize a centralized method that is operated on the content server instead of intermediate nodes. The main reason is that in order to recognize the flash crowd situation, the detection node must have global view of content popularity in advance for observing the change of Interest traffic pattern. However, to acquire this information, it takes much time and computing overhead to analyze huge amount of traffic patterns. Therefore, it is impractical to make the intermediate nodes calculate the content popularity because they have limited memory and computing resources as well as local content popularity view. For these reasons, it is rational to assume that content server can estimate the global information thereby we implemented the detection algorithm in the content server side and achieved effective caching operation.
4.3.2. How does the selective caching work for normal situations? After receiving an Interest packet, the content provider sends the corresponding Data to the requester through breadcrumb and stores the received contents according to its local caching strategy. The proposed selective caching focuses on a cache replication strategy, more specifically on how and where to cache the content while forwarding Data. The proposed scheme has two decision processes for selecting the caching router(s). The first determines the router that must cache the content from the highest cache capacity in the reverse path. This information is already available from the received Interest as the highest CCV (CCVhighest ) and the relevant Network Distance Value (NDV). This decision allows for only one replication in the path. If the content is not popular, this decision is sufficient and enhances the diversity of cached content. However, if the content is rather popular, only one cache placement would not be enough. A second decision is hence needed to identify a threshold for the multiple replication of more popular content, depending on
the cache capacity of intermediate nodes. Even though each CCN router has no way of recognizing content popularity, the content server can estimate the popularity (defined as a rank in the Zipf distribution) via exploiting past Interest processing data. The amount of request counts according to content popularity exponentially decreases the rank r of content declines linearly. Therefore, the order of content popularity is normalized by a logarithmic function and the normalized value is treated as a weight parameter to estimate the threshold. The logarithmic function for normalization is as follows: Wr ¼
log r ; log N total
ð0 rW r r1Þ
ð2Þ
here content of popularity rank r is normalized by the total number of content items in the content server and consequently any content with higher popularity gets a lower weight value to be replicated with higher possibility. When a content server sends a Data packet back to the requester, it calculates the CCV threshold (CCVth ) value and envelopes it in the Data packet as well as the oNDV, dNDV, and W r . The threshold CCVth is calculated by CCVth ¼ CCVhighest W r
ð3Þ
Note that even though the content server provides CCVth in the Data packet, it includes W r as well. This is because CCVhighest is estimated differently according to the time and path of the forwarded Interest, whereas W r tends to be constant. In other words, each CCN router cannot recognize content popularity and cope with the newly discovered CCVhighest . Therefore, when content is cached, the value of W r provided by the content server is stored along with the content. When cache hit occurs at an intermediate node, the Data will include the CCVhighest retrieved from the received Interest and W r from the cache. Upon receiving Data, the intermediate router compares the dNDV and its own CCV to oNDV and CCVth , respectively, to decide whether to cache the content locally. As shown in Fig. 2, the network load near the proximity of the network core tends to increase. In order to distribute such an unbalanced network load, the area for selective caching is restricted to half the distance from the node with the highest CCV to the content holder. This is represented as a dashed rectangle in Fig. 5. To create the selective caching area, whenever a router forwards Data, the dNDV value that was initially acquired from the relevant Interest from the content provider is decreased by one. The selective caching is performed only if the dNDV is in the range [0, oNDV/2). If the
D. Kim et al. / Journal of Network and Computer Applications 50 (2015) 101–113
Client 1
CCV (0.3)
CCV (0.8)
Node 1
Node 2
Data
Data
CCV (0.7)
Node 3 Data
oNDV
4
oNDV
4
oNDV
dNDV
-1
dNDV
0
dNDV
107
CCV (0.4)
CCV (0.2)
Node 4
Node 5
Data
Data
Content Server
4
oNDV
4
oNDV
4
1
dNDV
2
dNDV
3
Fig. 5. Example of selective caching.
Fig. 6. Example of cache-aware routing.
content is popular, then all routers in the selective caching range may cache it. Otherwise, only Node 2 (with zero dNDV) will cache it. For instance, let us assume that there are total of 1000 contents on the content server. Node 3 can cache the content whose rank is above 75 because the W r must be below 0.625. In case of content ranked 100th, Node 3 cannot cache the content due to its CCVth of about 2.7. Over selective caching in limited areas, routers that are distant from the content server will cache more content when the routers near the server have a high caching load. In turn, when routers near the edge suffer higher caching loads, routers near the server will try to cache more content. This oscillation makes the proposed scheme evenly utilized in-network caches.
4.3.3. How does the selective caching work for flash crowd situations? Our CC-CCN is designed to spread the caching burden of routers by considering content popularity and the cache capacity of routers on the path. The advantages of the proposed method are an increase in the cache diversity over the network and a reduction of cache pollution near the core routers. It is obviously desirable for handling the traffic that generally follows a certain pattern such as the Zipf distribution. However, in case of a flash crowd, large numbers of user requests arise simultaneously, regardless of previous content popularity. In this case, the proposed scheme finds it difficult to respond the flash crowd quickly,
due to a selective caching method that considers content popularity instead of the current need for the contents to be spread as soon as possible. To cope with this restriction, the selective caching method presented in the previous section needs to be extended to properly handle such a flash crowd phenomenon. That is, different to normal situation traffics, any content with flash crowds should be disseminated more quickly over the network for reducing response time as well as network burden while considering cache capacities. If a content server detects a flash crowd situation, the server responds with a corresponding Data packet by setting its W r value as 1. Originally, W r was a weight value for calculating CCVth and related to the content popularity index. However, for flash crowd contents, they do not follow a normal traffic pattern. In addition, they temporarily move up sharply in the content popularity ranking. Therefore, by setting W r to 1 (¼ the highest rank), we force the content server to identify the content as the highest priority to be cached. In this context, intermediate nodes receive the Data with a negative one value of W r and CCVth , and they only consider oNDV and dNDV values for caching it or not. Nodes in the selective caching area only store received Data with a negative one value of W r in the local cache, and this information is naturally propagated over the network through cache hits. For instance, in Fig. 5, let us assume that content server sends a Data packet that is detected as target of flash crowd content with negative one value of W r and CCVth . In this case, Node 3 and 4 can cache the target
108
D. Kim et al. / Journal of Network and Computer Applications 50 (2015) 101–113
4.4. Cache-aware routing
content because they are in the selective caching area and each CCV must be larger than negative one. As a result, the flash crowd content is cached as a deterministic cache replication and does not violate the cache replication principle in the proposed scheme. In addition, this replication process effectively remains only during flash crowd detection. Algorithm 2 presents the overall process of the selective caching scheme in our CC-CCN.
Most of the existing cache-aware routing methods tend to take one of the two strategies. The first is to share a list of the cached content periodically, and the second is to flood the Interest until the desired content is found. Both approaches may result in a considerable amount of network load. Whereas in the proposed
Algorithm 2 : Selective Caching
i
generation
for if
then
endif if
then
endif
endfor reception
for if
then Algorithm 1 if
then
else if
then
endif endif
if
then if
then
else
endif else if
then
endif endfor reception
for if
then if
endif endif endfor
then
D. Kim et al. / Journal of Network and Computer Applications 50 (2015) 101–113
109
Table 2 Simulation parameters for normal situations. Category
Values
Interest arrival rate Content size Number of contents Zipf distribution Average cache size Cache policies
10 Hz (10 interests/s) 1 MByte 10,000 Alpha ¼ {0.8, 1.0} (large popular content portion (Guo et al., 2008)) 1% {LCE, LCD, MCD, Prob (0.3), ProbCache, CC-CCN} [ LRU
Table 3 Simulation parameters for flash crowd situations. Category
Values
Start time (s) Duration (s) Peak rate (Hz) Content range Zipf distribution Cache policies
1000 1000 40, 50, 60, 70, 80, 90, 100 Lower {10%, 15%, 20%, 25%, 30%} of a Zipf distribution Alpha ¼ 1.0 {LCE, LCD, Prob (0.3), ProbCache, CC-CCN} [ LRU
To solve this problem, the redirected Interest sets an rflag field as TRUE and the router receiving the redirected Interest judges the possibility of route looping when it has neither the TFIB entry nor requested content. In this case, the router sets the rflag of the Interest as ERROR and forwards it according to the normal FIB mechanism. The other routers receiving this Interest try to eliminate the corresponding TFIB entry.
5. Performance evaluation 5.1. Simulation environments cache-aware routing, the router within the selective caching range of [0, dNDV/2) can create a TFIB entry with the exact content name and an expected outgoing face toward the router with the highest CCV if the router does not cache the content while forwarding Data with the selective caching. This is because at least one router in the forwarded direction will cache the content. Then the Interest will be re-routed according to the TFIB entry if the router receives an Interest for the same content. For example, Node 3 in Fig. 6 forwarded the content “/Content_server/a.txt” to Node 2 without caching it, creating a TFIB entry instead. After that, Client 2 sent the Interest packet with content name “/Content_server/a.txt.” Node 3 will redirect the received Interest for the content “/Content_server/a.txt” through Face 2 if the TFIB timer has not expired. Note that the proposed cache-aware routing approach must manage the expiration time of TFIBs efficiently to prevent a false positive forwarding decision. As our aim is to distribute the caching load fairly and utilize the cached content, the average residence time of content in a router is expected to be similar to all the other routers. Therefore, the expiration time is set by locally estimated average content residence times that will quite accurately to point to the cached content. However, route looping can occur when the average content residence time is different to the actual caching time. For example, in Fig. 6, let us assume that the content “/Content_server/a.txt” stored in the Node 2 was evicted due to local cache replacement strategy (e.g. LRU or LFU) before TFIB timer at Node 3 has expired. Then Node 3 will re-route the Interest through Face 2 if it receives an Interest for the content “/Content_server/a.txt”. In this case, Node 2 leads to route looping because it forwards the Interest to Node 3 through Face 4 according to the normal FIB mechanism. Consequently, Node 3 aggregates the Interest into its Pending Interest Table (PIT) and discards it. Moreover, PIT aggregation due to invalid Interest redirections seriously exacerbates the problem. In CCN, all the Interest receivers register the received Interest information in PIT before forwarding it and discard the packet when identical Interests arrive from other neighbors. Therefore, a PIT entry that is created by an invalid TFIB redirection can aggregate another Interest that has been forwarded to content server (not redirected by a TFIB). It increases the packet loss and delay over the network by impeding the normal Interest forwarding until the local PIT entry timer has expired.
We developed our CCN simulator based on OPNET (OPNET Technologies). To create various scenarios, we also developed a CCN network scenario generator that has several functionalities such as a topology generator, hierarchical/flat naming, homogeneous/heterogeneous cache sizing, and so on. These functions are available to the OPNET simulator in xml format. The CCN Simulator provides various cache management schemes that are served in CCNx (CCNx Homepage) or ccnSim (ccnSim Homepage). The traffic generator provides three distributions of traffic models: Zipf, MZipf, and flash crowd model (Zhang et al., 2011). To consider the layering of the forwarding engine, there are two types of node models: IP-overlay and MAC-overlay routers. For this adaptation, interface modules are served as a link adaptor in CCNx. We implemented a simple routing protocol to discover the shortest path and the Interest is directly forwarded toward the content server. With the CCN simulator, we used an Internet-like topology generated by GT-ITM that includes 12 stub domains and one transit domain (100 CCN routers in total). It was converted into a CCN network scenario by the CCN network scenario generator. To evaluate CC-CCN, we conducted simulations for two cases: a normal situation, using the configuration in Table 2, and a flash crowd situation with parameters as shown in Table 3. A flash crowd traffic pattern consists of three kinds of usercontrolled parameters in our simulator: (1) start time; (2) duration; (3) target content range; and (4) peak rate. At the designated flash crowd start time, the traffic generator additionally schedules Interests for the target content of the flash crowd until the duration is over while continuing normal Interest scheduling. We used 1000 s for the flash crowd duration. The user can determine the target range of flash crowd content based on the lower rank of the Zipf distribution. We evaluated our CC-CCN performance by varying the target content range from 10% to 30%. The Interest arrival rate for a flash crowd gradually increased until it arrives at its peak rate and then decreased for remainder of its duration. The traffic generator scheduled 10 Interests per second for the normal situation as well as Interests for the flash crowd, varying the peak rate from 40 Hz to 100 Hz. We used a Zipf distribution popularity whose alpha value was 1.0 for the normal situation. Other parameters were as listed in Table 2. The proposed scheme was compared to LCE, LCD, Prob. (0.3), and ProbCache.
D. Kim et al. / Journal of Network and Computer Applications 50 (2015) 101–113
5.2. Simulation results 5.2.1. Results for normal situations The proposed scheme's goal is to make the cache capacity of each network cache as even as possible. The caching capacity distribution per router is compared in terms of the average content residence time in Fig. 7. In the proposed scheme, the average content residence time per router was fairly level. In contrast, LCD suffers from extremely low average content residence time near the content server, as highlighted in the zoomed spot, while the proposed scheme shows an even result. Therefore, popular content in the LCD is not well replicated across the network due to severe cache pollution near the content server. In contrast, the result of the proposed scheme converged to a low level comparable to the stub routers of LCD. To verify the result's reasonability, we evaluated the content reusability, the ratio of times the cached contents were hit to the total number of cached contents, as shown in Fig. 8. The proposed scheme maintains much higher content reusability than LCD because our selective caching creates a high diversity of cached contents and the content-aware routing efficiently exploits them. In other words, cache capacity in stub routers is not fully exploited in LCD and the homogenous average content residence time of CC-CCN is enough to enhance overall caching performance. Figure 9 presents the average cache hit ratio for each scheme according to traffic model. The average cache hit ratio is the ratio of total cache hits to the number of generated Interests. All the results are averaged with 10 times of simulation, and shown with 99% confidence interval error bars. In all the cases, the proposed scheme shows the highest average cache hit ratio and is highest when the alpha of the Zipf distribution is 0.8. This means that the proposed scheme efficiently deals with greater amounts of popular content, whereas the other schemes suffer significant caching performance degradation. When the alpha value of the Zipf distribution is 0.8 the
proposed scheme shows about a 164% performance enhancement relative to LCD, which has the second highest performance. When alpha is 1.0, the performance of the proposed scheme is about 126% higher than LCD. ProbCache was expected to show greater performance than LCD because it is believed to perform better on tree topologies. In this paper, we set to 10 s for CCV estimation duration, as suggested by Cho et al. (2012). The server load was measured as the amount of Data generated per second by the content server. In Fig. 10, the proposed scheme decreases the server load by about 32–34.6% compared to LCD and about 41.7–46.8% compared to LCE. As mentioned previously, the proposed scheme distributes popular content in more caches while unpopular content is stored in a single cache. As a result, Interests have a higher chance of hitting the desired content in CC-CCN than in other schemes. The network load was estimated as the average amount of forwarded Interest and Data through network faces per second, as shown in Fig. 11. In contrast to the previous results on average 80
Average Cache Hit Ratio(%)
110
70 60
50 40 30 20 10 0 alpha = 0.8 LCE
LCD
MCD
alpha = 1 Prob(0.3)
ProbCache
CC-CCN
Fig. 9. Average cache hit ratio comparison.
Server Load (Mbytes/sec)
9 8 7 6 5 4 3 2 1 0
alpha = 0.8 LCE
Fig. 7. Average content residence time comparison.
LCD
MCD
alpha = 1 Prob(0.3)
ProbCache
CC-CCN
Fig. 10. Server load comparison.
Network Load (Mbytes/sec)
60 50 40 30 20 10 0 alpha = 0.8 LCE Fig. 8. Content reusability comparison.
LCD
MCD
alpha = 1 Prob(0.3)
ProbCache
Fig. 11. Network load comparison.
CC-CCN
D. Kim et al. / Journal of Network and Computer Applications 50 (2015) 101–113
cache hit ratio and server load, the proposed scheme showed less improvement. The proposed method diminished the network load by about 8.1–8.5% compared to LCD, and about 18.2–22.6% compared to LCE. The reason for this difference in performance is due to the cache-aware routing. The proposed cache-aware routing tries to redirect the Interest if the router has information about the other router that caches the desired content, resulting in reduction of hop distance that the Data or Interest must traverse. This can effectively increase the average cache hit ratio while decreasing the server load. However, the network load is only slightly increased as the detours from Data redirection only cause a minor increase in it. Nevertheless, if we consider the tendency that the highest network load is exposed near a content server, the detours of the proposal are intuitively expected to lessen the level of the highest network load. To evaluate this, we measured the link stress, the average value of the network load that is within the top 10%. As shown in Fig. 12, the proposed scheme's link stress is alleviated by about 30–31% and about 46–65%, compared to LCD and LCE, respectively. There were no route loop detected, proving that our looping prevention method worked properly and did not severely affect the link stress. 5.2.2. Results for flash crowd situations To investigate the performance of CC-CCN in a flash crowd situation, we estimated several performance metrics, focusing on the flash crowd duration section of the overall simulation time. Figure 13 shows the estimated network load during the flash crowd for various flash crowd peak rates. The proposed scheme maintained a lower network load than other caching strategies because CC-CCN quickly disseminated the popular content over the network routers. This is because the target content of flash crowds is considered highly popular even though they originally
Link Stress (Mbytes/sec)
7
6 5 4 3 2 1 0
alpha = 0.8 LCE
LCD
alpha = 1
MCD
Prob(0.3)
ProbCache
CC-CCN
Fig. 12. Link stress comparison.
had a much lower popularity rank in normal situations. In addition, the main reason for the proposed scheme showing almost steady performance despite the change of flash crowd peak rate lies behind the selective caching operation in flash crowd situation. That is, after content server detects the target of flash crowd, intermediate nodes cache the contents more aggressively based on the cache capacity, thereby the flash crowd contents are quickly disseminated over the network. Therefore, when the content requester fetches the content that is detected as flash crowd, the node retrieves the contents either from the local cache or from adjacent cache storages. This can drastically reduce the hop count for content retrieving so that the performance of network load shows very little changes. We can see the specific aspects of network load in Fig. 14. In Fig. 14, we verified the fast content dissemination of CC-CCN by showing the network load according to elapsed simulation time when the peak rate is set to 100 Hz. We also investigated the appearance of network load according to the variation of peak rate that is experimented in Fig. 13. Through the experiment, we discovered that except for the CC-CCN case, traffic overload in the network increased as the peak rate also increased in flash crowd situation. In case of CC-CCN, it shows a similar performance trend regardless of the increase in peak rate. As we see in the graph, the network load of CC-CCN slightly increases until around 1500 s and almost stabilizes after 1800 s. This aspect is also shown very similarly in other peak rates. Based on this result, we can explain that the CC-CCN is effective in resolving the flash crowd problem due to fast caching resource provisioning ability. To show the traffic load mitigation of CC-CCN, we additionally measured the link stress during a flash crowd and plotted the top 20 links in descending order, as shown in Fig. 15. Ranks 1–4 mainly consist of the content server and its neighbors. Three transit nodes are placed in Ranks 5–7 for all cache strategies. LCD shows a lower link stress on the core network than CC-CCN because LCD caches the contents from core to edge routers gradually, regardless of content popularity. However this can cause high link stress on the other nodes because intermediate routers, including edge routers, must exchange the Interest and Data packets with core routers initially. This causes the longer traveling hop counts as well as the high link stress of each router, as shown ranks lower than 8 (i.e. Ranks 8–20). In contrast to LCD, CC-CCN has low probability of caching low popularity content. This can cause slightly higher link stress for the core routers. However, over time, the content server can detect a flash crowd phenomenon and increase target content popularity to the highest rank. Moreover, as CC-CCN selects caching routers based on the cache capacity value as well as content popularity, it very efficiently disseminates highly popular content quickly as well as the target of flash crowds. Therefore, 80
Network Load (Mbytes/sec)
70
Network Load ( Mbytes/sec )
111
60 50 40 30 20 10
CC-CCN
LCD
Prob 0.3
LCE
ProbCache
0 40
50
60
70
80
90
Flash Crowd Peak Rate (Hz) Fig. 13. Network load in a flash crowd duration.
100
70 60
50 40 30 CC-CCN
20 1000
1200
1400
LCD
1600
Prob 0.3
1800
LCE
2000
ProbCache
2200
2400
Simulation Time (sec), Peak Rate = 100Hz Fig. 14. Network load appearance in a flash crowd duration.
112
D. Kim et al. / Journal of Network and Computer Applications 50 (2015) 101–113
0.030
6
Link Stress (Mbytes/sec)
5
Round Trip Time (sec)
CC-CCN LCD Prob 0.3
4
LCE ProbCache
3 2
0.025 0.020 0.015 0.010 0.005 CC-CCN
1
10
0 1
Prob 0.3
LCE
ProbCache
3
5
7
9
11
13
15
17
19
0.030 0.025 0.020 0.015 0.010 LCD
Prob 0.3
50
60
70
LCE
25
30
from the cache thrashing problem, which means that high used contents are evicted by other high popular contents due to lack of cache size. For this reason, Interest and Data packets frequently traverse over the network to fetch the contents again and increase the network load. However, in case of CC-CCN, it conducts caching operation based on both cache capacity of each node and content popularity. Therefore, nodes mitigate cache-thrashing problem by caching the flash crowd contents with the highest priority and then caching the contents having high rank of Zipf distribution. Other caching strategies do not consider both contents popularity and cache capacity, resulting in decreased cache hit ratio and increased network load. In accordance with this effect, we confirmed that our proposed CC-CCN also shows a markedly reduced round trip time than the other caching strategies.
0.035
CC-CCN
20
Fig. 18. Round trip time varying a flash crowd content range.
Fig. 15. Link stress in a flash crowd duration.
0.005
15
Flash Crowd Content Range (%)
Link Stress Rank, Peak Rate = 100Hz
Round Trip Time (sec)
LCD
0.000
ProbCache
0.000 40
80
90
100
Flash Crowd Peak Rate (Hz)
6. Discussion and conclusion
Fig. 16. Round trip time in a flash crowd duration.
Network Load (Mbytes/sec)
80 70 60 50 40 30 20 CC-CCN
10
LCD
Prob 0.3
LCE
ProbCache
0 10
15
20
25
30
Flash Crowd Content Range (%) Fig. 17. Network load varying a flash crowd content range.
CC-CCN has lower packet traveling hop counts and shows lower link stress for ranks lower than 8 (i.e. 8–20). It also has a lower round trip time, as shown in Fig. 16. To measure the performance of CC-CCN according to the variation of flash crowd content range, we fixed the flash crowd peak rate to 100 Hz. As shown in Figs. 17 and 18, we increased the target content range of a flash crowd from the lower 10% to the lower 30% of the Zipf distribution. Figure 17 shows the benefit of CC-CCN by showing considerably reduced network load even though the total amounts of high popular contents increase. As the number of highly reused contents increase, network suffers
Content-centric networking utilizes in-network caching and content-aware routing to provide better services on the Internet. In this paper, we presented a cache capacity-aware CCN called CCCCN that consists of selective caching and cache-aware routing schemes. Selective caching utilizes individual cache capacity and content popularity to fairly distribute caching load. Cache-aware routing provides accurate Interest direction without control messages by utilizing the average content residence time. The strengths of our proposed CC-CCN are (1) the fair distribution of popular content that significantly reduces network overhead compared to other caching strategies, (2) fast caching resource provision by detecting flash crowd symptoms at the server-side while considering the cache capacity of each router, and (3) no requirements for additional control message exchange protocols to support off-path cache utilization. Simulation results achieved in an Internet-like topology (transitstub model) showed that CC-CCN utilizes the network caches evenly. We compared the performance of CC-CCN against to the previous caching strategies (Laoutaris et al., 2004) and CC-CCN achieves the highest performance for average cache hit ratio, server and network loads, and link stress in normal situations. Furthermore, we observed that CC-CCN shows stable network performance and lower latency even when flash crowd peak rates are increased.
Acknowledgment This research was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2012R1A1B3003573).
D. Kim et al. / Journal of Network and Computer Applications 50 (2015) 101–113
References Cho K, Lee M, Park K, Kwon TT, Choi Y, Pack S. WAVE: popularity-based and collaborative in-network caching for content-oriented Networks. In: Proceedings of the IEEE INFOCOM NOMEN workshop. Orlando, USA; March 2012. Cisco visual networking index: forecast and methodology, 2012–2017, White paper; 2013. CCNx Homepage. 〈http://www.ccnx.org/〉. ccnSim Homepage. 〈http://www.infres.enst.fr/ drossi/index.php?n=Software.ccnSim〉. GT-ITM: modeling topology of large Internetworks. 〈http://www.cc.gatech.edu/ projects/gtitm/〉. Guo L, Tan E, Chen S, Xiao Z, Zhang X. The stretched exponential distribution of Internet media access patterns. Toronto, Canada: ACM PODC; 2008. Guo S, Xie H, Shi G. Collaborative forwarding and caching in content centric networks. IFIP. Prague, Czech Republic. 2012. Jacobson V, Smetters DK, Thornton JD, Plass MF, Briggs NH, Braynard RL. Networking named content. ACM CoNEXT, Rome, Italy; December 2009. Laoutaris N, Syntila S, Stavrakakis I. Meta algorithms for hierarchical web caches. In: IEEE IPCCC. Phoenix, USA; April 2004. Lee M, Cho K, Park K, Kwon TT, Choi Y. SCAN: scalable content routing for contentaware networking. In: IEEE ICC. Kyoto, Japan; June 2011.
113
Lee S-W, Kim D, Ko Y-B, Kim J-H, Jang M-W. Cache capacity-aware CCN: selective caching and cache-aware routing. In: IEEE GLOBECOM. Atlanta, USA; December 2013. Li Y, Xie H, Wen Y, Zhang ZL. Coordinating in-network caching in content-centric networks: model and analysis. In: IEEE ICDCS. Philadelphia, USA; July 2013. Long Y, Pan L, Yan Z. Off-path and on-path collaborative caching in named data network. AsiaFI, Kyoto, Japan; August 2012. OPNET Technologies. 〈http://www.opnet.com/〉. Psaras I, Chai WK, Pavlou G. Probabilistic in-network caching for informationcentric networks. In: Proceedings of the ACM SIGCOMM ICN workshop. Helsinki, Finland; August 2012. Wang Y, Lee K, Venkataraman B, Shamanna RL, Rhee I, Yang S. Advertising cached contents in the control plan: necessity and feasibility. In: Proceedings of the IEEE INFOCOM NOMEN workshop. Orlando, USA; March 2012. Wendell P, Freedman MJ. Going viral: flash crowd in an open CDN. In: ACM SIGCOMM. Toronto, Canada; August 2011. Xie M, Widjaja I, Wang H. Enhancing cache robustness for content-centric networking. In: IEEE INFOCOM. Orlando, USA; March 2012. Zhang B, Iosup A, Pouwelse J, Epema D.. Identifying, analyzing, and modeling flashcrowds in BitTorrent. In: IEEE P2P. Japan; August 2011. Zhang L, Estrin D, Burke J, Jacobson V, Thornton JD, Smetters DK, et al., Named Data Networking (NDN) Project. Technical Report. PARC; October 2010.