Web Proxy Caching: Do's, Don'ts and Expectations - Semantic Scholar

4 downloads 23779 Views 340KB Size Report
develop a dynamic replacement algorithm that continuously utilizes the best algorithm as the ... served from the cache rather than from the remote web server.
Web Proxy Cache Replacement: Do’s, Don’ts, and Expectations† Peter Triantafillou1

Ioannis Aekaterinides2

Abstract Numerous research efforts have produced a large number of algorithms and mechanisms for web proxy caches. In order to build powerful web proxies and understand their performance, one must be able to appreciate the impact and significance of earlier contributions and how they can be integrated. To do this we employ a cache replacement algorithm, ‘CSP’, which integrates key knowledge from previous work. CSP utilizes the communication Cost to fetch web objects, the objects’ Sizes, their Popularities, an auxiliary cache and a cache admission control algorithm. We study the impact of these components with respect to hit ratio, latency, and bandwidth requirements. Our results show that there are clear performance gains when utilizing the communication cost, the popularity of objects, and the auxiliary cache. In contrast, the size of objects and the admission controller have a negligible performance impact. Our major conclusions going against those in related work are that (i) LRU is preferable to CSP for important parameter values, (ii) hit ratio results tend to be very misleading in predicting the true performance of algorithms, (iii) accounting for the objects’ sizes does not improve latency and/or bandwidth requirements, and (iv) the collaboration of nearby proxies is not very beneficial. In addition to CSP and LRU we study the well-known GDS algorithm, which although utilizes similar terms, it does so in a different way and cannot be modeled with the CSP algorithm. Based on these

results, we chart the problem solution space, identifying which algorithm is preferable and under which conditions. Finally, we develop a dynamic replacement algorithm that continuously utilizes the best algorithm as the problem-parameter values (such as the access skew distributions) change with time.

Keywords: web proxies, caching, replacement algorithms, collaboration, performance measurements Technical Areas: Distributed Data Management, Distributed Operating Systems

1. Introduction A web proxy is a computer system that stands between web information servers and a community of users/clients. It stores web objects locally in its cache, so that later requests for the same objects can be served from the cache rather than from the remote web server. Since some objects are requested frequently, the web traffic is reduced considerably, the server is off-loaded, and the user-observed latencies are reduced



Partial funding for this work was provided by IST FET project DBGlobe number IST-2001-32645. Department of Computer Engineering and Informatics, University of Patras, Rio Patras, 26500, Greece, [email protected] 2 Department Electronic and Computer Engineering, Technical University of Crete, Chania, 73100, Greece, [email protected] 1

1

[11]. Therefore, web proxies can prove very beneficial and thus constitute a key part of the fundamental infrastructure of web-based information systems. Motivation and Goals of this Research During the past few years a great wealth of knowledge on efficient web proxy cache replacement has been accumulated. This knowledge is in terms of components, which replacement functions must include and on auxiliary algorithms and resources. Briefly, the accumulated knowledge suggests employing a cache replacement function that takes into account the objects’ popularities, sizes, and access costs, favoring small, remote, and popular objects. In addition, auxiliary caches and cache admission control can help further by eliminating unfortunate, eager replacements and maintaining key statistics (e.g., for estimating popularities). Finally, collaborating ‘nearby’ proxies can further improve performance. In order to be able to build powerful web proxies and understand their performance, one must be able to appreciate the true impact and significance of these contributions and how they can be integrated. In doing so, one faces the following challenges: 1. Typically, the performance benefits of each of these components have been evaluated against a base algorithm (such as LRU). When all of these components are integrated and used by a proxy, it becomes difficult to appreciate the performance impact attributable to each component, under varying system configurations and application characteristics. 2. Despite the, admittedly, very large number of cache replacement algorithms proposed by researchers, several real-world products are relying (stubbornly one might think) on the ‘good old’ LRU as opposed to the more sophisticated algorithms. This could be partly due to not knowing how does each component impact the overall performance. At any rate, it is certainly worth studying whether and under which system configurations (i.e., cache sizes, number of collaborating proxies, etc.) and application characteristics (e.g., access distributions) the more elaborate algorithms outperform LRU, and for which metrics. 3. Typically, the related work advocating/proposing specific algorithms, substantiates claims of superior performance mainly by showing higher cache hit (or byte hit) ratios (which is the single most popular

2

cache performance metric). Some, in addition, show lower average user latencies. However, these are estimated through overly simplistic mean access latency metrics, which are heavily dependent on the hit ratio, such as AverageLatency=MissRatio*AverageNetDelay. Alternatively, latency figures are taken from traces, which is also an unfortunate decision given the very large variance in retrieving even the same objects from the web at different times. The consequence of this erroneous estimation of latencies is that the differences in the performance of algorithms with respect to the hit ratio metric are translated in similar latency performance differences. However, as we shall show, this is not correct and leads to incorrect conclusions regarding the performance impact of algorithms when complex multi-term replacement criteria are deployed. Our paper addresses these concerns. In addition we chart the problem-space, describing which algorithm is preferable according to fundamental problem parameter values and metrics and we develop a dynamic algorithm, which uses the best algorithm as the values change with time. The rest of the paper is organized as follows. In section 2 we present an overview of related work in proxy caching. Section 3 describes the trace-driven, simulation-based performance study setup. It discusses our access cost models, the trace generation process, the collaboration modelling, the algorithms tested, the goals, and the experiments performed. In section 4 we present and analyse the results from our experiments. We demonstrate the performance of the algorithms under various environments. The comparison is based on the metrics of hit ratio, mean latency, and bandwidth requirements (web traffic). In section 5 we present a dynamic replacement algorithm and finally, in section 6 we present the concluding remarks of the paper.

2. Related Work Hierarchical Web Proxy Cache Systems Hierarchically-organized proxy caches (e.g. Harvest, Squid [6,7]) were proposed to improve overall performance. The key idea lies in the cooperation of proxies: misses on the local proxy cache are satisfied by other caches either higher in the hierarchy or by caches at the same level in the hierarchy. Each cache decides whether to fetch the object from the remote site or from some other cooperating cache in the hierarchy.

3

Non-hierarchical Web Proxy Cache Systems As hierarchical caches serve more clients, the response time perceived by each client, worsens due to three reasons. These are the large distance between clients and caches, the increased load on caches and the large number of levels in the hierarchical cache. Thus, these cache hierarchies may perform very well regarding hit ratios, but the performance based on the clients’ response times may be poor. Three basic design choices should be adopted by large-scale caches in order to improve clients’ response times: 1) hits and misses should result in a minimum number of hops to locate the object, 2) data should be shared among many users and many caches, and 3) caches should be closer to clients. Current cache architectures routinely violate these principles at a significant performance cost. The proposed system in [10] is built around a scalable data-location service called the hint hierarchy. The hint hierarchy allows each cache to locate the source (proxy cache or remote server) of each object that requires the minimum number of hops in order to be accessed. In [20] a new approach is proposed, based on the ‘Summary Cache’, which allows data sharing among a large number of caches. Each cache keeps a summary of the URLs of objects cached at the other collaborating caches. In the case of a local cache miss these summaries are first checked to see whether a collaborating cache can be helpful. ISP-wide Proxies and CRISP CRISP (Caching and Replication for Internet Service Performance [8,9]) is a collaborative cache that can support more users than a central proxy cache. It is scalable; it can be expanded, adding new caches, and thus supporting larger user communities. CRISP caches cooperate using a central directory service with a complete directory of the cache contents of all participating proxies. The CRISP cache is cooperative. If the requested object is cached at any proxy in the cooperative cache, the object is retrieved from that proxy rather than the remote web site. Upon a request for an object the proxy queries the directory server (mapping server). The directory server maintains a complete directory of all local directories that correspond to cooperating proxies. Proxies notify the directory server every time they add or remove an object from the cache. A design similar to CRISP, for similar environments, but for video data proxy caching is adopted in the MiddleMan system [16].

4

Cache Replacement Algorithms Cache replacement schemes for Web proxies can be categorized as follows: i) Traditional replacement policies and their extensions. The bulk of these are based on LRU. Some replacement algorithms that belong to this category are: Size-adjusted LRU, Least Frequently Used (LFU), SIZE [3], LRUMIN [3], LRU-THOLD [22], and LRU-K [5]. ii) Key-based policies. The idea is to sort objects based upon a primary key, break ties based on a secondary key and so on. Such policies are LOG2-SIZE [3] and HYPER-G [3]. iii) Function-based replacement policies. The idea is to employ a potential general function of different factors such as the time since last access, the entry time of the object in the cache, the cost retrieving the object from the web site, and so on. Then the replacement algorithm makes decisions of which object to evict from the cache, based on the specific function value associated with each cached object. Such algorithms are: the SLRU [1], the PSS [1,the LNC-R-W3-U [2], the LRV [4], the Greedy Dual-Size (GDS) [19], and a generalization of Greedy Dual-Size algorithm called Greedy Dual* [21]. Web Object Access Costs Models Since one of the most important reasons for employing web proxies is to reduce the average latency for retrieving web objects, it is important to adopt an adequate model for the communication latency component. Since this is a formidable task, the majority of researchers do not model the communication latency component of retrieving objects from their web servers at all. Instead they take advantage of the latency information, which is included in web traces collected from various sites. This method has proved to lead to misleading conclusions. After detailed examination of web traces [17] it has been found that the communication latency to retrieve the same object varies significantly even after the passing of a short time (a fact also acknowledged by [19]). This of course leads to erroneous decisions when we are studying and/or comparing the impact of different cache replacement algorithms on the mean latency. In the literature, two attempts to model communication latency are as follows. The first one, proposed in [1] considers the communication latency to be equal to q—S+2—(1-q)—A—R where S is the size of the object, A is the average size of objects, q is a factor between 0 and 1 an R is a random number between 0 and 1. The second one

5

proposed in [11], considers the latency to be equal to A+S/BW, where A is a constant amount of time, S is the size of the object and BW is the bandwidth of the communication link.

3. Performance Study We have conducted a performance study with the goal of evaluating the impact of several key components and mechanisms of a cache replacement scheme. The CSP replacement algorithm embodies these components, as they have been proposed in the literature by other researchers. We will concentrate on the performance of the proxy cache. For this, we will not take into account delays experienced during the interaction between the clients’ browsers and the proxy; instead all delays will be from the point of view of the proxy.

3.1 Study Setup Modelling Web Object Access Costs In general, the access cost (measured in time units) of accessing an object from a web server includes several components ([23]). The total costs include DNS resolution times, overheads due to the TCP protocol (e.g., connection establishment times), (proxy) server RAM access and I/O times. In addition, the access costs depend on link bandwidths, and router (processing and buffering) capacities. Since the Internet itself is a collection of different servers, routers, and links with very different performance characteristics, the task of modelling web object access costs is a formidable one and is outside the scope of this paper. Especially, given the trend of continuous infrastructure improvements (which can also perhaps explain why the literature contains conflicting data as to the contribution of some of the above components to the overall performance [15]). A recent study [15] identified the bottleneck within the Internet itself. So we focus on the communication cost component of the total access cost. We will assume the existence of well configured and efficient DNS, proxy, and web servers and routers, as well as efficient transport protocols. For our purposes we wish to employ a simple communication cost model that satisfies four requirements: i)

reflects the cost differences when fetching web objects of different sizes,

6

ii) is parametric and sensitive to the load in the Internet, iii) reflects the fact that different proxy cache replacement algorithms have different impacts on the Internet load, due to the different hit ratios they achieve, and iv) reflects the different link characteristics of the links traversed in typical scenarios. We believe that a model that satisfies these requirements can be simple enough, on the one hand, and also powerful enough to allow the proper evaluation of different replacement algorithms with respect to their latency performance. For the above reasons we have modelled the communication costs as follows. Modeling Communication Cost The communication time needed to retrieve an object is not directly proportional to its size. The time also depends on the distance (hops) and the link bandwidth between the server that holds the object and the proxy. To model network retrieval times we assume that requested objects belong to four categories. This categorization is based on observations we have made examining real world web accesses on how many and what type of access links are required to retrieve the object from our own site. These categories are: i)

Local objects: 2 hops, each with a 2 Mbps link (reflecting, say, accesses within Crete)

ii) Nearby objects: 2 hops, each with a 2 Mbps link and 2 hops each with a 32 Mbps link (reflecting, say, accesses within Greece). iii) Distant objects: 2 hops with a 2 Mbps link, 2 hops with a 32 Mbps link, and 15 hops with a 50 Mbps link (reflecting, say, accesses within Europe). iv) Very distant objects: 2 hops with a 2 Mbps link, 2 hops with a 32 Mbps link, and 30 hops with a 50 Mbps link (reflecting, say, accesses outside Europe). A certain percentage of the requests refer to each of these categories; in our traces we have used two scenarios. In the first one, 5% of all requests are for local objects, 15% for nearby, 40% for distant and 40% for very distant objects. In the second scenario, each of the four categories receives 25% of the requests. We model the delay at each hop by a M/M/1 queuing system, where the arrivals form a Poisson process at a constant average rate. The interarrival and service times of a Poisson process are independent of each other and each have an exponential distribution. Since the service times depend on an object’s size, the size

7

distribution of objects must follow the exponential distribution. The (body of the) size distribution (as we have found it in the literature and the traces we examined and) used in our workloads obeys the lognormal distribution. We found that the particular lognormal distribution used, exhibits only slightly higher standard deviation than the one we should expect if the distribution was exponential. In fact, the value of sqrt(var(size)) / E(size), where E(size) denotes the expected value of variable ‘size’ and var(size) denotes its variance, is 1.3 (which is very close to the value of 1 in the exponential distribution case). Thus our assumption of an exponential distribution, which is implicit in the use of the M/M/1 queuing system, seems justified. Now, for each hop consider that the link utilization is ρ and the mean number of clients waiting in the queue is N. The communication latency in each hop is written as: D= ( N + 1)

ρ 1 where N = and 1− ρ µ

1 size = . Size is the object’s mean size in bytes and BW is the bandwidth of the link. The total µ BW communication latency (CL) for example, for the fourth category of objects (Very distant) can be written as:  size   size    size CL =  ( N + 1)  ⋅ 2 +  ( N + 1)  ⋅ 2 +  ( N + 1)  ⋅ 30  BW local   BW national   BW int ernational 

where BWlocal = 2 Mbps, BWnational = 32 Mbps, and BWint ernational = 50 Mbps. (These link bandwidths are average bandwidths observed when accessing objects in Europe from Greece). The communication cost for accessing objects in the other categories is derived in the same way. Later, when discussing latency results, we will explain how we reflect the different hit ratios achieved by the algorithms in the link utilizations. We have used real web traces to drive our proxy servers. However in this paper we report the performance results using a tool called SURGE [13] for generating web proxy workloads since it allow us the flexibility to test the sensitivity of our results with respect to different values of system parameters (such as the skewness of access distribution) which are found at different traces. At any rate our results with the SURGE tool are very similar to those we obtained from the real traces.

8

Creating Synthetic Workloads SURGE (Scalable URL Reference Generator) [13] was used to generate the workloads. In particular, SURGE’s models can be varied to explore expected future demands or other alternative conditions. Each generated trace contains 140.000 requests and corresponds to one-day-long workloads observed in real traces. There are approximately 35000 unique objects in each trace. The request arrival rate at the proxy is 1.8 requests/sec. We used the following distributions for file size, popularity and temporal locality. i) Object Sizes. These size distributions can be heavy-tailed, meaning that a proxy server must deal with highly variable object sizes. The model used here is the lognormal distribution for the body and a Pareto for the tail. The mean size of the objects is equal to 15 Kbytes. ii) Popularity. This property of the workload reflects the probability of referencing particular objects. Popularity distribution for Web objects has been shown to follow Zipf’s Law [14]. Zipf’s Law states that if objects are ordered from the most popular to the least popular, then the probability P of referencing a file tends to be inversely proportional to its rank. That is, the probability of object i, is:

Pi =

k , iθ

i = 1,2,…,N and 0 ≤ θ ≤ 1

where k is a normalizing constant so that all probabilities sum to 1. When θ approaches 1 (0), the distribution becomes more skewed (uniform). iii) Temporal Locality. Based on studies of real web traces we have chosen to model the distribution of stack distances using the lognormal distribution. The traces we used have very similar characteristics to real traces (not elaborated here for space reasons). In order to study the impact of varying the parameter θ of the Zipf distribution in proxy cache performance, we generated three sets of traces corresponding to three different values of θ. These values are θ=1.0, 0.8 and 0.6. Studies show that expected θ values range from 0.6 to 0.8 [14]. However, related work also considers θ values up to one [12,13].

9

Modeling Proxy Collaboration The overall performance can be improved by introducing collaboration among proxy caches. More precisely, in the case of a local cache miss, if a cooperating ‘nearby’ cache has the object, then it will be retrieved from that cache rather than the object’s distant home site. Cooperating caches share their contents by propagating their directory to all participating caches every T seconds. In order to study the benefits attainable from collaborating proxies we employed the following model, based on three parameters. First, a percentage of the objects in the traces driving each proxy was common to all proxies. More precisely, this percentage was set, unless stated otherwise, at 50%. Second, the total probability mass of the common objects was varied; in the traces used for this paper we used 60% and 80% of all proxy accesses to refer to the common objects. Finally, the results we report in this paper had the collaborating proxies connected through dedicated 2Mbps links.

3.2 Cache Replacement Algorithms and Mechanisms In our study we tried to measure and compare the performance of several replacement policies under a cooperative environment. These policies are based on the well-known LRU, the Cost Size Popularity (CSP) algorithm, the CP-SQRT(S), the CP-LOG(S), the CP algorithm and the CS algorithms. A detailed description of how these algorithms work follows after an explanation of the basic mechanisms used. Caching Gain For every object in the cache we compute its caching gain, which is a mathematical expression involving (for the case of the CSP replacement algorithm) the size, the popularity, and the communication cost to retrieve the object. Objects with smaller caching gain are more likely candidates for eviction. Popularity Modeling/Estimation We can approximate the object’s popularity, since it is not known in advance, by computing the object’s request rate λi, which indicates how popular an object is. Since λi =1/MTTRi where MTTRi is the Mean Time To Reaccess the object, we can compute MTTR using a well-known technique found in the literature (such as the one in [18]). More precisely, MTTR is computed as a weighted sum of the interarrival times between

10

the consecutive accesses. Thus MTTR at time t1 is, MTTR (t1) = (1-α)(t1- t0) + αMTTR (t0) where t0 is the most recent access time and t1 is the current time. The averaging factor α can be tuned to ‘forget’ past accesses faster or slower. In our experiments we have chosen a value of 0.5 for α. Admission Control and the Use of an Auxiliary Cache The main task of an admission control policy is to decide which objects should be cached and which should not. Studies have shown that the admission control policy works well, especially if we combine it with a small auxiliary cache that acts as an additional filter. The auxiliary cache contains metadata information about each object and is also needed in order to compute the MTTR of each object. Putting Everything Together For objects fetched from the web for the 1st time, the proxy simply enters a record in the auxiliary cache with the object id and its reference timestamp. When referenced for the 2nd time, the MTTR value for the object is computed and the admission controller is called to determine whether the object should be cached. This decision is based on the caching gain function associated with the replacement algorithm. The replacement algorithm determines all these objects that would be evicted to make room for the new object. This action is taken if the admission controller determines that the new object has a greater caching gain than the sum of the caching gains of the objects candidates for eviction. Note that when using an auxiliary cache, we may experience one lost hit (since on the first reference only metadata is cached). The LRU Algorithm LRU deletes as many of the least recently used objects, as it is necessary to have sufficient space for the newly accessed object. LRU does not employ an auxiliary cache and admission control. The CSP (Cost Size Popularity) Algorithm This algorithm takes into account the size, the communication access cost, and the popularity of an object. For every object, i, in the cache we compute its caching gain, which can be written as:

CG i =

Cost i Size i ⋅ MTTR i

11

The replacement algorithm places the objects in descending order based on their caching gains. If the admission controller permits it, it evicts from the cache the objects with the lowest values of caching gain until enough space is free to accommodate the newly requested object. We also examined some extensions of CSP by changing the caching gain function and in order to observe how (i) the size term and (ii) the popularity term influence the replacement algorithm’s performance. These algorithms are the CP-SQRT(S) that has the square root of size at the denominator of the caching gain function, the CP-LOG(S) that has the logarithm of size at denominator, the CP that does not embodies the size at all and the CS that does not take into account the popularity term. Performance Metrics The following are the key performance metrics: i)

The hit ratio, defined as: HR=

Requests that hit objects in cache Total number of requests

ii) The client response time (latency) measured in milliseconds. Since we are concentrating on proxy server design issues, it does not include the network latency incurred for the communication between the client’s browser and the proxy. iii) The network bandwidth requirements (web traffic). The network bandwidth requirements is defined to be the amount of data retrieved (in Kbytes) over the web during the duration of playing the trace file. Experiments and Goals We conducted a detailed performance study consisting of a large number of experiments. Its goal is to answer the following key questions: i)

What is the relative performance of the CSP algorithm (as it embodies the key techniques and mechanisms which have been found/proposed by related work) against that of the simple and basic LRU algorithm? In particular, how and what do the popularity term, the size term, the auxiliary cache, and the admission control mechanisms contribute to the performance?

12

ii) How do the performance results as measured by the metrics of the hit ratio, the user latency, and the network bandwidth requirements compare against each other? iii) How do variations in the application environment and system configurations (i.e., cache sizes, number of collaborating proxies, skewness of access distributions) impact the answers to the above questions?

4 Performance Results In this section we present the performance results answering the above questions. Unless explicitly stated otherwise, when we refer to CSP we imply the use of CSP with auxiliary cache and admission control.

4.1 Results on Hit Ratio Performance First, we present the performance of each policy based on the hit ratio metric. It is obvious that as the cache size grows, all algorithms perform better. Since more space is available, more objects can be stored, which means that more hits are generated. Figure 1 shows the performance of LRU and CSP for a 2-proxy configuration and for variable cache size. The θ parameter is 0.8. We also examined the performance as a function of θ. As you can see in figure 2 when θ=0.6, there is a very small difference between the two policies considering a cache size of 1%. In more uniform workloads (e.g. θ=0.6) the CSP policy seems to have poor performance because it cannot exploit the popularity of each object. LRU on the other hand performs much better than CSP, especially with large cache sizes. As figure 1 shows, even for θ=0.8 when the cache size becomes adequately large, LRU outperforms CSP. As you can see in figure 2 for large values of θ (θ=1.0) and small cache sizes LRU performs negligibly better compared to θ=0.6: the hit ratio stays under 30%. On the other hand CSP increases its hit ratio by 76.9% compared to θ=0.6. When considering very large cache sizes (30%), where there is plenty of available space, the difference in performance observed for each algorithm by varying θ is not significant as you can

13

see in figure 2. The main reason for this is that there is enough space to accommodate many objects and thus many hits are generated which causes the hit ratio to approach its maximum value. By comparing the two algorithms we can definitely say that LRU outperforms CSP for large cache sizes, and for more uniform workloads. In more skewed workloads (θ ≥ 0.8) CSP is better, except for very large caches (Figure 1). More precisely, CSP performs better by 75.3% for small cache sizes where the space is at a premium. The main reason is that the popularity factor in CSP pays off, because the access distribution is more skewed. For very large cache sizes though, where the available space is enough to accommodate many objects, even LRU can keep enough hot objects and derive higher hit ratios. In this case we can see in figure 1 that LRU performs better by 7.3%. The worse performance of CSP in large cache sizes can be explained by considering the cost of the auxiliary cache, which is necessary in order to reliably compute MTTR for each object. Its drawback is that we lose one hit (the first one) compared to LRU.3 Hit Ratio vs θ for small (1%) and large (30%) caches

90

90

80

80

70

70 Hit Ratio (%)

Hit Ratio (%)

Hit Ratio vs Cache Size(θ=0.8)

60 50 40

60 50 40

30

30

20

20 10

10 1

2 LRU 2 Proxies

10 30 Cache Size (%) CSP 2 Proxies

Figure 1. Hit Ratio with varying cache size. θ=0.8

0,6 LRU 2 proxies 1% LRU 2 proxies 30%

0,8

1

θ

CSP 2 proxies 1% CSP 2 proxies 30%

Figure 2. Hit Ratio with varying θ. Cache size is 1% and 30% of the maximum required space

3

Suppose that we have a request for object i. It enters the auxiliary cache. On the second request for the same object, it tries to enter the main cache. The result depends on the admission control policy. On the third request for i we may have a hit. If we use the LRU policy, we will have a hit on the second request because neither the auxiliary cache nor the admission control policy exists. So, in the CSP policy we lose at least one hit for every object that enters the cache

14

4.2 Results on Latency Performance In section 3 we explained in detail the way we compute the time needed to retrieve an object from the Web. We name the maximum value of link utilization ρ to be ρmax and we will assign to it representative values for relatively highly loaded systems. As the link utilizations depend on the (hit ratio of) proxy cache replacement algorithms, which in turn depends also on the cache sizes, this value (ρmax ) is made to correspond to the configuration employing the LRU policy and the smallest tested cache size (1% of the total object sizes). Since other cache policies may enjoy higher hit ratios and in general lower network bandwidth requirements, clearly we must use a different value for ρ when computing the latencies occurred with these policies/configurations. To do this, we measure the total number of bytes that were fetched from the web server (i.e., missed the cache) for this basic LRU and 1% cache size configuration (‘Total Kbytesmax’) and with all other configurations. For the other configurations, p, we compute the link utilization as follows:

ρp =

Total Kbytes p Total Kbytesmax

⋅ ρmax

Unless otherwise stated, ρmax = 0.8. We will also report results with ρmax values 0.9 and 0.95. As expected, the mean latency reduces as the cache size increases as you can see in figure 3. For smaller cache sizes and θ=0.8, CSP performs better than LRU by 18.6% (figure 3) while for larger cache sizes LRU outperforms CSP. Generally, we noticed that the mean latency depends on the replacement policy. For small caches and more skewed distributions, CSP performs better because it tends to keep in the cache objects with large retrieval cost and popularity, and small size. In large caches where we have the ability to accommodate many objects, LRU performs better than CSP, as one would expect from the above results on the hit ratio performance. As θ grows, for small caches we can see in figure 4 that CSP performs better than LRU except in the case where θ=0.6 and cache size is 1%. This observation agrees with the results on the hit ratio metric. For larger cache sizes, the LRU algorithm performs better than the CSP, as explained above. A key observation when examining our latency results is that the increase in the hit ratio of CSP versus that of LRU does not yield an analogous decrease in latency, especially for small caches. We saw for

15

example for a 2-proxy environment, cache size 1%, and θ=0.8 (figure 1) an improvement of 75.3% in the hit ratio while the corresponding decrease in latency (figure 3) is only 18.6%. We also saw that a very small difference in the hit ratio of CSP and LRU translates sometimes in a relatively big difference in latency. This is attributable to the dual role the ‘size’ term is playing. It helps improve hit ratios, which can improve mean latencies. But it also results in CSP fetching larger objects from the web, a fact that hurts its latency (and network bandwidth requirements) performance.

Latency vs θ for small (1%) and large (30%) caches

1100

1100

1000

1000

900

900

800

800

Latency (msec)

Latency (msec)

Latency vs Cache Size (θ=0.8)

700 600 500 400

700 600 500 400

300

300

200

200 100

100 1

2

LRU 2 Proxies 1%

10 30 Cache Size (%) CSP 2 Proxies 1%

Figure 3. Latency with varying cache size. θ=0.8

0,6 LRU 2 Proxies 1% LRU 2 Proxies 30%

0,8

1

θ CSP 2 Proxies 1% CSP 2 Proxies 30%

Figure 4. Latency with varying θ. Cache size is 1% and 30% of the maximum required space

It is not unusual for the web network backbone to experience high data traffic. In this case the network may become congested. In our experiments so far we assigned the maximum link utilization ρmax to be 0.8. However, we also examined the performance on communication latency, with maximum link utilization ρmax equal to 0.9 and 0.95, which corresponds to very heavily- loaded networks. As you can see in figure 5, the decrease in latency when comparing the CSP with the LRU algorithms, increases as the link utilization increases too for small cache sizes. As the cache size increases the difference of the performance of the two algorithms decreases too. The reason for this is that in large cache sizes the communication bandwidth requirements are reduced for both algorithms and thus they do not depend as much on the web network

16

backbone. For small cache sizes where the algorithms generate heavy network traffic, even the slightest difference with regards to hit ratio results in high mean latencies because more accesses are satisfied from the network.

4.3 Results on Network Bandwidth Requirements We also tested how the cache replacement policies performed when network bandwidth is considered. The network bandwidth requirements are expressed as the number of Kbytes retrieved from the web. All algorithms perform better as the cache size grows. The reason for this is that in large caches there is plenty of available space to accommodate many objects. Thus, fewer objects are retrieved from the web. The same observation holds for the CSP policy when increasing the value of the parameter θ from 0.6 to 0.8. As we noticed above, the performance of CSP is improved as θ grows when considering the hit ratio. Higher values of hit ratio imply more hits and thus fewer retrievals from the web. So, when examining the CSP policy, the network bandwidth requirements decrease as the workload becomes more skewed. For small cache sizes we observed a decrease of 5.5% in the number of total Kbytes retrieved from the Web while the decrease in the large cache sizes was 1.8%. The situation is reversed when examining the LRU policy. We observed a slight increase in network usage when θ goes from 0.6 to 0.8. This increase was 4.67% for small cache sizes and 2.89% for large ones. This increase in network bandwidth requirements for LRU when θ increases from 0.6 to 0.8 can be intuitively attributed to the fact that LRU fails to capture the more skewed popularities. Recall from figure 2 that we had also observed a lower hit ratio, for θ=0.8 compared to θ=0.6. If we compare the two algorithms we can see in figure 6 that for large cache sizes LRU performs better for every value of θ. This can be explained as the CSP policy is found to be consistently worse for this configuration even for the metrics of latency and hit ratio. A key conclusion is that the dramatic improvements enjoyed by CSP observed when examining the hit ratio metric do not exist for the network bandwidth metric. For example, for θ=0.8, cache size = 1% and 2 proxies, CSP enjoys a hit ratio that is 75% higher than that of LRU. However this translates to only about 4%

17

improvement in terms of network bandwidth requirements. The explanation for this is similar to that given for the low latency improvements above. Latency vs Cache Size (θ=0.8) Comparing Different Link Utilization 2100 1900

Latency (msec)

1700 1500 1300 1100 900 700 500 300 100 1

2

10

30 Cache Size (%)

LRU 2 Proxies 1% (ρmax=0,8) CSP 2 Proxies 1% (ρmax=0,8) LRU 2 Proxies 1% (ρmax=0,9) CSP 2 Proxies 1% (ρmax=0,9) LRU 2 Proxies 1% (ρmax=0,95) CSP 2 Proxies 1% (ρmax=0,95)

Figure 5. Latency under different maximum link utilizations

Figure 6. Network bandwidth usage for large and small cache sizes and varying θ.

4.4 Comparing With Non CSP Algorithms: The GDS Algorithm In another thread, we have also compared the performance of another well-known algorithm called GDS [19], against that of CSP and LRU. We have found that in general GDS outperforms CSP and LRU in terms of hit ratio, with our results comparing GDS and LRU being similar to those reported by the GDS authors in [19]. However, our results also show that GDS performs poorly in terms of the latency and network bandwidth metrics. We have measured the performance of GDS(1) which tries to maximize hit ratio, and GDS(lat) which tries to minimize overall latency. Results on Hit Ratio Performance As you can see in figure 9 GDS(1) outperforms other policies in all cases, except in skewed workloads (θ=1.0) and small cache size (1%), where CSP is marginally better than GDS(1). This shows that

18

GDS(1) does it’s job well as it tries to maximize hit ratio. The main reason for this is that GDS(1) tries to keep in cache small object and thus much more objects than other policies which yields in higher probability of a cache hit. GDS(lat) performs as well as LRU when considering hit ratio. Hit Ratio vs θ for small (1%) and large (30%) caches

Latency vs θ for small (1%) and large (30%) caches 1200

90

1100

80

1000 Latency (msec)

Hit Ratio (%)

70 60 50 40 30

800 700 600 500 400 300

20

200

10 0,6 LRU 2 proxies 1% GDS(1) 2 proxies 1% LRU 2 proxies 30% GDS(1) 2 proxies 30%

900

0,8

θ 1 CSP 2 proxies 1% GDS(lat) 2 proxies 1% CSP 2 proxies 30% GDS(lat) 2 proxies 30%

Figure 9. Hit ratio with varying θ. Cache size is 1% and 30% of the maximum required space.

100 0,6 LRU 2 proxies 1% GDS(1) 2 proxies 1% LRU 2 proxies 30% GDS(1) 2 proxies 30%

0,8

θ 1 CSP 2 proxies 1% GDS(lat) 2 proxies 1% CSP 2 proxies 30% GDS(lat) 2 proxies 30%

Figure 10.Latency with varying θ. Cache size is 1% and 30% of the maximum required space.

Results on Latency Performance As you can see in figure 10, the performance of GDS(1) for all cache sizes is getting worse as θ grows.

It performs better than LRU and CSP for more uniform workloads, while for θ=1.0 the situation is reversed. The performance of GDS(lat) is worse than of GDS(1) for all workloads and cache sizes, except the case of skewed workload and large cache size. In general for small caches and skewed workloads (θ = 1.0) CSP performs better by 49.1% compared to GDS(lat) and by 23.8% compared to GDS(1), while in more uniform workloads

(θ =0.6) GDS(1)

outperforms other policies, as mentioned in [19]. For large caches LRU performs better only in skewed workloads.

19

Results on Network Bandwidth Requirements For small cache sizes and more skewed workloads CSP is better than GDS(1) by 1.2% and GDS(lat) by 4.6% while for more uniform workloads and large caches LRU is better than GDS(1) by 18.1 % and GDS(lat) by 17.7%. The worst is GDS(1) even though GDS(1) enjoys higher hit ratios, because it prefers to store small objects and fetch large ones from the web, which yields in higher network congestion.

4.5 Charting the Problem Solution Space and Discussion of the Results Throughout our study we observed that there is a disparity between hit ratio results and the results concerning our two other metrics (latency and bandwidth). As you can see in Table 1, GDS(1) is the preferred policy when considering hit ratio, while in other cases where the performance metric is latency or bandwidth requirements, GDS(1) is not as good as one would expect based on his performance on hit ratio. An obvious conclusion is that the hit ratio metric is a very poor predictor of the performance of complex multi-term replacement algorithms. We also see that the performance results regarding the latency metric follow the same trends as the results regarding the bandwidth requirements metric. The fact that the bandwidth results were explicitly measured in our experiments, leads us to believe that our analytically estimated latency results are valid. θ Value

Cache Size

Parameter of Zipf Distribution

Percentage of Maximum Required Space

Hit Ratio

Latency

Bandwidth Requirements

0.6 0.6

1% 30%

GDS(1) GDS(1)

GDS(1), LRU, GDS(lat) GDS(1), LRU, GDS(lat)

LRU LRU

0.8 0.8

1% 30%

GDS(1), CSP GDS(1)

CSP, GDS(1) GDS(1), LRU, GDS(lat)

CSP LRU

1.0 1.0

1% 30%

CSP, GDS(1) GDS(1)

CSP LRU

CSP LRU

Table 1. Overall performance based on hit ratio, latency, and bandwidth requirements metrics

Note also that the disparity between the hit ratio and latency results is smaller than that between the hit ratio and the network bandwidth requirements results. This holds since the improvement in hit ratio comes

20

partly from expelling larger objects from the cache. However, fetching these larger objects from the web impacts adversely network bandwidth requirements in a direct way. On the other hand, latency is only partially adversely affected (since latency is also affected by other factors, such as the number of hops, link bandwidths, etc.). Table 1 summarizes the results for representative values of θ and the cache size and with respect to the performance metrics of interest.

5. Dynamic Replacement Algorithm Having obtained the performance results for each studied replacement algorithm as a function of accessskew distribution, cache size, and of the performance metric of interest, we can go one step further and develop a replacement algorithm, which dynamically adjusts, in order to continuously provide the best performance. This algorithm is briefly described, in the following. The main idea behind the dynamic replacement algorithm is the selection of the best algorithm based on the conditions under which the proxy is operating. The dynamic algorithm makes the right decision by looking up a table which summarizes the performance of the algorithms studied (i.e., the GDS(1), the GDS(lat), the LRU, and the class of the CSP algorithms presented earlier). This table shows the performance of each algorithm under different values of θ and the required cache size based on three different performance metrics: hit ratio, latency and network bandwidth requirements. The performance table looks like table 1 in section 4.5. The only difference is that there are many more different values of θ and disk size. More precisely, θ ranges from 0.4 to 1.0 with 0.05 step and disk size varies from 1% to 30% with 2.5% step. The dynamic replacement algorithm monitors the request stream that arrives at a proxy, tries to estimate its properties, and by consulting the performance table, it chooses the appropriate algorithm.

Main Subsystems Description The primary subsystems of a proxy server, employing the dynamic replacement algorithm, are shown in figure 11. The popularity monitoring subsystem monitors the request stream that arrives at the proxy server.

21

Meta-data information is stored for every requested object, in order to have the ability to closely approximate the value of the parameter θ. The disk size monitoring subsystem monitors the request stream in order to compute the percentage of the maximum required space, which corresponds to the available disk size of the proxy server. In other words, if after a short time the aggregated size of all requested objects is for example 200Mbytes and the proxy server employs a cache with an 100Mbytes hard disk, then the disk size is the 50% of the maximum required space, needed to store all requested objects. The final input is given by the proxy administrator which selects the appropriate performance metric, selecting one of maximizing hit ratio, minimizing latency or minimizing network bandwidth requirements. Having set the performance metric and estimated the value of the parameter θ and the required cache size, the dynamic algorithm looks up the performance table at regular intervals and if its worth doing so, changes the replacement policy choosing the best one as determined from the results of section 4. Clients Web Server

Proxy Server Internet

Workload

Popularity Distribution Monitoring

Disk Size Monitoring

(θ estimator)

Performance Metric -Hit Ratio -Latency -BW Reqs

Replacement Algorithm Decision Mechanism

Request Processing

Figure 11. Proxy server with dynamic replacement algorithm.

6 Contribution and Concluding Remarks Web proxy cache replacement algorithms have received a great deal of attention from academia. The key knowledge accumulated by related research can be very briefly summarized as follows: Web proxy

22

caches, in order to yield high performance in terms of one or more of the metrics of hit ratio, communication latency, and network bandwidth requirements must: i) employ a cache replacement policy that takes into account the communication latency to fetch objects from the Web, the size of the objects, and the (estimated) popularity of the objects; ii) employ an auxiliary cache which holds meta-data information for the objects and acts as a filter and an admission control policy which further reduces the probability of unfortunate cache replacement decisions; iii) exploit ‘nearby’ caches, building collaborative web proxy caches across organizational wide networks which can offer additional performance improvement. Despite the fact that the above techniques (which have been embodied in our CSP algorithm) have been proposed by researchers long ago, most web proxies continue to employ the ‘good old’ LRU policy. Our results show LRU to be better than the sophisticated CSP algorithm in environments where the access distribution of objects is less skewed and/or in configurations where proxies enjoy large caches. Given that most web proxies use caches on magnetic disks and that there are several web traces showing moderately skewed distributions (with θ values between 0.6 and 0.7) [14], the choice of LRU may seem justified, depending on proxy configuration and application characteristics. Our results have also shown that the importance of measuring hit ratios of complex replacement policies (involving size, popularities, and communication cost terms) is of little importance. In fact, the hit ratio results (and thus any results on simplistic metrics of latency, which rely heavily on hit/miss ratios and average communication delays) can be very misleading. This happens since different terms in the multi-term replacement criteria may be working in conflicting ways. Namely, including the ‘size’ term in the replacement function improves the hit ratio, since smaller objects are favored and, thus, more objects can be cached. In turn, higher hit ratios do improve latencies, in general. However, including the ‘size’ term also has the effect of fetching larger objects from the web, which hurts latency (and network bandwidth requirements). In this sense, the ‘size’ term conflicts with the ‘communication cost’ term, which tries to fetch objects with small latencies. Similarly, we have seen small hit ratio differences to translate into large latency and network bandwidth requirements differences, a phenomenon occurring for large caches and attributable to the fact that the CSP replacement causes very large objects to be evicted and later fetched from the web.

23

Another major conclusion is that the collaboration benefits made possible in the system configurations we examined are rather small, both in terms of latency improvement and in terms of network bandwidth requirements improvements. This holds despite the fact that we, as other studies have found, also noticed significant improvements in the total hit ratio, up to 30%, due to the collaborative hits. With respect to the key components of the efficient replacement policy we have found that the auxiliary cache, the popularity component, and the communication cost component play an important role in the performance of CSP. However, the size and the admission control seem to affect mostly the hit ratio performance and only in a minor way the latency and the network bandwidth requirements performance. Finally, we took advantage of the previous results, charting the problem solution space with respect to the algorithm with the best performance as a function of popularity distribution, required cache size, and desired performance metrics. Having charted the problem space, we developed a dynamic replacement algorithm, which monitors the environment and chooses the right algorithm to apply, based on the characteristics of the proxy environment. In the future we plan to extend the above results and utilize them in order to determine optimal proxy placement algorithms and study web proxy designs for continuous media and mixed-media applications.

References [1]

Charu Aggarwal, Joel L. Wolf and Philip S. Yu, “Caching on the World Wide Web”, IEEE Transactions on Knowledge and Data Engineering, vol. 11, no. 1, January/February 1999

[2]

Junho Shim, Peter Scheuermann and Radek Vingralek, “Proxy Cache Algorithms: Design, Implementation and Performance”, In IEEE Transactions on Knowledge and Data Engineering, vol. 11, no. 4, July/August 1999

[3]

Stephen Williams, Marc Abrams, Charles R. Standridge, Ghaleb Abdulla and Edward A. Fox, “Removal Policies in Network Caches for World-Wide Web Documents”, In Proceedings. of ACM SIGCOMM, pp. 293-305, 1996.

[4]

Luigi Rizzo and Lorenzo Vicisano, “Replacement Policies for a Proxy Cache”, Research Note RN/98/13, Department of Computer Science, University College London, 1998.

[5]

Elizabeth J. O’Neil, Patrick E. O’Neil, and G. Weikum, “An Optimal Proof of the LRU-K Page Replacement Algorithm”,In Journal of the ACM, Vol. 46 No. 1, 1999.

[6]

D. Neal, “The Harvest Object Cache in New Zealand”, 5th Int. WWW Conference May, 1996.

[7]

Anawat Chankhunthod, Peter Danzig and Chuck Neerdaels, “A Hierarchical Internet Object Cache”, In Proceedings of the USENIX Technical Conference, San Diego, CA, January 1996.

24

[8]

S.Gadde, J. Chase and M. Rabinovich, “A Taste of Crispy Squid”, In Workshop on Internet Server Performance (WISP'98), Madison, WI, June 1998.

[9]

S. Gadde, J. Chase and M. Rabinovich, “Reduce, Reuse, Recycle: An Approach to Building Large Internet Caches”, Sixth Workshop on Hot Topics in Operating Systems (HotOS-VI), pages 93-98, May 1997.

[10] Renu Tewari, Michael Dahlin, Harrick M. Vin and Jonathan S. Kay, “Design Considerations for Distributed

Caching on the Internet”, Tech. Report TR98-04, Dept of Computer Science, University of Texas at Austin, 1998. [11] Mohammad S. Raunak, Prashant Shenoy, Pawan Goyal and Krithi Ramamritham, “Implications of Proxy Caching

for Provisioning Networks and Servers”, In Proceedings of the ACM International Conference on Measurement and Modeling of Computer Systems SIGMETRICS 2000), pages 66-77, Santa Clara, CA, June 2000 [12] Martin Arlitt and Carey Williamson, “Web Server Workload Characterization: The Search for Invariants”, In the

ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, May 1996. [13] Paul Barford and Mark Grovella, “Generating Representative Web Workloads for Network and Server

Evaluation”, ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, pages 151160, 1998. [14] Lee Breslau, Pei Cao, Li Fan, Graham Phillips and Scott Shenker, “Web Caching and Zipf-like Distributions:

Evidence and Implications”, IEEE INFOCOM, 1999 [15] Md Ahsan Habib

and Marc Abrams, “Analysis of Sources of Latency in Downloading Web Pages”, in

Proceedings of WebNet 2000, San Antonio USA, November, 2000. [16] Soam Acharya and Brian Smith, “MiddleMan: A Video Caching Proxy Server”, ACM NOSSDAV Conf., 2000. [17] Weekly Access Logs at NLANR's Proxy Caches, available from ftp://ircache.nlanr.net/Traces/ [18] Renu Tewari, Harrick M. Vin, Asit Dan and Dinkar Sitaram, “Resource-based Caching for Web Servers”, In

Proceedings of the SPIE/ACM Conference on Multimedia Computing and Networking, January 1998. [19] Pei Cao and Sandi Irani, “Cost-Aware WWW Proxy Caching Algorithms”, USITS 1997 [20] Li Fan, Pei Cao, Jussara Almeida and Andrei Broder, “Summary Cache: A Scalable Wide-Area Web Cache

Sharing Protocol”, In Proceedings of ACM SIGCOMM'98, Feb 1998. [21] Shudong Jin and Azer Bestavros, “GreedyDual* Web Caching Algorithm: Exploiting the Two Sources of

Temporal Locality in Web Request Streams.” In Proceedings of the 5th International Web Caching and Content Delivery Workshop, May 2000 [22] M. Abrams, C. Standridge, G. Abdulla, S. Williams, and E. Fox., "Caching Proxies: Limitations and Potentials",

In Proceedings of 1995 World Wide Web Conference, December 1995. [23] Balachander Krishnamurthy and Craig E. Wills, “Analyzing factors that influence end-to-end Web performance”,

In Proceedings of 2000 World Wide Web Conference / Computer Networks, May 2000

25

Appendix A. Additional Performance Results To be read at the discretion of the reviewers.

A.1 The Impact of the Auxiliary Cache and the Admission Control Policy As we noted above the auxiliary cache plays a role in cache performance. Combining it with an admission control policy may lead to better performance. It filters the workload and makes the cache less sensitive to workload transient changes. Thus, objects tend to stay longer in the cache and also they can prove that it is worthwhile to cache them (as they must become popular enough to pass the admission control and enter the cache). We ran a number of experiments in order to examine how the auxiliary cache and the admission control affect the performance. We report our results for a 2-proxy cooperative environment and for θ=0.8. First, we tested the CSP-NO-AUX algorithm. It differs from CSP in that it does not use an auxiliary cache. Since the auxiliary cache also helps in the computation of MTTR we used the exact popularity of each object pi (assuming that these are known in advance) instead of the MTTR in the caching gain. We have found that the MTTR mechanism approximates well enough the popularity of each object (yielding results that are very close, although slightly worse to those computed using the pi values) and thus we can compare CSP-NO-AUX and CSP. The negative impact of using the auxiliary cache was the possible lost hit as explained earlier. The performance is affected in a manner that is inversely proportional to the cache size. More precisely, as you can see in figure 12, for small caches (1%) and θ=0.8 the hit ratio increases by 47.9% when using the auxiliary cache while in large caches it increases by 39.6%. The latency reduction observed for small caches is 17.3% while for large caches this reduction is 36.3% (figure 13). This also agrees with the observation we made before: an increase in the hit ratio, especially for smaller caches, does not correspond to a proportional reduction in latency. These numbers, even if one removes the benefits from employing the exact popularities versus employing MTTR, show that employing an auxiliary cache proves to be very beneficial.

26

We also examined the impact of the admission control policy under the same configuration (2 proxies, θ=0.8). The new algorithm CSP-NO-ADM has an auxiliary cache but not an admission control policy. Influence of the Auxiliary Cache

Influence of the Auxiliary Cache

90 80 Latency (msec)

Hit Ratio (%)

70 60 50 40 30 20 10 1 CSP 2 Proxies

2

10 Cache Size (%)

CSP-NO-AUX 2 Proxies

Figure 12. Comparison of CSP with CSP-NOAUX based on Hit Ratio metric, under varying cache size. (θ=0.8)

1200 1100 1000 900 800 700 600 500 400 300 200 100 1 CSP 2 Proxies

2

10 Cache Size (%)

CSP-NO-AUX 2 Proxies

Figure 13. Comparison of CSP with CSP-NOAUX based on latency metric, under varying cache size. (θ=0.8)

Having an admission control policy improves the performance. More precisely, CSP performs better compared to CSP-NO-ADM when considering the hit ratio. The increase observed is 37.8% for small cache sizes and 3.6% for large caches. This testifies that admission control is important for achieving high hit ratios, especially for smaller cache sizes. The latency reduction, though, observed when using admission control is only 2.1% for small caches and 2.6% for large caches. The admission control plays a role in performance because it prevents too frequent replacements, which yield lower performance. It also caches objects only if it is profitable to do so.

A.2 The Impact of Collaboration The overall performance of LRU and CSP in terms of the hit ratio metric improves as the number of caches that cooperate increases. We found that the maximum increase of 29.6% in the hit ratio occurs when employing the CSP algorithm, between a 1-proxy and a 5-proxy configuration, for large cache sizes and θ

27

equal to 0.6. On the other hand, the overall performance in terms of communication latency is not improved as much by introducing collaboration. We found a decrease of 11.7% when employing the CSP algorithm for the same configuration. The same observation holds for the performance on network bandwidth requirements where the corresponding decrease is 8.8%. As we mentioned in section 3 in traces driving each proxy, 60% of all accesses referred to common objects. In order to study further the benefits of collaboration we increased this percentage up to 80%. The results, in general, showed modest improvements in latency and bandwidth requirement. For instance, for the configuration of 2 proxies employing the CSP replacement algorithm, θ=0.8, and cache size =10%, the results with regard to latency showed that the latency increased by 12% when the popularity of the common objects fell from 80% to 60%. The corresponding increase in network bandwidth requirements was 11%. Furthermore for large cache sizes (30%) we found that the collaboration is not beneficial because there is plenty of space in the local caches to store objects. We also studied the effect of the bandwidth of the link connecting the collaborating caches. We increased the bandwidth from 2 Mbps to 100Mbps and observed a small reduction in the latency. For example, for cache size = 10%, θ=0.8, and 2 proxies, the improvement in latency was 3,64% when the percentage of references to common objects was 80%.

A.3 The Impact of Size We examined how the size component in the caching gain function affects the performance. We degrade the influence of the object size by considering the three algorithms CP-SQRT(S), CP-LOG(S), and CP. We briefly report on the performance of these algorithms and on the performance of the CSP algorithm for small cache sizes, a 2-proxy environment and relatively skewed workload (θ=0.8). We concluded that it is worthwhile to include the size in the caching gain because otherwise the hit ratio decreases by 80.4% in the case where we do not take into account the size but only the communication cost and the popularity of the object (i.e., the CP algorithm). At the same time, we observed only a small increase in latency by 10.96% and in network bandwidth requirements by only 3.3%. We expected a higher increase in latency (and

28

network bandwidth requirements) due to the large decrease observed in the hit ratio. The reason for this relatively small latency (and network bandwidth requirements) increase is, again, due to the dual impact on performance of the ‘size’ term. Since it makes the replacement algorithm favor smaller objects, it induces higher hit ratios, which in turn induce, in general, lower mean latencies. However, favoring smaller objects implies that larger objects are fetched from the web, which has a negative impact on latency (and network bandwidth requirements) performance.

A.4 The Impact of MTTR We have also studied the significance of the popularity (MTTR term) in the caching gain function through the study of the CS replacement policy. We observed that by including only the size and the cost of the object in the caching gain function (i.e., the CS algorithm), the performance degrades. More precisely, in small cache sizes the hit ratio decreases by 25.3% while for large caches by 2.8%. The latency increases by 8.6% for small caches and by 27.8% for large ones. Thus, MTTR must be employed in the caching gain in order to have good performance. Note again that, for small caches, the latency differences are much smaller than the hit ratio differences. For larger caches, small hit ratio differences translate into much larger latency differences. This is due to the same reasons explained earlier (i.e., higher hit ratios are due to fetching larger objects, which adversely impacts latency). Only now, when having larger caches, some larger objects are still cached at the expense of truly very large objects, which tend to affect latency much more significantly.

29

Suggest Documents