Web Proxy Cache Placement, Replacement, and ... - Semantic Scholar

1 downloads 0 Views 258KB Size Report
problems of proxy-cache replacement and placement and to combine them in order to .... and compare the performance of several replacement policies under a.
Web Proxy Cache Placement, Replacement, and the ProxyTeller1 Peter Triantafillou2

Ioannis Aekaterinides3 Abstract

With this paper our intention is to first study comprehensively some key issues for the efficient delivery of internet content. In particular, we report the results of our study on the performance of prominent cache replacement algorithms and their key components, as they have been proposed in the literature. We chart the problem solution space, identifying under which conditions each algorithm is preferable, for which performance metric, and the impact of components such as the size of cached objects, their communication cost, and their popularity. Subsequently, we focus on the performance of proxy-cache placement algorithms. The results of the above two studies are finally used to meet our main goal with this research; that is to develop a tool which helps decide on the number and placement of proxy caches required to achieve certain performance goals with respect to network bandwidth requirements, mean user latency, and/or hit ratio. Such a tool is missing and can be additionally very useful in, for example, determining the required investment in order to meet certain performance goals, and test the performance of new proxy placement algorithms and how they interact with underlying replacement algorithms.

1. Introduction Currently there are billions of data objects in the web. Typically, a fraction of these objects will be requested more frequently. A web proxy is a computer system that stands between web information servers and a community of users/clients. It stores web objects locally in its cache, so that later requests for the same objects can be served from the cache rather than from the remote web server. Since some objects are requested frequently, the web traffic is reduced considerably, the server is off-loaded, and the user-observed latencies are reduced [1]. Therefore, web proxies appropriately placed within a network can prove very beneficial and thus constitute a key part of the fundamental infrastructure of web-based information systems. Motivation and Goals of this Research During the past few years a great wealth of knowledge on efficient web proxy cache replacement [6,12,13,14,15,16,17] has been accumulated. This knowledge is in terms of components, which replacement functions must include and on auxiliary algorithms and resources. Briefly, the accumulated knowledge suggests employing a cache replacement function that takes into account the objects’ popularities, sizes, and access costs, favoring small, remote, and popular objects. In addition, auxiliary caches and cache admission control can help further by eliminating unfortunate, eager replacements and maintaining key statistics (e.g., for estimating popularities). In order to be able to build powerful web proxies and understand their performance, one must be able to appreciate the true impact and significance of these contributions and how they can be integrated. Moving to the level of the network, a great deal of attention has also been paid on the replication of server content and the suitable placement of server replicas in nodal points of the network. A number of placement algorithms that can be applied in Content Delivery Networks were developed [8]. For obvious reasons, the efficient placement of web proxy caches, as opposed to the placement of full server replicas, is also a central issue for which the existing server-replica placement algorithms can be applied and combined with the results on the performance of individual proxies (and in particular, of proxy cache replacement algorithms). With this work we will study these algorithms and also address the interesting issue of the interaction of the proxy-cache placement algorithms with the replacement algorithms used at individual proxies.

1

This research was partially funded by the IST-FET project DBGlobe, number IST-2001-32645 Department of Computer Engineering and Informatics, University of Patras, Rio Patras, 26500, Greece, [email protected] 3 Department of Electronic and Computer Engineering, Technical University of Crete, Chania, 73100, Greece, [email protected] 2

1

The central goal of this work is to conduct performance studies to test available solutions for the problems of proxy-cache replacement and placement and to combine them in order to produce a tool that can help reach an optimal decision with respect to the number of proxy-caches and their placement within the network, required to meet certain performance goals, in terms of the metrics of mean user-observed latency, network bandwidth requirements, and cache hit ratio. We believe that such a tool is very much lacking. We have implemented such a tool, for which a demo is available.

2. Performance Study of Cache Replacement Algorithms We have conducted a performance study with the goal of evaluating the impact of several key components and mechanisms of a cache replacement scheme. With our CSP (Cost Size Popularity) replacement algorithm, we embodiy these components, as they have been proposed in the literature by other researchers. We will concentrate on the performance of a single proxy cache. For this, we will not take into account delays experienced during the interaction between the clients’ browsers and the proxy; instead, all delays will be from the point of view of the proxy.

2.1 Study Setup Modelling Web Object Access Costs In general, the access cost (measured in time units) of accessing an object from a web server includes several components ([7]). The total costs include DNS resolution times, overheads due to the TCP protocol (e.g., connection establishment times), (proxy) server RAM access and I/O times. In addition, the access costs depend on link bandwidths and router (processing and buffering) capacities. Since the Internet itself is a collection of different servers, routers, and links with very different performance characteristics, the task of modelling web object access costs is a formidable one and is outside the scope of this paper; especially, given the trend of continuous infrastructure improvements (which can also explain why the literature contains conflicting data as to the contribution of some of the above components to the overall performance [4]). A recent study [4] identified the bottleneck within the Internet itself. So, we focus on the communication cost component of the total access cost. We will assume the existence of well configured and efficient DNS, proxy, web servers and routers, as well as efficient transport protocols. For our purposes we wish to employ a simple communication cost model that satisfies four requirements: i) reflects the cost differences when fetching web objects of different sizes, ii) is parametric and sensitive to the load in the Internet, iii) reflects the fact that different proxy cache replacement algorithms have different impacts on the Internet load, due to the different hit ratios they achieve, and iv) reflects the different link characteristics of the links traversed in typical scenarios. We believe that a model that satisfies these requirements not only it can be simple enough, but also powerful enough to allow the proper evaluation of different replacement algorithms with respect to their latency performance. For the above reasons we have modelled the communication costs as follows. Modeling Communication Cost The communication time needed to retrieve an object is not directly proportional to its size. The time also depends on the distance (hops) and the link bandwidth between the server that holds the object and the proxy. To model network retrieval times we assume that requested objects belong to four categories. This categorization is based on observations we have made examining real world web accesses on how many and what type of access links are required to retrieve the object from our own site. These categories are: i) Local objects: 2 hops, each with a 2 Mbps link (reflecting, say, accesses within Crete) ii) Nearby objects: 2 hops, each with a 2 Mbps link and 2 hops each with a 32 Mbps link (reflecting, say, accesses within Greece). iii) Distant objects: 2 hops with a 2 Mbps link, 2 hops with a 32 Mbps link, and 15 hops with a 50 Mbps link (reflecting, say, accesses within Europe). iv) Very distant objects: 2 hops with a 2 Mbps link, 2 hops with a 32 Mbps link, and 30 hops with a 50 Mbps link (reflecting, say, accesses outside Europe).

2

A certain percentage of the requests refers to each one of these categories; in our traces we have used 5% of all requests for local objects, 15% for nearby, 40% for distant, and 40% for very distant objects. We model the delay at each hop by a M/M/1 queuing system, where the arrivals form a Poisson process at a constant average rate. The interarrival and service times of a Poisson process are independent of each other and each have an exponential distribution. Since the service times depend on an object’s size, the size distribution of objects must follow the exponential distribution. The (body of the) size distribution (as we have found it in the literature [2] and the traces we examined and) used in our workloads obeys the lognormal distribution. We found that the particular lognormal distribution used exhibits only slightly higher standard deviation than the one we should expect if the distribution was exponential. In fact, the value of sqrt(var(size)) / E(size), where E(size) denotes the expected value of variable ‘size’ and var(size) denotes its variance, is 1.3 (which is very close to the value of 1 in the exponential distribution case). Thus, our assumption of an exponential distribution, which is implicit in the use of the M/M/1 queuing system, seems justified. Now, for each hop consider that the link utilization is ρ and the mean number of clients waiting in the queue is N. The communication latency in each hop is written as: D= ( N + 1)

ρ 1 where N = and 1− ρ µ

1 size . Size is the object’s mean size in bytes and BW is the bandwidth of the link. The total = µ BW communication latency (CL), for example, for the fourth category of objects (Very distant) can be written as:  size   size    size CL =  ( N + 1)  ⋅ 2 +  ( N + 1)  ⋅ 2 +  ( N + 1)  ⋅ 30  BW local   BW national   BW int ernational 

where BWlocal = 2 Mbps, BWnational = 32 Mbps, and BWint ernational = 50 Mbps. (These link bandwidths are average bandwidths observed when accessing objects in Europe from Greece). The communication cost for accessing objects in the other categories is derived in the same way. We assign the maximum value of link utilization ρmax (=0.8) to the proxy configuration that generates the maximum traffic at network (measured in Kb, Total Kbytesmax). For the rest configurations, p, we use a different value ρp that can be computed as: ρ p =

Total Kbytes p Total Kbytesmax

⋅ ρmax

The above model we believe offers definite advantages compared to related work in which the latency is either taken from trace files (an unfortunate decision, since repeated requests give drastically different values) or calculated using oversimplified formulas (such as object_size / bandwidth). We have used real web traces to drive our proxy servers. However, in this paper we report the performance results using a tool called SURGE [2] for generating web proxy workloads since it allows us the flexibility to test the sensitivity of our results with respect to different values of system parameters (such as the skewness of access distribution) which are found at different traces. At any rate our results with the SURGE tool are very similar to those we obtained from the real traces. Creating Synthetic Workloads SURGE (Scalable URL Reference Generator) [2] was used to generate the workloads. In particular, SURGE’s models can be varied to explore expected future demands or other alternative conditions. Each generated trace contains 140.000 requests and corresponds to one-day-long workloads observed in real traces. There are approximately 35000 unique objects in each trace. We used the following distributions for file size, popularity and temporal locality. i) Object Sizes. These size distributions can be heavy-tailed, meaning that a proxy server must deal with highly variable object sizes. The model used here is the lognormal distribution for the body and a Pareto for the tail. The mean size of the objects is equal to 15 Kbytes. ii) Popularity. This property of the workload reflects the probability of referencing particular objects. Popularity distribution for Web objects has been shown to follow Zipf’s Law [3].

3

iii) Temporal Locality. Based on studies of real web traces we have chosen to model the distribution of stack distances using the lognormal distribution. The traces we used have very similar characteristics to real traces (not elaborated here for space reasons). In order to study the impact of varying the parameter θ of the Zipf distribution in proxy cache performance we generated three sets of traces corresponding to three different values of θ. These values are θ=1.0, 0.8 and 0.6. Studies show that expected θ values range from 0.6 to 0.8 [3]. However, related work also considers θ values up to one [11,2].

2.2 Tested Cache Replacement Algorithms and Mechanisms In our study we tried to measure and compare the performance of several replacement policies under a cooperative environment of two proxies that share their content. These policies are based on the well-known LRU, the Cost Size Popularity (CSP) algorithm, the CP-SQRT(S), the CP-LOG(S), the CP algorithm and the CS algorithms. In addition, we also tested the performance of the well-known GDS algorithm [6]. A detailed description of how these algorithms work follows after an explanation of the basic mechanisms used. Caching Gain For every object in the cache we compute its caching gain, which is a mathematical expression involving (for the case of the CSP replacement algorithm) the size, the popularity, and the communication cost to retrieve the object. Objects with smaller caching gain are more likely candidates for eviction. Popularity Modeling/Estimation We can approximate the object’s popularity, since it is not known in advance, by computing the object’s request rate λi, which indicates how popular an object is. Since λi =1/MTTRi where MTTRi is the Mean Time To Reaccess the object, we can compute MTTR using a well-known technique found in the literature (such as the one in [5]). More precisely, MTTR is computed as a weighted sum of the interarrival times between the consecutive accesses. Consequently, MTTR at time t1 is, MTTR (t1) = (1-α)(t1- t0) + αMTTR (t0) where t0 is the most recent access time and t1 is the current time. The averaging factor α can be tuned to ‘forget’ past accesses faster or more slowly. In our experiments we have chosen a value of 0.5 for α. Admission Control and the Use of an Auxiliary Cache The main task of an admission control policy is to decide which objects should be cached and which should not. Studies have shown that the admission control policy works well, especially if we combine it with a small auxiliary cache that acts as an additional filter. The auxiliary cache contains metadata information about each object and is also needed in order to compute the MTTR of each object. Putting Everything Together For objects fetched from the web for the 1st time, the proxy simply enters a record in the auxiliary cache with the object’s id and its reference timestamp. When referenced for the 2nd time, the MTTR value for the object is computed and the admission controller is called to determine whether the object should be cached. This decision is based on the caching gain function associated with the replacement algorithm. The replacement algorithm determines all these objects that would be evicted to make room for the new object. This action is taken if the admission controller determines that the new object has a greater caching gain than the sum of the caching gains of the objects candidates for eviction. Note that when using an auxiliary cache, we may experience one lost hit (since on the first reference only metadata is cached). The LRU Algorithm LRU deletes as many of the least recently used objects, as it is necessary to have sufficient space for the newly accessed object. LRU does not employ an auxiliary cache and admission control. The CSP (Cost Size Popularity) Algorithm This algorithm takes into account the size, the communication access cost, and the popularity of an object. For every object, i, in the cache we compute its caching gain, which can be written as:

4

CG i =

Cost i Size i ⋅ MTTR i

The replacement algorithm places the objects in descending order based on their caching gains. If the admission controller permits it, it evicts from the cache the objects with the lowest values of caching gain until enough space is free to accommodate the newly requested object. We also examined some extensions of CSP by changing the caching gain function in order to observe how (i) the size term and (ii) the popularity term influence the replacement algorithm’s performance. These algorithms are the CP-SQRT(S) that has the square root of size at the denominator of the caching gain function, the CP-LOG(S) that has the logarithm of size at the denominator, the CP that does not embody the size term at all, and the CS that does not take into account the popularity term. Performance Metrics The following are the key performance metrics: i) The hit ratio, defined as the percentage of the requests that hit the cache compared to all requests. ii) The client response time (latency) measured in milliseconds. Since we are concentrating on proxy server design issues, it does not include the network latency incurred for the communication between the client’s browser and the proxy. iii) The network bandwidth requirements (web traffic), which is defined to be the amount of data retrieved (in Kbytes) over the web during the duration of playing the trace file.

2.3 Performance Results The detailed results are omitted for space reasons. However they are available at [19]. In the following, we present the main results and conclusions from this study: Results on Hit Ratio Performance By comparing the LRU and CSP performance on hit ratio we can definitely say that LRU outperforms CSP for large cache sizes and when the workload is more uniform. In more skewed workloads (θ ≥ 0.8) CSP is better except for very large caches. More precisely, CSP performs better by 75.3% for small cache sizes where the space is at a premium. The main reason is that the popularity factor in CSP pays off, because the access distribution is more skewed. For very large cache sizes though, where the available space is enough to accommodate many objects, even LRU can keep enough hot objects and derive higher hit ratios. Results on Latency Performance A key observation when examining our latency results is that the increase in the hit ratio of CSP versus that of LRU does not yield an analogous decrease in latency, especially for small caches. We saw for example for a 2-proxy environment, cache size 1%, and θ=0.8 an improvement of 75.3% in the hit ratio while the corresponding decrease in latency is only 18.6%. We also saw that a very small difference in the hit ratio of CSP and LRU translates sometimes in a relatively big difference in latency. This is attributable to the dual role the ‘size’ term is playing. It helps improve hit ratios, which can improve mean latencies. But it also results in CSP fetching larger objects from the web, a fact that hurts its latency (and network bandwidth requirements) performance. We also examined the performance on communication latency, with maximum link utilization ρmax equal to 0.9 and 0.95, which corresponds to very heavily-loaded networks. The decrease in latency when comparing the CSP with the LRU algorithm increases as the link utilization increases too for small cache sizes. As the cache size increases, the difference of the performance of the two algorithms decreases too. The reason for this is that in large cache sizes the communication bandwidth requirements are reduced for both algorithms and thus they do not depend as much on the web network backbone. For small cache sizes where the algorithms generate heavy network traffic, even the slightest difference with regards to hit ratio results in high mean latencies because more accesses are satisfied from the network. Results on Bandwidth Performance

5

A key conclusion is that the dramatic improvements enjoyed by CSP observed when examining the hit ratio metric do not exist for the network bandwidth metric. For example, for θ=0.8, cache size = 1% and 2 proxies, CSP enjoys a hit ratio that is 75% higher than that of LRU. However, this translates to only about 4% improvement in terms of network bandwidth requirements. The explanation for this is similar to that given for the low latency improvements above. The Effect of the Auxiliary Cache The auxiliary cache is necessary in order to reliably compute the MTTR for each object. Its drawback is that we lose one hit (the first one) compared to LRU. Suppose that we have a request for object i. It enters the auxiliary cache. On the second request for the same object, it tries to enter the main cache. The result depends on the admission control policy. On the third request for i we may have a hit. If we use the LRU policy, we will have a hit on the second request because neither the auxiliary cache nor the admission control policy exist. So, in the CSP policy we lose at least one hit for every object that enters the cache. The performance is affected in a manner that is inversely proportional to the cache size. More precisely, for small caches (1%) and θ=0.8 the hit ratio increases by 47.9% when using the auxiliary cache, while in large caches it increases by 39.6%. The latency reduction observed for small caches is 17.3% while for large caches this reduction is 36.3% The Effect of the Admission Control Having an admission control policy improves the performance. To be more accurate, CSP performs better compared to CSP without the admission control when considering the hit ratio. The increase observed is 37.8% for small cache sizes and 3.6% for large caches. This testifies that admission control is important for achieving high hit ratios, especially for smaller cache sizes. The latency reduction, though, observed when using admission control is only 2.1% for small caches and 2.6% for large caches. The Effect of Size We examined how the size component in the caching gain function affects the performance. We degrade the influence of the object size by considering the three algorithms CP-SQRT(S), CP-LOG(S), and CP. We concluded that it is worthwhile to include the size in the caching gain because otherwise the hit ratio decreases by 80.4% in the case where we do not take into account the size but only the communication cost and the popularity of the object (i.e., the CP algorithm). At the same time, we observed only a small increase in latency by 10.96% and in network bandwidth requirements by only 3.3%. The Effect of MTTR We have also studied the significance of the popularity (MTTR term) in the caching gain function through the study of the CS replacement policy. We observed that by including only the size and the cost of the object in the caching gain function (i.e., the CS algorithm), the performance degrades. More precisely, in small cache sizes the hit ratio decreases by 25.3% while for large caches by 2.8%. The latency increases by 8.6% for small caches and by 27.8% for large ones. As a result, MTTR must be employed in the caching gain in order to have good performance. Comparing LRU, CSP, and GDS In another thread, we have also compared the performance of another well-known algorithm called GDS [6], against that of CSP and LRU. (Due to its peculiarities, GDS cannot be modeled by CSP, although it takes into account similar terms). We have found that in general GDS outperforms CSP and LRU in terms of hit ratio, with our results comparing GDS and LRU being similar to those reported by the GDS authors in [6]. However, our results also show that GDS performs poorly in terms of the latency and network bandwidth metrics. We have measured the performance of GDS(1) which tries to maximize hit ratio, and GDS(lat) which tries to minimize overall latency. GDS(1) outperforms other policies in all cases when considering hit ratio, except in skewed workloads (θ=1.0) and small cache size (1%), where CSP is marginally better than GDS(1). This shows that GDS(1) does it’s job well as it tries to maximize the hit ratio.

6

The performance of GDS(1) based on latency, for all cache sizes is getting worse as θ grows. It performs better than LRU and CSP for more uniform workloads, while for θ=1.0 the situation is reversed. The performance of GDS(lat) is worse than of GDS(1) for all workloads and cache sizes, except for the case of skewed workload and large cache size. In general, for small caches and skewed workloads (θ = 1.0) CSP performs better by 49.1% compared to GDS(lat) and by 23.8% compared to GDS(1), while in more uniform workloads (θ =0.6) GDS(1) outperforms other policies, as mentioned in [6]. For large caches LRU performs better only in skewed workloads. For small cache sizes and more skewed workloads CSP is better than GDS(1) by 1.2% and GDS(lat) by 4.6% when considering network bandwidth requirements, while for more uniform workloads and large caches LRU is better than GDS(1) by 18.1 % and GDS(lat) by 17.7%. The worst is GDS(1), even though GDS(1) enjoys higher hit ratios, because it prefers to store small objects and fetch large ones from the web, which yields in higher network congestion. Problem Solution-Space Chart and Discussion Throughout our study we observed that there is a disparity between hit ratio results and the results concerning our two other metrics (latency and bandwidth requirements). As you can see in Table 1, GDS(1) is the preferred policy among the others when considering hit ratio, while in other cases where the performance metric is latency or bandwidth requirements, GDS(1) is not as good as we would expect. The obvious conclusion is that the hit ratio metric is a very poor predictor of the performance of complex multi-term replacement algorithms. We also see that the performance results regarding the latency metric follow the same trends as the results regarding the bandwidth requirements metric. The fact that the bandwidth requirements results were explicitly measured in our experiments, leads us to believe that our analytically estimated latency results are valid. Note also that the disparity between the hit ratio and latency results is smaller than that between the hit ratio and the network bandwidth requirements results. This is due to the fact that the improvement in hit ratio comes partly from expelling larger objects from the cache. However, fetching these larger objects from the web impacts adversely network bandwidth requirements in a direct way. On the other hand, latency is only partially adversely affected (since latency is also affected by other factors, such as the number of hops, link bandwidths, etc.). Table 1 summarizes our findings and charts the problem’s solution space. θ Value Cache Size Hit Ratio Latency Bandwidth Parameter of Percentage of Requirements Zipf Distribution

Maximum Required Space

0.6 0.6

1% 30%

GDS(1) GDS(1)

GDS(1), LRU, GDS(lat) GDS(1), LRU, GDS(lat)

LRU LRU

0.8 0.8

1% 30%

GDS(1), CSP GDS(1)

CSP, GDS(1) GDS(1), LRU, GDS(lat)

CSP LRU

CSP, GDS(1) CSP CSP 1.0 1% GDS(1) LRU LRU 1.0 30% Table 1. Overall performance based on hit ratio, latency, and bandwidth requirements metrics

3 Placement of Web Proxy Caches 3.1 Study Setup We model the network under which the proxy placement takes place as a graph with nodes and edges connecting them. This graph could represent the backbone network of a large ISP. Each one of the nodes covers a concrete geographic region with thousands of users connected to these nodes. Thus supposing that

7

the server is found in the geographical region that covers node 1 and a user connected with node 3 requests an object from the server, then his request will travel from node 3 to 1,and vice-versa. The edges are weighted. These weights represent the time (in msec) that is required in order to transmit a packet of information from one node to another. Therefore, the edge weights show how loaded the links that connect each node are. This weight depends on the geographical distance of nodes, the delay on the routers, the delay due to packet processing, etc. Nodes also have weight. The node weight represents the quantity of information that all users connected to this node request from the web server. Suppose, therefore, that the web server is connected at node 1 and that a trace file, which represents the expected workload of the web server, is available; in order to calculate the node weights we follow this process: initially, users are grouped into clusters based on their geographical position. That means that users found to be geographically close belong to the same cluster. The clustering is based on user IP address (which exists in the trace file) and relative information from BGP Routing Tables [10]. The N larger clusters are randomly assigned to N nodes (where N is the total number of nodes). Then we calculate the amount of information (in MByte or Gbyte) which each cluster requests from the web server and this number constitutes the weight of each node. The proxy placement algorithm will use edge weights, node weights, and the network topology information, in order to decide where and how many proxy servers to place. 3.1.1 Proxy Server Performance Estimation Apart from node weights, edge weights, and the network topology information, the performance of proxy servers plays an important role. The selected performance metric is the Byte Hit Ratio, which depends on the characteristics of the request stream, the available storage size, and the replacement algorithm. The estimation of the BHR (Byte Hit Ratio = 1 – Byte Miss Ratio (BMR)) value of a proxy server that is going to be placed in the network is based on a performance table (like Table 1 of Section 2), which was derived through a number of experiments under various application environments and system configurations. More precisely, we measured the proxy server performance under different sizes of available cache space, different values of the parameter θ of the zipf popularity distribution, and different replacement algorithms (GDS(1), GDS (lat), LRU, and CSP). Thus, for each combination of these values, we know which the preferred replacement algorithm is, and its performance expressed in BHR. The methodology therefore for assigning the right BHR to each proxy server is the following: initially for each cluster of users we calculate the characteristics of their request stream, which are the value of the parameter θ and the available cache size. The latter is expressed as the percentage of the maximum disk space required for storing all requested objects. (In other words if we consider that a proxy server, k, is deployed with 1 GB cache size and the aggregated size of all requested objects from the cluster that is connected to proxy k is 2 GB, then the percentage is 50%). Finally, we consult the performance table so as to find the BHR of the proxy server as well as the algorithm that is going to be used in order to yield in the best possible overall performance. 3.1.2 Modelling of the Total Proxy Placement Cost Consider that n_wti is the weight of node i, e_wtij is the edge weight between the nodes i and j, d(i,j) is the minimum distance between the nodes i and j, N is the number of nodes in the graph, and Cost (G) is the total cost for a proxy placement in a network modelled by a graph G (we also refer to it as the total graph cost). The minimum distance d(i,j) is defined to be the minimum sum of weights along the path from i to j, and is computed after examining all the possible paths. Note that d(i,i) = 0. Consider that the objects that are not served by a proxy server can be served by a nearby proxy with a specific probability. For example, suppose that there are 2 proxy servers, a web server, and one other node. Each proxy server has stored a quantity of information D. A percentage (we call it OL, OverLapping) of the D is common between the proxy servers. Therefore, the overlapped quantity of information is equal to OL*D. The remaining information (which is not common between the proxy servers) can produce a hit to the user requests, which have already generated a miss at the other proxy server. Thus, the quantity of information that was not served by the first proxy server (which is BMR2*n_wt1) is served by the second proxy server with probability OL*BHR3.

8

The algorithm for computing the graph cost is: 1. For each node beginning from i = 1. 2. Find the closest proxy server to i. This is k 3. Cost(G)=n_wti * d (i,k) 4. Find the closest proxy server to k. This is j 5. Cost (G)=Cost (G) + n_wti * d (k,j) * BMRk, where BMRk=1-BHRk*OL 6. k = j 7. Go to step 4 if k ≠ server 8. i= i+1 9. Go to step 1 if i ≠ N Having, therefore, in mind the above cost calculation method, the placement algorithm tries to place the proxy servers in those nodes that will lead to the minimum total proxy placement cost. We also observe that the distance between two nodes d(i,j) can model the transmission delay of a data packet from one node to the other or the number of hops in the path from i to j. 3.1.3 Web Proxy Cache Placement Algorithms Despite the fact that the setup of the system presented in this section is different from the system of a CDN with full server replicas placed throughout the network, we can apply these placement algorithms for our problem at hand as well. However, note that the results that will emerge must be combined with the results of Section 2 for the performance of individual proxy cache performance, in order to understand the performance of different cache placement algorithms. We examined the performance of the well-known Greedy, HotSpot, and Random algorithms. In the study that follows we do not present the results for the tree-based algorithm [9], because its performance was found, as expected, to be comparatively poor when it is applied in a graph and not in a tree topology. The Greedy Algorithm The Greedy algorithm works as follows: Suppose that we want to place M replicas among N potential nodes. We choose one node each time. In the first iteration all the nodes are evaluated to determine which is more appropriate for placing a replica. The total cost of the system is calculated considering that all the requests from users are directed to the node that we select each time. In the end of the iteration we will select the node that will give us the smaller total cost. In the second iteration we try to select the second node that will store a replica (given that the first node is already selected) and finally we select the node with the smaller total cost. The total cost is calculated considering that the users’ requests are directed to the node with replica that is closer to them, that is to say it leads to the smaller cost. This process is repeated until M nodes are found [8]. The Hot Spot Algorithm The basic idea behind this algorithm is the placement of replicas near communities of users that generate the greatest load. Thus, the N nodes are sorted depending on the amount of traffic that is generated in their neighborhood. The replicas are placed in the nodes whose neighborhood produces the greatest load. The neighborhood of node A is defined to be the circle centered at A with some radius. The radius can vary from 0 to the maximum distance between two nodes [8]. The Random Algorithm In this algorithm the nodes that will contain the replicas are selected randomly.

3.2 Performance Study Network Topology For the performance study of the three algorithms (HotSpot, Random and Greedy) the backbone topologies of two major ISPs (AT&T and C&W) in North America were used.

9

In the experiments that follow we used the monthly average values of edge weights, as these were collected from the ISP websites that offer real time and monthly statistics of their network links. Graph Cost Calculation We compute the total cost of a proxy cache placement, following the way described in section 3.1.2. That means that a request which generated a miss at a proxy server, can be served by the next closest proxy. The percentage of common data between the proxy servers is fixed at 55%, while the BHR is the same for all proxies and it is equal to 45%. As it was reported in section 3.1 the BHR (with the help of the described methodology) should be different for each node. However, for simplicity in our study we consider it to be constant and equal to 45%.

3.3 Performance Results The performance results were similar for both topologies, AT&T and C&W. It clearly appears that the Greedy algorithm is more efficient, while the Random, as it is expected, is the worst. Due to lack of space we do not give detailed results. We can see for example that with the placement of 2 proxies on the AT&T topology we achieve 26% improvement with Greedy while the corresponding improvement for the HotSpot is around 12% and the Random around 8%. The corresponding percentage for the topology of C&W is for the Greedy 24%, for the HotSpot 5.5% and for the Random 4.5%. We also observed that the same behavior holds, even if we select different nodes of the network as servers. Our results agree with the performance results of other researchers [8].

4 ProxyTeller: The Proxy Placement Tool Based on the above two studies on the placement of proxy servers in the Internet and on the performance results of cache replacement algorithms, we developed a tool [18] that aims to give a solution to the placement problem using the Greedy placement algorithm. Likely, users will be administrators, who wish to place appropriately proxy servers in nodal points of the Internet. This tool could also be used for deciding on the required investments in order to enjoy certain desired performance improvements as well as for testing different placement algorithms and how they interact with replacement algorithms. In the section that follows we present the tool's basic components and functionality.

4.1 Development Platform We used tools for developing interactive applications in the Internet, in order to make our placement tool easily accessible from a large number of users across the world. In fact, a demo for the tool is available [18]. More precisely, we used the PHP language along with the support of the MySQL database. For the presentation of the tool and its interaction with the users, we used HTML, while for the controls of data imported by the user we selected javascript. The algorithms of proxy placement were developed with C ++, while in certain points it was also essential to use Java. The system in which we developed the tool, supports a large number of users registering their data files and their preferences in the database, after having followed certain simple steps in order to become members of the system.

4.2 A Short Description of ProxyTeller Initially the user attempts to enter the member section, by following certain simple steps in order to become a member of the system or inserting the username and password. In the member web page, the user must determine certain parameters that are essential for the execution of the algorithms, by supplementing suitably certain fields. These are: Network Topology The network topology should be described suitably. Consequently, we use a specific format that describes the number of nodes, the number of edges, the edge weights, and the connectivity of each node.

10

Expected Workload As reported above, a trace file which will contain the request stream, in order to create user clusters and the node weights is essential. Also, the characteristics of workload are measured so as to determine the expected proxy server performance. Thus, the user can either import his trace file which he believes it reflects the future demand or use already existing trace files generated by SURGE. Available Disk Size The size plays important role in the proxy server performance. Therefore, the user is asked to determine the size of the proxy caches that are going to be placed in the network. Desired Performance Improvement In this part the user is required to define the percentage of the desired performance improvement that he wishes to achieve, by placing proxy servers in the network. This percentage describes the reduction of the total cost that is achieved because of the suitable placement, against the cost without any proxies at all in the network. Additionally, the user can define the number of proxies he wishes to place. Performance Metrics The user is asked to determine the performance metric of interest for individual proxy servers. He has three choices: to minimize the bandwidth requirements, to minimize the response time, or maximize the hit ratio. Depending on what the user selects, the proxy server will be described by a certain value of Byte Hit Ratio and the suitable replacement algorithm for each proxy server will be proposed (selecting one of LRU, GDS(1), GDS(lat) and CSP, from our results in Section 2). After specifying these essential parameters the routine, which is responsible for proxy placement, is called. At the end, the user will see the results of Greedy placement. More precisely, the user can see the network with the proxy servers being placed, the percentage of improvement, the number of proxies placed, and the replacement algorithm that the tool proposes for each proxy server.

5 Contribution and Concluding Remarks Web proxy cache replacement algorithms have received a great deal of attention from academia. The key knowledge accumulated by related research can be very briefly summarized as follows: Web proxy caches, in order to yield high performance in terms of one or more of the metrics of hit ratio, communication latency, and network bandwidth requirements must: i) employ a cache replacement policy that takes into account the communication latency to fetch objects from the Web, the size of the objects, and the (estimated) popularity of the objects; ii) employ an auxiliary cache which holds meta-data information for the objects and acts as a filter and an admission control policy which further reduces the probability of unfortunate cache replacement decisions; iii) exploit ‘nearby’ caches, building collaborative web proxy caches across organizational wide networks which can offer additional performance improvement. Despite the fact that the above techniques (which have been embodied in our CSP algorithm) have been proposed by researchers long ago, most web proxies continue to employ the ‘good old’ LRU policy. Our results show LRU to be better than the sophisticated CSP algorithm in environments where the access distribution of objects is less skewed and/or in configurations where proxies enjoy large caches. Given that most web proxies use caches on magnetic disks and that there are several web traces showing moderately skewed distributions (with θ values between 0.6 and 0.7) [3], the choice of LRU may seem justified, depending on proxy configuration and application characteristics. Our results have also shown that the importance of measuring hit ratios of complex replacement policies (involving size, popularities, and communication cost terms) is of little importance. In fact, the hit ratio results (and thus any results on simplistic metrics of latency, which rely heavily on hit/miss ratios and

11

average communication delays) can be very misleading. This happens since different terms in the multi-term replacement criteria may be working in conflicting ways. Based on the above results we were able to know the expected performance of proxies employing the best replacement algorithm. This knowledge gave us the ability to study the performance of algorithms that try to place efficiently proxy caches within the nodes of a network (as opposed to full server replicas). We developed a study platform and after applying these placement algorithms, we found that the Greedy algorithm performs better. Finally, we developed a tool, which gives the ability to users to find the most suitable proxy placement, on a given network, under various workload characteristics and proxy configurations. More precisely, given the network topology, the expected workload, the available cache size, and the performance goal (percentage of performance improvement in any one of the metrics of hit ratio, mean latency, and network bandwidth) the tool presents the most efficient placement by applying the Greedy algorithm. We believe that this tool could be very useful to researchers as well as to content delivery network administrators.

References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19]

Mohammad S. Raunak, Prashant Shenoy, Pawan Goyal and Krithi Ramamritham, “Implications of Proxy Caching for Provisioning Networks and Servers”, In Proceedings of the ACM International Conference on Measurement and Modeling of Computer Systems SIGMETRICS 2000), pages 66-77, Santa Clara, CA, June 2000 Paul Barford and Mark Grovella, “Generating Representative Web Workloads for Network and Server Evaluation”, ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, pages 151160, 1998. Lee Breslau, Pei Cao, Li Fan, Graham Phillips and Scott Shenker, “Web Caching and Zipf-like Distributions: Evidence and Implications”, IEEE INFOCOM, 1999 Md Ahsan Habib and Marc Abrams, “Analysis of Sources of Latency in Downloading Web Pages”, in Proceedings of WebNet 2000, San Antonio USA, November, 2000. Renu Tewari, Harrick M. Vin, Asit Dan and Dinkar Sitaram, “Resource-based Caching for Web Servers”, In Proceedings of the SPIE/ACM Conference on Multimedia Computing and Networking, January 1998. Pei Cao and Sandi Irani, “Cost-Aware WWW Proxy Caching Algorithms”, USITS 1997 Balachander Krishnamurthy and Craig E. Wills, “Analyzing factors that influence end-to-end Web performance”, In Proceedings of 2000 World Wide Web Conference / Computer Networks, May 2000 L. Qiu, V. N. Padmanabhan, and G. M. Voelker, “On the placement of Web server replicas”, In Proceedings of IEEE Infocom 2001, April 2001 B. Li, M. J. Golin, G. F. Italiano, and X. Deng, “On the placement of Web proxies in the Internet”, In Proceedings of Infocom 2000, March 2000. B. Krishnamurthy and J. Wang, “On network-aware clustering of web clients”, In Proceedings of ACM Sigcomm 2000, August 2000 Martin Arlitt and Carey Williamson, “Web Server Workload Characterization: The Search for Invariants”, In the ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, May 1996. Stephen Williams, Marc Abrams, Charles R. Standridge, Ghaleb Abdulla and Edward A. Fox, “Removal Policies in Network Caches for World-Wide Web Documents”, In Proceedings. of ACM SIGCOMM, pp. 293-305, 1996. M. Abrams, C. Standridge, G. Abdulla, S. Williams, and E. Fox., "Caching Proxies: Limitations and Potentials", In Proceedings of 1995 World Wide Web Conference, December 1995. Elizabeth J. O’Neil, Patrick E. O’Neil, and G. Weikum, “An Optimal Proof of the LRU-K Page Replacement Algorithm”,In Journal of the ACM, Vol. 46 No. 1, 1999. Charu Aggarwal, Joel L. Wolf and Philip S. Yu, “Caching on the World Wide Web”, IEEE Transactions on Knowledge and Data Engineering, vol. 11, no. 1, January/February 1999 Junho Shim, Peter Scheuermann and Radek Vingralek, “Proxy Cache Algorithms: Design, Implementation and Performance”, In IEEE Transactions on Knowledge and Data Engineering, vol. 11, no. 4, July/August 1999 Luigi Rizzo and Lorenzo Vicisano, “Replacement Policies for a Proxy Cache”, Research Note RN/98/13, Department of Computer Science, University College London, 1998. ProxyTeller: A Proxy Cache Placement Tool. Demo available at http://pcp.softnet.tuc.gr Detailed Performance Results of Cache Replacement Algorithms available at http://pcp.softnet.tuc.gr/docs/pr.htm

12

Suggest Documents