Cache Pollution in Web Proxy Servers
Rassul Ayani Department of Microelectronics and Information Technology Royal Institute of Technology 164 40 Kista, Stockholm, SWEDEN email:
[email protected] Yong Meng Teo and Yean Seen Ng Department of Computer Science National University of Singapore 3 Science Drive 2 SINGAPORE 117543 email:
[email protected] Abstract Caching has been used for decades as an effective performance enhancing technique in computer systems. The Least Recently Used (LRU) cache replacement algorithm is a simple and widely used scheme. Proxy caching is a common approach to reduce network traffic and delay in many World Wide Web (WWW) applications. However, some characteristics of WWW workloads make LRU less attractive in proxy caching. In the recent years, several more efficient replacement algorithms have been suggested. But, these advanced algorithms require a lot of knowledge about the workloads and are generally difficult to implement. The main attraction of LRU is its simplicity. In this paper we present two modified LRU algorithms and compare their performance with the LRU. Our results indicate that the performance of the LRU algorithm can be improved substantially with very simple modifications.
1. Introduction The use of World Wide Web (WWW) has been expanding rapidly in the recent years and various techniques have been studied to make it faster. However, many Web users still experience an unacceptable response time. Two categories of
techniques have been investigated and partially deployed to reduce the response time: (i) techniques that aim at making the communication between the customer and the source of data faster (e.g., by increasing the bandwidth of the Internet), and (ii) techniques that aim at reducing network traffic, e.g., by maintaining a copy of the frequently accessed files at customer side (so called proxy caching). Caching is an effective performance enhancing technique that has been used in computer systems for decades. A cache is a medium used for temporary storage of frequently used data. Usually a proxy server is installed at client-side that may have several functions, including caching the frequently accessed data. Several proxy caches have been developed in recent years, including CERN_http [14], Harvest [14] and its successor Squid [16]. The Least Recently Used (LRU) and the Least Frequently Used (LFU) cache replacement algorithms are two simple schemes that have been widely used in computer architecture and virtual memory. However, the characteristics of Web workloads, as described in section 2 of this paper, make these two schemes less attractive in proxy caching. In the recent years, several researchers have proposed more advanced proxy caching algorithms. For instance, the Greedy Dual Size (GDS) algorithm described in [2] has shown much better performance than LRU. However, most of these algorithms are more
0-7695-1926-1/03/$17.00 (C) 2003 IEEE
complicated, need more knowledge about the workload, and consume more processor cycle. The main attraction of the LRU is its simplicity. The term “cache pollution” means that a cache contains objects that are not frequently used in the near future. One of the main weaknesses of LRU in WWW applications is that it suffers from “cold cache pollution”. The term “cold cache pollution” refers to the unpopular objects that remain in the cache. For instance, in LRU whenever a new object is inserted in the cache it is put on the top of the cache stack. If the object is not popular (e.g., it is used only once), it will take some time before it is moved down to the bottom of the stack (marked as “least recently used”) and dropped from the cache. On the other hand, in LFU cache replacement, the objects that have been previously popular (or hot) may remain in the cache for some time even if they are not popular any longer. We refer to this phenomenon as “hot cache pollution”. In this paper, we discuss how to maintain simplicity of the LRU, but at the same time improve its performance. We introduce two modified LRU algorithms to reduce the cold cache pollution: LRUDistance and Segmented LRU (SLRU). The LRUDistance algorithm places a new object at a distance D from the bottom of the cache stack (as opposed to the traditional LRU that places the new objects on top of the stack). The LRU-Distance algorithm reduces the cold cache pollution since it takes a shorter time to drop out a new object if it is not accessed again. In the SLRU algorithm, the cache is partitioned into a lower and an upper partition. An object is first placed in the lower partition, but it is promoted to the upper partition when it is qualified (e.g., when it is accessed k times). We have used trace-driven simulation to evaluate the effectiveness of these two algorithms in relieving cold cache pollution. Our results indicate that these two algorithms outperform the traditional LRU algorithm in terms of hit rate and byte hit rate. The rest of the paper is organized as following: In section 2 we discuss Web workload characteristics for three different traces. Section 3 introduces two algorithms for reducing cold cache pollution, namely LRU-Distance and partitioned cache. Performance results are presented in section 4 and we summarized our observations in section 5.
2. Characteristics of Web Workloads We selected three traces collected by different sources in this investigation. Since small and large traces exhibit different behavior, we chose a short 3day trace and two longer 30-day traces. The 3-day trace (referred to as DEC) was collected by Digital Equipment Corp [7]. The other two are 30-
day traces were downloaded from the National Laboratory for Applied Network Research, NLANR [8], and are referred to as SJ and BO respectively. These traces represent different types of workloads. The proxy server for DEC trace is a corporate proxy with heavy web traffic servicing over 10,000 clients. The original DEC trace contained four weeks of data, but we extracted three days of the requests. Unlike DEC trace, SJ trace was collected for a regional proxy server at MAE-West Exchange Point in San Jose, California [8]. Table 1 illustrates the major characteristics of these three traces.
Property Trace size Duration No. of all requests No. of distinct requests Size of all requests Size of distinct requests Average size of distinct requests Standard deviation of distinct requests
DEC Trace 262.4 MB 3 days
SJ Trace
BO Trace
292.4 MB 30 days
658.5 MB 30 days
2,133,953
2,163,985
4,716,277
857,914
960,818
2,480,375
21.1 GB
41.6 GB
81.5 GB
11.1 GB
19.3 GB
37.8 GB
12,880 Bytes
20,071 Bytes
15,241 Bytes
99,551 Bytes
215,887 Bytes
601,202 Bytes
Table 1: Major Characteristics of the Three Traces The trace profiles summarized in Table 2 conforms with the results shown by the other researchers, e.g., see [5,9,10], that Web access requests are non-uniform. This means some objects are “popular” and accessed very frequently while some others are rarely accessed As shown in Table 2, about 30% of the objects are accessed only once. This table also reveals that 45.5% of the requests are to the objects that are accessed three times or less. On the other hand, some objects are very popular and are accessed very frequently. For instance, the most popular object has been accessed 4,932 times.
0-7695-1926-1/03/$17.00 (C) 2003 IEEE
Property
Least popular Most popular
DEC Trace No. of Access Distinct frequency requests 1 633,128 2 3 4,932 4,357 4,146
Percent of All requests 29.7
110,266 39,262 1 1 1
10.3 5.5 0.2 0.2 0.2
before an unwanted object (a rarely accessed one) is moved from the top of the stack to its bottom. Thus, one of the main weaknesses of LRU is the fact that an object that is referenced rarely may remain in the cache for a long time and pollute the cache before it is dropped out. This phenomenon is referred to as ¨cold cache pollution”. This situation is very likely to occur in Web caches where the number of references to objects accessed only once is substantial (for instance it is 29.7% in the DEC trace as shown in Table 2). In this paper, we propose the following two modified LRU algorithms to reduce the cold cache pollution.
Table 2: Access Pattern of DEC Trace
3.1 LRU-Distance Algorithm DEC Trace Re-access Interval (0, 10 minutes) (10, 20 minutes) (20, 30 minutes) (30, 40 minutes) (40, 50 minutes) (50, 60 minutes) Total
No. of Re-access 585,878 98,225 52,058 37,815 29,574 24,285
Percent of all accesses 45.9 7.7 4.1 3.0 2.3 1.9
1,276,039
100
Table 3: Short-Term Temporal Locality
3. Reducing Cache Pollution The trace profile summarized in Table 3 indicates that an object’s access frequency (or popularity) is nonuniform and varies over time. For instance, in the DEC trace (shown in Table 3), 45.9% of the objects are reaccessed within the first 10 minutes after they have been brought in the cache, whereas only 1.9% of them are re-accessed later in the 5th 10 minutes interval (i.e., in the interval (50, 60]). Consequently, if the least frequently used (LFU) objects of the cache are replaced whenever the cache is full, then an object that has been previously popular (or hot) may remain in the cache even if it is not accessed any longer. Hence, using LFU may lead to “hot cache pollution”. Now, we consider LRU to replace cache objects. The traditional LRU implements a cache as a stack, where the most recently accessed object is placed at the top of the stack (Figure 1). If an object is accessed rarely, it will be gradually pushed down to the least recently used position (bottom of the stack) and then will be dropped out. However, it may take a long time
The LRU-Distance algorithm is a variant of LRU, which attempts to relieve the cold cache pollution in proxy caching. LRU-Distance tries to modify the behavior of the cache by placing new objects at a distance D from the bottom of the stack (instead of placing it at the top of the stack), as shown in Figure 1. This would allow an object that is accessed only once to be dropped out of the cache faster, giving other more useful objects a chance to be kept in the cache. The distance D is measured in terms of the number of objects from the bottom of the cache stack. For instance, if D is equal to 10 then an object that is only accessed once would be the tenth object to be removed from the cache. However, if the object is accessed for the second time it is migrated to the most recently used position in the stack. Hence, the LRU-Distance algorithm tries to reduce the cold cache pollution caused by the “onetimers” by removing the objects that are accessed only once within a shorter period of time than it would take in the traditional LRU.
3.2 Partitioned Cache The idea of partitioning a cache has been previously employed in operating systems, where one of the partitions contains data blocks and the other one is used for instructions. However, we apply the Segmented LRU (SLRU) algorithm to proxy caching in a different way. In our SLRU algorithm, a cache is divided into two partitions: an upper and a lower segment. Each segment uses LRU to replace cache objects. As depicted in Figure 2, when the upper segment is full the object at the least recently used position of the upper segment is dropped to the most recently used position of the lower segment. On the other hand, when an object is accessed “k” times in the lower segment it is promoted and is moved to the top of the upper segment.
0-7695-1926-1/03/$17.00 (C) 2003 IEEE
Cache Memory
MRU object
Most recently used object
Hit
Other objects
Upper Segment
Hit
LRU object
Miss Miss
Discarded objects
Hit
Hit
MRU object Least recently used object
Other objects
Lower Segment
LRU object
Figure 1: Logical flow of LRU
Figure 2: Logical flow of SLRU
4. Performance Results 48 47 46 BHR
In this study, we used a trace-driven simulator and compared performance of LRU, SLRU and LRUDistance algorithms in terms of hit ratio and byte hit ratio. Hit ratio (HR), is defined as the number of requests found in the cache divided by the total number of requests. Byte hit ratio (BHR) is defined as the number of bytes found in the cache divided by the total number of bytes requested. An important issue in the LRU-Distance is how to determine a reasonable value for D. For instance, Figure 3 illustrates the simulation result for various values of D. In these experiments we used the BO-trace and assumed that the cache size is 180MB (equivalent to 0.5% of the size of the total unique requests). As this figure suggests, the BHR increases to a certain level as
45 44 43 42 41 0
5000 10000 Distance from bottom of the stack
15000
Figure 3: BO Trace - BHR for Different Values of D in LRU-Distance
0-7695-1926-1/03/$17.00 (C) 2003 IEEE
the D is increased and then it declines. If D is as big as the cache size the LRU distance will be equivalent to the traditional LRU. In SLRU, an obvious question would be the size of each partition. Another issue would be the promotion count k, i.e., when should an object from the lower segment be promoted to the upper one. Figure 4 illustrates the impact of partition size on the BHR. In this experiment, we used the BO-trace and varied the size of the lower partition from 10% to 60% of the total cache size. Further more, we assumed that the promotion count was equal to one, i.e., an object that is accessed again after it has been inserted in the cache is moved to the upper segment. Figure 5 illustrates the impact of k on the BHR. Here, we used the BO-trace and assumed that both partitions are equal (i.e., the size of the lower segment is 50% of the total cache size).
45 BHR (% )
44 43 42 41 40 0.0
0.2
0.4
0.6
0.8
size of the lower segment
BHR (%)
Figure 4: BO Trace - BHR for Different Lower Segment Sizes in SLRU 52 51 51 50 50 49 49 48 48 47
c
0
2
4
6
8
10
12
promotion count (k)
assumed that in the SLRU both partitions are equal and k =1. Figure 6 and Figure 7 illustrate HR for various cache sizes using BO and SJ traces. Note that the cache size refers to the total cache size and in SLRU the two partitions are equal (i.e., the size of each partition is equal to half of the total cache size). The HR shown in these two figures can be compared with the maximum achievable HR which is 49.3% for the BO trace and 56.9% for the SJ trace (these maximum values were obtained assuming an infinite cache size). As it can be seen from these figures, performance of SLRU and LRU-Distance is very similar. The LRUDistance and SLRU algorithms perform better than the LRU algorithm. In the SJ trace, the performance improvement is up to 10% for small cache sizes of 0.5% and 1%. In Figure 7, the difference in performance between the modified LRU algorithms and the traditional LRU is greater for smaller cache sizes. Figure 8 and Figure 9 show the BHR for various cache sizes using the same two traces. The maximum achievable BHR (obtained for infinite cache size) was 57.9% for the BO trace and 57.0% for the SJ trace. All three algorithms have the same maximum BHR when the cache size is unlimited, because there is no replacement of objects. It is interesting to note in BO trace that while both proposed algorithms outperform LRU, the peaks of their performance occur at different cache sizes. The SLRU gives the best improvement at cache size of 1% (with a performance improvement of 10.1% over LRU), whereas the LRU-Distance has its peak for a cache size of 5% (with a performance improvement of 6.6% over LRU). The results also indicate that it is more difficult to improve the BHR than the HR. Both algorithms cause the cache to function as a partitioned cache and relieve cold cache pollution. The LRU-Distance algorithm makes use of a virtual partition while the SLRU algorithm makes use of a physical partition. The LRU-Distance algorithm is a simpler implementation of the partitioned cache because it only requires one parameter – the distance D. The SLRU algorithm is a more complex implementation of the partitioned cache as it requires two parameters – size of the lower partition and the promotion count.
Figure 5: BO-Trace - BHR for Various Promotion Counts in SLRU
In the next four figures we compare the performance of LRU with SLRU and LRU-Distance algorithms for the BO and SJ traces. In this set of experiments, we
0-7695-1926-1/03/$17.00 (C) 2003 IEEE
60
50 45 40
BHR (%)
HR (%)
55
35 30
50
45
25 0
2
4
6
8
10
40
Cache Size (%)
0
2
4
6
8
10
Cache Size (%)
LRU SLRU (promo 1, lp 0.5)
LRU
LRU-Distance
Figure 6: BO Trace - HR of the Three Replacement Algorithms
SLRU (promo 1, lp 0.5)
LRU-Distance
Figure 8: BO Trace - BHR of the Three replacement Algorithms
50
50
45
40 BHR (%)
HR (%)
45
35 30
40
25 0
2
4
6
8
10
Cache Size (%)
35 0
LRU
2
SLRU (promo 1, lp 0.5)
LRU-Distance
LRU
Figure 7: SJ Trace - HR of the Three Replacement Algorithms
4
6
8
10
Cache Size (%) SLRU (promo 1, lp 0.5)
LRU-Distance
Figure 9: SJ Trace - BHR of the Three Replacement Algorithms
0-7695-1926-1/03/$17.00 (C) 2003 IEEE
Group, Computer Science Dept, Blacksburg, VA, December 1999.
5. Conclusions LRU is a very simple and widely used cache replacement algorithm that suffers from cache pollution in proxy caching. In the recent years, several more efficient replacement algorithms have been suggested. But, these advanced algorithms require more knowledge about the workloads and are generally more difficult to implement. The main attraction of LRU is its simplicity. In this paper we presented two modified LRU algorithms, LRU-Distance and SLRU, to reduce cold cache pollution. We used trace-driven simulation to evaluate the performance of these two algorithms and to compare them with the traditional LRU. Our results indicate that the performance of the LRU algorithm can be performed by up to 10% with a very simple modification.
Acknowledgements This project was supported by a joint research grant from Fujitsu Computers Singapore Ltd and the National University of Singapore.
1.
A. Iuutonen, “Web Proxy Servers”, Prentice Hall, Paperback, Published December 1997.
2.
P. Cao and S. Irani, “Cost-Aware Proxy Caching Algorithm”, In Proceedings of the 1997 USENIX Symposium on Internet Technology and Systems, December 1997.
4.
Digital Equipment Cooperation, Digital Web Proxy Traces, ftp://ftp.digital.com/pub/DEC/traces/proxy/webtra ces.html.
8.
National Laboratory for Applied Network Research, Squid 2.0 Web Proxy Traces, ftp://ircache.nlanr.net/Traces/.
9.
P. Barford, A. Bestavros, A. Bradely and M. Vrovella, “Changes in Web Client Access Patterns, Characteristics and Caching Implications”, BUCS-TR-1998-023, November 1998.
10. E.J. O’Neil, P.E. O’Neil, G. Weikum, “The LRUK page replacement algorithm for database disk buffering”, Proceedings of ACM SIGMOD, 1993. 11. A. Bestavros, M. Crovella and C.Cunha, “Characteristics of WWW Client-Based Traces”, Technical Report TR-95-010, Boston University, Apr. 1995. 12. Z. Liu,, N. Niclausse, and P. Nain, “A New Efficient Caching Policy for WWW”, Proceedings of the 1998 Internet Server Performance Workshop (WISP ’98), Madison, WI, pp.119-128, June 1998.
References
3.
7.
A. Bestavros and S. Jin, “Greedydual* Web Caching Algorithm, Exploiting the Two Sources Of Temporal Locality in Web Request Streams”, In the 5th International Web Caching and Content Delivery Workshop, Lisbon, Portugal, 22-24 May 2000. M. Arlitt and J. Dilley, “Improving Proxy Cache Performance: Analysis of Three Replacement Policies”, IEEE Internet Computing, November/December 1999.
5.
“Web Server Workload Characterization: the Search for Invariant”, SIGMETRICS 96, May 1996, PA, USA.
6.
M. Abrams and R. Wooster, “Proxy Caching that estimates page load delays”, Technical Paper 24061-0106, Virginia tech Network Research
13. J. Shim, P. Scheumermann and R. Vingralek, “Proxy Cache Algorithms: Design, Implementation, and Performance”, IEEE Transactions on Knowledge and Data Enginerring, Vol. 11, No. 4, July/August 1999. 14. M. A. Luotonen, H. Frystyk and T. Berners-Lee, W3C httpd, http://www.w3.org/hypertext/WWW/Daemon/Stat us.html. 15. A. Chankhunthod, P. Danzig, C. Neerdaels, M.Schwartz and K.Worrell, “A Hierarchical Internet Object Cache”, Proceedings for the 1996 USENIX Technical onference, January 1996, San Diego, CA, USA. 16. Squid Internet Object http://www.nlanr.net/Squid.
0-7695-1926-1/03/$17.00 (C) 2003 IEEE
Cache,