Essence of an Effective Web Caching Algorithm
Annie P. Foong1, Yu-Hen Hu and Dennis M. Heisey2 Department of Electrical and Computer Engineering University of Wisconsin 1415 Engineering Drive Madison, WI 53706, USA E-mail: foong,
[email protected],
[email protected]
Abstract. We introduced the concept of a cache demand function and identified the size and lifespan of an object as important components of any web caching strategy. Due to the dynamic nature of web accesses, the strategy must also be adaptive. A web-caching agent needs to acquire knowledge about the objects it encounters, and deduce an effective strategy dynamically. A Logistic Regression model is used for this purpose. Results (based on object and byte hit ratios) on trace driven simulations of six web and proxy servers confirm our hypotheses.
Keywords. Caching algorithms, admittance, eviction, demand function, logistic regression, adaptive, web model.
1
Foong is currently with the Server Architecture Lab at Intel Corp. This work was done when the author was a student at the University of Wisconsin. 2 Heisey is with the Department of Surgery, University of Wisconsin.
1
INTRODUCTION
Web caching offers an effective way to improve Web performance using current technologies. Caches can dramatically reduce response times and relieve traffic congestion. The wave of commercial cache products is testimony to their importance [Strom99]. However, the common trend in these products is to use high performance hardware and large memories to achieve their goals. Although this had allowed rapid development of products, such approaches aim to find quick cures for symptoms, without understanding the underlying issues. A web cache has no obligation to cache all incoming objects at least once. This unique web architecture creates a cache admittance strategy, versus a strictly replacement one. In this paper, we offer a brief analysis of the web cache admittance problem. Even with perfect knowledge of web accesses, an optimal solution in terms of either object hit ratio or byte hit ratio is NP-hard [Hosseinni97]. We propose two effective heuristic rules that often lead to nearly optimal solution. For the practical application where future accesses are unknown, we employed a Logistic Regression (LR) model [Foong99a] to predict the likelihood of re-access of an object in consideration. Initial simulation results on several real-world traces are every encouraging. 2
THE CACHING PROBLEM
A cache essentially makes 3 decisions: admittance, eviction and pre-fetch (Figure 1). We shall not discuss pre-fetch here, and refer interested readers to [Foong99b]. Notation: Let ok denote the object requested at event k, such that a request stream of K requests is given by ρ = {o1,o2… oK}. Ck is the state of cache on arrival of object ok; S, the size of cache; sk, the size of object; ck, the cost of object. Further, let pk be the index of previous access of ok, where pk=k for first access; and nk be the index of next access, where nk= k for last access. The lifespan, Lk, of an object is the number of arrivals up to the next access of the object, i.e., Lk = nk - k = k - pk (Figure 2). The caching problem becomes one of selecting a cache strategy, consisting of {(Ak, Ek); 1 ≤ k ≤ K}, such that Ck - 1 + Ak - Ek if ok ∉ Ck - 1 Ck = if ok ∈ Ck - 1 Ck - 1 holds, and the cost function, F (Α, ρ, S) = F =
K
∑f k =1
k
ck
is minimized subjected to the cache capacity restriction
∑ sk ∈Ck
≤ S
ok
where 0 if ok ∈ Ck fk = 1 if ok ∉ Ck The total cost is dependent on the cache’s goal. A web cache has two major goals: Reduce user response times by improving hit ratios; and/or reduce network traffic by improving byte hit ratios
[Foong99b]. We shall assume that hits incur zero costs. Faults incur a 1-unit cost when optimizing for object hit ratios, and sk-unit cost for byte hit ratios. We divide the strategy into 2 policies: Admittance policy. Given a request sequence ρ, for each object ok (1≤ k ≤ K), if nk > k, Ak = {ok}; otherwise, Ak = ∅ (strict admittance). Heuristic 1: Do not admit an object with an infinite lifespan (regardless of cache size constraints). This reduces turnover and reserves resources for more cacheworthy objects. For practical purposes, an object is considered to have an infinite lifespan if its next access is larger than the last event in a trace, or its known or expected expiry time. Researchers who have used a size threshold [Abrams95, Williams96] in their replacement policies are using a variation of a cache admittance policy. Eviction policy. The cost function can be further divided into two parts: F=
∑
pk = k
ck +
∑f
pk < k
k
ck =
Fcompulsory + Fcapacity
where Fcompulsory is the cost of first-time accesses and cannot be reduced without pre-fetch. In this paper, we seek to minimize Fcapacity. We introduce the term aggressive eviction that implements the following: (i) the current object ok can be included in the eviction set, i.e. Ek ∩ ok ≥ ∅; (ii) at event k, for any object oj ∈ Ck, if ok = oj (i.e. a hit) and nk = k (i.e. object not re-accessed), oj will be evicted. We note that if an object is evicted within the interval [k, nk], a fault will be created. Therefore, there is a one-to-one correspondence between a capacity fault and its lifespan. To simplify analysis, we are concerned only with successive accesses of an object. This observation gives rise to Heuristic 2: Keep objects with small lifespans, thereby reducing the chance of eviction before use. In addition, we introduce the concept of a cache demand (resource) function (Figure 3) to account for the varying sizes of web objects. A cache of size S, and serving K requests has a resource of SK units. An object of size sk and lifespan Lk, has a cache demand of skLk units. • To improve object hit ratios, we want to fit as many of such demand units into a given cache resource. Together with heuristic 2, the simplest heuristic to adopt is Heuristic 3a: Keep objects with the smallest skLk (weighted lifespan) units. • Improving byte hit ratios is more complicated. Since byte faults incur a cost proportional s k, we want to evict objects with small costs (sk). This conflicts with the heuristic to evict objects with large weighted lifespans. As such, for the purposes of byte hit ratios, use Heuristic 3b: Keep objects with small lifespans. In the event of a tie, the object with the smaller size will be evicted. Our proposed heuristics reduce to the well-known Longest-Forward Distance (LFD) [Belady66] for fixed size objects. In fact, it is shown that the strict admittance/aggressive eviction variation can lead to more optimal performance [Foong99b]. In reality, object lifespans must be estimated. The performance of any caching algorithm therefore depends on 2 factors: a. if it used a lifespan or weighted lifespan in its policy b. if an accurate estimate is made of its lifespan. For example, one suggested policy evicts objects with the largest size [Abrams95, Williams96]. This is equivalent to half of our weighted lifespan policy. It performs relatively well for object hits,
but poorly for byte hits, which is expected from what we have discussed. Another suggestion was to use LRU with a size limit [Abrams95]. Here, the last access time is used as a predictor of lifespan. Together with the size limitation, this is similar to implementing a weighted lifespan policy. Since the size of an object is known once it is retrieved, the major problem in implementing an effective strategy becomes one of accurately predicting lifespans. 3
PREDICTING LIFESPANS
The Logistic Regression (LR) model has been widely used by bio-statisticians to model the outcome of disease development (event) as a function of suspected risk factors (predictors). Our goal is to express the outcome of the dependent variable Y, in terms of its predictors, (1, X 1,..., Xk) and their respective coefficients (β0,β1,...,βk). The LR probability is given by: PLR = P(Y=1 or 0 | 1, X1,..., Xk) = where z =
1 1 + exp( − z )
Equation (1)
k
∑βX j
j
-∞ < z < +∞
j=0
Given a set of observed data, the coefficients, β‘s is obtained using the method of maximum likelihood estimation [Hosmer89] (learning phase). Once the coefficients are determined, the LR probability can be calculated for other objects (prediction phase) (Figure 4). LR analyses are available in most commercially available statistical packages (e.g. SAS). In our context, the event of interest is whether a document is re-accessed at least once in the next N accesses. We have chosen predictors based on four proposed locality of web reference: temporal, spatial, functional and contextual. The LR analysis provides us with the significant predictors of web re-accesses (events). Once this is known, we can calculate the probability of re-access of each object when given the set of predictors. The lifespan of an object is inversely proportional to the LR probability. We refer readers to [Foong99a, Foong99b] for details on locality and LR implementation. 4
RESULTS
We used web traces from 3 servers and 3 proxies to drive our cache simulator. The predictors used in our experiments include: • time since last access (SINCE) and number of accesses of same object (BHITS) in a backward window • size (SIZE) and type (TYPE1=html/text, TYPE2=images, TYPE3=others) of object • number of hypertext links (NUM_LINKS), images (NUM_IMAGES) and keywords that match a popular word list (NUM_KEYWORDS) on a html page.
Results of the LR model are given in (Table 1). There is a common trend for SINCE (basis of LRU) and BHITS (basis of LFU) to be significant predictors of re-access. Other predictors vary for different sites. Even before we tackle the caching problem, we must first determined the effectiveness of the LR model as a predictive tool. We found that predicted lifespans correlate positively (between 0.42 – 0.55) with actual lifespans. In other words, objects with large predicted lifespans have large actual lifespans, and vice versa. Since we require only a relative ranking of lifespans to implement the cache algorithm, a positive correlation is sufficient. This occurs for all six traces, and the correlation is statistically significant (p < 0.05). Finally, we implement caching algorithms based on the LR model and compare their performances to the more common algorithms (Table 2). For our LR-based algorithms (LR-LIFESPAN and LRLSIZE), objects with lifespans beyond the end of the trace will not be admitted. To make way for newly arriving objects, the object with the largest lifespan or weighted lifespan will be evicted. We also included the performance of algorithms with a priori knowledge (LIFESPAN, LSIZE) to see the effects of perfect prediction. Object hit ratios are presented in ( Figure 5) and byte hit ratios in (Figure 6). The simulation were carried out on cache sizes that were 1%, 5%, 10% and 20% relative to an “infinite” cache size, i.e. the minimum cache size that produces no faults. For the a priori algorithms, LSIZE performs much better than LIFESPAN for object hit ratios. Among the predictive algorithms, LR-LSIZE is the best performer in object hit ratios. It also performs better than SIZE. This confirms that both lifespan and size must be taken into consideration simultaneously. LIFESPAN and LR-LIFESPAN have the best performance for byte hit ratios. Our strict admittance algorithms also have the added benefit of a low turnover (results not reported here). The selective nature of the algorithm reduces the need for constant cache maintenance. 5
FUTURE WORK AND CONCLUSION
We have identified the importance of including both lifespan and size of a web object in maximizing hit ratios. We formalized the ideas of strict admittance and aggressive eviction that are better suited to a web caching architecture. We showed the effectiveness of a simple LR model in predicting object lifespans. The existence of different predictors of web access suggests that a cache algorithm must be adaptive to different sites. We want to investigate more into predictors relating to contextual locality in future models. This will enable us to engage in what we hope to introduce as content-aware caching. Indeed, more sophisticated models can be pursued for better prediction. In addition, heuristics for fitting cache demand units to a given cache resource can also be improved. We have investigated but a small area in web caching. Specifically, we have made decisions based only on what a single cache sees. The effects of other caches are unknown. Our hope is that by understanding single caches, we lay the foundation for understanding the more complex relationships among cooperating caches. ACKNOWLEDGMENT We would like to thank Mike Gerdts, Darrell Schulte, Jinah Yun-Mitchell and Duane Wessels, for providing us with the web traces used in this study.
REFERENCES 1.
[Abrams95] M. Abrams, C.R. Standridge, G. Abdulla, S. Williams and E.A. Fox, “Caching Proxies: Limitations and Potentials,” Virginia Tech Report, TR-95-12, 1995.
2.
[Belady66] L.A. Belady, “A study of replacement algorithms for a virtual-storage computer,” IBM Systems Journal, vol. 5(2), 1966, pp. 78-101.
3.
[Foong99a] A.P. Foong, Y. Hu and D.M. Heisey, “Logistic regression in an adaptive web cache,” IEEE Internet Computing, Sept 1999, pp 27-36.
4.
[Foong99b] A.P. Foong, “An adaptive cache strategy for the WWW,” Ph.D. thesis, University of WisconsinMadison, Dec 1999.
5.
[Hosmer89] D.W. Hosmer and S. Lemeshow, Applied Logistic Regression, John Wiley & Sons, New York, 1989.
6.
[Hossenini97] S. Hosseini, “Investigation of Generalized Caching,” Ph.D. thesis, Washington University, 1997.
7.
[Strom99] D. Strom, “The caching question,” Internet World, Sept 1999.
8.
[Williams96] S. Williams, M. Abrams, C.R. Standridge, G. Abdulla and E.A. Fox, “Removal Policies in Network Caches for WWW Documents,” in Proceedings of ACM SIGCOMM 96, Stanford, Aug 1996.
ρpre: Prefetch E:Evict A: Admit
(Faults) Cache Agent
cache
Non-cache
(Hits) ρ: request stream
client
Figure 1 Actions taken by a cache
A
B
C
D
C
lifespan of A = 5
Figure 2 Lifespan of object
A
A lifespan of A = 1
size
S
D
B ok
sk
C
D K
lifespan
Time
Figure 3 Cache Demand Function
Trace Extract size, type, since, bhits
Pre-process Play-back
Extract keywords, number of links & images, secondary objects
Parse HTML
Simulator
LR learning Coefficients
Figure 4 Experimental setup
WWW
Performance Metrics
Trace BIO CAES WHY
CAEP
SJP
LJP
Coefficients associated with predictors 0.92 x BHITS -5.0e-6 x SINCE 0.37 x BHITS -7.8e-7 x SINCE -5.0e-5 x SIZE 2.02 x BHITS -2.4e-6 x SINCE 2.0e-5 x SIZE 1.12 x TYPE1 0.07 x BHITS -1.6e-6 x SINCE 1.17 x TYPE1 0.011 x NUM_IMAGES 0.63 x BHITS -1.5e-3 x SINCE -0.67 x TYPE2 0.45 x BHITS -1.5e-9 x SINCE
p-value 0.0001 0.0001 0.1 0.0001 0.1 0.0001 0.0006 0.08 0.0001 0.0003 0.0001 0.0004 0.03 0.0001 0.0001 0.01 0.006 0.0001
* If β>0, the predictor increases the probability of the event; if β