A Case for Dynamic Selection of Replication and ... - Semantic Scholar

3 downloads 132022 Views 72KB Size Report
patterns. We employ an approach where the best strategy .... CLV strategy, with TTL fixed as p = 10% of ... ments hosted in our Web server from June 2002 to.
A Case for Dynamic Selection of Replication and Caching Strategies Swaminathan Sivasubramanian Guillaume Pierre Maarten van Steen Dept. of Mathematics and Computer Science Vrije Universiteit, Amsterdam, The Netherlands {swami,gpierre,steen}@cs.vu.nl

Abstract

load balancing among different servers and decreases the access latencies experienced by clients. Replication and caching strategies are being used However, if a document is updated, replicas must to reduce user perceived delay and wide area net- be updated to prevent clients accessing a stale work traffic. Numerous such strategies have been copy. proposed to manage replication while maintainMany strategies have been proposed to achieve ing consistency among the replicas. In earlier re- replication while maintaining consistency among search, we demonstrated that no single strategy replicas. A replication strategy dictates the numcan perform optimal for all documents, and pro- ber and location of replicas and the choice of a posed a system where strategies are selected on protocol governing the creation of replicas and a per-document basis using trace-driven simula- consistency enforcement. Different strategies may tion techniques. In this paper, we demonstrate the offer various levels of performance and consisneed for continuous dynamic adaptation of strate- tency, so a system designer should be careful when gies using experiments conducted on our depart- selecting a replication strategy. ment Web traces. We also propose two heuristics, We showed in earlier research that no single Simple and Transition, to perform this dynamic strategy can universally perform optimal for all adaptation with reduced simulation cost. In our documents [11]. An important gain in perforexperiments, we find that Transition heuristic re- mance can be obtained by associating each docduces simulation cost by an order of magnitude ument with the strategy that suits it best. Morewhile maintaining high accuracy in optimal strat- over, the document-to-strategy associations must egy selection. be re-evaluated from time to time, since changes in access and update patterns are likely to affect the performance characteristics of strategies. Adaptation is certainly beneficial when a Web 1 Introduction server has to react to unpredictable events such as Web users often experience slow document trans- flash crowds. However, as we also show in this pafers for Web documents. To reduce access time, per, adaptation is necessary even for servers dealmany systems replicate or cache documents at ing with relatively stable access patterns. Detectservers close to the clients. This process allows ing and handling drastic change in access patterns, 1

such as flash crowds, has been addressed in [7] and is not the focus of this paper. In this paper, we focus on adaptation for relatively stable access patterns. We employ an approach where the best strategy for each document is periodically selected among a set of candidate strategies. The choice of “best” strategy is made by simulating the performance that each strategy would have provided in the recent past. If necessary, the strategy for the concerned document is switched dynamically. Each server adaptation requires d · s simulations, where d is the number of hosted documents and s is the number of candidate strategies. In our current system, the simulation of a single strategy takes several tens of milliseconds on a 1-GHz PIII machine. Considering that we apply traditional and well-known trace-driven simulation techniques, we expect comparable performance for systems similar to ours. Each adaptation may thus lead to significant computational load if the number of objects or the number of candidate strategies is high. In particular, the latter can happen if we want to integrate parameterized strategies. Such strategies have a tunable parameter affecting their behavior, such as a “time-tolive” value or a “number of replicas.” The contributions of this paper are as follows: (i) we demonstrate the need for continuous dynamic adaptation of replication strategies for Web documents; and (ii) we present techniques for the selection of an optimal replication strategy from a number of candidate strategies, while keeping the selection costs low. None of these contributions are reported in our earlier works [9, 11]. Regarding the first contribution, we have demonstrated the need for selecting strategies on a per-document basis in our earlier research [11]. However, to the best of our knowledge no study has been reported that quantitatively shows the need for a continuous dynamic adaptation of strategy for Web documents. In this paper, we quantitatively demonstrate this need by experiments performed with our department server’s Web traces.

In our second contribution, we present techniques that perform strategy selection at reduced simulation cost. Our approach consists of reducing the number of candidate strategies that must be simulated for each document. We propose two different heuristics for strategy selection. The first one, called Simple, re-evaluates strategies only for documents whose access or update patterns have significantly changed. The second heuristic, called Transition, tries to predict plausible transitions to alternative strategies and evaluates only the most promising strategies. It does so by building a weighted graph where nodes are the candidate strategies and weights are the number of observed transitions between strategies. The Transition heuristic uses this transition graph to select and evaluate only a subset of strategies that are likely to perform well. We evaluate the accuracy and speedup offered by the two heuristics based on our traces. The results show that Transition performs better than Simple both in terms of speedup and accuracy. In our experiments, Transition provides a speedup of 12, while making correct decisions in 96% of the cases. The rest of the paper is organized as follows: Section 2 describes our evaluation methodology. Section 3 demonstrates the need for continuous dynamic adaptation of replication strategies using our Web traces. Sections 4 discusses our two strategy selection heuristics and present their performance evaluation. Section 5 discusses the related work and Section 6 concludes the paper.

2 Evaluation methodology We set up an experiment that simulates a system that is capable of switching its strategies dynamically for changes in the access and update patterns of its documents. Our experiment consists of collecting access and update traces from our department Web server, simulating different strategies and selecting the best one for a given period. With 2

this setup, we observe the changes in the strategy adopted by the documents over the entire length of the traces. In this section, we present our simulation model in detail.

x servers sorted based on the total number of clients handled by them) for the document. When the document is updated, the primary server pushes update to the intermediate servers. Different strategies are derived for x = 5, 10, 25, ..., 50.

2.1 Simulation Model

• Hybrid[x] (SU[x] + CLV[10]) - The primary server maintains copies at the x most popular servers, where x = 10, 15, 20, 25, 30, 40, 50 and the other intermediate servers follow CLV strategy, with T T L fixed as p = 10% of the age of the document.

We assume that documents have a single source of update called the primary server, which is responsible for sending updates to replicas located at intermediate servers. For the sake of simplicity, we consider only static documents in our evaluations, that is documents that change only due to updates by the primary server. We do not make a distinction between replication and caching strategies. We perceive them to be different only in the way replicas are created: Caching strategies create a document copy only when it is accessed by at least one client, while replication strategies pro-actively create replicas. In our experiments, we use a list of 30 strategies to choose from: • •





In our simulations, we group clients based on the Autonomous Systems (AS) to which they belong. We measure the available network bandwidth between the AS belonging to the primary server and other ASes as follows: We record the time t taken by our server to serve a document of b bytes to a client from an AS. We approximate the bandwidth between our AS and the AS to which the client belongs to as b/t. We redirect the requests of a client to the replica NR: No replication. located in the client’s AS. If there is no such CLV[p] (Cache with Limited Validation): In- replica available, then the requests are redirected termediate servers cache documents for a to the primary server. We understand that in the given time. Each document has a validity latter case, clients could be redirected to a closer time (TTL) after which it is removed from the server instead of the primary server to obtain a cache. Different strategies are derived with better performance. However, simulating such a TTL value fixed at p = 5%, 10%, 15%, and request routing policy requires inter-AS network 20% of the age of document. This family of measurements to decide which replica is closer to strategies was first used in the Alex file sys- a given client. We could not adopt this policy in tem [3]. our simulations due to the lack of inter-AS network measurements. SI (Server Invalidation): Intermediate servers cache the document for an unlimited time. The primary server invalidate the copies 2.2 Adaptation Mechanisms when a document is updated. This strategy has been used in AFS[13] and variants with The relative performance of various strategies is compared using a cost function. It is designed to leases in [2] and [5]. capture the inherent tradeoff between the perforSU[x] (Server Updates): The primary server mance gain by replication to performance loss due for a document maintains copies at the x to consistency enforcement. In our experiments, most popular intermediate servers (the top we use a cost function that takes three parame3

ters: (i) access latency, l, (ii) number of stale documents returned, c and (iii) network overhead, b, that is the bandwidth used by the primary server for maintaining consistency and serving clients from ASes without replicas. The cost function for a strategy s during a given period of time t is:

Table 1: Trace Characeteristics Number of days Number of GET requests Number of objects Number of updates Number of unique clients Number of different ASes

273 78,049,912 2,185,896 156,721 1,252,779 2853

cost(t, s) = w1 ∗ l + w2 ∗ c + w3 ∗ b

3 The Need for Dynamic Adaptation

where w1 , w2 and w3 are constants determining the relative weight of each metric. The performance of a strategy s, during a given period of time t, is represented by the value of the cost function cost(t, s). This value is obtained by simulation of strategy s with past traces. Each simulation outputs the values of latency, percentage of stale documents, and network overhead, obtained for a given strategy, from which the cost value is derived.

In this section, we demonstrate the need for continuous dynamic adaptation of replication strategies for Web documents. To do so, we measure the proportion of documents that change their strategy over the duration of the traces, determined by Adaptation Percentage (AP). AP is defined as the ratio of the total number of adaptations observed and the total number of possible adaptations. For example, in a 7 day period, if a document makes 2 adaptations out of the maximum possible 6 adaptations (as the adaptation period is fixed to be one day), then its AP is 33%. This metric shows the rate at which an object switches its strategy dynamically. A higher value of AP implies that the objects dynamically switch their strategy more often thereby showing a clear need for continuous adaptation. We performed a complete evaluation of the popular documents (those receiving at least 100 requests a week) in our traces and plotted the average AP over all documents (aggregated every week). The results are given in Figure 1. As seen in the figure, AP varies from 5% (objects switch their strategy only 5% of the times) to 50% (objects switch their strategy half of the time). This figure shows that the objects switch their strategy often, though the rate of adaptation varies over a period of time. This clearly demonstrates the need for continuous dynamic adaptation of strategies, if the adopted strategy for an object

To select replication strategies, the primary server periodically evaluates the performance of a set of candidate strategies for each document with traces collected during the previous adaptation period. Based on the evaluation, the primary server selects the best strategy for an object defined as the one that had the smallest cost in the last period. Finally, the server dynamically changes the strategy for the object, if required. Implementation details on how strategies can be switched on the fly are available in [10]. The Web trace used in the experiments covers the requests and updates made to the documents hosted in our Web server from June 2002 to March 2003. Numerical details about the trace are given in Table 1. We perform our evaluations only for objects that receive more than 100 requests a week, reducing the total number of objects to be evaluated in the order of thousands. We adopt the no replication (NR) strategy for the rest. We fix the adaptation period of the server to one day. 4

60

Table 2: Performance of Non-adaptive and Adaptive selection methods given in terms of (i) Total Client latency (TCL) for all requests, (ii) number of stale documents delivered to clients (NS) and (iii) network usage bandwidth (BW) Strategy TCL(hrs) NS BW (GB) Non-adaptive 37.5 11 13.1 Adaptive 31.3 3 12.7

Average AP

50 40 30 20 10 0 0

4

8

12

16

20

24

28

32

Time (weeks)

method outperforms its Non-adaptive counterpart, Figure 1: Time plot of AP of documents (aggre- according to all metrics, because of its ability to gated every week) for the entire length of traces adapt to changes in client request and document update patterns. From these simple experiments performed with our traces, it can be seen that documents need to needs to remain optimal. continuously adapt strategies to maintain optimal To elucidate more about the reason for varyperformance, even for a simple Web server like ing AP, we performed a detailed study on a 9-day ours. Hence, we believe that continuous dynamic traces (starting June 17, 2002) comprising signifiadaptation will be more beneficial for replicated cant change in request patterns. For this time peWeb services handling more clients and hosting riod, we plotted the number of requests and AP higher number of popular objects. of the popular documents. The results are given in Figure 2. As seen in the figure, AP drastically increases for a drastic change in the number of requests but stabilizes when the number of requests 4 Strategy Selection Heuristics become stable as well. As it turned out, documents needed to be replicated more, in order to maintain In this section, we propose two selection heurisadequate performance as client demand increased. tics, Simple and Transition, that aim to reduce the After some time, the proper adaptations had been simulation cost by reducing the number of candidate strategies evaluated during the selection promade, leading to a drop in the AP. cess. We further evaluated the need for continuous dynamic adaptation by comparing the performance of a system that would associate a strat- 4.1 Performance Evaluation Metrics egy to each document once and never adapt (Nonadaptive), to one which would periodically adapt To evaluate the performance of strategy selection to the current-best strategy associations (Adap- heuristics, we compared the selection accuracy tive). Table 2 shows the performance of the two se- and computational gain in comparison to the full lection methods for the same time period in terms evaluation method (evaluating all candidate strateof the three cost function parameters: client la- gies every period). The goal of a selection heuristency, number of stale documents and bandwidth tic is to provide the same decisions as the full evalusage. It can be seen that the Adaptive selection uation strategy, but with less simulation cost. We 5

(a)

(b)

55000 45000

Average AP

No. of Requests

50000 40000 35000 30000 25000 20000 15000 0

2

4

6

8

100 90 80 70 60 50 40 30 20 10 0 0

Time (days)

2

4 Time (days)

6

8

Figure 2: (a) Number of Web requests per day for the 9 day period staring June 17,2002 and (b) AP for web documents in our server for the same traces

4.2 Simple Heuristic

evaluate the performance of selection heuristics with the following metrics:

The basic assumption behind the Simple heuristic is that if the request and update patterns have not significantly changed over the past adaptation period, then the strategy that was the best for the past adaptation period will remain optimal for the current period also. The strategy works as follows: if the current request rate or update rate is within a cut-off threshold of x% from the values of the previous period, then the system retains the same strategy. Otherwise, it performs a full evaluation of all candidate strategies. Obviously, the higher the value of x, the greater the speedup but at the price of lower accuracy. We evaluated the performance of this simple heuristic with different values of x and the results are given in Table 3. As expected, the accuracy of the simple heuristic decreases, when x increases. An accuracy of 94% is obtained with x = 5 with just very little gain in speedup. For higher values of x, in addition to poor accuracy, the AWCR is also high, indicating that the selected strategy is far from optimal. Thus, this simple idea of reducing simulation cost yields neither significant gain in speedup nor good accuracy.

• Speedup: This is defined as the ratio of total number of candidate strategies evaluated by the full evaluation method to that of the selection heuristic 1 . • Accuracy: This is defined as the percentage of times when the heuristic has selected the same strategy as the full evaluation strategy. • Average Worst Case Ratio (AWCR): This metric indicates how bad a strategy is when the heuristic has made a non-optimal selection. The AWCR of a heuristic is computed as the average of WCRs for all nonoptimal selections in an adaptation period, where WCR is defined as follows:

WCR(t) =

|(cost(selected,t) − cost(best,t))| |(cost(worst,t) − cost(best,t))|

AWCR can vary from 0 to 1. The greater its value, the worse is its choice of strategy. 1 This metric is a rough indicator of the speedup measured in time scale

6

The accuracy of this heuristic depends on two factors: (i) the value of y and (ii) the duration of phase 1.

Table 3: Performance of simple heuristic for different values of the cut-off threshold. Threshold 5% 10% 20% 25%

Accuracy 94% 90% 79% 73%

Speedup 1.2 1.4 1.7 2.1

AWCR 0.11 0.18 0.205 0.29

4.3.1 Effect of y

We evaluated the accuracy and speedup of this heuristic for different values of y with a transition graph built from a week long traces (i.e., the length of the first phase is 7 days). The results are presented in Table 4. As could be expected, 4.3 Transition Heuristic the speedup of the heuristic increases when y increases, since less strategies are evaluated. HowThe Transition heuristic tries to predict the likely ever, at the same time, the accuracy and AWCR instrategy transitions and evaluates only the most crease. When y = 10%, we obtain speedup of 12, promising strategies. It aims to gain speedup by with only a little loss in accuracy. Furthermore, evaluating only this subset of likely strategies in- even in the 4.5% cases of non-optimal selections, stead of the entire set of candidate strategies. the selected solutions are quite close to the optiThe basic intuition behind the transition heuris- mal, as indicated by an AWCR of 0.03 (please retic is to predict and evaluate only the strategies to call that AWCR is calculated only for non-optimal which a document is likely to switch from the cur- selections). These results are much more satrent strategy. Obviously, the size and constituents isfying than the Simple heuristic. We conclude of the selected subset of strategies to evaluate de- that, for our traces, evaluating strategies based on termines the speedup and accuracy of the heuris- their past transition patterns is a good solution that tic. The smaller the size of this subset, the higher yields high speedup with high accuracy. We obwill be the speedup. On the other hand, the heuris- served that the full evaluation method takes about tic will find the optimal strategy only if it belongs 55 minutes, in a 1 GHz PIII machine, to evaluto the subset of evaluated strategies. ate and select strategies for the objects hosted by The Transition heuristic works in two phases. our department server, while the Transition heurisIn the first phase, full evaluations are performed tic takes approximately 5 minutes. Thus, Tranon the traces to build the transition graph. This sition strategy drastically reduces the simulation graph captures the history of transitions between cost and makes the cost of continuous dynamic different strategies. It is a weighted directed graph, adaptation low. whose nodes represent the candidate strategies and weights are the number of observed transitions be4.3.2 Effect of the length of Phase 1 tween strategies. No speedup is gained in this phase as full evaluations are performed. The next question we addressed is: How long The second phase of the heuristic uses the tran- should phase 1 last? Recall that no speedup is sition graph built during the first phase to deter- gained during this phase in which the transition mine the likely subset of strategies that need to graph is built. Hence, it is preferable to have a be evaluated to perform strategy selection. This small phase 1 period. However, it must be long subset is determined as the set of target strategies enough to build enough transition data, to perform whose estimated transition probability is greater selections with high accuracy in phase 2. We evaluated the accuracy of the heuristic for than y%. 7

graph built from one week. Hence, we feel that Table 4: Performance of Transition for different this transition graph needs to be rebuilt only rarely. values of y One of the shortcomings of this heuristic is its inability to handle emergencies like flash crowds y% Accuracy Speedup AWCR as the pre-built transition graph does not cover 5% 97.8% 8 0.01 such drastic changes in access patterns. Such sce10% 95.5% 12 0.03 narios might call for fast detection of the onset of 15% 93.4% 13 0.06 emergencies (also determining the access patterns 20% 92.2% 15 0.08 25% 88% 15.5 0.10 of emergency) and triggering a strategy selection mechanism that evaluates only a small set of candidate strategies that can perform well in the given scenario. Table 5: Length of phase 1 vs. Accuracy No. of days 5 7 9 14

Accuracy 90.7% 95.5% 96% 96%

AWCR 0.09 0.03 0.03 0.03

5 Related work

A large number of proposals have been made in the past to improve the quality of Web services. Cache consistency protocols such as Alex [3] and TTL policies aim to improve the scalability of the Web. Another caching strategy is the invalidadifferent durations of phase 1 with y fixed at 10%. tion, which can maintain strong consistency at relFor example, a transition graph built out of a 3-day atively low cost in terms of delay and traffic [2]. phase 1 will be fed with details of 2 transitions for Invalidations can also be propagated through a hieach document (as the adaptation period is fixed erarchy of caches [14] or multicast [15]. as 1 day). Several replication strategies have been proTable 5 shows the accuracy of the heuristic for posed in the past. Radar uses a dynamic different durations of phase 1. As seen from the ta- replication protocol that allows dynamic creble, a transition graph built out of just 5-day traces ation/deletion of replicas based on the clients’ acfor each object already leads to a high accuracy. cess patterns [12]. In [8], the authors propose An increase in the size of transition graph leads to replication protocols that determine the number better accuracy, however the gain stabilizes around and location of replicas to reduce the access la7 days. For 7 days, we have measured a total of 6 tency while taking the server’s storage constraints adaptations over thousands of documents, thereby into account. covering thousands of adaptations. Hence, a sufAll these systems adopt a single strategy or sinficient amount of transition data is obtained in a gle family of strategies for all documents, possismall period thereby providing good accuracy. bly with a tunable parameter. However, our earlier work advocated the simultaneous use of multiple strategies. 4.3.3 Discussion A number of systems select strategies on a perAn important issue in using Transition is to deter- document basis. In [1], the authors propose a promine how often to rebuild the transition graph. In tocol that dynamically adapts between variants of our experiments, we did not observe a degradation push and pull strategies on a per-document basis. in accuracy for 9 month traces with a transition Similarly, in [5] proposes an adaptive lease proto8

col that switches dynamically between push and pull strategies. Finally, in [6], the author propose a protocol that chooses between different consistency mechanisms, invalidation or (update) propagation, on a per-document basis, based on the document’s past access and update patterns. These protocols perform a per-document strategy selection similar to ours but they are inherently limited to a small set of single family of strategies (e.g., push or pull, invalidation or propagation) and cannot incorporate different families of strategies as done in our system.

6 Conclusions Work

and

group objects with similar access and update patterns. Such a grouping will reduce the number of simulations performed as the simulations are performed on a per-group basis instead of perdocument basis. We are planning to investigate the use of existing object clustering schemes such as [4]. Furthermore, we are also investigating schemes to handle emergency situations such as flash crowds in our system.

References [1] M. Bhide, P. Deolasee, A. Katkar, A. Panchbudhe, K. Ramamritham, and P. Shenoy, Adaptive PushPull: Disseminating Dynamic Web Data , IEEE Transactions on Computers 51 (2002), no. 6, 652– 668. [2] P. Cao and C. Liu, Maintaining Strong Cache Consistency in the World Wide Web, IEEE Transactions on Computers 47 (1998), no. 4, 445–457. [3] V. Cate, Alex – A Global File System, File Systems Workshop (Berkeley, CA), USENIX, USENIX, May 1992, pp. 1–11. [4] Y. Chen, L. Qiu, W. Chen, L. Nguyen, and R. H. Katz, Clustering Web Content for Efficient Replication, 10th International Conference on Network Protocols (Los Alamitos, CA.), IEEE Computer Society Press, November 2002. [5] V. Duvvuri, P. Shenoy, and R. Tewari, Adaptive Leases: A Strong Consistency Mechanism for the World Wide Web, 19th INFOCOM Conference, IEEE Computer Society Press, March 2000, pp. 834–843. [6] Z. Fei, A Novel Approach to Managing Consistency in Content Distribution Networks, 6th Web Caching Workshop, June 2001, pp. 71–86. [7] J. Jung, B. Krishnamurthy, and M. Rabinovich, Flash Crowds and Denial of Service Attacks: Characterization and Implications for CDNs and Web Sites, 11th International World Wide Web Conference, May 2002, pp. 252–262. [8] J. Kangasharju, James Roberts, and K.W. Ross, Object Replication Strategies in Content Distribution Networks, 6th Web Caching Workshop, June 2001.

Future

The need for dynamically selecting a strategy on a per-document basis was shown in our earlier research. In this paper, we demonstrate the need for continuous dynamic adaptation of strategies with experiments performed on the traces of our department Web server. We find that continuous adaptation is beneficial even for seemingly relative stable access patterns. As a second contribution of this paper, we proposed two heuristics to perform this continuous adaptation with reduced simulation cost. The first heuristic, called Simple, performs a full evaluation only if the access and update patterns of a document have significantly changed. Otherwise, it retains the current strategy. The second heuristic, called Transition, evaluates only a subset of candidate strategies. This subset is determined based on the past transition behavior of the documents. In our experiments, we find that the Transition heuristic performs better than its simple counterpart, both in terms of accuracy and speedup. We conclude that, in our traces, evaluating strategies based on their past transition patterns is a good solution that yields high speedup with high accuracy. Another way of reducing simulation overhead would be to perform object clustering, i.e., to 9

[9] G. Pierre, I. Kuz, M. van Steen, and A.S. Tanenbaum, Differentiated Strategies for Replicating Web Documents, Computer Communications 24 (2001), no. 2, 232–240. [10] G. Pierre and M. van Steen, Design and implementation of a user-centered content delivery network, Proceedings of the Third IEEE Workshop on Internet Applications, June 2003. [11] G. Pierre, M. van Steen, and A.S. Tanenbaum, Dynamically Selecting Optimal Distribution Strategies for Web Documents, IEEE Transactions on Computers 51 (2002), no. 6, 637–651. [12] M. Rabinovich and A. Aggarwal, Radar: A Scalable Architecture for a Global Web Hosting Service, Computer Networks 31 (1999), no. 11–16, 1545–1561. [13] M. Satyanarayanan, Scalable, Secure, and Highly Available Distributed File Access, Computer 23 (1990), no. 5, 9–21. [14] J. Yin, L. Alvisi, M. Dahlin, and C. Lin, Hierarchical cache consistency in a WAN, USENIX Symposium on Internet Technologies and Systems, 1999. [15] H. Yu, L. Breslau, and S. Shenker, A Scalable Web Cache Consistency Architecture, SIGCOMM (Cambridge, MA), ACM Press, August 1999, pp. 163–174.

10

Suggest Documents