sends a list of objects to prefetch to a client and the client issues prefetch ...... a script to execute on a link's onClick event that checks this database before ...
A Non-interfering Deployable Web Prefetching System Ravi Kokku Praveen Yalagandula Arun Venkataramani Mike Dahlin Department of Computer Sciences, University of Texas at Austin {rkoku, ypraveen, arun, dahlin}@cs.utexas.edu
Abstract
spare resources on the servers and the network and (2) is deployable with no modifications to the HTTP protocol and the existing infrastructure. To avoid interference, Prefetch-Nice monitors the server load externally and tunes the prefetch aggressiveness of the clients accordingly, it uses TCP-Nice, a network-level solution specifically designed for low-priority data transfer [52], and it uses a set of heuristics to control the resource usage on the clients. To work with existing infrastructure, Prefetch-Nice is implemented by modifying html pages to include javascript code that issues prefetch requests and by augmenting the server infrastructure with several simple modules that require no knowledge of or modifications to the existing servers. We evaluate the prototype implementation under a trace based workload and find that on a WAN with a client on cable modem, our scheme reduces demand response times by 25% in comparison to a prefetching scheme that uses same prediction techniques but with no techniques to avoid interference.
We present Prefetch-Nice, a novel non-intrusive web prefetching system that (1) systematically avoids interference between prefetch and demand requests at the server as well as in the network by utilizing only spare resources, and (2) is deployable without any modifications to the browsers, the HTTP protocol and the network. Prefetch-Nice’s self-tuning architecture eliminates the necessity of the traditional “threshold” magic numbers typically used to limit intereference, thereby allowing applications to improve benefits and reduce the risk of aggressive prefetching. For example, we observe that on a real web server trace, our system reduces the average demand request response times by 25% in comparison to a prefetching scheme that uses same prediction techniques and traditional techniques for limiting interference.
1. Introduction
Second, we propose a novel self-tuning architecture for prefetching that eliminates the necessity of the traditional “threshold” magic number that is typically used to limit the interference prefetching inflicts on demand requests. This architecture separates prefetching into two different tasks:(i) prediction and (ii) resource management. The predictor proposes prioritized lists of high-value documents to prefetch. The resource manager decides how many of those documents can be prefetched and schedules the prefetch requests to avoid interference with demand requests and other applications. This separation of concerns has three advantages. First, it simplifies deployment and operation of prefetching systems by eliminating the need to select an appropriate threshold for an environment and update the threshold as conditions change. Second, it can reduce the interference caused by prefetching by throttling aggressiveness during periods of high demand load. Third, it can increase the benefits of prefetching by prefetching more aggressively than would otherwise be safe during periods of low and moderate load.
A number of studies have demonstrated the benefits of Web prefetching [13, 18, 27, 28, 35, 36, 43, 53]. And the attractiveness of prefetching appears likely to rise in the future as the falling prices of disk storage [15] and network bandwidth [8, 42] make it increasingly attractive to trade increased disk storage and network bandwidth consumption for improved response time and availability. Despite these benefits, prefetching systems have not been widely deployed because of two concerns: interference and deployability. First, if a prefetching system is too aggressive, it may interfere with demand requests to the same service (self-interference) or to other services (cross-interference) and hurt overall system performance. Second, if a system requires modifications to the existing HTTP protocol [21] , it may be impractical to deploy: the large number of deployed clients makes it difficult to change clients, and the increasing complexity of servers [25, 29, 31, 47, 56] makes it difficult to change servers. What is therefore needed is a prefetching protocol that (a) avoids interference at clients, networks, and servers and (b) does not require changes to the HTTP protocol and the existing infrastructure (client browsers, servers and networks). In this paper, we make three contributions. First, we present a novel prototype Web prefetching system – PrefetchNice – that (1) avoids interference by effectively utilizing only
Third, we explore the design space for avoiding interference in a prefetching system in light of the requirement of avoiding or minimizing changes to the existing infrastructure. We find that it is straightforward to deploy prefetching that ignores the problem of interference, and it is not much more difficult to augment such a system to avoid server interfer1
• Server: Prefetching consumes extra resources on the server such as CPU time, memory space and disk I/Os.
ence. Extending the system to also avoid network interference is more involved, but doing so appears feasible even under the constraint of not modifying current infrastructure. Unfortunately, we were unable to devise a method to completely eliminate prefetching’s interference at existing clients: in our system prefetched data may displace more valuable data in the cache. Instead, we develop several heuristics that reduce this interference to what appear to be acceptable levels. But, it appears that a complete solution may eventually require modifications at the client [7, 9, 45]. The rest of the paper is organized as follows. Section 2 discusses the requirements and architecture of a prefetching system. Sections 3, 4 and 5 present the building blocks for reducing interference at servers, networks and clients. Section 6 presents the prefetch mechanisms that we develop to realize the prefetching architecture. Section 7 discusses the details of our prototype and evaluation. Section 8 presents some related work and section 9 concludes.
2
• Network: Prefetching causes extra data packets to be transmitted over the Internet and hence consumes network bandwidth, potentially increasing router queue delays and network congestion. • Client: Prefetching leads to extra resource requirements even at the client. CPU, memory, and disk usage on clients could increase dramatically depending on how aggressively the client prefetches. A client browser’s memory and disk caches may be polluted by aggressive prefetching. A common way of achieving balance between the benefits and costs of prefetching is to select a threshold probability and fetch objects whose estimated probability of use before the object is modified or evicted from the cache exceeds that threshold [18, 32, 43, 53]. There are at least two problems with such “magic-number" based approaches. First, it is difficult for even an expert to set thresholds to optimum values to balance costs and benefits—although thresholds relate closely to the benefits of prefetching, they have little obvious relationship to the costs of prefetching [8, 23]. Second, appropriate thresholds to balance costs and benefits may vary over time as client, network, and server load conditions change over seconds (e.g., changing workloads or network congestion [31]), hours (e.g., diurnal patterns), and months (e.g., technology trends [4, 21]). Our goal is to construct a self-tuning resource management layer that prevents prefetching from interfering with demand requests. If successful, such an architecture would simplify the design of prefetching systems by separating the tasks of prediction and resource management. In such an architecture, at any given time prediction algorithms specify arbitrarily long lists of the most beneficial objects to prefetch sorted by benefit, and the resource management layer issues requests for these objects in sorted order and ensures that these requests do not interfere with demand requests or other system activities. In addition to simplifying system design, such an architecture could have two performance advantages over statically-set prefetch thresholds. First, such a system may reduce interference because when resources are scarce, it would reduce prefetching aggressiveness. Second, such a system may increase the benefits of prefetching when resources are plentiful by allowing more aggressive prefetching than would otherwise be possible.
Requirements and Architecture
There appears to be a consensus among researchers on a high level architecture for prefetching in which a server sends a list of objects to prefetch to a client and the client issues prefetch requests of objects from the list [10, 39, 43] . This division of labor allows servers to use global object access patterns and application-specific knowledge to determine what should be prefetched, and it allows clients to filter requests through their caches to avoid repeatedly fetching objects [10, 39]. In this paper, we develop a framework for prefething that follows this organization and which seeks to meet two other important requirements: self tuning resource management and deployability without modifying existing protocols, clients, proxies, or servers.
2.1
Resource Management
Services that prefetch should balance the benefits they get against the risk of interference. Interference can take the form of self-interference, where a prefetching service hurts its own performance by interfering with its demand requests, and cross-interference, where the service hurts the performance of other applications on the prefetching client, other clients, or both. Limiting interference is essential because many prefetching services has potentially unlimited bandwidth demand where incrementally more bandwidth consumption provides incrementally better service. For example, the prefetching system can improve its hit rate and hence the response times, by fetching objects from a virtually unlimited collection of objects that have non-zero probability of access [6, 9] or by updating cached copies more frequently as data change [11, 51, 53]. Interference can occur at any of the critical resources in the system.
2.2
Deployability
Many proposed prefetching mechanisms suggest modifying the HTTP/1.1 protocol [5, 16, 18, 43], to create a new request type for prefetching. An advantage of extending the protocol is that clients, proxies, and servers could then distinguish prefetch requests from demand requests and potentially schedule them separately to prevent prefetch requests 2
from interfering with demand requests [16]. However, such mechanisms are not easily deployable because modifying the protocol implies modifying the widely-deployed infrastructure that supports the current protocol. Furthermore, as web servers evolve and increase in their complexity by spanning multiple machines [25], CDNs [1], database servers, dynamic content generation subsystems [20, 33], etc., modifying CPU, memory, and disk scheduling to separate prefetch requests becomes increasingly complex. If interference were not a concern, a simple prefetching system could easily be built with the present infrastructure, where clients can be made to prefetch without any new modifications to the protocol. For example, servers can embed in the content, javascript code or a java applet [22] that when executed, fetches specified objects over the network, loading them into the browser cache. An alternate way to obviate the use of javascript and applets is to add invisible frames to the demand content that include and thereby preload the prefetch content. In this paper, we adapt these types of techniques to also avoid interference.
2.3
2.3.1
One Connection
In the one connection architecture (Figure 1(a)), a client fetches both demand and prefetch requests from the same content server. Since browsers multiplex requests over established connections to the servers, and since browsers do not differentiate between demand and prefetch requests, each TCP connection may have interleaved prefetch and demand requests and responses. Sharing connections causes prefetch requests to interfere with demand requests for network and server resources. If interference can be avoided, this system is easily deployable. In particular, objects fetched from the same server share the domain name of the server. So, unmodified client browsers can use cached prefetched objects to service demand requests. 2.3.2
Two Connection
In the two connection architecture(Figure 1(b)), a client fetches demand and prefetch requests from different servers. This architecture thus segregates demand and prefetch requests on separate network connections. This architecture also allows the demand and prefetch servers to be hosted either on different machines or on the same one. For example, two application servers listening on different TCP ports can be hosted on the same machine. This architecture also allows demand and prefetch requests to be segregated on separate network connections. Although, the two connection architecture provides these additional options for reducing interference, this solution appears to complicate the deployability of the system. Objects with the same names fetched from different servers are considered different by the browsers. So, browsers can not directly use the prefetched objects to service demand requests.
Architecture
In this subsection, we present two alternative architectures to build a prefetching system. The architectures are described here in order to provide a framework for discussing server, network and client resource management in Sections 3 through 5. In this subsection, we stay at a high level and do not discus in detail how to implement these architectures in a deployable way. These architectures and resource management strategies are relevant regardless of whether prefetching is implemented on a new protocol or by exploiting existing infrastructure. We describe how our implementation realizes one of these architectures in an easily deployable way in Section 6. The alternatives assume the following about the client browsers:
2.3.3
Comparison
In the sections that follow we show how to address the limitations of both architectures.
• To improve the deployability of the prefetching system, browsers should be unmodified.
• Browsers may multiplex multiple client requests to a given server on one or more persistent connections [21].
• Some of the techniques we develop for avoiding interference are useful for the one connection architecture, but some are less so. In particular, our best strategy for reducing interference at servers is based on end-to-end performance and is equally applicable to the one and two connection architectures. Conversely, the techniques we use to avoid network interference appear much easier to apply to the two-connection than the one-connection architecture.
As Figure 1 illustrates, in both these architectures, clients get the list of documents to prefetch, hint lists, from Hint Server. Clients send their access histories to the hint server and the hint server uses either online or offline prediction algorithms to compute hint lists consisting of most probable documents that client will request in the future.
• Despite the apparent deployability challenges to the two connection architecture discussed above we find that the same basic technique we use to make unmodified browsers prefetch data for the one connection architecture can be adapted to support the two connection architecture as well.
• Browsers match requests to documents in their caches based on (among other parameters) the server name and the file name of the object on the server. Thus files of the same name served by two different server names are considered to be different.
3
Demand Requests
Demand/Prefetch Requests
Content Server
Client
Hint Lists
Hint Lists
Hint Server
One second intervals
Requests per second
140 120 100 80 60 40
(b) Two Connection
120
One minute intervals
100
80
3.1
60
20
Local scheduling Server scheduling can help use the spare capacity of existing infrastructures for prefetching. In principle, existing schedulers for CPUs, memories [32, 34, 45], and disks [38] could prevent low-priority prefetch requests from interfering with high-priority demand requests. Furthermore, as these schedulers are intimately tied to the operating system, they should be highly efficient in delivering whatever spare capacity exists to prefetch requests even over fine time scales. Note that local scheduling is equally applicable to both oneand two-connection architectures. However, for many services server scheduling may not be easily deployable for two reasons. First, although several modern operating systems support CPU schedulers that can provide strict priority scheduling, few provide memory/cache or disk schedulers that isolate prefetch requests from demand requests. Second, even if an operating system provides the needed support, existing servers would have to be modified to associate prefetch and demand requests with scheduling priorities as they are serviced [4]. This second requirement appears particularly challenging given the increasing complexity of servers, in which requests may traverse not only a highly-tuned web server [44, 50, 55, 56] but also a number of other complex modules such as commercial databases, application servers or virtual machines for assembling dynamic content (e.g., Apache tomcat for executing Java Servlets and JavaServer pages), distributed cluster services [3, 26], and content delivery networks (CDNs).
0 0
10000 20000 30000 40000 50000 60000 70000 80000 90000 Interval
(a)
0
200
400
600
800
1000
1200
1400
1600
Interval
(b)
Figure 2. Server loads averaged over (a) 1second and (b) 1-minute intervals for the IBM sporting event workload. We conclude that both architectures are tenable in some circumstances. If server load is the primary concern, and network load is known not to be a major issue, then the one connection prototype may be simpler than the two connection prototype. At the same time, the two connection prototype is feasible and deployable and manages both network and server interference. Given that networks are a globally shared resource, we advise the use of two connection architecture in most circumstances.
3
Alternatives
40
20 0
Reference Lists
Figure 1. Design Alternatives for a Prefetching System There are a variety of ways to prevent prefetching from interfering with demand requests at servers. Average requests per second
160
Prefetch Server Hint Server
(a) One Connection
180
Prefetch Requests
Client
Reference Lists
Demand Server
Server Interference
An ideal system for avoiding server interference would cause no delay to demand requests in the system and utilize significant amounts of any spare resources on servers for prefetching. The system needs to cope with and take advantage of changing workload patterns over various timescales. HTTP request traffic arriving at a server often is bursty with the burstiness being observable at several scales of observation [14] and with peak rates exceeding the average rate by factors of 8 to 10 [40]. For example, Figure 2 shows the request load on an IBM server hosting a major sporting event during 1998 averaged over 1-second and 1-minute intervals. Though such burstiness favors prefetching when the servers are not loaded, to make use of the spare resources, the prefetching system should gracefully back off when the demand load on the servers increases. We use the term, pfrate, to denote the prefetch aggressiveness of a prefetching scheme. It represents the number of files prefetched for each demand file served.
Separate prefetch infrastructure An intuitively simple way of avoiding server interference is to use separate servers to achieve complete isolation of prefetch and demand requests. In addition to the obvious strategy of providing separate demand and prefetch machines in a centralized cluster, a natural use of this strategy might be for a third-party “prefetch distribution network” to supply geographically distributed prefetch servers in a manner analogous to existing content distribution networks. Note that this alternative is not available to the one-connection architecture. 4
However, separate infrastructure needs extra hardware and hence may not be an economically viable solution for many web sites.
probe objects so that different paths through the server can be probed. Second, increment/decrement rates must balance the risk of causing interference against the risk of not using available spare capacity. Furthermore, for stability, the system should limit the rate at which it adjusts pfrate so that the effects of previous adjustments, particularly the increments, are observed before new ones are made.
End-to-end monitoring End-to-end monitoring periodically measures the response time of servers and increases or decreases the pfrate when the measured time is low (indicating that serves have spare capacity) or high (indicating servers are heavily loaded) respectively. An advantage of this approach is that it requires no modifications to existing servers. Furthermore, it can be used by both one- and two-connection prefetching architectures. The disadvantage of such an approach is that it is likely to provide less precise scheduling than local schedulers that have access to the internal state of servers and operating systems. A particular concern is whether such an approach can react to changing loads at fine timescales. In the following subsections, we present a simple design of an end-to-end monitor and evaluate its efficacy in comparison to server scheduling.
3.2
Prototype Configuration Since interference is our main concern, we use a simple conservative monitor. Our monitor collects five response time samples spaced at 100 ± random(20) milliseconds (random returns a uniformly distributed random value between 0 and 20). If all the five samples lie below a threshold, the hint list size is incremented by one. While taking the five samples, if any sample exceeds the same threshold, the hint list size is reduced by one, and the sample count is reset, so that a new set of five samples are collected. In section 3.3, we show that our design choices work reasonably well in reducing interference, reaping significant spare bandwidth and maintaining stability. A limitation of our current prototype is that the threshold is hand-tuned for our configuration. Although selecting the threshold was straightforward, we plan to investigate policies for self-tuning thresholds. Because for many systems the response time v. load curve is nearly flat over modest loads and nearly vertical over high loads, there is reason to hope that appropriate thresholds may often be automatically detected.
End-to-end Monitor Design
Figure 3 illustrates the design of a monitor controlled prefetching system. The monitor controls the size of the hint lists given out by the hint server based on the server load. In what follows, we describe the functions that the monitor performs, and the design issues involved to realize the functions effectively. Demand/Prefetch Requests
Evaluation
A key challenge in studying web services is that services’ prefetch demands, prefetching strategy, and prefetching effectiveness may vary widely. It is therefore not practical to simulate application-specific prefetching and adaptation. The key observation that makes our analysis tractable is that for the purpose of evaluating resource management algorithms, it is not necessary to determine the impact of prefetching on the service. We therefore abstract away prediction policies used by services by prefetching sets of dummy data from arbitrary URLs at the server. Thus, the goal of the experiments is to compare the effectiveness of different resource management alternatives in avoiding server interference against the ideal case (when no prefetching is done) with respect to the following metrics: (i) cost: the amount of interference in terms of demand response times and (ii) benefit: the prefetch bandwidth. We consider the following algorithms for this set of experiments:
Request Samples
Hint Lists
Hint Server
3.3
Content Server
Client
Monitor Hint List Size
Figure 3. A Monitored Prefetching System Probing The monitor sends probe packets to the server at random times and measures the response times perceived by the probes. These probes are regular HTTP requests to representative objects on the server. A design challenge is to tune the rate at which probing is done. High rates make the monitor more reactive to server load, but also add extra load on the server. On the other hand, low probe rates make the monitor slow in reacting, and thus affecting demand requests. Low rates are particularly bad, given the bursty load patterns of servers. Setting the prefetch rate If the probes indicate response times exceeding a threshold, the monitor reduces the pfrate. Similarly, if the response times are under a different threshold, the monitor increases the prefetch rate. Two design issues must be addressed. First, appropriate thresholds must be selected. Different thresholds should be used for different
1. Ideal case, when no prefetching is done or when we use a separate prefetching infrastructure. 2. Prefetching with no interference avoidance with constant pfrate values of 1 and 5. We refer to this case as NoAvoidance scheme. 5
3. Prefetching with monitor control. Note that in this case the pfrate can vary from zero to a high maximum value (set to 100).
Average Response Time(sec)
0.6
4. Local server CPU scheduling. As a simple server scheduling policy, we choose Nice, the CPU scheduling utility in Unix. In this case, we use two different http servers on one machine. One server running at lower priority (+19) handles prefetch requests, and the other running at normal priority handles demand requests. This simple implementation of server scheduling only intended as a comparison for monitoring; more sophisticated local schedulers may closely approximate the ideal case.
No Prefetching, pf=0 No Avoidance, pf=1 No Avoidance, pf=5 Nice, pf=1 Nice, pf=5 Monitor
0.5 0.4 0.3 0.2 0.1 0 0
100
200
300
400
500
600
700
Rate (num of conns/sec)
For evaluating the first three algorithms, we set up one server serving both demand and prefetch requests. These schemes are applicable in both one connection and two connection architectures. The last algorithm requires that we service the demand and prefetch requests on two different connections. Because we used unmodified HTTP servers, our simple prototype is applicable only to two-connection architectures. Note that the general approach could be applied to a one-connection architecture too. We use two different workloads in our experiments. Our first workload generates demand requests to the server at a constant rate. The second workload is a one hour subset of the IBM sporting event server trace, whose characteristics are shown in Figure 2. We scaled up the trace in time by a factor of two, so that requests are generated at twice the rate than original trace, as the original trace barely loads our server. This scaling is likely to hurt the effectiveness of the monitor control approach. Our experimental setup includes the Apache HTTP server [2] on a 450MHz Pentium II, with 128MB of memory. The client load was generated using httperf [41] running on four different Pentium III 930MHz machines. All machines run the Linux operating system.
Figure 4. Effect of prefetching on demand response times with various interference avoidance policies 100
No Prefetching Demand: No Avoidance Prefetch: No Avoidance Demand: Nice Prefetch: Nice Demand: Monitor Prefetch: Monitor
Bandwidth (in Mbps)
80
60
40
20
0 0
100
200
300
400
500
600
700
800
Rate
Figure 5. Prefetch and demand bandwidths with pfrate = 1 Linux scheduler only prioritize processes on a coarse granularity. Even with the lowest possible priority values for prefetch requests, the CPU time allotted for demand requests can not be more than forty times the CPU time allotted for prefetch requests (with nice values of -20 and +19 for demand and prefetch processes respectively). More sophisticated CPU schedulers [4, 49] could avoid these difficulties. These experiments show that resource management is an important component of prefetching system because overly aggressive prefetching can significantly hurt demand response time and throughput while timid prefetching gives up significant bandwidth. It also illustrates a key problem with constant non-adaptive magic numbers such as the threshold approach that is commonly proposed. For a given service, a given threshold will correspond to some average pfrate.
Experimental Results with Constant Workload Figure 4 shows the demand response times with varying request arrival rate. The graph shows that both monitor and Nice schemes closely approximate the behavior of the ideal No-Prefetching control case in not affecting the demand response times. The No-Avoidance cases with fixed pfrate values can significantly damage both the demand response times and the maximum demand throughput. Figures 5 and 6 show the bandwidth achieved by prefetch requests and their effect on the demand bandwidth with varying demand request rate. For different values of pfrates, No-Avoidance case adversely effects the demand bandwidth. Whereas, both Nice and monitor schemes reap a significant bandwidth without large decreases in the demand bandwidth. Further, at higher request rates, monitor outperforms Nice by conservatively setting the pfrate to very low values (almost zero). Nice could not meet such good performance as the
Experimental Results with IBM Server Workload Similar to the constant load experiments, we compare the effectiveness of the schemes described above for the IBM sporting event server workload. The results of the runs are shown in 6
60
7600
9
7400
8
7200
7
110
6 100 5 90
4
80
5800
1 180
5600
60
80
100
120
140
160
0 200
300
400
500
600
700
4.5
4 0
5
10
15
20
25
30
Time (in minute intervals)
(b)
Figure 8. Monitor’s pfrate settings averaged over (a) 1-second and (b) 1-minute intervals, vs the demand request rates for the IBM workload 4.79, the monitor outperforms the No-Avoidance scheme with a fixed pfrate value of 5 in both metrics.
20
100
5
6200
2
(a)
0
6400
60 40
5.5
6600
70
Time (in second intervals)
40
6800
6000
20
6
7000
3
0
6.5
demand rate avg pfrate
Avg pfrate value
120
10
Demand Requests/min
demand rate avg pfrate
130 Demand Requests/sec
80 Bandwidth (in Mbps)
140
No Prefetching Demand: No Avoidance Prefetch: No Avoidance Demand: Nice Prefetch: Nice Demand: Monitor Prefetch: Monitor
Avg pfrate value
100
800
Rate
4 Network Interference
100 (0.61347749)
0.100 0.080
Avg Response Time Prefetch BW (in Mbps)
80
0.040
40
0.020
20
0.000
0
pf
pf
te
ch
in ra pf
et ef pr o N
ra = 1 te = ra 2 pf te = ra te 5 = 10 M on ito M r ( on 2m M itor s) on (8 ito m r(1 s) 5m N s) ic e, N ic pfra e, pf te = ra te 1 = 10
60
g
0.060
Mechanisms to avoid network interference can be deployed at clients, intermediate routers and servers. For example, clients can can reduce the rate at which they receive data from the servers using TCP flow control mechanisms [48]. However, it is not clear how to set the parameters to such mechanisms or how to deploy them given existing infrastructure. Router prioritization can avoid interference effectively, as routers have more information of the state of the network. However, router prioritization is not easily deployable in the foreseeable future. We focus on server based control because of the relative ease of deployability of server based mechanisms and their effectiveness in avoiding both self- and crossinterference. In particular, we use a server assisted technique called TCP-Nice [52]. TCP-Nice is a congestion control mechanism at the sender that is specifically designed to support background data transfers like prefetch data. Connections using Nice operate by utilizing only spare bandwidth in the network. They react more sensitively to congestion and backoff when a possibility of congestion is detected, giving way to foreground connections. In our previous study [52], we provably bound the network interference caused by Nice under a simple network model. Our experimental evidence under wide range of conditions and workloads showed that Nice causes little or no interference and at the same time reaps a large fraction of the spare capacity in the network. Nice is deployable in the two connection context without modifying the internals of servers by configuring systems to use Nice for all connections made to/from the prefetch server. A prototype of Nice runs on Linux currently, and it should be straight-forward to port Nice to other operating systems. The other way to use Nice in non-Linux environments is to put a Linux machine running Nice in front of the prefetch server and make the Linux machine serve as a reverse proxy or a gateway. It appears to be more challenging to use Nice in the one connection case. In principle, the Nice implementation allows flipping a connection’s congestion control algorithm between standard TCP (when serving demand requests) and
Prefetch BandWidth (Mbps)
Avg Demand Response Time (seconds)
Figure 6. Prefetch and demand bandwidths with pfrate = 5
Figure 7. Performance of No-avoidance, Nice and Monitor schemes on IBM Server Workload the Figure 7. In the case of No-Avoidance scheme, different bars show the experimental results corresponding to different pfrate settings. For the monitor, the performance is shown for various threshold values and for Nice, the performance is again shown with different pfrate settings. The graph shows that the No-Avoidance scheme is sensitive to the fixed value of pfrate. At a lower pfrate, it prefetches few bytes. As the pfrate increases, this mechanism achieves more prefetch bandwidth but starts interfering with demand requests. Also notice that if pfrate is fixed at a high value, then the demand response time interference rises sharply. The monitor is sensitive to the threshold value but it has wide sweet spot. In comparison to No-Avoidance scheme it achieves more prefetch bandwidth for a given response time cost. The Nice scheme also clearly outperformed the NoAvoidance method. Figure 8 illustrates the capability of our monitor to adapt in a timely manner to changes in server load. We choose a particular run of the monitor on the trace, with threshold set to 8ms. We plot the one second and one minute averages of the pfrate settings by the monitor against the changes in the demand load pattern, for the first three minutes of the workload and entire workload respectively. These plots clearly illustrate the ability of the monitor in controlling the pfrate. Also note that event hough the average pfrate setting in this case is 7
Nice (when serving prefetch requests). However, using this approach for prefetching has a number of challenges - (1) Flipping modes causes packets already queued in the TCP socket buffer to inherit the new mode. Thus, demand packets queued in the socket buffer may get sent out at low-priority while prefetch packets may get sent out at normal-priority, thus causing network interference. Ensuring that demand and prefetch documents get sent out in the appropriate modes requires an extension to Nice and a coordination between the application and the Nice implementation. (2) Nice is designed for long flows. It is not clear if flipping back and forth between congestion control algorithms will still avoid interference and gain significant spare bandwidth. (3) HTTP/1.1 pipelining requires replies to be sent in the order requests were received, so demand requests may be queued behind behind prefetch requests, causing demand requests to perceive increased latencies. One way to avoid such interference may be to quash all the prefetch requests queued in front of the demand request. For example, sending a small error message (eg. with HTTP response code 204 - indicating no content) having a short lifetime as a response to the quashed prefetch requests. (4) Servers need to be modified to tell the TCP layer when to use standard TCP and when to use Nice. Based on these challenges, it appears much simpler to use the two connection architecture when network is a potential bottleneck. As a part of our future work, we intend to explore these challenges and determine if a deployable one connection architecture that avoids network interference can be devised.
5
0.35 0.3
Hit Rate
0.25 0.2
Ideal LRU 1MB Ideal LRU 10MB Ideal LRU 30MB LRU 1MB LRU 10MB LRU 30MB LRU-expire24h 1MB LRU-expire24h 10MB LRU-expire24h 30MB
0.15 0.1 0.05 0 1
10 Prefetch Aggressiveness
Figure 9. Effect of prefetching on demand hit rate Because we assume that the client cannot be modified, we resort to two heuristics to limit the impact of prefetching’s cache pollution. First, in our system services place a limit on the ratio of prefetched bytes to demand bytes sent to a client. Second, services can set the Expires HTTP header to a value in the near future (e.g., one day in the future) to encourage some clients to evict prefetched document earlier than it may otherwise have done. These heuristics have an obvious disadvantage: they resort to magic numbers similar to those in current use, and they suffer from the same potential problems: if the magic numbers are too aggressive, prefetching services will interfere with other services, and if they are too timid, prefetching services will not gain the benefits they might otherwise gain. Fortunately, there is reason to hope that performance will not be too sensitive to this parameter. First, disks are large and growing larger at about 100% per year [15] and relatively modest-sized disks are effectively infinite for many client web cache workloads [53]. So, disk caches may have room to absorb relatively large amounts of prefetch data with little interference. Second, hit rates fall relatively slowly as disk capacities shrink [6, 53], which would suggest that relatively large amounts of polluting prefetch data will have relatively small effects on demand hit rate. Figure 9 illustrates the extent to which our heuristics can limit the interference of prefetching on hit rates. These experiments use the 28-day UCB trace of 8000 unique clients from 1996 [24] and simulate the hit rates of 1MB, 10MB and 30MB per-client caches. Note that these cache sizes are small given, for example, Internet Explorer’s default behavior of using 3% of a disk’s capacity (e.g., 300MB of a 10GB disk) for web caching. On the x-axis we vary the number of bytes of dummy prefetch data per byte of demand data that are fetched after each demand request. In this experiment, 20% of services use prefetching at the specified aggressiveness and the remainder do not, and we plot the demand hit rate of the nonprefetching services. Ideally, these hit rates should be unaffected by prefetching. As the graph shows, hit rates fall gradually as prefetching increases, and the effect shrinks as cache
Client Interference
Prefetching may interfere with client performance in at least two ways. First, processing prefetch requests may consume CPU cycles and, for instance, delay rendering of demand pages. Second, prefetched data may displace demand data from the cache and thus hurt demand hit rates for the prefetching service or other services. As with the server interference issues discussed above, client CPU interference could, in principle, easily be addressed by modifying the client browser (and, perhaps, the client operating system) to use a local CPU scheduler to ensure that prefetch processing never interferes with demand processing. Lacking that option, we resort to a simpler approach: as described in Section 6, we structure our prefetch mechanism to ensure that prefetch processing does not begin until after the loading and rendering of the demand page, including all inline images and recursive frames. Although this approach will not help reduce cross-interference with other applications at the client, it may avoid a potentially common case of self-interference of the prefetches triggered by a page delaying the rendering of that page. Similarly, a number of storage scheduling algorithms exist that balance caching prefetched data against caching demand data [7, 9, 34, 45]. Unfortunately, all of these algorithms require modifications to the cache replacement algorithm. 8
sizes get larger. If, for example, a client cache is 30MB and 20% of services prefetch aggressively enough so that each prefetches ten times as much prefetch data as the client references demand data, demand hit rates fall from 29.9% to 28.7%.
6
function pageOnLoad() { myiframe.getPFlist(document.referrer); } if(null == window.onload) { window.onload = pageOnLoad();} else { var origfn = window.onload; window.onload = function(){origfn();pageOnLoad();};}
Prefetching mechanism
Figure 10 illustrates the key components of our implmentation of the prefetching mechanism for the one and two connection architectures. The one connection prototype consists of an unmodified browser and a server that acts as the demand as well as prefetch server. The monitor controlls the size of hint lists given out by the hint server. The two connection prototype consists of the same components as above, but the prefetch server is a copy of the demand server (running either on a separate machine or different ports on the same machine). A simple front-end in front of the demand server is used to intercept certain requests and return appropriate redirection objects as described below, thereby obviating the need to make any modifications to the original demand server. As an optimization, the front-end could also be integrated with the demand server In the following we describe the prefetching mechanisms for the one and two connection architectures.
6.1
Figure 11. Augumentations to html pages function getPFList(var prevref) { document.location="hint-server/pflist.html+PCOOKIE="+ document.referrer + "+" + prevref + TURN=1; document.close(); }
Figure 12. pfalways.html been loaded, the myOnLoad() function calls getMore() and replaces pflist.html by fetching a new version with TURN=i+1, and step 4 is repeated. Thus a long list of prefetch suggestions can be “chained” as a series of short lists. When a hint server has sent everything it wants to, it returns a pflist.html with no getMore() call.
One-connection prefetching mechanism
Content modification Each HTML document is augmented with pieces of JavaScript and HTML code as shown in figure 11. Demand fetch
Demand fetch of prefetched document The client simply fetches it from the cache just like any other cache hit.
1. Client requests a demand document. 2. When an augmented HTML document finishes loading into the browser, the pageOnLoad function is called. This function calls getPfList(), a function defined in pfalways.html, shown in figure 12. The file pfalways.html is a small static file and is almost always found in the cache.
6.2
Two-connection prefetching mechanism
The two-connection design employs the same basic mechanism for prefetching described above. However, because browsers cache documents using the server name and document name, we need to add a level of indirection so that demand requests (to the demand server) for prefetched objects (from the prefetch server) can be serviced by the prefetched objects. Accomplishing this has proven to be somewhat nontrivial and involves the steps described below. In order to use the prefetched document in the cache when a demand request arrives for it, we make use of a redirection object fetched from the demand server. After getting a prefetched document from the prefetch server, a request for the same is sent to the demand server as well, which responds with a redirection object (wrapper) that just points to the corresponding document on the prefetch server. When a demand request arrives later for this document, the corresponding wrapper stored in the cache redirects the request to the prefetched document which is also found in the cache.
3. pfalways.html fetches the file pflist.html (figure 13) from the hint server with the name of the enclosing document, the name of the previous document in history (the enclosing document’s referrer) and TURN=1 as arguments. The prediction module in the hint server dynamically generates documents to be prefetched in pflist.html as explained in Section 6.4. 4. When pflist.html loads with TURN==i, the preload() function in its body fetches the current list of documents to be prefetched from the the prefetch server (which is same as demand in this case). After the current list of prefetch documents has 9
Demand/Prefetch Requests
Demand Requests
Content Server
Client
Demand Server
Munger Client
Fileset
FE
Fileset
Reference Lists
Hint Lists
Hint Lists
Hint Server
Munger
Prefetch Server Hint Server
Prefetch Requests
(a)
Fileset
(b)
Figure 10. Prefetching mechanisms for (a) one connection and (b) two connection architectures. sends a request for the corresponding HTML file to the demand server to obtain a suitable redirection object. The HTTP/1.1 “referrer” field for this request is set to the current file pflist.html, and thus distinguishes it from regular demand requests.
function myOnLoad() { preload("demand-server/c.html"); getMore() ; } function getMore() { document.location="hint-server/pflist.html+PCOOKIE="+ document.referrer + "+" + prevref+ "+" + "TURN=2"; document.close(); }
4. Front end: The front-end lets regular demand requests pass through to the demand server. However, when it gets a request for a wrapper, which it detects by observing the referrer field, it returns an appropriate redirection object. A redirection object is simply a short JavaScript file that sets its document.location property to the prefetched object’s URL.
var myfiles=new Array() function preload(){ for (i=0;i