INTELLIGENT PREFETCHING AT A PROXY SERVER Sajid Hussain (
[email protected])and Robert D.McLeod (
[email protected]) Intemet Innovation Center Engineering, University of Manitoba.
Abstarct-Intelligent prefetchmg helps to reduce the perceived delay for document downloading at the user's end. A monitor agent predicts possible future requests that are then prefetched to a proxy server "near" the user. An independent cache is provided for all users at the proxy server side. The documents are prefetched , in the user's cache at the proxy server. A separate cache for each user helps prevent a small number of users from exploiting network resources. Intelligent prefetching reduces document retrieval latency at the expense of minimal increase in network trmc.
The following section discusses the personal proxy cache, section I11 discusses the prefetching and section IV gives the implementation and finally the conclusion.
11. PERSONAL PROXY CACHE
Most home users suffer from communication delays. The cost of disk space and computation time is significantly less than the cost of communication. Personal proxy cache provides extra user cache on the proxy server. The local web browser cache is quite beneficial for users but they still suffer from communication delays. The Intemet download time through a modem is very long when compared to the download time of the same object from a proxy server.
I. INTRODUCTION The Intemet is becoming increasingly popular at home, work, school and all other institutions. The goal is to bring the source objects closer to the user so that less time is needed to download the data at the bandwidth bottleneck.
Figure 1 shows that there is a greater bandwidth channel between the proxy server and the Intemet than between the home user client and the proxy server. Thus, while an object is downloaded for a user from the proxy server during the same time several other objects can be prefetched from the Intemet to the proxy server.
Caching on local browsers helps to reduce the delays for reloading recently accessed objects. The retrieval time could be further reduced by prefetching the objects that are not requested by the user but are likely to be requested in the future. Since the user at home have bandwidth constraint channel, pre-fetching the objects at home computer will not be effective [l]. Prefetching could be effective by providing a personal cache for the user at the proxy server. The objects are pre-fetched to the personal proxy cache using large bandwidth channel between proxy server and the source website. These prefetched objects will accelerate the downloading process. The personal proxy cache adds an additional layer in the hierarchy. An intelligent hierarchy of caches improves the system's performance.
Proxy Server onal Proxy Cache
I
I
I
More Bandwidth lesstimefor downloding
P
Less Bandwidth
Personal caching agent could be used to determine the objects to be pre-fetched for the user.,The agent monitors the history of the user and determines the desired objects. Search engines have been improved and provide more closely related documents. The Intemet search query gives a long list of closely related URLs. These closely related links help in estimating the next future possible request of the user. The personal caching agent prefetches the future possible requests. When user selects any link from the current page, a cluster of other related links is also downloaded.
more time for downloadinn
I
I
b
II III II ( I
I 11
Pre-fetching Objects are downloaded before they are requested
Fig. 1. Pre-fetching to Personh Proxy Cache
A hierarchy of caches improves the system's performance [2]. The personal proxy cache adds an additional layer in the hierarchy. Figure 2 shows the hierarchy of caches in the prototype implementation of
0-7803-5957-7/00/$10.00 02000 IEEE
209
personal proxy cache. The first layer is the local browser cache, second is the personal proxy cache and third is the proxy server cache. The inefficient addition of more layers in the hierarchy of caches may increase the document retrieval time. When the proxy server cache is full and an object is replaced then usual least recently used (LRU) algorithmsgive poor performance [3]. Since the likelihood of finding the objects in the personal proxy cache is greater than finding the objects in the proxy server cache, the addition of an extra layer in the hierarchy reduces the objects downloading time.
a
closer to the user. Since web documents are changing very rapidly, the chances of getting the fresh document in the cache is becoming limited, the upper bound of getting the document in the cache is less than 40% [3]. Prefetching is a technique to increase the chances of getting the document from the local cache. The future possible requests for the user are predicted and the web documents are stored in the local cache for the user. Now when the user requests the web document there are more chances of getting the document from the local cache. The local cache can be at the client machine and/or can be at the server. In both cases the retrieval time will be faster as compared to the retrieval from the Internet. The time of prefetching depends on several parameters. Personal smart organizers can be used to prefetch the documents according to the needs of the user. If the usual procedure of the user is to see the CNN news at 8:30 am in the morning, then it is advisable to prefetch the CNN news website just before 8:30 am. Similarly prefetching can be done for the regular needs of the user. Another aspect of prefetching is the real time prefetching. The paths traversed by different usm on the website has some similarities. These path traversals can help in predicting the future requests. Another approach is to prefetch the links which are spatially closer to the selected link. If the user is visiting a well structured website like Internet movie database, IBM patent page or any annotated list of URLs like Blight’s Telecommunication page then there are more chances that future request will be from the neighborhood of the selected link. This technique helps for these front end websites and it does not need any record of previous users traversals. Since the contents of these front end websites change rapidly, the neighborhood prefetching is quite effective for these websites.
Q
Client
Client
Layer 2
Proxy Server Layer 3
Internet Fig. 2. Hierarchy for Personal Proxy Cache
When the user is browsing a document then in the same time interval the prefetching can be done. The idle time of our channel is used for prefetching. The bandwidth available between the client and the proxy/web server is less than as compared to the bandwidth between the proxy/web server and the client. So in the same time interval more data can be prefetched from the Internet to the proxy server than to the client. To get the maximum efficiency the prefetching is done at both the levels. The document is downloaded from the Internet to the Proxy server and to the client cache too. The prefetching at the server can have hierarchy of caches. One level is the personal cache of the user at the server, then second level is the cache for several users, then third level can be that several servers are acting like siblings and can retrieve documents from one another. It is helpful in using the
A Personal caching agent determines the objects to be prefetched. The user browsing history helps to determine possible candidates. Furthermore, simple heuristic rules may also help in determining the possible candidates. For instance, if the user visits a particular newspaper or entertainmentsite everyday, then it may be required to pre-fetch the objects to the personal proxy cache. As well, if the user visits a list of links then it is better to pre-fetch the other objects in the list while the first object is being downloaded.
In. PREFETCHING Caching reduces the latency and brings the documents
210
Since most of the publicly available logs do not contain the information useful for prefetching [SI, the experiment was run in the Internet Innovation Center for several users. The following table shows a sample of the typical results.
resources. For users of Winnipeg it will be faster to get the document from Saskatchewan as compared to getting the same document from Europe. So there can be hierarchy of caches to get the document. The scheduling of prefetching is quite effective in queuing management. If the prefetching is done at regular intervals and at low priority threads with controlled transfer rate then prefetching helps to reduce the burstiness of the Internet traffic [4].
I Direct
Method
IV. IMPLEMENTATION The proxy server is implemented to prefetch the related documents for the user. The server uses HTTP 1.1 protocol to manage cache control tags. The server keeps track of the user requests. For every user there is associated recently visited links buffer. The size of the buffer can be adjusted dynamically. The buffer is updated on the basis of “First In First Out”. The buffer contains only the pages whose content-type is “text/ html”. The server also extracts the embedded links in the pages. The list of these embedded links is stored with the link in the visited links buffer. The extraction of embedded links is done in separate threads so it will not effect the downloading time or systems performance. The time user takes to browse the page is quite sufficient to extract this useful information.
Prefetch
Visits
340
335
Data (MB)
14.11
14.06
Download Time (mins)
18.72
6.19
Connection Time (sec)
63.47
28.64
Prefetched Bytes (MB)
N/A
20.26
Transfer Rate (kblsec)
12.57
37.88
The results show that prefetching increases the transfer rate by three times.
V. CONCLUSION Objects are pre-fetched to the personal cache at the proxy server to get faster downloading. Since the personal caching agent is used, the probability of finding the objects at the proxy server is greater and fewer objects are downloaded from the Internet. REFERENCES
The personal caching agent searches the “referer” page of the current page in the recently visited links buffer. If the referer page is found then it locates the current page in the list of embedded links. A small cluster of links following the current page is also selected to be prefetched.
[l] L. Fann, Q. Jacobson, P. Cao and W.Lin, “Web Prefetching Between Low-Bandwidth Clients and Proxies: Potential and Performance”, in Proceedings SIGMETRICS’99, 1999. [2] A. Chankhunthod, P. Danzig, C. Neerdaels, M.F. Schwartz and K. Worrell, “A Hierarchical Intemet Object Cache,” in Proc. USENM 96, hnp://acalibur.usc.edu/cache-html,1996. [3] M. Abrams, C. R. Standxidge, G. Abdulla, S. Williams, and E. A. Fox, “Caching proxies: Limitations and Potentials,” in Proceedings of the Fourth International Conference on the WWU: (Boston, MA), December 1995. [4] M. E. Crovella and P. Bardford, “The Network Effects of Prefetching”, in Proceedings of IEEE Infocom’98, San Francisco, Califomia, 1998. [5] Davison B.D, “Web Traffic Logs: An Imperfect Resource for Evaluation”, in Proceedings of Ninth Annual Conference of the Internet Society (INET’99), San Jose, June 22-25, 1999.
The recently visited links buffer is needed because if the web page contains the frames, then two or more html files can be requested. In these days banners and advertisements are quite common. So user may have requested several html files and only one of *em will be the desired referer page. The prefetching agent starts separate low priority threads to download the future possible requests. The prefetching is started after a short delay to ensure that the actual request has been served. The maximum number of simultaneous connections are controlled and can be determined dynamically. For instance if five pages are to be prefetched, and each of them may contain more than ten image files. Then 50 or more files have to prefetched. The server will not start 50 parallel connections otherwise it will increase the congestion. There is a limit on maximum number of parallel threads. This limit helps in achieving effective queuing management.
21 1