Advances in Engineering Software 37 (2006) 11–19 www.elsevier.com/locate/advengsoft
Weblins: A scalable WWW cluster-based server Ahmad Faoura,*, Nashat Mansourb a Laboratoire de Physique des Mate´riaux (LPM), Lebanese University, Beirut, Lebanon Computer Science Division, Lebanese American University, P.O. Box 13-5053, Beirut, Lebanon
b
Received 27 July 2004; received in revised form 22 March 2005; accepted 4 April 2005 Available online 4 June 2005
Abstract With the ever-growing web traffic, cluster-based web servers have become very important to the Internet infrastructure. Thus, making the best use of all available resources in the cluster to achieve high performance is a significant research issue. In this paper, we present Weblins, a cluster-based web server that can achieve good throughput. Weblins has Gobelins operating system as platform. Gobelins is an efficient single system image operating system that transparently makes use of the resources available in the cluster. The architecture of Weblins is fully distributed. Weblins implements a content-aware request distribution policy via a new interface on top of Gobelins. Popular web files are dynamically replicated on all nodes via a cooperative caching mechanism. For the non-popular files, the requests are handed-off to the corresponding nodes via the TCP Handoff protocol. Simulation results show that the strategy used by Weblins is more suitable for clusterbased Web severs in comparison with pure content-aware strategy and pure cooperative caching strategy. q 2005 Elsevier Ltd. All rights reserved. Keywords: Content-aware request distribution; Cooperative file caching; Load balancing; Web server cluster
1. Introduction Clusters of workstations are becoming an increasingly popular hardware platform for cost-effective high performance network servers. Typically, a cluster-based web server consists of a front-end server, responsible for request distribution, and a number of back-end servers, responsible for request processing. Back-end servers can handle several incoming requests concurrently. However, to ensure high performance, cluster-based web servers should satisfy two requirements: load balancing and high cache hit rate. Load balancing solutions can be classified as: DNS based approaches, and IP/TCP/HTTP redirection based approaches. The second approach employs a specialized front-end node and a load balancer, which traditionally determines the least loaded server to which the packet has to be sent [12,13]. Previous request distribution methods, such as round robin, have focused mainly on load balancing to maximize the utilization of the cluster. When the front-end * Corresponding author. E-mail addresses:
[email protected] (A. Faour), nmansour@ lau.edu.lb (N. Mansour).
0965-9978/$ - see front matter q 2005 Elsevier Ltd. All rights reserved. doi:10.1016/j.advengsoft.2005.04.002
server distributes incoming requests in this manner, each back-end server is likely to cache identical set of data in its main memory. If the size of the working set exceeds the size of the main memory cache, back-end servers tend to suffer from the expensive disk I/O. Recent work has focused on the content or type of requests sent to servers [8–11]. These request distribution methods aim to achieve both load balancing and high cache hit rate. While it takes load balancing into account, these methods attempt to dispatch the same kinds of requests to only one back-end server in order to reduce the number of disk accesses. There are three main components comprising a cluster configuration with content-aware request distribution strategy: (a) a dispatcher that specifies which web server will process a given request, (b) a distributor, which interfaces with the client and implements the mechanism that distributes the client requests to a specific web server, and (c) a web server, which processes HTTP requests. In order to distribute requests based on the requested content, the distributor should implement some mechanisms such as TCP handoff [9] or TCP splicing [7]. There are three typical cluster configurations: with single front-end distributor, co-located distributor and server, and co-located dispatcher, distributor and server [11]. Under a load balancing policy in a k-node cluster, each node will
12
A. Faour, N. Mansour / Advances in Engineering Software 37 (2006) 11–19
statistically serve only 1/k of the incoming requests locally and will forward (kK1)/k of the requests to the other nodes using the TCP handoff mechanism. TCP handoff is an expensive operation. This could lead to a significant forwarding overhead, decreasing the potential performance benefits of the proposed solution [14]. On the other hand, when a back-end server becomes overloaded, it moves some contents to an under-loaded back-end server. That is, some incoming requests are re-assigned to another less busy backend server. Then, the re-assigned back-end server will process the subsequent requests. Locality-aware request distribution (LARD) [9] has been proposed as one framework of the content-based strategy. Other research has focused on the high cache hit rate, where cooperative caching [5,6] may be applied to cluster-based web servers. Cooperative caching treats all back-end server memory systems as a large file cache. When a back-end server misses some data, it first searches the main memory cache of other servers before it accesses the corresponding hard disk. One problem is that most of existing cooperating caching algorithms have been developed to provide remote users with file sharing in traditional Unix network file systems and do not consider the specific characteristics of cluster web servers. In this paper, we present a cluster-based web server system, called Weblins, which exploits Gobelins cluster operating system single image properties [4]. Gobelins provides global management of all resources. Higher-level operating system services such as distributed shared memory system, distributed file system and cooperative file cache can be easily implemented in Gobelins. It has primarily been designed to support the execution of high performance parallel applications. However, its mechanisms are also suitable for the implementation of efficient cluster web servers even if resource management policies designed for parallel applications are not adequate for data server applications. Weblins integrates new memory and file management policies to efficiently support the execution of a web server. Weblins incorporates a mixed strategy that combines content-aware policy and cooperative caching strategy. Our main goal is to minimize the overhead of TCP handoff on one hand and on the other hand to take advantage of cooperative caching for obtaining high cache hit rate. Our specific contributions in this paper include: (a) adapting the traditional greedy dual-size frequency (GDSF) replacement algorithm to suit cluster environments, (b) developing a new request distribution policy for obtaining load balancing and high cache hit rate, (c) developing a new distribution algorithm of web documents across the disks of all nodes for uniformly distributing the load, and (d) comparing the performance of Weblins with that of cooperative caching and content-aware servers for serving static Web content. This paper is organized as follows. Section 2 presents the overall architecture of Weblins. Sections 3–6 describe the cache replacement policy, web database distribution
and request distribution policy. Section 7 presents the simulation results. Section 8 concludes the paper.
2. Weblins architecture Weblins aims to provide: (a) a scalable distributed architecture, (b) a high request throughput by balancing the load on the cluster nodes, and (c) a high cache hit rate by exploiting Gobelins features. The main features of Weblins are: it can potentially use content-aware request distribution policy by using the TCP Handoff whenever doing so is profitable; it can improve the cache hit rate by dynamically constructing a set of popular files. This set is replicated on all nodes by the use of the mechanisms of cooperative caching of Gobelins; it can support large Web databases by splitting data across disks of all nodes in the cluster. Every node can access any web object (regardless of its location) thanks to the mechanisms of Distributed File System of Gobelins; it can survive failures due to its distributed architecture. The Weblins Web server consists of multiple distributed Web servers that function as a single logical Web server. A front-end interface directs Web requests to a set of backend nodes. Distributing requests is completely transparent to clients. The front-end can perform as a TCP router, or a simple round-robin switch. It presents a single IP address to clients regardless of the number of back-end servers. Thus, it is possible to add and remove back-end servers without making the clients aware of it. In our implementation, we use a simple LAN switch that directs Web requests in a round-robin fashion. In Weblins, all back-end servers are identical with the same configuration. The distributed operating system Gobelins provides the platform of the cluster. There are three main components in each server node (Fig. 1): (a) a dispatcher that specifies which web server contains the requested document (if any) in the cluster, (b) a distributor that implements the mechanism for distributing client requests to specific server nodes in case of redirection, and (c) Web Server which process HTTP requests. To distribute the requests, the distributor component should implement some mechanism such as TCP handoff. The TCP handoff [9] enables the forwarding of back-end responses directly to the clients. To specify the locality of a file in the cluster, the dispatcher should contact the Gobelins cooperative cache file system and cache manager. This information is ‘passed’ between the dispatcher and the underlying Gobelins OS via the G-API interface. G-API is an intermediate interface between the HTTP server and the underlying cooperative cache file system. It contains some library functions used by the server, instead of using standard open, read and write functions. These library functions are g_open, g_read, g_write, g_stat and g_find. Using these functions, the server can get information about a requested file whether it is cached locally or in the cluster
A. Faour, N. Mansour / Advances in Engineering Software 37 (2006) 11–19
HTTP Server
HTTP Server
Dispatcher
Distributor
Distributor
Dispatcher
G-API
TCP/IP Handoff
TCP/IP Handoff
G-API
Cooperative Cache Manager
13
Distrbuted File System
GIMLI
Distrbuted File System
Cooperative Cache Manager
TCP/IP
TCP/IP
GOBELINS
GIMLI GOBELINS
LAN Fig. 1. Architecture of weblins.
and all other information about cached files like size, popularity, type and other. For request distribution, Weblins uses a strategy that combines the features of content-aware request distribution and the features of cooperative caching. The former is used in case of non-popular files that mostly have big sizes, while the latter is used in case of highly popular files that have a small size and are frequently accessed. These files are replicated dynamically on all nodes by the mechanisms of cooperative caching providing a high cache hit rate. Using this mixed strategy, we can assign some nodes to specific types of files like video streams, minimize the overhead of handoff requests, and determine a ‘core’ of highly popular files to be replicated on all nodes. This combination increases the overall performance of the system.
3. Cache replacement policy The cache replacement policy is used for choosing the file(s) to be evicted, in order to make room for new inserted file(s). This choice is important for obtaining a high cache hit rate. The best choice is to evict the files that have low file values. Several parameters constitute the value of a file. Among these parameters are the size, the popularity, the time of reference and the cost of bringing the file into the cache. The GDSF algorithm seems the best in the literature for web caching purposes [2]. However, this algorithm is not designed for a cluster of web servers. We propose a modified version of this algorithm and integrate it into our Weblins prototype. The GDSF algorithm maintains a priority value of each cached document. With each file f present in the cache, a frequency count Fr(f) is associated representing how many times it has been accessed. File f that is not in the cache but
is going to be cached is assigned a frequency count of 1:Fr(f)Z1. To define which files are going to be replaced when the cache capacity is exceeded, a priority queue on files is maintained. The file f is inserted into the priority queue with a priority key Pr(f) computed in the following way Prðf Þ Z Clock C Frðf Þ !Costðf Þ=Sizeðf Þ where: (a) Clock is a running queue ‘clock’ that starts at 0 and is updated for each replaced (evicted) f_evicted to the priority key of this file in the priority queue: Pr(f_evicted); (b) Fr(f) is a file f frequency count. If file f is a hit then Fr(f) is increased by one: Fr(f)ZFr(f)C1. If file f is a miss, it is assigned a frequency count of 1: Fr(f)Z1; (c) Size(f) is the file size; (d) Cost(f) is the cost associated with bringing file f into a cache. 3.1. CGDSF: clustered-greedy-dual-size-frequency algorithm The CGDSF is a cooperative replacement algorithm that enables busy caches to utilize the idle caches. Each cache runs the local GDSF algorithm, using the efficient implementation described in Section 2. Recall that this algorithm maintains a value for each object in the cache, and upon a cache miss, it evicts the object with the minimum value. In our clustered generalization, we modified this algorithm in two ways: † Global aging clock. The parameter ‘clock’ has a monotonically increasing value (it is increased any time when some document get replaced). Working in the cluster, the local GDSF can lead to different clock on each node based on the percentage of the file eviction. In this case, a conflict might occur if the same document
14
A. Faour, N. Mansour / Advances in Engineering Software 37 (2006) 11–19
If file f is a replica max_pr = maxpriority(all replicas) if Pr(f)=max_pr update_priority(all_replicas); discard(f) Else //f is a singlet //search for a node Ny that has free space if found() inject f to Ny set pr(f) to old priority else //search for a node Nz that has a min. set of files with // priority less than f and sum(size(fi)) >=size(f) if found(Nz) discard(fi) for all i inject(f) to Nz with its old priority else discard(f) endif endif endif Fig. 2. File eviction algorithm in CGDSF.
is accessed on two nodes at the same time. This document will have two different priorities. Also, as we describe for our prototype, if one node gets a copy of a highly popular document from a remote node, this copy must remain a long time in the cache. Working with a local clock, this copy can have a low priority if the clock is low, and there is a high probability to be replaced by another less popular in case of injection of a file to this node. We propose a global clock that is broadcasted to all nodes in case of any modification. † Incremented priority. In case of discarding a replica from a cache, if its priority is higher than all other copies, this priority is migrated to another copy before the discarding. This way the priority cannot be decremented and the popularity of any file is conserved. When an object is chosen for eviction, it has two cases: (a) Singlet, or (b) Replica. If the evicted object is a replica, it is simply discarded after passing its priority to another replica if this priority is the maximum between all replicas. If the evicted object is a singlet (i.e. it is the only copy in the cluster), then it is transferred to another cache that has free space. Otherwise, we find a node that has a number of files that have lower priority than that of the candidate object and have the sum of their sizes greater than the size of the evicted object; these files will be discarded. If such a node is not found, the object is discarded. A pseudo-code of the CGDSF is shown in Fig. 2.
server owns or can access a replicated copy of the Web site content unless internal re-routing mechanisms are employed. There are essentially two mechanisms for distributing static information among the Web servers of the cluster. One is to replicate the content tree across independent file systems running on the servers; the other is to share information by means of a distributed file system such as Andrew File System (AFS) or Network File System (NFS). The first technique requires that each server in the cluster maintain a local copy of the Web documents on its local disk. In this way, each server has to access its own disk only, without any extra communication with the other servers of the cluster. Such content replication has a high storage overhead and, even worse, it requires any content update to be propagated to all the nodes in short periods of time. An efficient mechanism for updating and controlling the documents should be implemented to maintain consistency among the data stored on the servers. The second technique for sharing information uses a distributed file system such as AFS and NFS: each document is divided into logical blocks, which are distributed across the server’s disks. The use of a distributed file system ensures the consistency of information and does not require a large amount of disk space. On the other hand, it introduces a communication overhead between the servers and may increase the response time, as the server nodes have first to obtain the file information from the file server before sending it to the client. In Weblins, we propose to use a third alternative by partitioning the content tree among the Web server nodes, share information by the Distributed File System of Gobelins, and replicate the set of highly popular documents on all nodes. This technique has two main advantages. It increases secondary storage scalability. It also allows the use of specialized server nodes to improve responses for different file types (such as, streaming content, CPUintensive requests, and disk-intensive requests), to minimize the communication overhead between servers, and to maintain high availability. However, content partitioning can lead to load imbalance produced by the uneven distribution of popular Web documents, since the servers storing hot documents can be overwhelmed by client requests. We believe that suitable caching mechanisms can alleviate server overload due to hot spots because frequently accessed documents are likely not to require a disk access. The combination of the uniform partitioning of Web database between the server’s node and the use of efficient cooperative caching mechanism can increase the performance by increasing the high cache hit rate and a uniformly distribute the load between nodes.
4. Web documents distribution 4.1. Algorithm When we consider locally distributed Web systems that do not use a content-aware dispatching mechanism, any server node should be able to respond to client requests for any part of the provided content tree. This means that each
It has been observed that 10% of files in a Web site accounts for 90% of server requests and 90% of the bytes transferred [3]. To improve the storage utilization without
A. Faour, N. Mansour / Advances in Engineering Software 37 (2006) 11–19
scarifying performance, we propose new policies for document replication, which reduce the number of copies for documents with low request rate or those with large file size. We propose an algorithm that classifies the web database into three main categories based on historical data (logs): † Highly popular documents. It is composed of the documents with high request rate. † Popular documents. This category contains the documents with average request rate. † Non-popular documents. It is composed of the documents with minimum request rate. To classify documents, Weblins employs a heuristic procedure: first, we take the set of documents accessed more than once. Next, we calculate the average frequency of this set. The documents having a bigger frequency than this average are considered as ‘highly popular’. Then, we consider the set of all remaining documents and calculate the average frequency for this set. The documents that have a smaller frequency than this new average are considered as ‘non-popular’ and the remaining documents are, therefore, considered as ‘popular’. The highly popular documents have the highest probability to be cached in the cluster memory and does not require more than one disk access. Usually, the size of such documents is small [3]. To minimize cache misses on the access of highly popular documents, these documents are replicated on all nodes. The second and third categories of documents are distributed among all disks in the following way: for each category, we compute the minimum, the average and the maximum file size. The files with sizes that exceed the average constitute the first sub-category, whereas the others the second subcategory. Then, the files of each sub-category are distributed between the nodes in a round-robin fashion. Based on this technique, the files are likely to become evenly distributed between the nodes and the load will be shared between all disks.
5. Request flows through the system A back-end server that initially receives a request from the front-end is referred to as the first member. A server member hit occurs when the first member receiving a request from the front-end cache a copy of the requested object. Likewise, a server member miss indicates the case when the first member does not contain a copy of the requested object. If no replication is used, the probability of a server member hit is roughly 1/n where n is the number of servers in the cluster array. The exact probability is dependent on the way objects are partitioned across the server array, request traffic, and the load and availability of server members. A server member hit is distinct from a server array hit that occurs when the server array as a whole (cluster) can satisfy
15
a request (i.e. at least one server member has a copy of the requested object in its cache). Note that it is possible to have a server member miss and a server array hit. This would occur when the first member receiving a request from the front-end does not contain a cached copy of the requested object but another server does. Conversely, it is possible to have a server member hit and a server array miss. This would occur when the first member receiving a request from the front-end is the owner of the requested object (i.e. has the object in its local disk). Also, it is possible to have a server member miss and a server array miss. This would occur when there is no member in the cluster that have a cached copy and the member who received the request from the front-end is not an owner. The following summarizes the different request flows through the system: (a) server member hit, server array hit; (b) server member hit, server array miss; (c) server member miss, server array hit; (d) server member miss, server array miss. Server member hit and server array hit. In this case the server sends the object directly backs to the client without going through the front-end. Server member hit and server array miss. This is the case when the first member receiving the request from the TCP router is the owner of the requested object. The server fetches the object form its disk, cache a copy on its local cache and send the object directly back to the client. Server member miss and server array hit. This would occur when the first member receiving the request has not a copy of the requested object in its cache, but there is a copy cached in the cluster. In this case two alternatives are possible: getting a copy from remote server, or handoff the request to the remote server. In Weblins, we aim to reduce the number of cache misses by caching the highly popular files on all nodes. Usually, these files constitute about 10% of requested documents [3]. Serving the highly popular files from all caches can improve the performance and decrease the response time. The cornerstone in our algorithm is its capability to construct dynamically the set of highly popular files in all nodes by the combination of the mechanisms offered by Gobelins to copy data between nodes and the cache replacement algorithm that we proposed. In case of non-popular files, we use the TCP/IP Handoff implemented on all nodes to handoff the request to another back-end nodes. Using TCP/IP Handoff can avoid cache pollution and reduce the overhead of the network. In general, in the case of server member miss and server array hit, we propose to get a copy from the node that caches the file if the requested file has a small size. If this file belongs to the set of highly popular pages it will be accessed frequently in the near future and it remains in the cache. Conversely, if this file belongs to popular or non-popular files then it will be evicted from the cache by the use of the cache replacement policy. This way the algorithm constructs the core set of highly popular files dynamically
16
A. Faour, N. Mansour / Advances in Engineering Software 37 (2006) 11–19
and with low overhead using the underlying system services and system libraries that provide high bandwidth and low latency instead of the use of HTTP interface (in case of proxy) or other communication protocols. In case of files with big size, we consider that using TCP Handoff is more efficient than getting a copy for reducing the overhead in the network and the probability of cache pollution since these files will replace some small files in case of cache saturation. Servermembermiss and server array miss.This case would occur when the first member receiving the request is not the owner of the requested object. Since the requested object belongs to the set of popular or non-popular object, we propose to handoff the request to the owner. In this way we avoid the network overhead and we maintain a copy in the cluster. To optimize performance, our system implements a mixed strategy for server member misses as in the following steps: † The first member contacts the dispatcher for getting information about the requested object. These informations are provided by the G-API interface. † After gathering the information, if the requested object is not cached, then the request is handed off to the owner of the object.
† If the requested object is cached and with small size, then the G-API interface is called to get a copy from the remote node. † The first member returns the object to the client. † If the requested object is cached but it has a large size, then the request is also handed off to the node which cache the object. † The remote node returns the object directly to the client. † Asynchronously, the remote node informs the first member to clean-up connection information corresponding to the request. A pseudo-code for the Weblins’ request distribution algorithm is illustrated in Fig. 3, where S refers to the size of requested file and T refers to the server load measured in number of active connections.
6. Distributed vs. centralized dispatcher A real risk of a system bottleneck exists when the cluster’s front end implements the content-aware request distribution policy (layer-7 Web switch) [9,11]. Indeed, the additional overhead caused by such content-aware routing reduces the system scalability by one order of magnitude with respect to the load balancing request distribution policy (layer-4 Web switches) [9,11]. To overcome this drawback, Weblins proposes an alternative that combines the two policies. A layer-4 Web switch is implemented in the front-end node of the cluster, which interfaces client requests. This switch receives all requests directed to the Web cluster and distributes them to the back-end servers that have the lowest number of active connections. Upon receiving a request from the front-end, a back-end server can process this process in one of two ways: (a) serve the request locally, or (b) redirect the request to other back-end servers according to Weblins request distribution policy described in Section 5.
7. Simulation results
Fig. 3. Request distribution algorithm in weblins.
To study various request distribution policies for a range of cluster sizes under different assumptions for CPU speed, amount of memory and other parameters, we developed a configurable web server cluster event-driven simulator. The costs for the basic request processing steps used in our simulation were derived from the measures used in [9]; in particular, we use file size threshold, SZ100 KB, and server load threshold, TZ130 active connections. In addition to Weblins, we simulated two more prototypes: Distributed LARD (content-aware request distribution strategy). This prototype is implemented based on the description given in [11]. In this prototype, the cluster is composed of a switch connected to a set of back-nodes. The switch distributes the client requests to the back-end server in a round-robin fashion. A specialized dispatcher is
A. Faour, N. Mansour / Advances in Engineering Software 37 (2006) 11–19
300000
LARD WEBLINS Gobelins
20 0 8 MB
200000 WEBLINS
150000
Gobelins
100000 50000
16 MB Size of cache
0 8 MB 16 MB Size of Cache
Fig. 4. Local cache hit ratio (10 nodes).
dedicated for implementing the LARD policy. Every backend node implements the distributor module. Gobelins cooperative caching strategy (Web-CLRU). This is a simulation of a Web server implemented on a cluster that has Gobelins Operating system as a plat-form. The resources (memories, disks and CPU) are globally managed [4]. The simulator calculates overall throughput, hit rate, disk access, idle time and network traffic. Throughput is the number of requests in the trace that were served per second by the entire cluster, calculated as the number of requests in the trace divided by the simulated time it took to finish serving all the requests in the trace. The cache hit ratio is the number of requests that hit in a back-end node’s main memory cache divided by the number of requests in the trace. The idle time was measured as the fraction of simulated time during which a back-end node was under utilization, averaged over all back-end nodes. The throughput and the service time are calculated as a function of the number of nodes in the cluster. For some results, the number of nodes is fixed to 10 and the size of the cache is varied. We note that the simulator does not account for the communication overhead between the nodes in the implementation of the CGDSF and the other competing algorithms. In the remainder of this section we present the results of the execution of the three prototypes on the log of the University of Saskatchewan [1] using the following assumptions: (a) the set of highly popular files is 8% of the total size and can be cached at most on all nodes, (b) no network congestion, and (c) negligible overheads for copying and transferring data between the application and kernel level and the document insertion and removal in and from the cache.
Fig. 6. Number of import files (10 nodes).
Figs. 4 and 5 show the results of local and cooperative cache hit ratio. For the local cache, LARD produces the highest hit rate value because every request is directly forwarded to the node that caches this requested document. However, Gobelins has no information about the content of the request and the requests are sent to the nodes in roundrobin fashion. This increases the probability of cache miss in case of getting a copy from local disk or from another node caching the document. In Weblins, the server has another third choice to redirect the request to another node and thus avoid the cache miss if the requested document is cached in a remote node or the document has a big size. In Fig. 5, LARD shows the lowest hit rate because it does not involve any cooperation between caches. Gobelins has the highest value because it caches all documents while Weblins does not cache the documents that have a big size, which leads to a cache miss in every access for such a document. Figs. 6 and 7 show the number of handoff requests and files imported for the three prototypes. Gobelins does not use the Handoff protocol, which increases the number of imported files between nodes. In contrast, LARD uses the Handoff protocol, which increases handoff requests. The handoff request is more expensive than getting a copy and especially for smaller files, but it is more adequate for bigger files. These characteristics influence the service time and the throughput given in Figs. 8 and 9. In these figures, the mixed strategy used by Weblins gives the best service time and throughput. Handoff Requests 500000 400000
98.6 98.4 98.2 98 97.8
LARD WEBLINS Gobelins
97.6 97.4
# of requests
Cooperative Cache Hit Rate (%)
Import Files
250000
80 60 40
# of files
Hit Rate (%)
Cache Hit Rate 120 100
17
300000
LARD WEBLINS
200000 100000 0
8 MB
16 MB Size of cache
Fig. 5. Cooperative cache hit ratio (10 nodes).
8 MB 16 MB Size of Cache
Fig. 7. Number of handoff requests (10 nodes).
18
A. Faour, N. Mansour / Advances in Engineering Software 37 (2006) 11–19
Throughput 2100 reqs/sec
Service Time
0.0075
2200
0.0074
2000
LARD
1900
WEBLINS Gobelins
1800 1700
0.0073
WEBLINS
0.0072
Web-CLRU
0.0071
1600 0.007
8 MB 16 MB Size of Cache
5
Service Time 0.009
Time (%)
0.0085 LARD
0.0075
WEBLINS Gobelins
0.007
15
Fig. 11. Service time (cache 8 MB).
Fig. 8. Throughput (Cache 16 MB).
0.008
10
0.0065 0.006 8 MB 16 MB Size of Cache Fig. 9. Service time (Cache 16 MB).
Figs. 10–13 compare the CGDSF used by Weblins cache replacement policy with CLRU used by Gobelins [4]. We implemented these two policies on the Weblins prototype. Results in Fig. 10 show that CGDSF can give a higher cache hit rate because it retains in the cooperative cache the documents that have the highest probability to be accessed in the future and due to its capability to construct the core files (i.e. the highly popular files) in all nodes. Fig. 11 shows the scalability of Weblins, since it shows a reduction in the service time as the number of nodes increases and thus it increases the throughput of the cluster. Figs. 12 and 13 show the importance of CGDSF in term of the construction of the core files. Fig. 12 shows a big difference in number of imported files between CGDSF and CLRU. For approximately 500,000 requests, the number of files imported between nodes can reach about 50% for the implementation of Weblins with CLRU, while this value is less than 10% using the CGDSF. This means that CGDSF can dynamically
replicate and preserve the highly popular files on all nodes and thus reduce the network and communication overhead, increase the cache hit rate and evenly distribute the load among the nodes. Fig. 13 gives the number of handoff requests. The CGDSF has the highest value because the files with low priority (usually big size files) are not replicated on all nodes, and at most one copy of each file is cached in the cluster. Every access for these files may need a handoff request. The difference in the number of handoff requests between CGDSF and CLRU is about 1000 requests (i.e. 5% of total requests). The total number of network data transfer (handoff and import files) for Weblins using CGDSF on a cluster of 15 nodes and for a working set of 500,000 requests has been found to be about 70,400 (approximately 14% of total requests), while this number is about 219,400 (approximately 44% of total requests) for Weblins using Import Remote Files 300000 250000 200000 WEBLINS 150000 Web-CLRU 100000 50000 0 5
10
15
Fig. 12. Imported files (size of cache 8 MB).
Handoff Requests 21500
Hit Rate 100
21000
80
20500
60
WEBLINS
20000
40
Web-CLRU
19500
20
WEBLINS Web-CLRU
19000 18500
0 5
10
15
Fig. 10. Hit rate (cache 8 MB).
5
10
15
Fig. 13. Handoff requests (cache 8 MB).
A. Faour, N. Mansour / Advances in Engineering Software 37 (2006) 11–19
CLRU. These results show that Weblins can reduce the network overhead by approximately 30% when it uses the CGDSF algorithm, and also increase the cache hit rate by about 30% (Fig. 10) in comparison with CLRU replacement policy. We believe that the throughput and service time for the Weblins prototype using the CGDSF can reach more significant values if the overhead of network communication and memory management are included in the simulation.
8. Conclusion In this paper, we present Weblins, a scalable clusterbased WWW server. Weblins includes a new request distribution algorithm that combines the features of content-aware request distribution and the mechanism proposed by its underlying cooperative cache system. Also, a new cache replacement policy and Web database storage are integrated into Weblins. Simulation results show that Weblins gives better throughput and overall performance in comparison with pure content-aware request distribution policy and cooperative caching mechanism.
Acknowledgements Part of this work was done by the first author at the IRISA, University of Rennes-1. We thank Christine Morin for her support of this work.
References [1] Arlitt M. A performance study of Internet web servers. Master Thesis, University of Saskatchewan; 1996.
19
[2] Cherkasova L. Improving WWW proxies performance with greedydual-size-frequency caching policy. Hewlett-Packard Laboratories, Technical Report. [3] Arlitt M, Williamson C. Web server workload characterization: the search for invariants. In Proceedings of the ACM SIGMETRICS Conference, May 1996, p. 126–137. [4] Lottiaux R. Gestion globale de la me´moire physique d’une grappe pour un syste`me a` image unique. PhD Thesis, Universite´ de Rennes1; 2001. [5] Sarkar AP, Hartman J. Efficient cooperative caching using hints. In: Second symposium on operating systems design and implementation; 1996, p. 35–46. [6] Dahlin MD, Wang RY, Anderson TE, Patterson DA. Cooperative caching: using remote client memory to improve file system performance. In First symposium on operating systems design and implementation; 1994, p. 267–280. [7] Cohen A, Rangaragan S, Slye H. On the performance of TCP splicing for URL-aware redirection. In: Proceedings of the USENIX symposium on internet technologies and systems, Boulder, CO; October 1999. [8] Zhang X, Barrientos X, Chen J, Seltzer M. HACC: architecture for cluster-based web servers. In Proceedings of the third USENIX Windows NT symposium; July 1999, p. 155–164. [9] Pai V, Aron M, Svendsen M, Drushel P, Zwaenepoel W, Nahum E. Locality-aware request distribution in cluster-based network servers. In: Proceedings of the eighth international conference on architectural support for programming languages and operating systems (ASPLOS VIII); October 1998, p. 205–216. [10] Cherkasova L. FLEX: load balancing and management strategy for scalable web hosting service. In Proceedings of the fifth international symposium on computers and communications (ISCC’00); July 2000, p. 8–13. [11] Aron M, Sanders D, Druschel P, Zwaenepoel W. Scalable contentaware request distribution in cluster-based network servers. In Proceedings of the USENIX annual technical conference; 2000. [12] Schroder T, Goddard S, Ramamurthy B. Scalable web server clustering technologies. IEEE Netw 2000;14(3):38–45. [13] Bryhni H, Klovning E, Kure O. A comparison of load balancing techniques for scalable web servers. IEEE Netw 2000;14(4):58–64. [14] Cherkasova L, Karlsson M.. Scalable web server cluster design with workload-aware request distribution strategy WARD. In the third international workshop on advanced issues of e-commerce and webbased information systems, WECWIS; 2001.