the impact of cluster-based web systems design on

8 downloads 0 Views 40KB Size Report
Cluster-based web systems, load balancing, user-perceived performance. ... (1 access to retrieve the HTML file and n accesses to retrieve its n embedded resources). ... and decides to which server they should be allocated, working either as Layer-4 .... network usage to make sure the network does not cause a bottleneck.
THE IMPACT OF CLUSTER-BASED WEB SYSTEMS DESIGN ON USER-PERCEIVED PERFORMANCE Leszek Borzemski Institute of Control and Systems Engineering Wroclaw University of Technology Wybrzeze Wyspianskiego 27, 50-370 Wroclaw, Poland

ABSTRACT As the Internet has become more pervasive, many research works have to relay on it. Mostly, an individual access for information is only required and a best effort way of providing such service may be quite enough. However in several computing projects that have been established nowadays, there is urgent need for good and scalable Internet performance as perceived by end users. Web users perceive good Internet performance as low latency, high throughput and high availability. Almost 40% latency, as perceived by users, is causing on a Web site side, therefore, there is a general consensus that intensive research in Web site design is needed. This paper discusses the impact of cluster-based Web systems design on user-perceived performance and presents results from empirical performance studies carried out for cluster-based Web systems. KEYWORDS

Cluster-based web systems, load balancing, user-perceived performance.

1. INTRODUCTION Web users perceive good Internet performance as low latency, high throughput and high availability. Understanding and identifying the sources of the performance problems are very important issues for Internet designers. It is never been easy to determine whether slow responses are due to network problems or endsystems problems on both sides, i.e. user and server sides. The actual improvements of the response time as perceived by the users come from a combination of network and server technologies. Obviously, the simplest and unquestionable way of overcoming Internet performance problems is to increase the network bandwidth. But because almost 40% latency, as perceived by users, is causing on a Web site side; therefore, there is a general consensus that intensive research in Web site design is needed [Cardellini et al. 2002]. Web providers make efforts to solve the problem of “slow responses” by development of new Web sites architectures that can scale Web site capacity dynamically to match aggregate user demand while ensuring continuous service availability. Recent studies show that this direction is the most usable and is commonly acceptable by the end-users because better performance may be achieved without changes in their local infrastructures. Generally, two such architectures can be considered, namely geographically distributed and locally distributed. The former is mainly considered as the architecture for the Internet-wide content providers. The latter is built around a cluster of Web servers distributed on a local area network that is driven by a centrally localized Web switch. Web servers in a cluster work collectively as a single Web resource in the network. The Web switch has ability to distribute user requests among Web servers in a cluster. It employs a dispatching algorithm to achieve required level of performance and availability of a Web service. This paper discusses the impact of cluster-based Web system design on user-perceived performance and presents results from empirical performance studies carried out for cluster-based Web systems with IBM’s SecureWay Network Dispatcher as a Web switch [IBM 2002]. We evaluate the cluster-based Web site infrastructure based on IBM RS/6000 43P computers. The rest of this paper is organized as follows. Background and related work is discussed in Section 2. In this section we also discuss how a user standpoint performance view can be introduced in the design of a

642

THE IMPACT OF CLUSTER-BASED WEB SYSTEMS DESIGN ON USER-PERCEIVED PERFORMANCE

cluster-based Web site design. In Section 3 the testbed is presented. Section 4 presents performance measurements we have made. Finally, concluding remarks appear in Section 5.

2. BACKGROUND Although a client issues one request at a time for a Web page, he or her usually causes multiple client-toserver interactions because retrieving one Web page will impose, on the average, 1+n accesses on the server (1 access to retrieve the HTML file and n accesses to retrieve its n embedded resources). All these interactions are refereed to us as URL requests, or simply as requests. The Web switch handles the requests and decides to which server they should be allocated, working either as Layer-4 or Layer-7 switch [Bourke 2001]. Layer-4 switch knows only information about TCP/IP connections used to transport user requests and responses, whereas Layer-7 switch may use such information, as URL, in addition to layer 2, 3, and 4 information. Therefore Layer-7 switch is content-aware and it can effectively support user-perceived quality. Then Layer-4 Web switch we may use a virtual IP address to distribute Web traffic to Web servers based on IP addresses, whereas a Layer-7 load balancing configuration enables the Web switch to use a virtual IP address to distribute Web traffic to Web servers based on Universal Resource Locators (URLs). The Web switch can use different performance strategies. Among them the most popular are load balancing and load sharing. Both are deployed focusing on site’s point of view, thus they are aimed at site’s optimal resource utilization. Load balancing is aimed at equalization the load among all servers in a cluster, while load sharing algorithms attempt to smooth out transient peak overload periods on some cluster nodes. Undoubtedly, there is the need for strategies that would be user-perceived in the sense that they would differentiate the requests to provide the guaranteed service for some premium users while for non-premium users the service would be as only at the best effort basis, so far. It is expected that in the near future such Web service will be available, especially for the needs of mission-critical e-commerce users [Bhatti 2000]. Such service is also impatiently awaited in the context of scientific Web computing, e.g. in grid projects. But such differentiated services might discard some user requests [Bhatti and Friedrich 1999]. A step further is to propose dispatching algorithms based on the user-oriented performance measures such as the response time of a request, which represents the interval that elapses from the arrival of the request until its completion at the system. Layer-7 switches give such possibility. Since they use client-aware dispatching algorithms they can dispatch requests in such a way to ensure the least response time for each individual request arriving the Web switch. The request response times cannot be a priori known, but can be estimated. For instance, the estimation of request response time, and appropriate task’s service demand, may be done based on object’s type or size. Because the Web server’s performance has an important impact on the Web site, several researchers have studied factors that influence the performance of the Web, and different ways to improve it have been proposed. Methods, techniques and algorithms for request dispatching in cluster-based Web systems are described in the most recent survey in the area [Cardellini et al. 2002]. User-perceived performance Web architectures are extensively studied in [Casalicchio 2001], whereas in [Bhatti and Friedrich 1999] different algorithms for scheduling of incoming requests by a Web server are discussed. Authors show that the performance is increasing when an earliest deadline first policy is used.

3. THE TESTBED 3.1 IBM SecureWay Network Dispatcher IBM SecureWay Network Dispatcher (or Network Dispatcher) is the software implementation of Layer-4 and Layer-7 Web switch technology [IBM 2002]. Network Dispatcher is a front-end balancer for clusterbased Web systems. It is implemented as the software running on several operating platforms, including AIX on RS/6000s and Windows NT/2K on PCs. Network Dispatcher improve performance of a Web site by distribution of incoming requests among a group of servers forming a cluster. It can support locally distributed cluster-based Web systems as well as distributed Web systems. The Network Dispatcher consists of the following key software components that can be used individually or together: (i) Dispatcher, which is the main component responsible for load balancing in local and wide networks; (ii) Interactive Session

643

IADIS International Conference WWW/Internet 2002

Support (ISS), which distributes requests locally and geographically based on DNS-based dis patching approach. ISS can be also exploited by the Dispatcher to build its knowledge about loads of servers and (iii) Content Based Routing (CBR), which routes requests based on the URL requested. In this study we used mainly the Dispatcher component, which we briefly introduce here. When Dispatcher receives the packet sent to the cluster, it checks which server is the next best server in the cluster to handle the load and sends the packet to that server, changing the destination hardware address of the packet by the hardware address of the network adapter (i.e. MAC address) of the chosen server. Then the Web server receives the packet and responds directly to the client. In experiments we use the weighted round-robin load balancing algorithm [Bourke 2001]. Round-robin policy exploits information on past dispatching decision and uses a circular list and a pointer to the last selected server to make dispatching decisions, that is, if S i was the last chosen node, the new request is assigned to Si+1 , where i+1=(i+1)mod S, where S is the number of server nodes. Weighed round-robin policy is taking dynamically evaluated weights that are proportional to the server load state. Therefore a server with weight, for example 10, will be assigned twice as many requests as the server with weight 5. Typical metrics for load are the number of new and active TCP connections for each server or utilization of server resources (metrics supported by IIS). The Dispatcher supports rather a load sharing solution and uses for that as simple dispatching mechanism as possible. But it also includes a Manager component that can be configured to provide load information to the Dispatcher. Without the Manager we have a static weighed round-robin policy. An administrator assigns weights to each server. The administrator can change them only manually. No load information from the servers is given to the Dispatcher. Also in that configuration the Dispatcher does not automatically recognize if server is no longer available. This information should be given manually. Adding the Manager functionality allows information regarding the load of the servers to factor dynamically into the weighting mechanism, giving a more accurate load balancing scheme. Then the least loaded server is the most likely choice. In calculating the weight the Manager can take input from different sources, including so-called advisors and ISS. The advisor runs on the Dispatcher machine and sends requests to the servers to measure actual response time for a particular protocol. The results are used by the Manager to adjust the load balancing weights. By option of the smoothing index we can influence how quickly the weights can change. Based on the smoothing index value, the Manager will change the servers’ weights more or less quickly. We use recommended value 5 seconds what results in rather slowly weights modification process to prevent against possible oscillate effect in the way the requests are load balanced. Also it is possible to set the proportions of importance settings for each of the policies to suit the environment. We can set the proportions of importance for each of the policies as ndmp = (active connections, new connections, advisors, input from ISS). For example, the setting ndmp = (25 25 25 25) means that all policies will contribute 25% in the weighting process. The Manager can use in calculation of the new weights the information sent by the ISS component about the servers’ performance according to a specific metric. The ISS monitor collects server load information from the ISS agents running on the individual servers and forwards it to the Dispatcher. We can configure ISS to suit our server environment and evaluate CPU utilization, taking into account such factors as the total number of users, types of access.

3.2 Methodology The testbed configuration is built around switched LAN equipped with IBM 8273 High Performance Switch (10/100 Mbps) and IBM 8275 workgroup switch. Twelve identical Intel Celeron 300Mhz PCs with 32 MB SDRAM, 3 GB disk drive, and 3COM EtherLink XL (10/100) network card, running MS Windows NT 4.0 Workstation with Service Pack 5.0 are connected to IBM 8273 ports as a Web traffic generation and monitoring platform. As we use the WebBench, one PC is solely used as the WebBench controller, while other PCs are running WebBench’s clients, each in three-threaded mode. IBM 8273 is uplinked to IBM 8275, which is a LAN switch used for forming a cluster. The cluster-based Web server is built on the basis of three IBM RS/6000 43P Model 260 (1-Way system with RISC PowerPC Power3 200 MHZ, 256 MB RAM, Ultra Wide SCSI 4.3 GB disc drive, integrated 10/100 network controller) workstations running AIX 4.3 and Apache for AIX. The fourth computer connected to IBM 8275 is the IBM RS/6000 43P Model 260 (2-Way SMP system with RISC PowerPC Power3 200 MHZ, 256 MB RAM), which is along with AIX 4.3 and Network Dispatcher V2.1.

644

THE IMPACT OF CLUSTER-BASED WEB SYSTEMS DESIGN ON USER-PERCEIVED PERFORMANCE

The synthetic workload generator in our study is Ziff Davis Media’s Web benchmarking licensed freeware WebBench 3.0 [eTesting Labs 2002]. WebBench is one of most popular and industry accepted free benchmark program. We use a version available to the general public on the Web. We decided to use WebBench because we wanted to evaluate raw Web server performance under stress tests. WebBench is a complete benchmark program that measures the performance of Web servers (software and hardware). It uses client PCs to simulate Web browsers. However, unlike actual browsers, the clients don't display the files that the server sends in response to their requests. Instead, when a client receives a response from the server, it records the information associated with the response and then immediately (think time can be set) sends another request to the server. The controller sets up, starts, monitors and stops the tests . When the tests end, the clients send their results to the controller, which provides a report focusing on two measures of performance: requests per seconds and bytes per second transmitted from the cluster-based Web server to clients. The controller, unlike the clients, does not affect the server’s overall score. The clients issue HTTP GET requests to the server. WebBench supports the HTTP/1.1 features persistent connections and pipelining. In our experiments WebBench uses static test workload profiles only based on HTML and GIF files. Our basic methodology is to make cluster-based Web systems measurements on the effect of different HTTP protocol options for the workload profiles generated by WebBench. We also evaluate the cluster performance for different dispatching policies that may be configured by an administrator in Network Dispatcher setup. We show as well how cluster operation can fail when some bad configuration setup is done. Additionally to the measurements done by the WebBench benchmarking platform, we evaluate the CPU idle times of the servers (using Unix command vmstat). Tests are carried out according to WebBench’s test methodology. WebBench provides two overall server scores: requests per second and throughput as measured in bytes per second. The workload is generated by a maximum of 11 clients. We also monitored network usage to make sure the network does not cause a bottleneck. The network, fortunately, was not the cause of a bottleneck during the measurement experiments. We examine the following testbed configurations: (i) Web server based on a single 1-way workstation; (ii) Web server based on a single 2-way workstation; (iii) cluster-based Web server on two 1-way workstations; (iv) cluster-based Web server on three 1-way workstations. Each Web server in a cluster configuration has own replica of the standard workload WebBench tree. We use the standard test suite ZD_STATIC_V30.TST to test HTTP/1.0 traffic. In testing HTTP/1.1 traffic we examine HTTP/1.1 persistent connections feature. When enabling persistent connections the client maintains its connection to the server for multiple requests instead of disconnecting each time it receives a response from the server. The advantage to using persistent connections is that there is less overhead associated with each request because the client doesn't connect or disconnect on every request. Another HTTP 1.1 feature that has an effect on requests is pipelining. Then the client bundles several requests together and sends them to the server in a batch instead of sending them to the server one request at a time. We do not test this feature. For HTTP/1.1 and mixed traffic the standard test suite ZD_STATIC_V30.TST is modified to allow 100% or 50% HTTP/1.1 traffic. In both cases, the minimum and maximum number of requests per persistent connection are set as 1 and 20, respectively. HTTP/1.0 and HTTP/1.1 tests are performed for the following settings: (i) HTTP/1.0 traffic only; (ii) HTTP/1.1 traffic only and (iii) 50% HTTP/1.0 and 50% HTTP/1.1 mixed traffic. The impact of Dispatcher setup on Web cluster performance is examined for the following configurations: (i) Standard – weights in policy are dynamically calculated on the basis of open and new TCP connections, ndmp=(50 50 0 0); (ii) Static – all servers have the same permanently set weight, ndmp is not defined; (iii) Advisor-aware – weights are solely calculated based on server information obtained from HTTP advisor and from ISS, ndmp=(0 0 50 50) and (iv) Advanced – weights are calculated based on all information which can be available to the Dispatcher, ndmp=(25 25 25 25). In the advisor-aware and advanced configurations, the ISS agents (daemons) collect information about the percentage of the CPU that is being utilized and the number of active processes in the system. CPU usage measurement method is provided by the system whereas the number of processes running on the server is given by the system command ps that when executed, returns this information. Both metric functions are to be minimized, and Monitor utilizes them with the importance weight 2.0 and 1.0, respectively. ISS periodically every 5 seconds contacts all servers to collect this information.

645

IADIS International Conference WWW/Internet 2002

4. PERFORMANCE RESULTS

1500

2000

Advanced

0

1000

Advisor-aware

1000

3-server cluster

Static

500

2-server cluster

Standard

Requests [req / sec]

2-Way server

2000

1-Way server

1

0

Requests [req / sec]

We carried out a number of performance measurement experiments. Due to space limitations, only a fraction is presented here. Figure 1 summarizes the performance of the Web servers’ testbed configurations under HTTP/1.1 workload. Cluster-based Web servers are configured in standard Dispatcher configurations. The maximum request rate is about 1616 requests/second at almost 10 MB/second throughput achieved for 3server Web cluster. HTTP/1.1 yields significant improvement, at average 38%, in performance, in comparison with HTTP/1.0. The results confirm the performance improvement expected after applying persistent connections.

1

2

3

4

5

6

7

8

9

10

2

3

4

11

5

6

7

8

9

10

11

Number of clients

Number of clients

Figure 1. Performance of testbed configurations for HTTP/1.1 workload

server_1

Advanced

70 60 50

1000

2000

Advisor-aware

1500

Static

server_2

server_3

500

40 30 20 10

0

Requests [req / sec]

Standard

Figure 2. Performance of Dispatcher configurations for HTTP/1.0 workload

1

2

3

4

5

6

7

8

9

10

11

Number of clients

Figure 3. Performance of Dispatcher configurations for HTTP/1.1 workload

0 1

2

3

4

5

6

7

8

9

10 11

Figure 4. Average %CPU utilization v. number of clients for 3-server cluster and HTTP/1.0 workload in Advisor-aware configuration

It is quite obvious that the Web server based on a single 1-way workstation achieves the worst performance but it is worthwhile to notice that the 2-way Web server outperforms the others for all HTTP/1.0 loads, even comparing with 3-server cluster. In the case of HTTP/1.1 workload, the 3-server cluster outperforms the 2-way machine but only when is heavily loaded (i.e. for more than 5 clients). However, in the case of HTTP/1.0 protocol, the 2-way Web server is fully utilized (100% CPU utilization) for all loads greater than 8 clients, whereas in the case of HTTP/1.1 this saturation state is already observed for 6, and of course for more clients. On the contrary, in both 2-server and 3-server Web clusters none of servers exceeds 85% CPU utilization. In this way, the clusters reach saturation state earlier, without making the most of aggregate cluster theoretical capacity. Such drawback effect of the standard configuration can be explained after studying detailed log information showing how the CPU usage is changing in time. The CPU utilization is measured every second for each server. The log shows that every 5 seconds, one of the servers is fully loaded whereas the other or two others is or are underloaded. This is a consequence of the dispatching mechanism used by the Dispatcher in standard configuration. It calculates new weights every 5 seconds based only on information about TCP connections and is not fully server load aware. This effect is less

646

THE IMPACT OF CLUSTER-BASED WEB SYSTEMS DESIGN ON USER-PERCEIVED PERFORMANCE

significant for HTTP/1.1. Then the Dispatcher may assign requests along long-lasting active connections and the overhead of the whole load balancing process is kept to a minimum. Figures 2 and 3 show the results for various Dispatcher configurations for 3-server cluster. The advanced configuration outperforms any other configuration, especially when is used for servicing HTTP/1.1 traffic. Then the projected request rate is about 1800 requests/seconds and the throughput more than 11 MB/sec. Another interesting result is the following. The measurements show that static policy is much better than standard default policy, and performs only a bit worse than the sophisticated advanced configuration. We suppose, that this is an effect of how WebBench tests the Web site. Namely, the biggest file is only 529 KB. It is too small size to cause long lasting connection that are be better supported by the Dispatcher in standard than in static configuration. In our experiments only the advantages of static policy are displayed. It is a simple and fast policy but it has other drawbacks. For instance, static policy can assign next connections to target server that handles a long file transfer requested earlier, or assign request to failed server. The effect of incorrect system’s settings is showed in the case of advisor-aware configuration that works badly when the workload increases above some level, which is some kind of a misleading cluster saturation point. Usually, when the workload gradually increases, saturation phenomenon occurs that causes a reduction of system throughput due to too much system overheads or system resource exhausting. In our case this behavior is observed at about 50% CPU utilization at average. Then as workload gradually increases, the performance decreases in the case of HTTP/1.1 traffic, and totally oscillates for HTTP/1.0 load. There can be several reasons of that. We examined CPU utilization servers’ logs. We have found that the load is evenly balanced among servers up to 50% CPU utilization level (5 or 7 clients for HTTP/1.1 or 1.0, respectively) and the cluster works well reaching the best performance at this point (Figure 4 shows that behavior for 3server cluster and HTTP/1.0 traffic). But after this point, individual server loads are changing very much and the cluster cannot obtain well-balanced load sharing. Probably the Dispatcher receives non-coherent or even contradictory load information and is not able to decide correctly how to share the load. Obviously, 50% CPU utilization does not show that the server is overloaded but due to bad information from HTTP advisor the situation is wrong in Dispatcher’s opinion. Such misunderstanding causes that after that point the cluster is saturated. As an antidote, we can lower the advisor proportion as it is recommended in [Sadtler et al. 1999].

5. CONCLUSIONS In this paper we studied how design alternatives of cluster-based Web system may influence user-perceived performance. We experimented with different settings of IBM SecureWay Network Dispatcher. Any performance data contained herein was determined in a controlled environment. Actual results in other environments may vary. However, specific implementation examples and measurements studied here can give site administrators a better understanding of technology involved.

REFERENCES Bhatti, N. and Friedrich, R., 1999. Web server support for tiered services. IEEE Network, Vol. 13, No. 5, pp. 64-71. Bhatti, N. et al., 2000. Integrating user-perceived quality into Web server design. Computer Networks, Vol. 33, No. 16, pp. 1-16. Bourke, T., 2001. Server Load Balancing. O’Reilly & Associates, Inc., Sebastopol, CA, USA. Cardellini, V. et al., 2002. The state of the art in locally distributed Web-server systems. ACM Computing Surveys, Vol. 34, No. 2, pp. 263-311. Casalicchio, E. and Colajanni, M., 2001. A client-aware dispatching algorithm for Web clusters providing multiple services. In: Proceedings of 10th International World Wide Web Conference, Hong Kong, pp. 535-544. eTesting Labs, 2002. WebBench. http://www.etestinglabs.com. WebBench? is a trademark of ZD Publishing Holdings Inc., an affiliate of eTesting Labs Inc. in the U.S. and other countries. IBM, 2002. IBM Network Dispatcher. http://www.ibm.com. Sadtler, C. et al., 1999. Load balancing for eNetwork communications servers. IBM Redbook SG24-5305-00, IBM Corp. ITSO, Research Triangle Park, NC, USA.

647

Suggest Documents