Performance Evaluation of DNS Based Load ...

25 downloads 7656 Views 234KB Size Report
DNS(Domain Name. Server) name caching, Random Early Detection Method and. Load Buffer Range Method have considerable effects on the load balance of ...
IJCST Vol. 2, Issue 1, March 2011

ISSN : 2229-4333(Print) | ISSN : 0976-8491(Online)

Performance Evaluation of DNS Based Load Balancing Techniques for Web Servers 1

Baldev Singh, 2O.P. Gupta, 3Simar Preet Singh

Dept. of Computer Science, Lyallpur Khalsa College, Jalandhar, India 2 SIT, PAU, Ludhiana, Punjab, India 3 University College of Engineering, Punjabi University, Patiala, India

1

Abstract The vast development of World Wide Web traffic has produced great interest in distributed web server systems. While considering architecture of distributed web servers, the DNSbased distributed system is an optimistic solution in context of scalability, performance, and availability. DNS(Domain Name Server) name caching, Random Early Detection Method and Load Buffer Range Method have considerable effects on the load balance of distributed web server systems. In this paper, we examine the various load balancing techniques for web servers and evaluate the performance of the distributed Webserver system.

URL-name to the IP address of a Web-server, the DNS of a distributed Web-server system can collect information from Web-servers for various statistics[22]. The DNS can select the address of a Web-server based on the collected information. In order to select the address of the suitable Web-server, the DNS could use some scheduling policy to balance the load among several Web-servers to avoid becoming overloaded.

Keywords DNS, Load balancing, Load buffer, Caching. I. Introduction Various web sites replicate information across independent or coordinated servers due to the rapid increase in Internet web traffic. Amongst various solutions, the distributed web server system is the one of most significant solution[8]. Because of augmented capability, distributed processing & information replication, the bunch of web servers could significantly increase the processing throughput of the system and reduce the response delay. There are basically two types of architectures for distributed web server systems – dispatcher based systems [7,3] and DNSbased systems. In the case of dispatcher-based system, http requests from clients that must reach a hardware point known as dispatcher, that may be a special switch (IP-dispatcher) or a front-end at the distributed web server. Then the dispatcher redirects each request to one of the web servers either by changing the destination IP address contained in the packet header or by establishing an additional connection between the dispatcher and the web server. Selected web server sends response to the http request to the client. The request may be directly or indirectly. In the case of DNS-based web server systems, the DNS translates the logical site name into the IP address of one of the web servers, also the clients communicate with web servers directly. This structural design can improve the system scalability, and is appropriate for geographically distributed web server systems. Although, the DNS-based system has a problem with load balancing. II. Distributed Web-server System The Web server system architecture consists of three entities: the client, the domain name server (DNS) and the Web-server. The distributed Web-server system can be organized into several Web-servers and a cluster DNS that resolves all initial address resolution requests from local gateways. Each client session can be characterized by one address resolution and several Web page requests. At first, the client receives the address of one Web-server of the cluster through the DNS address resolution. Subsequently, the client submits several HTTP requests to the Web-server. In addition to resolving the

  International Journal of Computer Science and Technology

Fig. 1: Structure of distributed Web-server system Many existing distributed Web-server systems assign the client requests arriving at the DNS in a round-robin manner among the Web-servers. The Round-Robin DNS policy is efficient in the system where the client requests from local gateways are uniformly distributed due to IP address caching mechanism at the client. Another approach to the DNS scheduling policy is to allow the DNS to select a Web-server from the cluster based on some load information from the Web-servers. The DNS can collect various kinds of data from the Web-servers such as history of server state, the number of active server connections or detailed processor loads. Most conventional load balancing schemes have used this kind of approach using the load information from servers[6]. Some simple strategies are discussed here in order to improve the performance of the distributed Web-server cluster system. In this paper, we focus on a simple scheme for collecting the load information from the Web-servers that are arranged in the multiple logical ring connections. The server load information is the CPU utilization over a short interval. III. DNS-based Load Balancing Load balancing is a term that describes a method to distribute incoming socket connections to different servers. Load balancing algorithms can be classified into three main classes: static algorithms, dynamic algorithms and adaptive algorithms[4]. Static algorithms decide how to distribute the workload according a prior knowledge of the problem and the system characteristics. Dynamic algorithms use state information to make decisions during program execution. w w w. i j c s t. c o m

ISSN : 2229-4333(Print) | ISSN : 0976-8491(Online)

Adaptive algorithms are a special case of dynamic algorithms. They dynamically change its parameters in order to adapt its behavior to the load balancing requirements [10]. A. Logical Ring Redirection For collecting the server load information, we arrange the Web-servers in the multiple logical rings as shown in Fig. 1. In this system, a new Web-server can be easily included in the logical ring by changing the IP address of the neighbor server. The Web-servers can cooperate to track and collect the server load in the distributed Web-server cluster system through the log-file about the average CPU load. The main concern is to identify the server with maximum CPU load and the server with minimum CPU load[20]. Fig. 2 shows the main software components needed to implement the distributed Web-server cluster system, that include the server load monitor, the load collector and the redirection module. The server load monitor tracks the CPU load of the server and sends this information to the neighbor server. The server load collector located on the local name server initiates the load collection process and determines the most heavily loaded server and the server with the least load and periodically provides the load information to the Web-servers. The redirection module determines if a client request has to be redirected and the destination Web-server based on the load information received from the load collector[21].

Fig. 2: Assortment of Load Information from Web-Servers. The main objective of this load collection process is to determine the most heavily loaded server and the server with the least load. The server load collector begins load collection process by placing initial load values and IP-addresses in a load collecting message and sending it to its neighbors in the logical rings. When a Web-server receives a load collecting message from its neighbor, it compares the values of the current maximum load and the minimum load with its own load value. If the arrived current maximum value is less than its own, then it replaces the current maximum load with its own. If the arrived current minimum value is greater than its own, then it replaces the current minimum with its own and sets the corresponding IP-address in the message. If the load collector receives the message from the neighbor server, then it broadcasts this message to all the Web-servers to announce the most heavily loaded server and the server with the least w w w. i j c s t. c o m

IJCST Vol. 2, Issue 1, March 2011

load. The load collector sends a load collecting message to its neighbors periodically. By arranging the Web-servers and the local name server in the multiple rings, the load collection process can be continued in case of failure in a single Webserver. There is a technique to redirect a client request such as HTTP redirection that allows a Web-server to respond to a client request with a 301 or 302 status code in the response message. We use HTTP redirection for reassigning the client request on the Web-server with the least CUP load. This server load collecting technique will alleviate the number of messages between the load collector and the Web-servers. B. Load Buffer Range Method In the use of DNS-based load balancing method, the DNS server divides the load among servers in a round-robin manner, and the service server periodically sends its load status to the DNS server. Based on the load data collected from the web servers, the DNS server can skip the overloaded ones when dispatching requests[1]. There is basically no direct geographical relationship between the DNS server and web servers, the web server should not send its state information to the DNS server too often so as to avoid congesting in the network or wasting network bandwidth[5]. For this reason, a conventional method defines a load buffer range (LBR) with low and high thresholds for each web server. Fig. 2 shows the state transition diagram of the LBR example. In Fig., before the load of a web server exceeds 90% (high threshold), the server is not overloaded. That is, the DNS server can assign new client requests to that web server. Once the load of that web server is greater than 90%, it enters into the overloaded state. A web server in overloaded state notifies the DNS server not to assign new client requests to that web server until its utilization return under 70% (low threshold).

Fig. 2: State transition diagram of Conventional LBR Method.

Fig. 3: State change of Conventional LBR Method

  International Journal of Computer Science and Technology 

IJCST Vol. 2, Issue 1, March 2011

The probability of the overloaded state against to the server load is shown in Fig. 3. In this method, when there are not many service servers and the amount of requests is high, once one of the service servers is overloaded, it must keep its overloaded state until its load is under 70% and then notify DNS server to assign new client requests to that web server. During this period, the other web servers may need to share the additional 20% (90%-70%) load from that overloaded server. This may in turn cause other web servers to become overloaded, and so on, resulting in unstable service quality. C. Random Early Detection Method For solving the load oscillation phenomenon of web servers mentioned previously, we consider that the state of overload or under-load of a web server in the load buffer range should be a probability rather than definite, in order to avoid burdening the other web servers with too much load. Hence, we use the concept of random early detection (RED) method to determine the overload status of web servers probabilistically. The RED idea is first presented in [11] for congestion avoidance in packet-switched networks. When the average queue size exceeds a preset threshold, the gateway drops or marks each arriving packet with a certain probability, where the probability is a function of the average queue length. It puts emphasis on avoiding the TCP global synchronization that results from each connection reduces the window to one and goes through Slow-Start in response to a dropped packet at the same time. In [11], the RED gateway calculates the average queue size, which is compared to a minimum and a maximum threshold. When the average queue size is less than the minimum threshold, no packets are dropped. When the average queue size is greater than the maximum threshold, every arriving packet is dropped. When the average queue size is between the minimum and maximum thresholds, each arriving packet is dropped with probability pa, where pa is a function of the average queue length. Applying the RED idea here in the context of DNS-based load balancing, the probability of a web server becoming overloaded is directly proportional to its current load. A line chart example of the probability of a web server becoming overloaded is shown in Fig. 4.

ISSN : 2229-4333(Print) | ISSN : 0976-8491(Online)

be overloaded. Finally, when the load of a service server is between 70% and 90%, the probability of its state becoming overloaded is proportional to its current load. D. Page Caching M. Colajanni et al. in his work, considered DNS name caching, and compared a number of load balancing algorithms for DNS based distributed web server systems [12]. They concluded that taking advantage of both domain information and web server load information would lead to best performance among various strategies. However, they did not consider the effects of page caching on the performance of web server systems. Besides name caching, there is also page caching in the Internet. Page caching takes advantage of reference locality, and could reduce network traffic and possibly reduce latency[2]. Moreover, it is possible to further improve the performance of load balancing in a DNS-based distributed web server system by utilizing suitable page caching schemes. In this paper, we investigate the effects of some page caching strategies on the load balancing of DNS-based systems. Although page caching could reduce the traffic load of web server systems, but it is unlikely that simple page caching could greatly improve the load balance, since statistically hit ratios for requests from different clients should be roughly equal[11]. There are two ways to further improve the page cache performance. One is to find a better caching policy, and the other is to use prefetching. Prefetching is a technique that browsers/proxies/caches/servers predict user preference and request pages before the user actually requests those pages. However, straightforward prefetching approaches may have serious negative performance effects on networks, such as severe traffic bursties and queueing effects [13]. Moreover, there are several negative factors that may affect log-based prefetching and could cause wrong prediction [14,15]. Due to these difficulties in prefetching, there is an interest in taking advantage of page access statistical information instead of page access patterns [16]. Studies [17,18] show that page accesses are non-uniformly distributed. It is obvious that caching frequently accessed pages is more efficient than caching seldom accessed pages. IV. Conclusion Web-server can be placed in geographically de-centralized area in DNS-based load balancing architecture. In this paper we examined various techniques for a DNS-based distributed Web-server system and summarized their load balancing performance. This paper concludes that various load balancing techniques in distributed web-server systems like name caching, Random Early Detection Method and Load Buffer Range Method have significant effects on the performance optimization of distributed web server systems.

Fig. 4: Change of State of RED method In the above example, the minimum threshold is 70% and the maximum threshold is 90%. When the load of a service server is less than 70%, its state should be under-load. When the load of a service server is greater than 90%, its state would

  International Journal of Computer Science and Technology

References [1] Chih-Chiang Yang, Chien Chen, and Jing-Ying Chen, ‘Random Early Detection Web Servers for Dynamic Load Balancing’, IEEE 2009 [2] M. Abrams, C. R. Standridge, G. Abdulla, S. Williams, and E. A. Fox. Caching proxies: Limitations and potentials. Proceedings of the 4th International WWW Conference, Boston, MA, Dec. 1995. [3] G. Apostolopoulos, D. Aubespin, V. Peris, P. Pradhan, and D. Saha. Design, implementation and performance of a content-based switch. Proceedings of INFOCOM’00, Tel w w w. i j c s t. c o m

ISSN : 2229-4333(Print) | ISSN : 0976-8491(Online)

Aviv,, March 2000. [4] M. F. Arlitt and C. L.Williamson.Web server workload characterization: The search for invariants. IEEE/ACM Trans. On Networking, 5(5), Oct. 1997. [5] M. Colajanni, P. S. Yu, V. Cardellini, "Dynamic Load Balancing in Geographically Distributed Heterogeneous Web Servers," In Proceedings of Int'l Conf. Distributed Computing Systems, pp. 295-302, May 1998. [6] V. Cardellini, M. Colajanni, and P. S. Yu. Dynamic load balancing on web-server systems. IEEE Internet Computing, 3(3):28–39, May-June 1999. [7] E. Casalicchio, V. Cardellini, and M. Colajanni. Contentaware dispatching algorithms for cluster-based web servers. Cluster Computing, Kluwer Academic Publishers, 5(1):65– 74, Jan. 2002. [8] M. Colajanni, P. S. Yu, and D. M. Dias. Analysis of task assignment policies in scalable distributed web-server systems. IEEE Trans. on Parallel and Distributed Systems, 9(6), June 1998. [9] Shi, W. Macgregor, M.H. Gburzynski, P. “Load Balancing For Parallel Forwarding” Networking, Ieee/Acm Transactions On, Aug. 2005. [10] Mohammed Aldasht, Julio Ortega, Carlos G. Puntonet; Antonio F. Diaz, “A Genetic Exploration of Dynamic Load Balancing Algorithms”, IEEE 2004. [11] Zhong Xu, Rong Huang, Laxmi N. Bhuyan, “Load Balancing of DNS-Based Distributed Web Server Systems with Page Caching”, IEEE. [12] M. Colajanni, P. S. Yu, and D. M. Dias. Analysis of task assignment policies in scalable distributed web-server systems. IEEE Trans. on Parallel and Distributed Systems, 9(6), June 1998. [13] M. Crovella and P. Barford. The network effects of prefetching Proceedings of INFOCOM’98, San Francisco, CA, March 1998. [14] B. D. Davison. Web traffic logs: An imperfect resource for evaluation. Proceedings of the ninth Annual Conference of the Internet Society (INET’99), San Jose, CA, June 1999. [15] A. Nanopoulos, D. Kastasaros, and Y. Manolopoulos. Effective prediction of web user accesses: A data mining approach. Proceedings of the Workshop WEBKDD 2001: Mining Log Data Across All Customer Touch-Points, San Francisco, CA, Aug. 2001. [16] A. Venkataramani, P. Yalagandula, R. Kokku, S. Sharif, and M. Dahlin. The potential costs and benefits of long-term perfetching for content distribution. Technical Report TR01-13, Dept. of Computer Science, Univ. of Texas at Austin, USA, June 2001. [17] M. F. Arlitt and C. L.Williamson.Web server workload characterization: The search for invariants. IEEE/ACM Trans. On Networking, 5(5), Oct. 1997. [18] L. Breslau, L. F. Pei Cao, G. Philips, and S. Shenker. Web caching and zipf-like distributions: Evidence and implications. Proceedings of INFOCOM’99, March 1999. [19] Chih-Chiang Yang, Chien Chen, and Jing-Ying Chen, “Random Early Detection Web Servers for Dynamic Load Balancing”, IEEE 2009. [20] Y. S. Hong, J. H. No and S.Y. Kim, “DNS-Based Load Balancing in Distributed Web-server Systems”, IEEE 2006. [21] Hiroshi Yokota, Shigetomo Kimura, Yoshihiko Ebihara, “A Proposal of DNS-Based Adaptive Load Balancing Method for Mirror Server Systems and Its Implementation”, IEEE w w w. i j c s t. c o m

IJCST Vol. 2, Issue 1, March 2011

2004. [22] A. Shaikh, R. Tewari, and M. Agrawal, "On the Effectiveness of DNS-based Server Selection," In Proceedings of IEEE INFOCOM 2001, pp. 1801 1910, Anchorage, USA, April 2001. Baldev Singh is currently working as Asst. Professor at Post Graduate Dept. of Computer Science & IT, Lyallpur Khalsa College, Jalandhar, Punjab. He has more than 14 years of teaching experience. He has also worked on UGC sponsored minor Research Project. He has presented more than 15 papers in various national and international conferences in India and abroad. His areas of interests are Parallel and Distributed Computing. Dr. O P Gupta, an alumni of PAU, Ludhiana, Thapar University, Patiala and GNDU, Amritsar has demonstrated his intellectual, interpersonal and managerial skills in various domains. He is the winner of PAU Meritorious Teacher Award for 2009-10. Having vast industrial experience of working in IT industry with the role of Project Leader and Project Manager, currently he is Associate Professor of Computer Science and Deputy Director, School of Information Technology at PAU, Ludhiana. His area of interests include Parallel and Distributed Computing, Grid Computing for Bioinformatics, Network Testing and Network Management. Along with being a committed teacher and a passionate researcher, he is actively involved in social activities. Simar Preet Singh, a Student of M.Tech (Computer Engineering) from University College of Engineering, Punjabi University, Patiala is also a Microsoft Professional. Apart from this he is also having certifications like Microsoft Certified System Engineer (MCSE) and Core Java. He had also undergone training programme for VB.Net and Cisco Certified Network Associates (CCNA). He has presented many research papers in various national and international conferences in India and abroad. His areas of interests include Database, Network Security and Network Management.

  International Journal of Computer Science and Technology 

Suggest Documents