A Modified Genetic Algorithm based Load Distribution Approach towards Web Hotspot rescue Debashree Devi1, Y. Jayanta Singh2 1
Department of Computer Science and Information Technology, Assam DonBosco University, Guwahati, India Email:
[email protected] 2 Department of Computer Science and Information Technology, Assam DonBosco University, Guwahati, India Email:
Abstract— Web hotspot is a serious problem often experienced in case popular websites. It provides dramatic load spike in a website, which occurs when a huge number of users accessing the same website. A prominent solution to this problem is server load balancing. Dynamic load balancing involves allocation of requests to the server or processor dynamically when they arrive. For effective load balancing, a near-optimal schedule of incoming requests or processes must be determined “on-the-fly”, so that execution of requests can be completed in shortest possible time. So we have proposed a Genetic Algorithm based load balancing scheme which relies on a process scheduling policy. Genetic Algorithm provides to search for the optimal solution out a search of candidate solutions. It follows the survival-of-the-fittest principle, to achieve the optimal solution, through a number of generations. The proposed algorithm is evaluated for various population size and number of generations, to maximize the processor utilization of nodes/ processors in the system. Index Terms— Dynamic Load Balancing, Genetic Algorithm, Server load balancing, Web hotspot.
I. INTRODUCTION With the rapid increase in the no. of internet users, it is obvious for a website to get a millions of hit per day. For popular websites, e.g. social networking website, online audio/ video playing website etc, this rapid increase in load may cause serious problem. Addition to this, the rapid development of internet applications, diversifies the services offered by popular websites. These services are real-time and dynamic. Hence handling of all these requests by one single server will lead to a situation of overloading. Technically such a situation can be termed as “web-hotspot”. Situations like webhotspot generally stay for a very short period of time [1]. But it can seriously degrade the performance of a website. The use of a high performance system as a solution would be very costly. We can use a flexible web server system, which is scalable with the changing load in the website. But it also costs a high amount of money, as it leads to more hardware requirements. The concept of load balancing is not that much old. In 1995, when the Internet was first introduced, it was only used for some academic purposes. But as soon as it was introduced in to the business world, people started to use the internet for various tasks. With increasing number people accessing the internet, number issues have to take care in order to provide good service to the customers. This is where the concept of load balancing lies.
DOI: © Association of Computer Electronics and Electrical Engineers, 2013
Load balancing can be defined as a form of system performance evaluation, analysis and optimization, which is used to distribute the load constantly assigned to a single server, across a network of processing elements or servers, so as to equalize the load among the servers, at any point of time. A most commonly used load balancing technique is DNS Round Robin, a DNS-based load balancing process. This technique provides a function to associate more than one IP address to a single hostname, as shown in Fig.1[17]. For e.g. , the hostname, www.vegan.net, is associated with multiple IP addresses, provided to distribute the traffic evenly among the IP addresses. However DNS Round Robin ended up with limitations like caching issues, traffic distribution etc. Nowa-days, the SLB (server load balancing) process is quite effective in context to solve problems like redundancy, scalability and server management. SLB generally comes with components like VIP (virtual IP address), server, user access levels, redundancy, persistence, service checking, load balancing algorithms etc. Load balancing algorithms are mathematically programmed into SLB device. They are assigned to individual VIPs. There is a number of load balancing algorithm, those can be categorized as global or local, static or dynamic, centralized or distributed etc. In our proposed system, the concept of SLB is implemented, with application of an optimization algorithm, namely Genetic Algorithm. Genetic algorithm combines the exploitation of previous results with the exploration of new solutions of the search space. It generally follows the survival-of-the-fittest technique [4]. Genetic algorithm provided to maintain a population of candidate solutions that evolves over time and ultimately converges to give the optimal solution. In a population, individuals are represented by chromosome, which is represented as a string of bits. To evolve the best solution and to implement natural selection, an objective function is defined, which helps to measure a candidate solution’s relative fitness. The domain of our key problem is distributed system. Generally a distributed system comprises of a no. of computers, which acts as client, accessing services from another set of computers, which acts as servers. The most common example of distributed system is the World Wide Web (www). The WWW, everyday it intercepts a large traffic, directing it to a web server system.The purpose of web server is to store information and serve client requests. A web server system is consisting of multiple web server hosts, running a number of web applications simultaneously. Dynamic load balancing comes with the need of allocating servers/ resources to client requests, at the moment they arrive. It is “mission-critical” as it is unpredictable to determine the incoming load. It involves key issues like task migration and load sharing. According to Ref. [3], Load sharing provides to manage the tasks in the system in such a way that no processor in the system is idle. Generally a process is migrated to another processor if the migration cost or overhead is less than some predetermined matrix, in order to improve processor utilization. Migration of processes generally requires more hardware requirements, which in turn leads to increase the cost of execution. The load balancing problem strategy tries to ensure that the processors or servers in the system are equally loaded and every processor or server does same number of request processing. After receiving of the requests, a good scheduling policy should be maintained which can assure assigning of requests, to appropriate servers, within the shortest execution time. In this paper, we have considered the problem of load balancing is a process scheduling policy which takes every incoming request as one process and assign it to a processor or server for processing. The rest of the paper is organized as follows: In section 2, a brief description of the related works has given. In section 3, theoretical details about the Genetic Algorithm are described. In section 4, the system and process model is introduced. Section 5 included the proposed Genetic Algorithm based load balancing approach. The implementations and results are discussed in section 6. Section 7 includes the conclusion part.
II. RELATED WORKS Web hotspot being a serious problem as it degrades the quality of the website. Manual control on this whole process would surely affect the website quality. Ref. [1] defined DotSlash autonomic rescue system, given by Weibin Zhao, provided a solution to this problem.
Figure 1: DNS round Robin mechanism
In order to solve the problem, DotSlash enables the web site to create a distributed web server system on the fly, adaptive to the changing environment. In the design model of DotSlash autonomic rescue system, a cost effective mechanism was applied to handle the increase request load. According to it, different web sites can form a mutual-aid community of web servers, so that in case of critical period it can use the spare capacity of other web sites in the community. The working of DotSlash rescue system can be described simply by the following steps: • Dynamic Virtual Hosting • Request Redirection • Workload Monitoring • Rescue Control • Service Recovery.
A. Dynamic load balancing Approaches In client based approach, requested documents can be routed to any replicated web server even when the nodes are loosely (or not) coordinated. Routing of requests to the web clusters can be done by either Web-clients or by Client-side proxy servers [5]. DNS- based approach provides to overcome the limitations of client based approach as it uses request routing mechanism in the cluster side. The cluster DNS or the authoritative DNS server for the distributed Web server system’s nodes, translates the URL to the IP address of one server, so as provides architecture transparency at the URLlevel [5] [6]. Based on the scheduling algorithm, used by the cluster DNS, to balance the load on the Web server’s node, the DNS-based approach can be categorized into Constant TTL Algorithm and Adaptive or Dynamic TTL Algorithm. Cluster based approach for peer-to-peer system; B. Mortazavi_ and G.Kesidis [7] have used a reputation framework, based on which they have designed a game, in which players play in order to receive maximum files from the system. Brighten Godfrey and et al. [8] has proposed an algorithm for load balancing for heterogeneous and dynamic P2P system. Kalman Graffi et al [9] have used a DHT-based information gathering and system analyzing technique. Ananth Rao et al [10] to address the load balancing problem in p2p system have proposed an algorithm, which gives the idea of virtual server. Song Fu et al [11] has characterized the behavior of randomized search algorithms in the general P2P environment. In case of dispatcher based approach, Harikesh Singh et al. [6] have addressed an advanced DNS dispatching technique provided to distribute the HTTP requests from the clients, by using Round Robin and proximity based scheduling algorithm. Many of the approaches of load balancing involved optimization techniques like Fuzzy logic, Genetic Algorithm also. Load balancing problem is known to be NP- hard in context to number of requests versus number of machines/ servers. It leads to search for an optimum solution to this problem. Yu-Kwong Kwok and et al [15] defined a new dynamic fuzzydecision-based load balancing system incorporated in a distributed object computing environment. With the help of conventional control theory, the sudden increase in the load was considered as an external force to the system. A feedback mechanism is maintained which provide to minimize the effect of the external force to the system. A Genetic Algorithm based approach was introduced by Bibhdatta Sahoo et al [16] for dynamic load distribution in heterogeneous distributed system. It has defined the load balancing as a job scheduling mechanism, comparing the proposed system with two scheduling policies like LERT-MW and LERT-MWM. Priyanka Gonade et al [4] defined a modified Genetic Algorithm approach with an objective function for minimum load deviation of a node.
III. GENETIC ALGORITHM- THEORETICAL CONCEPT Genetic Algorithm (GA) is search based method which works based on the principle of natural selection and genetics. It is a model based on search methods, provided to obtain the optimal solution out of a search space consists of a population of potential solution. This algorithm follows the principle of survival of the fittest, where each individual presents a point in problem solution’s search space. An individual which represents a candidate solution can be expressed as string of bits, referred to as chromosomes. Each chromosome is composed of variables called genes and values associated with the genes are termed as alleles. To evolve the best solution and to implement natural selection, an objective function is defined, which helps to measure a candidate solution’s relative fitness. The objective function is an important concept as it is used subsequently used by the GA to guide the evolution of best solutions. After the problem is encoded in a chromosomal manner and an objective function has been chosen, solution to the search problem can be evolved by using the following steps [12]: • INITIALIZATION:
The initial population of candidate solutions is usually generated randomly across the search space. • EVALUATION: After initialization of the population, the fitness values of all the candidate solutions are evaluated by using the objective function. • SELECTION: Selection provides to select those solutions with higher fitness value to the next generation and thus imposes the survivalof-the-fittest mechanism on the candidate solutions. The main idea of selection is to prefer better solutions to worse ones,
and many selection procedures have been proposed to accomplish this idea. Some of the selection techniques are roulettewheel selection, stochastic universal selection, ranking selection and tournament selection. • RECOMBINATION:
Recombination provides to combine parts of two or more parental candidate solutions to create a new, possibly better solutions, termed as offspring. The offspring under recombination will not be identical to any particular parent and will instead combine parental traits in a novel manner [14]. • MUTATION:
The task of mutation is to locally but randomly modify a solution. It generally involves changing one or more traits of an individual. We can say that the mutation performs a random walk in the space of the candidate solutions. • REPLACEMENT:
The offspring population created by selection, recombination, and mutation replaces the original parental population. Many replacement techniques such as elitist replacement, generation-wise replacement and steady-state replacement methods are used in GAs. • TERMINATION CONDITIONS OR STOPPING CONDITIONS:
Termination conditions are generally problem dependent. Some general stopping conditions are obtaining of optimal solution, same fitness value for more than one generation, consecutively etc.
A. Basic Genetic Algorithm Operators SELECTION OPERATOR:
The basic selection techniques can be distinguished into two categories: •
FITNESS PROPORTIONATE SELECTION
This includes methods such as roulette-wheel selection and stochastic universal selection [12]. In roulette-wheel selection, each individual in the population is assigned a roulette wheel slot sized according to its fitness value. Thus a better solution will have a larger slot than a less fit solution. •
ORDINAL SELECTION
This includes methods such as tournament selection and truncation selection [12]. In tournament selection, s-number of chromosomes are selected in random and put in tournament against each other. The fittest group with k-number of individuals is selected as the parent. RECOMBINATION OPERATOR:
After selection, individuals from the offspring pool are recombined (or crossed over) to create new, hopefully better, offspring. In recombination process, two individuals are selected randomly and recombined with predefined probability, pc, termed as crossover probability. A uniform random number, r is defined which is compared with the pc. If r pc, then individuals are simply taken to be the copy of their parents. A pseudo code for the above mechanism is given below: Pseudo Code: [1] Start [2] Define r any random number [3] Define pc, pc= crossover probability [4] If r