A Decentralized and Efficient Algorithm for Load Sharing in Networks of Workstations Guy Bernard Institut National des Te´le´communications 9 rue Charles Fourier 91011 EVRY Cedex France Phone: (33) 1 - 60 76 45 67 Fax: (33) 1 - 60 76 47 80 e-mail:
[email protected] Michel Simatic Alcatel TITN ANSWARE 1 rue Galvani 91301 MASSY Cedex France Phone: (33) 1 - 30 67 92 20 ext. 34 405
ABSTRACT This paper presents the design and evaluation of a decentralized load sharing algorithm for networks of workstations, RADIO. With respect to general distributed computing environments, networks of workstations have some peculiarities. First, the global computing power is most of the time much underutilized. Second, users of workstations need occasionally a peek of computing power. Third, workstations are often diskless, so that running a process on one workstation or another does not add file migration overhead. Fourth, network interfaces often provide a broadcast capability, which may be used to reach several destinations in a single message. Last, workstations are often dedicated to an "owner", so that a workstation may only be used for running foreign processes only when the workstation’s owner does not use it (or at least when running foreign processes would not increase the owner’s programs response time by a significant amount). The first three points make load sharing very attractive for a network of workstations. The fourth point may be used for simplifying the design of load sharing algorithms, but broadcasting is expensive. The goal of RADIO is to provide the benefits of a decentralized load sharing algorithm while preserving the personal character of workstations and providing good performance results, in particular with respect to extensibility. The key feature of the RADIO load sharing algorithm is that it is decentralized but involves expensive broadcast messages only occasionally. The design choices for information policy, location policy and transfer policy are described and motivated. RADIO has been implemented on a network of Sun workstations, and runs entirely outside of the kernel. Experimental results show that the extensibility of RADIO is better than that of previous decentralized algorithms, based on broadcast messages.
1. Introduction There are basically three ways of improving performance in loosely coupled distributed computing systems. The first one consists in implementing a file location/migration policy. The second one consists in taking benefit of the parallelism inherent to some applications by running in parallel, on different processors, the different tasks that constitute the application program. The third one, usually referred to as "load sharing", consists in taking benefit of the fact that, in the network, some machines are less loaded than
-2-
others (or even totally inactive), by running some processes on a less loaded machine. This paper focuses on a load sharing policy. Among loosely coupled distributed computing environments, a network of workstations has some peculiarities. First, the global computing power is most of the time much underutilized [Mut87a, Stu88a, The89a]. Second, the owner of a workstation needs occasionally a peek of computing power, high enough to lead to slow response times if the processes are run simultaneously on his/her workstation. Third, workstations are often diskless, so that running a process on one workstation or another does not add file migration overhead. Fourth, network interfaces often provide a broadcast capability, which may be used to reach several destinations in a single message. Last, workstations are often dedicated to an "owner", so that a workstation may only be used for running foreign processes when its owner does not use it (or at least when running foreign processes would not increase the response time of the owner’s programs by a significant amount). This paper focuses on a load sharing policy in a network of personal workstations. The goal is to provide the benefits of load sharing (that may be large for underutilized systems) while preserving the personal character of workstations and providing good performance results, in particular with respect to extensibility (defined as the maximum number of workstations that may be part of the system without consuming too much CPU or network bandwidth). The problem of performance and extensibility of load sharing algorithms has been addressed in several papers [Zho88a, The89a]. The main conclusions are: (i) broadcasts on a local area network are expensive; (ii) centralized algorithms are, surprisingly, the most extensible; (iii) failure detection and recovery are expensive for centralized algorithms. The goal of this paper is to describe and evaluate a decentralized algorithm that involves broadcast messages very occasionally , and thus provides large extensibility and good performance results. 2. The RADIO Algorithm A load sharing algorithm is composed of three parts [Zho88a]. The information policy specifies what are the informations used in deciding a process migration, and the way these informations are distributed in the system. The location policy decides on which machine an eligible process should be migrated. The transfer policy determines the eligibility of a process for migration. We now describe these three parts for the RADIO algorithm and then the broadcast cases. 2.1. Information Policy In RADIO, information and decision are both decentralized and centralized. A workstation is said available when it can accept to run remote processes. At any time, RADIO handles the following data structures: 1.
Every workstation Wi keeps in memory the identity of the last workstation Li that accepted to receive a process from Wi , i.e. that was available at that time.
2.
The available workstations are linked in a distributed ordered list, the "available list" (each available machine Ai knows only the identity of its predecessor Ai −1 and successor Ai +1 in the list).
3.
A machine called the "manager" plays a special role. It knows the identity of the first workstation A 1 of the available list. The identity of the manager is known by all the workstations of the network. Workstation A 1 knows it is the first workstation of the available list as its predecessor is the manager itself.
When workstation Wi wishes to run a process P on a remote machine: 1.
It first polls its (supposed) available partner Li . This is based on the assumption that Li that was available before is still available now. If Li is really available, it accepts to run process P (see Figure 1).
2.
Of course, this assumption does not always hold. If Li is not available, it forwards the request to the manager (see Figure 2). If the available list is not empty, the manager indicates the identity of the first workstation of the available list, A 1, to the requesting workstation Wi . If the available list is empty, the manager notifies Wi that no workstation is currently available so
-3-
that Wi executes process P locally.
(1) req W i
L
i
(2) ack Figure 1: Finding an available workstation in two messages
W i (3) A 1
Mgr
(1) req
L
i
(2) req
A1
A2
Figure 2: Finding an available workstation in three messages
When a workstation Ai switches from available state to non-available state, it notifies its predecessor Ai −1 in the available list that the successor of Ai −1 is now Ai +1. Afterwards Ai −1 notifies Ai +1 that it is now its predecessor (see Figure 3). When a workstation Ni switches from non-available state to available state, it notifies the manager, which sends back the identity of the first available workstation A 1. Then, Ni notifies Ai that it is now its predecessor. Thus, Ni is inserted at the head of the available list, in location A 1 (see Figure 4). 2.2. Location Policy The goal of a load sharing policy may be to attempt to balance machine loads ("load balancing") at any time over the network, or simply to ensure that no machine is idle when others are overloaded ("load sharing" strictly speaking) [Kru87a]. For networks of personal workstations, load balancing is not only useless, but undesirable, for two reasons. First, since most of the time the workstations are underloaded, load balancing would most of the time migrate a process from a workstation less loaded to another even less loaded, so that migration overhead would predominate over the gain in execution time [The89a]. Second, since workstations are personal, the owner would not tolerate a significant degradation of his/her response times because another user started many cpu intensive computations. Several ways for taking into account the personal nature of a workstation have been proposed. In Butler [Nic90a], a workstation is unavailable for foreign processes as soon as the number of logged in users is greater than some threshold. This criteria is very restrictive, because if users neglect to log out when they do not use the workstation, it will appear busy whereas it is in fact available. In Condor [Lit88a], a workstation is declared available when there have been no keyboard or mouse activity for some duration, and the average CPU utilization has been less than some value for a certain amount of time. This criteria is very restrictive too, because if the workstation is used only to run an editor, it will appear as busy, while running
-4-
(1) A
(3) ack A i
i-1
A
i+1
(2) a) before
A
A
i-1
i+1
b) after Figure 3: Switching from available to non-available state a foreign process would not slower the editing process. In RADIO, as in [Alo88a], a workstation is declared available when its current load is less than a threshold Tlow . The load index used is the UNIX 4.3bsd index, namely a mix of averaged CPU and IO queue lengths. The value of Tlow need not be the same on every workstation, and it may be changed dynamically by the system administrator if needed. Empirically, the value of 0.8 appeared to be suitable. In our experiments, users did not notice that their workstation was executing foreign processes with this value. 2.3. Transfer Policy As stated before, the transfer policy determines the eligibility of a process for migration. In RADIO, a process may be migrated to an available workstation (if such a workstation exists) only when the workstation becomes overloaded, namely when its current load is above a second threshold, Thigh . In order to prevent boomerang effects, the value of Thigh must be at least Tlow +1. The value of 1.8 appeared to be suitable. However, as for Tlow , Thigh need not be the same on every workstation. To summarize the influence of Tlow and Thigh , a workstations may be in one of three states, according to the value of its current load: 1.
load