Load Balancing in Replicated Databases with Loose Consistency Requirements D.H.J. Epema Faculty of Information Technology and Systems, Delft University of Technology P.O. Box 356, 2600 AJ Delft, The Netherlands,
[email protected]
Abstract
An important method for improving the performance of distributed databases is load balancing. Because a database access has to go to a server on which the relevant data reside, load balancing in distributed database systems is a matter of judiciously placing one or multiple copies of the data in the system, and of judiciously selecting a server for a database access. Depending on the application requirements, databases may have to enforce the serializability of transactions and the consistency of replicas. We present a simulation study of dierent policies for determining the numbers of replicas and for placing the replicas in distributed databases without serializability guarantees and with only loose consistency guarantees.
1 Introduction Currently, many companies and institutes use distributed database systems (DDS) for their daily oce operation, which integrate e-mail databases, discussion databases, databases for collaborative work, etc., into a single system. This type of applications usually do not pose strict requirements for the serializability of transactions or for the consistency of replicas of databases. In large companies with tens or hundreds of thousands employees, managing such DDSs is a nontrivial task. One aspect of this management is the performance of such a system, which relies on the judicious placement and replication of databases, and on the policy of selecting a server in case the database used by a transaction is replicated. In this paper we study by means of simulations the performance of dierent policies for deciding on the numbers of replicas and for assigning replicas to servers in DDSs without concurrency-control mechanisms and with only a simple replica-consistency mechanism. This study, of which we present here some initial results, is being performed in the context of a Lotus Notes environment (see http://www.lotus.com), which oers the kind of DDS explained above. Lotus Notes does not include concurrency control, and one of its ways for keeping replicas consistent is by means of schedules which de ne the times at which the replicas are to be made identical. The performance of (distributed) databases has been studied from dierent perspectives and in dierent settings. Much of this work has concentrated on the cost of storing and transmitting data (rather than on response time), leading to variations of the File-Allocation Problem [2], and on the performance penalty due to concurrency control [4]. Much work is also being performed in the context of the World-Wide Web, but there write accesses only play a minor role, and the focus is often on devising suitable metrics for server selection that capture the congestion on the Internet [3]. This work was performed while the author was visiting G. Bozman's group at the IBM T.J. Watson Research Center in Hawthorne, NY, USA.
2 The Model In this section we describe our simulation model of a DDS.
2.1 Components of the Model
Our model of a DDS consists of a network of servers on which databases are stored and of clients that access these databases through transactions. There are D separate databases, each of which may be replicated. Transactions are assumed to access a single database. Rather than consisting of a single request, they consist of a number of RPCs. When a client generates a transaction for some database, it sends the rst RPC across the network to a server that stores the database, and awaits its results before sending the next RPC. A transaction consists of a uniformly distributed number of RPCs, and RPCs have an exponentially distributed service time at the server side; the processing time at the client side is assumed to be negligible, and so is ignored. During sessions (see 2.2) clients continuously go through the cycle of generating a transaction, waiting for it to nish and return its nal results, and then thinking for some time before generating the next. The think time is also exponentially distributed. The number of clients is denoted by C . Each of the servers stores a subset of the databases. The servers use the FCFS policy for serving the RPCs of transactions. The capacities of the servers may be dierent, and are de ned relative to some unit-capacity server. We model the network as an in nite server with a constant service time. So we assume that there is no congestion, and that the time needed to send an RPC and to return its results is independent of the amount of data transmitted.
2.2 Transaction Rates
We de ne the access pattern for all client-database pairs by creating in a random way a twodimensional matrix (aij ) of relative transaction rates, indexed by databases (i = 1; : : : ; D) and by clients (j = 1; : : : ; C ), which is biased in both dimensions. The total relative transaction P P rates of client j and database i are de ned as i aij and j aij , respectively. To be more precise, we rst create a matrix (a0ij ) in which the elements along each row and column increase exponentially such that a0iC =a0i1 = KC for i = 1; : : : ; D, and a0Dj =a01j = KD for j = 1; : : : ; C , for some constants KC ; KD > 1, and then we take each of the aij to be uniformly distributed between 0 and a0ij with probability u and aij = 0 with probability 1 ? u, for some u with 0 < u 1. Clients alternate between sessions and session intervals during which they do and do not generate transactions. The session length of client j has a uniform distribution with mean Sj , and the session-interval length of all clients has an exponential distribution with mean I . So client j generates transactions during a fraction Sj =(Sj + I ) of the time. We take the Sj and the I such that the ratios of these fractions of time are equal to the ratios of the clients' total relative transaction rates. In reality, users often generate some number of consecutive transactions for the same database before moving on to the next. Therefore, we de ne a xed probability p of any client certainly generating a transaction for the same database as its previous transaction (also across dierent sessions). In the remaining fraction 1 ? p of the cases, the client picks a database based on all of its relative transaction rates.
2.3 Replication
We will study two policies according to which the numbers of replicas of the databases are determined. In xed replication, r ? 1 fractions f1 ; : : : ; fr?1 with 0 < f1 < < fr?1 < 1 and r positive integers n1 ; : : : ; nr are given. Assume the databases to be numbered in descending order of their total relative transaction rates. Let jk = fk D (rounded to below), k = 1; : : : ; r ? 1, and
let j0 = 0; jr = D. Then, databases jk?1 + 1; : : : ; jk have nk replicas, k = 1; : : : ; r. In uniform replication, the aim is to make the ratios of the total relative transaction rates and the numbers of replicas of all database as close as possible. Assume that the databases are initially in a list in descending order of their total relative transaction rates, and that they all have one replica. In this replication policy, a fraction f 1 is given, and the following procedure is followed until the total number of replicas of all databases together is equal to f D (rounded to below): Delete the database at the head of the list, increase its number of replicas by one, and re-enter the database into the list at the appropriate place, i.e., according to its total relative transaction rate divided by its new number of replicas. We de ne in either replication policy the relative replica transaction rate of a database as the ratio of its total relative transaction rate and its number of replicas. For each of the replication policies, we assume that there is a xed, deterministic replication interval, at the end of which the replicas of the same database are made up to date. We do not model the overhead of this replica updating. We have two ways of assigning replicas of databases to servers. In the random allocation policy, the replicas of each database are put on randomly chosen (dierent) servers. For the balanced assignment policy, we assume the databases to be sorted in a list in descending order of their relative replica transaction rates. Now, the replicas of every next database on this list are assigned to those servers that will be the most lightly loaded (in terms of the sum of the relative replica transaction rates of the databases assigned to the servers divided by the servers' capacities) after the replicas of this database have been assigned to them. When issuing a transaction for a particular database, a client is only allowed to change servers when at least a replication interval has expired since its last transaction to the same database, so that it will always see its own changes to the database. The server then selected is the one with the shortest queue of RPCs to be executed, possibly smoothed over some time interval or some number of observations. (Lotus Notes uses another criterion for server selection, based on RPC response times.) Polling a server for its queue length is modelled as a single RPC.
3 Simulation Results In this section we present some preliminary simulation results, which were obtained with the CSIM-18 package [1]. The values of the parameters of the model used in this section have the following values. There are 200 clients, 3 servers of capacity 1 each, and 20 databases. The mean client think time is 3 s, the mean RPC service time is 0:05 s on a unit-capacity server, and the number of RPCs per transaction is uniformly distributed between 10 and 20. The session length of the largest client is uniformly distributed between 15 and 30 minutes (the distributions of the session lengths of the other clients can then be derived, see 2.2), and the session-interval length of all clients has a mean of 15 minutes. Both ratios KC ; KD of the client and database relative transaction rates, the probability u of having no transactions for a speci c client-database pair, and the probability p of certainly sticking to the same database, are set to 10, 0:5, and 0:8, respectively. The queue length used for server selection is smoothed over 100 observations. Finally, the constant delay in the network is 0:002 s. In all simulations, the simulated time is 10 hours, and the results are reported over the last 8 hours. Of course, the most important performance metric from a user point of view is the transaction response time at the client side, but the overall distribution of the response times for all clients together may not be very meaningful, because poor responses at certain clients or for certain client-database pairs may be hidden in it. Therefore, we will present the density of the mean transaction response time for client-database pairs in the form of histograms giving for each occurring mean response time the number of client-database pairs experiencing this mean response time. For lack of space, we can only show a few of our simulation results.
3.1 Database Assignment
We rst study the eect of the random and balanced database-assignment policies without replication. The simulation results are shown in Figure 1 and in Table 1. Of course, in this case all load balancing is due to the database assignment, clients having no choice as to which server to send a transaction to. This means that for both policies, we can expect the histograms in Figure 1 to be trimodal (and in general, for S servers, S -modal). As can be seen in Table 1, the system with the balanced assignment is very well balanced indeed, with all three servers running close to their maximal capacity, and the histograms for the separate servers blend to a unimodal distribution in Figure 1(b). With the random-assignment policy, servers 0 and 2 have low loads, and their histograms blend but are completely separated from the histogram of the heavily loaded server. 800
800
600 number of clientdatabase 400 pairs 200
600
0
400 200 0
5 10 15 20 25 mean transaction response time
0
0
5 10 15 20 25 mean transaction response time
(a) (b) Figure 1: The numbers of client-database pairs per mean transaction response time for (a) the random and (b) the balanced database-assignment policy without replication. policy server utilization RPC throughput queue length RPC response time random 0 0.539 10.76536 1.12016 0.10405 1 0.989 19.82096 23.05426 1.16312 2 0.695 13.90040 2.07952 0.14960 balanced 0 0.920 18.41160 6.63386 0.36031 1 0.951 19.04651 8.27872 0.43466 2 0.958 19.15223 8.50309 0.44397
Table 1: Performance of the random and balanced database-assignment policies without replication.
3.2 Database Replication
We now introduce our two replication policies. For xed replication, we specify that a fraction of 0:5 of the databases with the largest relative transaction rates get two replicas and the others only one, and for uniform replication, we specify that the total number of replicas is 1:5 times the number of databases. In the latter case, it turns out that there are three copies of the two databases with the largest total relative transaction rates, and two copies of the next six. In either case, we get a total of 30 database replicas. The replication interval is 15 minutes. The troughputs of RPCs in the four combinations of the two assignment policies and the two replication policies are nearly equal, around 58 (out of a maximum of 60). As is shown in Figure 2, either replication policy gives a big improvement over the results without replication presented in Section 3.1, with the combination balanced assignment- xed replication giving the best results.
800
800 random assignment xed replication
600 number of clientdatabase 400 pairs 200 0
400 200
0
5
800
10
15
(a)
20
0
25
random assignment uniform replication
600 number of clientdatabase 400 pairs 200 0
balanced assignment xed replication
600
0
800
5
10
15
(b)
20
25
balanced assignment uniform replication
600 400 200
0
5 10 15 20 25 mean transaction response time
0
0
5 10 15 20 25 mean transaction response time
(c) (d) Figure 2: The numbers of client-database pairs per mean transaction response time with replication and with a 15-minute replication interval.
4 Conclusions and Future Work We can draw a few preliminary conclusions (not all simulation results supporting these could be shown). First, if there is no replication or if the replication interval is long, random assignment is not a good idea. Second, in the case of balanced assignment, a long replication interval does not decrease performance very much. Third, we expected uniform replication to be the better of our two replication policies, but this is not uniformly so. Apparently, a choice of fewer servers for more database is sometimes better than a choice of more servers for fewer databases. Elements that we want to include in our simulation model are (1) the trac due to making replicas consistent, (2) the sizes of the databases versus the storage capacities of the servers, (3) server-initiated load balancing, and (4) load balancing at dierent levels, e.g., not only in a single cluster of servers, but also among sets of clusters. Finally, we want to run simulations with larger numbers of clients, servers and databases that better match real environments, and we need to better validate our model assumptions against Lotus Notes measurement data.
References [1] Mesquite Software, Inc. User's Guide: CSIM18 Simulation Engine, 1998. [2] P.B. Mirchandani and R.L. Francis. Discrete Location Theory. Wiley-Interscience, 1990. [3] M. Sayal, Y. Breitbart, P. Scheuermann, and R. Vingralek. Selection algorithms for replicated web servers. Perf. Eval. Rev., 26(3):44{50, 1998. [4] A. Thomasian. Concurrency control: Methods, performance, and analysis. ACM Computing Surveys, 30:70{119, 1998.