Approximation Algorithms for Data Distribution with Load ... - CiteSeerX

Approximation Algorithms for Data Distribution with Load Balancing of Web Servers Li-Chuan Chen Networking and Communications Department The MITRE Corporation McLean, VA 22102 [email protected]

Abstract Given the increasing traffic on the World Wide Web (Web), it is difficult for a single popular Web server to handle the demand from its many clients. By clustering a group of Web servers, it is possible to reduce the origin Web server’s load significantly and reduce user’s response time when accessing a Web document. A fundamental question is how to allocate Web documents among these servers in order to achieve load balancing? In this paper, we are given a collection of documents to be stored on a cluster of Web servers. Each of the servers is associated with resource limits in its memory and its number of HTTP connections. Each document has an associated size and access cost. The problem is to allocate the documents among the servers so that no server’s memory size is exceeded, and the load is balanced as equally as possible. In this paper, we show that most simple formulations of this problem are NP-hard, we establish lower bounds on the value of the optimal load, and we show that if there are no memory constraints for all the servers, then there is an allocation algorithm, that is within a factor 2 of the optimal solution. We show that if all servers have the same number of HTTP connections and the same memory size, then a feasible allocation is achieved within a factor 4 of the optimal solution using at most 4 times the optimal memory size. We also provide improved approximation results for the case where documents are relatively small.

1. Introduction Internet and World Wide Web (WWW) traffic has grown explosively, and this growth is expected to continue. For a popular Web site, network congestion and server overloading may become serious problems in the future and would result in increased Web services delays. A number of ap-

Hyeong-Ah Choi Department of Computer Science The George Washington University Washington, DC 20052 [email protected]

proaches to solve this problem has been proposed recently, including mirroring, document caching, and clusters of Web servers. The first is to mirror (replicate) popular Web sites in different locations throughout the world. The original Web site’s homepage would contain a list of mirror sites. This allows users to choose a site based upon their location. One drawback of mirroring the Web site is that the user does not typically have access to information about underlying network and server load. This issue was considered in a number of papers [16, 11, 1, 14, 9] by taking network latency and server load into account, basing decisions on prior performance, or based on erasure codes. Web caching is a mechanism to place copies of frequently accessed Web objects closer to the users. One difficulty in Web caching is the possibility of accessing stale Web objects. Cache coherence [10, 4] deals with problem of keeping Web objects consistent with the original copy. Web objects vary in size, unlike traditional caching systems, in modern computer memory systems, where a cache-line is of a fixed size. Replacement algorithms deal with this issue where more than one object might be removed to replace the current object [13, 6]. For the clustering of servers as a single Web server, Web documents are distributed among servers, and only one Universal Resource Locator (URL) is published to the clients. Since many servers are working together, load balance is the main issue. This has been studied elsewhere [2, 12, 15, 5, 8]. We will focus on this approach in this paper. We will present a number of results on the complexity of solving the allocation optimization and decision problems, either exactly or approximately. We will consider a number of different formulations of the problem, since some seem to be easier to approximate than others. In addition to establishing that simple formulation of the allocation problem are NP-hard, we present approximation algorithms for the cases of no memory constraints, equal memory and HTTP con-

straints, and for allocation involving small document sizes.

2. Previous Work There have been several studies of load balancing among a cluster of Web servers. These are usually broken into two broad categories: client-based load balancing and serverbased load balancing. Lewontin and Martin [9] implemented a client-side load balancing algorithm. Their method is based on the past performance of the requests to minimize response latency performance. The performance is measured by the number of bytes transmitted divided by the total time. A list of replicated servers’ performances is maintained at the client’s proxy server and then uses a directory service to map a URL onto one of the servers. Many of the server-based load balancing systems are based on a 2-tier architecture. A front-end server is responsible for dispatching an incoming Web document request to one of the back-end document servers. In NCSA services [7], a round-robin Domain Name Service (DNS) is used for distributing client requests to one of the Web servers. The drawback of using DNS is that it does not provide load balance among the servers, due to the non-uniformly document sizes and DNS naming caching. DNS does not know the status of Web servers. When a server is down or busy (because it has been taking all the large document size requests), DNS might still rotate the request to that server. Garland et al. [5] overcome NCSA’s uneven load balancing problem by implementing a mechanism that monitors server load and selects the least loaded server for serving an incoming request. Their server’s load metric is determined by the number of Web document requests for the server plus the number of processes currently active in the server. Narendran et al. [12] implemented a distributed Web servers system based on the combination of DNS roundrobin, HTTP redirection and document’s access rate as a mechanism to balance the load. Our model is closely related to theirs, but includes server memory size limits. Existing research has stressed practical approaches to the problem of achieving load balance, but there has not been a theoretical analysis of the performance of these algorithms, and very little work has been done in terms of how Web documents are allocated among servers. In this paper we approach this load balance problem from a more theoretical direction. We consider the allocation of Web documents among a cluster of Web servers in order to achieve load balance. Each of the servers is associated with resource limits in their memory and their number of HTTP connections. Each document has an associated size, sj , and access rate, rj . Following [12], we define the access rate to be the product of time needed to access the document and the probability that the document is requested.

We prove that even simple formulations of this problem are NP-hard. We also provide simple approximation algorithms for a number of formulations of this problem. In each case we show that the approximation algorithm achieves a fixed performance ratio with respect to the optimum solution. Before presenting our results, we define our model in greater detail.

3. Problem Formulation Our model is a generalization of one proposed by Narendran et al. [12]. It consists of M servers and N documents. Throughout we use the index i when referring to servers and j when referring to documents. Each server i is associated with a memory size mi and a number of simultaneous HTTP connections li . Each document j is associated with a document size sj and an access cost rj as defined earlier. The total P access cost rˆ is sum of all documents’ access cost, ˆ rˆ = N j=1 rj . The total number of HTTP connections l is P l= M sum of all servers’ HTTP connections, ˆ i=1 li . Let r = (r1 , r2 , . . . , rN ), l = (l1 , l2 , . . . , lM ), s = (s1 , s2 , . . . , sN ), m = (m1 , m2 , . . . , mM ). The input to the allocation problem is a quadruple I = hr, l, s, mi. The output is an allocation (or access) matrix, which is an m×n matrix, aij , where 0 ≤ aij ≤ 1. If aij 6= 0, then document j is allocated to server i. We permit a document to be allocated to more than one server, and we interpret aij to be the probability that a request for document j is to be processed by server i. Any allocation must satisfy the following allocation constraint M X

aij = 1,

for 1 ≤ j ≤ N .

i=1

A special case, called a 0-1 allocation is one in which aij ∈ {0, 1}. In such an allocation each document appears in exactly one server. Let Di denote the set of documents allocated to server i, that is, Di = {j | aij 6= 0}. The sum of document sizes in server i cannot exceed the memory of this server. From this we have the following memory constraint X sj ≤ m i , for 1 ≤ j ≤ M. j∈Di

An allocation satisfying these constraints is called a feasible allocation. Let Ri denote the total access cost for server i, that is, Ri =

N X j=1

aij rj .

A server’s ability to respond to document requests is affected by two quantities. The total number of bytes this server must send is proportional to the server’s total access cost. As the number of HTTP connections increases, the server’s ability to satisfy multiple requests increases. Hence, we define the load of server i per HTTP connection to be Rlii . Define objective function f (a) to be f (a) = max

1≤i≤M

Ri li

.

Our goal is to balance the load by minimizing the maximum load over all servers. Allocation Optimization Problem: Given input quadruple I, find a feasible allocation a that minimizes f (a). Call this optimum allocation a∗I , and let fI∗ = f (a∗I ) be its optimum value. When I is clear from context, we will simply write f ∗ . Allocation Decision Problem: Given input I and value f0 , is fI∗ ≤ f0 ? Our interest in the decision problem is that, given an algorithm for the decision problem, we may use it within the context of binary search to search for the optimum value for the optimization problem.

4. Contributions First we establish a lower bound of rˆ/ˆl, on the value of the optimum load f ∗ . This can be achieved when memory is not a constraint, by allocating every document to every server. This immediately improves the results of Narendran et al. [12], since their results did not consider bounds on memory. Our remaining results all involve 0-1 allocations (in which each document is assigned to exactly one server). Hardness: We show that even without memory constraints and with all servers having an equal number of HTTP connections, the allocation optimization problem is NP-hard. We also show that if memory constraints are present, then even determining the existence of a feasible 0-1 allocation is NP-hard. This remains true even if all servers have the same memory size. No memory constraints: We show that if there are no memory constraints for all the servers, then there is a simple and efficient greedy allocation algorithm, which is within a factor 2 of the optimal solution. Equal memory and HTTP constraints: We show that if all servers have the same number of HTTP connections and the same memory size, then a feasible allocation is achieved within a factor 4 of the optimal solution using 4 times the optimal memory size.

Small document sizes: The above hardness results rely on the fact that documents can be nearly as large as the memory sizes of the servers. However, in practice, document sizes are typically much smaller than the servers’ memory sizes. We show that if the memory sizes for all servers are equal to some value m, and the size of the largest document is at most m/k, then we can compute an allocation whose load is at most a factor of 2(1 + 1/k) times optimal. All of our approximation algorithms are based on simple greedy approaches, and are easy to implement.

5. Lower Bounds Consider an input I = hr, l, s, mi where there are no memory constraints; that is mi = ∞. Recall that rˆ = PN ˆl = PM li . We begin by providing a lower r and j j=1 i=1 bound on the optimal allocation cost f ∗ . Lemma 1 Let rmax = max1≤j≤N rj and lmax = max1≤i≤M li , rmax rˆ ∗ f ≥ max . , lmax ˆl Proof: Assuming memory space is large enough to allocate all documents for each server. Then the memory constraint is trivially satisfied. By the pigeon-hole principle, there exists a HTTP connection on a server i that has to provide service at least the total access cost rˆ divided by the total number of HTTP connections ˆl, that is f ∗ ≥ rˆˆl . The document with the largest access rate must be assigned to some server, and in the best case it is assigned to the server with largest number max of HTTP connections, implying a cost of at least rlmax . t u We will use the following alternative lower bound in the proof of Theorem 2 below. Lemma 2 Assume r1 ≥ r2 ≥ . . . ≥ rN and l1 ≥ l2 ≥ . . . ≥ lM , then for all j 0 , 1 ≤ j 0 ≤ min(M, N ), f

∗

≥

max

1≤j 0 ≤min(N,M )

Pj 0

j=1 rj

Pj 0

i=1 li

.

Proof: Consider the optimal allocation of the first j 0 documents to the M servers. Let Sj 0 denote the servers used in this allocation and let fj∗0 denote the cost of this allocation. Clearly |Sj 0 | ≤ j 0 . Since we allocate a subset of documents to all the servers we have fj∗0 ≤ f ∗ .

We claim that if i ∈ Sj 0 then we may assume that i − 1 ∈ Sj 0 . If not, then move all the documents from server i to server i − 1. Since i − 1 is not in Sj 0 it contains no documents. Since li−1 ≥ li , this can only decrease the overall cost. Based on this and the fact that |Sj 0 | ≤ j 0 it follows that we may assume that Sj 0 ⊆ {1, 2, . . . , j 0 }. For 1 ≤ i ≤ j 0 , let Ri be the sum of rj for the documents assigned to server i. By definition Ri fj∗0 = max 0 . 1≤i≤j li

0-1 Allocation: Given an input quadruple I = hr, l, s, mi, does there exist a 0-1 allocation? This problem is NP-complete even if we set all memory sizes equal m1 = m2 = . . . = mM = m. The reason is that satisfying the memory constraints is equivalent to the bin packing problem where s denotes the sizes of the objects and the bins are of size m. If we choose to ignore memory constraints altogether then the problem is still NP-hard for 0-1 allocations as we show below.

Thus for 1 ≤ i ≤ j 0 , fj∗0 li ≥ Ri , fj∗0

li ≥

i=1

0-1 Allocation with No Memory Constraints: Given an input quadruple I = hr, l, s, mi with mi = ∞ for 1 ≤ i ≤ M , does there exist an allocation with load value f ≤ 1?

0

0

0

j X

j X

Ri =

j X

rj .

j=1

i=1

Therefore, f

∗

≥

fj∗0

Pj 0

j=1 rj

≥ Pj 0

i=1 li

. t u

Narendran et al. [12] present an allocation algorithm under a similar model to ours, but without memory constraints. Now we show that it is trivial to achieve an optimal allocation by selecting aij appropriately. P Theorem 1 If for all i, mi ≥ N j=1 sj , then an optimal allocation is achieved by setting aij = lˆli , for all i, j. Proof: Since aij > 0, it implies that each server must have copies of all documents, thus Ri f (a) = max | 1≤i≤M li ( PN ) j=1 ( rj aij ) = max | 1≤i≤M li ) ( l PN ( ˆli ) j=1 rj | 1≤i≤M = max li rˆ = max | 1≤i≤M ˆl rˆ ≤ f ∗. = ˆl Therefore, a is an optimal solution.

t u

6. NP-Completeness We observe that even simple formulations of this optimization problem are NP-hard. Consider the following problems:

We show that this problem is NP-complete even if all servers have an equal number of HTTP connections, l1 = l2 = . . . = lM = l. As before we may reduce the bin packing problem to this problem by letting l be the bin size and letting r denote the sizes of the objects to be packed. A 0-1 allocation of value at most 1 is equivalent to a bin packing into M bins, since for each server 1 ≤ i ≤ M we have Ri /l ≤ 1 implying that the total size of objects Ri assigned to bin i is at most the bin size of l. These results imply that the problem is only interesting when there are memory constraints or limits on the number of servers to which a document can be allocated. Henceforth we only consider 0-1 allocations.

7. Approximation Algorithms Throughout the remainder of the paper we consider only 0-1 allocations.

7.1. No Memory Constraint Consider an instance of the document allocation problems in which there are no memory constraints, that is mi = ∞, 1 ≤ i ≤ M . Consider Algorithm 1 shown in Fig. 1. We will show that it produces an allocation that is within a factor 2 of the optimal solution. Note that for a 0-1 allocation with no memory constraints we may assume that N ≥ M , since otherwise the optimal assignment is achieved by placing one document in each of the N servers with the largest values of li . Theorem 2 Let f1 be the objective function value for the Algorithm 1 for no memory constraint. Then f1 ≤ 2f ∗ . Proof: Suppose not. Let j 0 be the first document allocated to some server i0 such that Rl0i0 > 2f ∗ , where i

0

Algorithm 1 Input: A quadruple I = hr, l, s, mi, where mi = ∞ for 1 ≤ i ≤ M . Output: A 0-1 allocation of documents to servers. 1. Sort documents by decreasing access cost, rj . 2. Sort servers by decreasing port connections, li . 3. for 1 ≤ i ≤ M do { 4. set Ri = 0; } 5. for 1 ≤ j ≤ N do { R +r 6. Choose i that minimizes ili j for 1 ≤ i ≤ M . 7. Allocate document j to server i. 8. Ri += rj . }

j X

0

0

Ri + j rj 0

> 2f

j=1

0

0

j j X X

aij rj + j rj 0

> 2f

0

rj + j rj 0

> 2f

Ri + r j 0 R i0 R i0 + r j 0 ≥ > 2f ∗ , ≥ 0 li li li 0 which implies Ri0 + rj 0 > 2li0 f ∗ . Summing over 1 ≤ i ≤ j 0 we have   0 j0 j j0 X X X   Ri + r j 0 > 2li f ∗ i=1

j=1

i=1

∗

li

Since rj 0 ≤ rj for 1 ≤ j ≤ j 0 , we have Pj 0 Pj 0 j=1 rj + j=1 rj . This implies j X

j X

li .

i=1

j=1

Pj 0

j=1

rj + j 0 rj 0 ≥

0

rj

> 2f

∗

j X

li

i=1

j=1

∗

. li li−1 By line 8 of the algorithm, if i − 1 ∈ / S then Ri−1 = 0 and since this is the first document allocated to server i, Ri = 0. Thus rj 00 rj 00 < , li li−1 which implies li > li−1 , a contradiction. Thus it follows that we have S ⊆ {1, 2, . . . , j 0 }. Because each of the first j 0 documents has been allocated to one Pj 0 server in S we have i=1 aij rj = rj . Consider the situation just after the allocation of document j 0 . By the choice of i in line 6 of the algorithm we have for 1 ≤ i ≤ M ,

∗

rj li

.

However this contradicts the lower bound of Lemma 2.

t u

It is easy to see that a straightforward implementation of Algorithm 1 runs in O(N log N + N M ) time, where lines 1 and 6 dominate the total time. If there are L distinct values of li it is possible to achieve a running time of O(N log N + N L), which is no worse since L ≤ M . To do this we partition the servers into L groups according to the value of li . For each group we maintain a binary heap [3] which is sorted by the value Ri . For each group we can determine the minimum Ri value in O(1) time and hence can determine the server i on line 6 in O(L) time by inspecting each heap. For the selected heap we update the value of Ri in O(log N ) time. Thus each iteration of the loop of line 5 takes O((log N ) + L) time for a total running time of O(N log N + LN ).

7.2. Equal Memory and Load Constraints In this section we will show how to relax the assumption on memory size made in the previous section. Recall that we are given an input quadruple I = hr, l, s, mi, N = |r| = |s|, M = |l| = |m|. Assume that we have homogeneous servers with all servers having the same number of HTTP connections and equal memory sizes, that is, li = l and mi = m for 1 ≤ i ≤ M . Assume there exists a 0-1 allocation a∗ with an objective function value f ∗ such that both memory and load balancing constraints are satisfied for all i, that is, N X j=1

sj a∗ij ≤ m,

N X rj a∗ij j=1

l

≤ f ∗.

For 1 ≤ j ≤ N , we normalize each document’s access cost rj and each document’s size sj as follows: rj0 =

Algorithm 3

sj rj , s0j = . lf ∗ m

PN PN 0 ∗ This implies that j=1 rj0 a∗ij ≤ 1, and j=1 sj aij ≤ 1. In general we do not know f ∗ , so our approach will be to conduct a binary search to find the smallest value of f ∗ such that we can allocate all the documents into servers with memory 4m with total cost at most 4f ∗ . This will provide us with the desired approximation bounds.

Algorithm 2 Input: A quadruple I = hr, l, s, mi,where li = l, mi = m for 1 ≤ i ≤ M and target cost f ∗ . Output: A 0-1 allocation of documents to servers and indication of success. 1. /∗ Initialization ∗/ L1i , L2i , Mi1 , Mi2 = 0 for 1 ≤ i ≤ M , r s normalize rj , sj , rj0 = lfj∗ , s0j = mj , for all j 2. Split the documents into two sets, D 1 , D2 , where D1 = {j | rj0 ≥ s0j }, D2 = {j | rj0 < s0j }. 3. call Algorithm 3; 4. if all documents have been assigned to some server then output yes, else output no.

1. 2. 3. 4. 5. 6. 7.

1. 2. 3. 4. 5. 6. 7.

Subroutine used in Algorithm 2

// Phase 1: Assign documents of D 1 to // servers such that Ri ≤ lf ∗ and Mi ≤ m. j = 1; for (i = 1 to M and j ≤ N ) do { while (j ≤ |D1 | and L1i < 1) do { Allocate document j to server i. L1i += rj 0 ; Mi1 += sj 0 ; j ++; } } // Phase 2: Assign documents of D 2 to // servers such that Ri ≤ lf ∗ and si ≤ m. j = 1; for (i = 1 to M and j ≤ N ) do { while (j ≤ |D2 | and Mi2 < 1) do { Allocate document j to server i. L2i += rj 0 ; Mi2 += sj 0 ; j ++; } }

Figure 3. The 0/1 approximation algorithm for both memory and load constraints (cont.).

Claim 2 At any point in the algorithm, Figure 2. The 0/1 approximation algorithm for both memory and load constraints.

max(max(L1i , L2i , Mi1 , Mi2 )) ≤ 2), i

where the max is over 1 ≤ i ≤ M . 1

2

1

We split the documents into two sets, D , D . D consists of the documents whose (normalized) access cost is bigger than its (normalize) document size and D 2 consists of documents whose document size is bigger than its access cost. Let L1i denote the cumulative load for documents that are in D1 and are assigned to server i. Let Mi1 denote the cumulative memory for documents that are in D 1 and are assigned to server i. Define L2i and Mi2 similarly for D2 . Algorithm 3 shown in Figures 3 consists of two phases. We try to assign as many documents which are in D 1 as possible in phase 1 and then assign the remaining documents which are in D2 in phase 2. The first phase guarantees that servers are well utilized with respect to access cost and the second phase guarantees utilization with respect to size. Claim 1 At any time in the execution of Algorithm 2, Mi1 ≤ L1i , L2i ≤ Mi2 . Proof: This follows from definitions of D 1 and D2 .

t u

Proof: The proof is by induction on the number of documents. Initially this is clearly satisfied. Suppose that the claim holds just prior to insertion of document j0 . Let i be the server to which j0 is allocated. Case 1: j0 ∈ D1 Prior to insertion, L1i ≤ 1 (for otherwise j0 would not be placed here) . So after insertion its load is L1i + rj0 0 ≤ 1 + rj0 0 ≤ 2 (since rj0 0 ≤ 1, ∀j). Thus after insertion j0 , we have L1i ≤ 2, and by the previous claim, Mi1 ≤ 2. Case 2: j0 ∈ D2 Prior to insertion, Mi2 ≤ 1 (for otherwise j0 would not be placed here) . So after insertion its load is Mi2 + s0j0 ≤ 1 + s0j0 ≤ 2 (since s0j0 ≤ 1, ∀j). Thus after insertion j0 , we have Mi2 ≤ 2, and by the previous claim, L2i ≤ 2. t u

Claim 3 If there exists an optimal allocation a∗ ∗ with value f satisfying both the memory constraint P N ∗ ≤ m, and the load balance constraint, j=1 sj aij PN rj a∗ij ≤ f ∗ , then Algorithm 2 succeeds in assignj=1 l ing all documents. Proof: Suppose not. Let j0 be the first document which fails to fit. Consider Li , Mi just prior to the insertion of j0 . • Case 1: j0 ∈ D1 Just prior to insertion of j0 , we claim that L1i > 1, ∀i. If this were not so for some i, Li ≤ 1, then we would have assigned j0 to server i. From this, we have M

Approximation Algorithms for Data Distribution with Load ... - CiteSeerX

Approximation Algorithms for Data Distribution with Load ... - CiteSeerX

Suggest Documents

Approximation Algorithms for Combinatorial Auctions with ... - CiteSeerX

Greedy Algorithms for Optimal Distribution Approximation

Load Value Approximation - CiteSeerX

Geometric Approximation Algorithms - CiteSeerX

Geometric Approximation Algorithms - CiteSeerX

Retrospective-Approximation Algorithms for the ... - CiteSeerX

retrograde approximation algorithms for jeopardy ... - CiteSeerX

Approximation Algorithms for Stochastic Inventory Control ... - CiteSeerX

Approximation Algorithms for Capacitated Stochastic ... - CiteSeerX

Improved Approximation Algorithms for Rectangle Tiling ... - CiteSeerX

Approximation algorithms for facility location problems - CiteSeerX

Efficient Approximation Algorithms for Repairing ... - CiteSeerX

Stochastic Approximation Algorithms for Number Partitioning - CiteSeerX

Approximation Algorithms For Wireless Sensor Deployment - CiteSeerX

Faster Approximation Algorithms for the Rectilinear ... - CiteSeerX

Models and Approximation Algorithms for Channel ... - CiteSeerX

improved approximation algorithms for shop scheduling ... - CiteSeerX

Approximation Algorithms For Wireless Sensor Deployment - CiteSeerX

Subquadratic Approximation Algorithms For Clustering ... - CiteSeerX

Models and Approximation Algorithms for Channel ... - CiteSeerX

Approximation algorithms for quadratic programming - CiteSeerX

Approximation algorithms for semidefinite packing problems with ...

Approximation Algorithms for Submodular Set Cover with ... - CiteSeerX

Approximation algorithms for semidefinite packing problems with ...