A Distributed Control Law for Load Balancing in ... - Semantic Scholar

3 downloads 270238 Views 2MB Size Report
AContent Delivery Network (CDN) represents a popular and useful solution to effectively support emerging Web applications by adopting a distributed overlay of ...
IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 21, NO. 1, FEBRUARY 2013

55

A Distributed Control Law for Load Balancing in Content Delivery Networks Sabato Manfredi, Member, IEEE, Francesco Oliviero, Member, IEEE, and Simon Pietro Romano, Member, IEEE

Abstract—In this paper, we face the challenging issue of defining and implementing an effective law for load balancing in Content Delivery Networks (CDNs). We base our proposal on a formal study of a CDN system, carried out through the exploitation of a fluid flow model characterization of the network of servers. Starting from such characterization, we derive and prove a lemma about the network queues equilibrium. This result is then leveraged in order to devise a novel distributed and time-continuous algorithm for load balancing, which is also reformulated in a time-discrete version. The discrete formulation of the proposed balancing law is eventually discussed in terms of its actual implementation in a real-world scenario. Finally, the overall approach is validated by means of simulations. Index Terms—Content Delivery Network (CDN), control theory, request balancing.

I. INTRODUCTION

A

Content Delivery Network (CDN) represents a popular and useful solution to effectively support emerging Web applications by adopting a distributed overlay of servers [2]–[4]. By replicating content on several servers, a CDN is capable to partially solve congestion issues due to high client request rates, thus reducing latency while at the same time increasing content availability. Usually, a CDN (see Fig. 1) consists of an original server (called back-end server) containing new data to be diffused, together with one or more distribution servers, called surrogate servers. Periodically, the surrogate servers are actively updated by the back-end server. Surrogate servers are typically used to store static data, while dynamic information (i.e., data that change in time) is just stored in a small number of back-end servers. In some typical scenarios, there is a server called redirector, which dynamically redirects client requests based on selected policies. The most important performance improvements derived from the adoption of such a network concern two aspects: 1) overall system throughput, that is, the average number of requests served in a time unit (optimized also on the basis of the processing capabilities of the available servers); 2) response Manuscript received September 22, 2010; revised May 30, 2011; accepted February 22, 2012; approved by IEEE/ACM TRANSACTIONS ON NETWORKING Editor Z. M. Mao. Date of publication April 03, 2012; date of current version February 12, 2013. This work is based on seminal work that appears in the Proceedings of IEEE Global Communications Conference (GLOBECOM) Workshop, Miami, FL, December 6–20, 2010. The authors are with the Dipartimento di Informatica e Sistemistica, Federico II University of Napoli, Naples 80138, Italy (e-mail: [email protected]; [email protected]; [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TNET.2012.2190297

time experienced by clients after issuing a request. The decision process about these two aspects could be in contraposition. As an example, a “better response time” server is usually chosen based on geographical distance from the client, i.e., network proximity; on the other hand, the overall system throughput is typically optimized through load balancing across a set of servers. Although the exact combination of factors employed by commercial systems is not clearly defined in the literature, evidence suggests that the scale is tipped in favor of reducing response time. Akamai [5], LimeLight [6], and CDNetworks [7] are wellknown commercial CDN projects that provide support to the most popular Internet and media companies, including BBC, Microsoft, DreamWorks, EA, and Yahoo!. Several academic projects have also been proposed, like CoralCDN [8] at New York University and CoDeeN [9] at Princeton University, both running on the PlanetLab1 testbed. A critical component of a CDN architecture is the request routing mechanism. It allows to direct users’ requests for a content to the appropriate server based on a specified set of parameters. The proximity principle, by means of which a request is always served by the server that is closest to the client, can sometimes fail. Indeed, the routing process associated with a request might take into account several parameters (like traffic load, bandwidth, and servers’ computational capabilities) in order to provide the best performance in terms of time of service, delay, etc. Furthermore, an effective request routing mechanism should be able to face temporary, and potentially localized, high request rates (the so-called flash crowds) in order to avoid affecting the quality of service perceived by other users. Depending on the network layers and mechanisms involved in the process, generally request routing techniques can be classified in DNS request routing, transport-layer request routing, and application-layer request routing [10]. With a DNS-based approach, a specialized DNS server is able to provide a request-balancing mechanism based on well-defined policies and metrics [11]–[13]. For every address resolution request received, the DNS server selects the most appropriate surrogate server in a cluster of available servers and replies to the client with both the selected IP address and a time-to-live (TTL). The latter allows to define a period of validity for the mapping process. Typical implementations of this approach can provide either a single surrogate address or a record of multiple surrogate addresses, in the last case leaving to the client the choice of the server to contact (e.g., in a round-robin fashion). 1http://www.planet-lab.org

1063-6692/$31.00 © 2012 IEEE

56

IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 21, NO. 1, FEBRUARY 2013

Fig. 1. Content Delivery Network.

With transport-layer request routing, a layer-4 switch usually inspects information contained in the request header in order to select the most appropriate surrogate server. Information about the client’s IP address and port (and more generally all layer-4 protocol data) can be analyzed. Specific policies and traffic metrics have been defined for a correct server selection. Generally, the routing to the server is achieved either by rewriting the IP destination of each incoming packet, or by a packet-tunneling mechanism, or by a forwarding mechanism at the MAC layer. With application-layer request routing, the task of selecting the surrogate server is typically carried out by a layer-7 application, or by the contacted Web server itself. In particular, in the presence of a Web-server routing mechanism, the server can decide to either serve or redirect a client request to a remote node. Differently from the previous mechanism, which usually needs a centralized element, a Web-server routing solution is usually designed in a distributed fashion. URL rewriting and HTTP redirection are typical solutions based on this approach. In the former case, a contacted server can dynamically change the links of embedded objects in a requested page in order to let them point to other nodes. The latter technique instead exploits the redirection mechanism of the HTTP protocol to appropriately balance the load on several nodes. In this paper, we will focus our attention on the applicationlayer request routing mechanism. More precisely, we will provide a solution for load balancing in the context of the HTTP redirection approaches. In most of the papers available in the literature, the design of a proper network management law is carried out by assuming a continuous fluid flow model of the network. Validation and testing are then provided by exploiting a discrete packet simulator (e.g., ns-2, Opnet, etc.) in order to take into account the effects of discretization and nonlinear nature occurring in practice. This approach is widely used in the communication and control communities (see, for example, [14]–[19] and references therein). In a similar way, in this paper we first design a suitable load-balancing law that assures equilibrium of the queues in a balanced CDN by using a fluid flow model for the network of servers. Then, we discuss the most notable implementation issues associated with the proposed load-balancing strategy. Finally, we validate our model in more realistic scenarios by means of ns-2 simulations.

We present a new mechanism for redirecting incoming client requests to the most appropriate server, thus balancing the overall system requests load. Our mechanism leverages local balancing in order to achieve global balancing. This is carried out through a periodic interaction among the system nodes. The rest of this paper is organized as follows. We will briefly describe some interesting solutions for load balancing in Section II. Section III discusses the reference model for a load-balanced CDN, while Section IV focuses on our proposal for a novel distributed load-balancing algorithm suitable for this kind of architecture. Section V presents the results of a thorough experimental campaign based on simulations and aimed at assessing the performance of the proposed solution. Section VI proposes an in-depth discussion of the most critical features of our solution. Finally, Section VII concludes the paper by providing final remarks, as well as some pointers to open issues and related directions of future investigation. II. RELATED WORK Request routing in a CDN is usually concerned with the issue of properly distributing client requests in order to achieve load balancing among the servers involved in the distribution network. Several mechanisms have been proposed in the literature. They can usually be classified as either static or dynamic, depending on the policy adopted for server selection [20]. Static algorithms select a server without relying on any information about the status of the system at decision time. Static algorithms do not need any data retrieval mechanism in the system, which means no communication overhead is introduced. These algorithms definitely represent the fastest solution since they do not adopt any sophisticated selection process. However, they are not able to effectively face anomalous events like flash crowds. Dynamic load-balancing strategies represent a valid alternative to static algorithms. Such approaches make use of information coming either from the network or from the servers in order to improve the request assignment process. The selection of the appropriate server is done through a collection and subsequent analysis of several parameters extracted from the network elements. Hence, a data exchange process among the servers is needed, which unavoidably incurs in a communication overhead.

MANFREDI et al.: DISTRIBUTED CONTROL LAW FOR LOAD BALANCING IN CDNs

57

Fig. 2. Local load-balancing strategies. (a) Queue-adjustment. (b) Rate-adjustment. (c) Hybrid-adjustment.

The redirection mechanisms can be implemented either in a centralized or in a distributed way [20]. In the former, a centralized element, usually called dispatcher, intercepts all the requests generated into a well-known domain, for example an autonomous system, and redirects them to the appropriate server into the network by means of either a static or a dynamic algorithm. Such an approach is usually adopted by commercial CDN solutions. With a distributed redirection mechanism, instead any server receiving a request can either serve it or redistribute it to another server based on an appropriate (static or dynamic) load-balancing solution. Depending on how the scheduler interacts with the other components of the node, it is possible to classify the balancing algorithms in three fundamental models [21] (see Fig. 2): a queue-adjustment model, a rate-adjustment model, and a hybrid-adjustment model. In a queue-adjustment strategy, the scheduler is located after the queue and just before the server. The scheduler might assign the request pulled out from the queue to either the local server or a remote server depending on the status of the system queues: If an unbalancing exists in the network with respect to the local server, it might assign part of the queued requests to the most unloaded remote server. In this way, the algorithm tries to equally balance the requests in the system queues. It is clear that in order to achieve an effective load balancing, the scheduler needs to periodically retrieve information about remote queue lengths. In a rate-adjustment model, instead the scheduler is located just before the local queue: Upon arrival of a new request, the scheduler decides whether to assign it to the local queue or send it to a remote server. Once a request is assigned to a local queue, no remote rescheduling is allowed. Such a strategy usually balances the request rate arriving at every node independently from the current state of the queue. No periodical information exchange, indeed, is requested. In a hybrid-adjustment strategy for load balancing, the scheduler is allowed to control both the incoming request rate at a node and the local queue length. Such an approach allows to have a more efficient load balancing in a very dynamic scenario, but at the same time it requires a more complex algorithm. In the context of a hybrid-adjustment mechanism, the queue-adjustment and the rate-adjustment might be considered respectively as a fine-grained and a coarse-grained process. Both centralized and distributed solutions present pros and cons depending on the considered scenario and the specific performance parameters evaluated. As stated in [22], although in some cases the centralized solution achieves lower response time, a fully distributed mechanism is much more scalable. It is also robust in case of dispatcher fault, as well as easier to

implement. Finally, it imposes much lower computational and communication overhead. In the following, we will describe the most common algorithms used for load balancing in a CDN. Such algorithms will be considered as benchmarks for the evaluation of the solution we propose in this paper. The simplest static algorithm is the Random balancing mechanism (RAND). In such a policy, the incoming requests are distributed to the servers in the network with a uniform probability. Another well-known static solution is the Round Robin algorithm (RR). This algorithm selects a different server for each incoming request in a cyclic mode. Each server is loaded with the same number of requests without making any assumption on the state, neither of the network nor of the servers. The Least-Loaded algorithm (LL) is a well-known dynamic strategy for load balancing. It assigns the incoming client request to the currently least loaded server. Such an approach is adopted in several commercial solutions. Unfortunately, it tends to rapidly saturate the least loaded server until a new message is propagated [23]. Alternative solutions can rely on Response Time to select the server: The request is assigned to the server that shows the fastest response time [24]. The Two Random Choices algorithm [25] (2RC) randomly chooses two servers and assigns the request to the least loaded one between them. A modified version of such an algorithm is the Next-Neighbor Load Sharing [26]. Instead of selecting two random servers, this algorithm just randomly selects one server and assigns the request to either that server or its neighbor based on their respective loads (the least loaded server is chosen). In Section III, we will present an alternative solution for load balancing, falling in the class of rate-adjustment approaches. We propose a highly dynamic distributed strategy based on the periodical exchange of information about the status of the nodes in terms of load. By exploiting the multiple redirection mechanism offered by HTTP, our algorithm tries to achieve a global balancing through a local request redistribution process. Upon arrival of a new request, indeed, a CDN server can either elaborate locally the request or redirect it to other servers according to a certain decision rule, which is based on the state information exchanged by the servers. Such an approach limits state exchanging overhead to just local servers. III. LOAD-BALANCED CDN: MODEL FORMULATION In this section, we will introduce a continuous model of a CDN infrastructure, used to design a novel load-balancing law. The CDN can be considered as a set of servers each with its own queue. We assume a fluid model approximation for the dynamic behavior of each queue. We extend this model also to the overall CDN system. Such approximation of a stochastic system

58

IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 21, NO. 1, FEBRUARY 2013

Fig. 3. Fluid queue model.

by means of a deterministic process is widely adopted in the literature [14], [15], [18]. Usually, a CDN is designed with adequate resources in order to satisfy the traffic volume generated by end-users. In general, a wise provisioning of resources can ensure that the input rate is always lower than the service rate. In such a case, the system will be capable to efficiently serve all users’ requests. Though, in this paper we focus exclusively on critical conditions where the global resources of the network are close to saturation. This is a realistic assumption since an unusual traffic condition characterized by a high volume of requests, i.e., a flash crowd, can always overfill the available system capacity. In such a situation, the servers are not all overloaded. Rather, we typically have local instability conditions where the input rate is greater than the service rate. In this case, the balancing algorithm helps prevent a local instability condition by redistributing the excess load to less loaded servers. As anticipated before, in this section we will formulate a new load-balancing algorithm for CDNs based on a continuous fluid flow model. In Section IV-B, we will consider a discrete version of the load-balancing law, specifically derived for implementation purposes. We will eventually demonstrate the effectiveness of such a discrete version with the help of simulations. Let be the queue occupancy of server at time . We consider the instant arrival rate and the instant service rate . The fluid model (Fig. 3) of CDN servers’ queues is given by (1) for . Equation (1) represents the queue dynamics over time. In particular, if the arrival rate is lower than the service rate, we observe a decrease in queue length. On the other hand, the queue increases whenever the arrival rate is greater than the service rate. In the latter case, the difference in (1) represents the amount of traffic exceeding the available system’s serving rate. The model described above nicely fits a system in which there is no cooperation among nodes. In such a case, in fact, a node that receives more traffic than it is able to handle will not be able to serve all incoming requests due to an overload condition. It stands clear, though, that such a critical condition might be alleviated if the node in question were allowed to redirect the exceeding incoming traffic to other nodes in the network. Indeed, if we look at the overall system’s behavior, we are interested in guaranteeing that the following condition holds: (2) In the above formula, and represent, respectively, the overall average incoming rate and the overall average service rate of the system once equilibrium is reached.

As already stated, the objective in (2) is a natural goal for any CDN that has been appropriately designed. However, such a general goal can be achieved in many different ways, not all of which provide local stability guarantees, as well as balancing of the servers’ queues. Indeed, it might happen that the overall condition is met, but one or more local server’s queues overflow, thus bringing to packet losses and unavailability of the overloaded servers (with an impact on the perceived service time). In order to meet the requirement in (2), at the same time avoiding local instability situations, we should be able to guarantee that the following condition holds for all of the servers in the network: (3) . for We hence propose to introduce cooperation in the system by allowing each single node to undertake proper actions aimed at satisfying the condition (3). Namely, in order to control the dynamics of the queue length and prevent any critical situation in terms of congestion, we can operate directly on the fraction of traffic exceeding the server’s capacity. Such excess traffic can be accommodated by redistributing it to the server’s neighbors on the basis of an appropriate control law. As it will become apparent in the following, the main idea of the control law we propose in this paper relies on properly redistributing server ’s excess traffic to one or more neighboring servers in case their queues are less loaded than the local queue at server . To this aim, we focus on the dynamics of queue length variations for with respect to the desired equilibrium point deriving from the application of a proper load-balancing law. By better formalizing the above considerations, we eventually arrive at the following formulation, as far as cooperation activities are concerned, i.e., by focusing on the impact that exchanges among nodes have on their respective queue lengths: (4) In (4), neighbors of node , and takes into account the portion of requests injected from node into node if it is negative, while it accounts for the portion of requests injected into node from node if it is positive. In fact, the strategy we devised balances the exceeding local load proportionally to the load of the other nodes. In order to take into account the balancing strategy in the fluid flow model, we introduce the following balancing law: (5) represents a nonnegative gain. In doing so, the where amount of requests redistributed from (resp. to) server to (resp. from) neighbor with (resp. ) at time is proportional to the difference between their respective queue occupancies: The greater is the difference, the greater is the portion of requests assigned to the receiving server. By using (5), (4) in compact form becomes (6)

MANFREDI et al.: DISTRIBUTED CONTROL LAW FOR LOAD BALANCING IN CDNs

where

for

and

, with

The above expression represents our model for a balanced CDN system. If we suppose no-lossy communications among network nodes, the portion of requests redirected by node to node is equal to the portion of requests received by node from node (7) Indeed, it comes out that if server is sending a portion of its requests to server (i.e., and so ), then the term . We observe that coefficient in (5) is symmetric with respect to the indexes and . From (7), it follows that (8) and so (9) From this, it follows that matrix is a row sum zero symmetric Laplacian matrix with nonpositive off-diagonal elements. In what follows, we will show that the network of servers at the equilibrium can be balanced by the law expressed in (5). Lemma 3.1: If the network of servers under the load-balancing law (5) is strongly connected for all , then the network queues equilibrium corresponds to the state of balanced queues. Proof: Since the network under the law (5) is strongly consatisfies nected and holds condition (9), then the matrix instantaneously the properties of a weighted Laplacian matrix, i.e., it is a symmetric and irreducible positive semidefinite matrix with a simple eigenvalue at zero [27]. Therefore, any equilibrium manifold of the system (6) is a right eigenvector of associated with the zero eigenvalue with all equal components

with , , for . Thus, the equilibrium of the network under the balancing law (5) corresponds to the condition of the balanced queues length (i.e., for ). Note that the above result is quite general in the sense that every balancing law of the form (5) with gain factor satisfying (9) that preserves network connectivity guarantees that all the queues are balanced at the equilibrium. In Section IV, we will present a possible choice of a balancing law. IV. DISTRIBUTED LOAD-BALANCING ALGORITHM In this section, we want to derive a new distributed algorithm for request balancing that exploits the results presented in Section III. First of all, we observe that it is a hard task to define a strategy in a real CDN environment that is completely compliant with the model proposed. As a first consideration, such a model deals with continuous-time systems, which is not exactly the

59

case in a real packet network where the processing of arriving requests is not continuous over time. For this reason, in the following of this section, we focus on the control law described by (5). The objective is to derive an algorithm that presents the main features of the proposed load-balancing law and arrives at the same results in terms of system equilibrium through proper balancing of servers’ loads, as assessed by Lemma 3.1. Equation (4) can be rewritten by considering separately the term associated with incoming requests at node and the term associated with the requests redirected from node to its neighbors (10)

neighbors of node and neighbors of node are the sets of neighbors whose queue loads are respectively lower and greater than queue load at node . Clearly, since we want to define a local strategy for balancing, each node can exclusively act on the amount of requests redirected to its neighbors without controlling the incoming requests. For this reason, we will disregard the amount of requests redirected by the neighbors. This does not violate the control law since the incoming requests are provided by the neighbors, all of which use the same balancing algorithm. In this way, we act on the term where

(11)

and from (5) (12)

where is the gain to be designed. Let be the amount of requests to be redirected from node to its neighbors at time instant . From (12), we can design the gain as follows: (13) in order to divide the load into the output flows with . Such a coefficient is symmetric with respect to the indexes and since the amount of traffic redirected from node to node is equal to the traffic received at node from node if no requests are lost during the redirection process. Formula (5) presents some feasibility problems in a real environment since it requires an instantaneous update of the neighbors’ loads . Furthermore, we have supposed a continuous fluid queue model for requests arriving and leaving the server. Actually, this approximation cannot be exploited in a real scenario: The requests arrive and leave the server at discrete times (Fig. 4), hence in a given time interval, a discrete number of requests arrives at and departs from each server in the system.

60

IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 21, NO. 1, FEBRUARY 2013

in the interval is available, which prevents distributing them in accordance with the balancing formula (14). Alternatively, we propose to assign a new request to a neighbor based on a probability that is proportional to the load difference. More precisely, a new request is assigned from node to node with probability equal to

Fig. 4. Discrete time notation of arriving requests.

In order to get around such limits, we introduce a new timediscrete version of (5) with given by (13). Let us consider a sample period and assume to be the number of requests arriving in the interval at node . The algorithm redirects an amount of such requests to neighbor based on the following formula:

(15) This approach does not alter the principle of the mechanism proposed by (14), i.e., to assign requests in a way that is proportional to the load differences. Indeed, we consider the random variable with Bernoulli distribution if request is assigned from node to node otherwise

(14) which still satisfies (5). Indeed, every seconds, the algorithm assigns to neighbor an amount of all the requests arrived in interval proportional to the difference of their request loads. Naturally, for approaching zero, we again obtain the timecontinuous algorithm. A. Implementation Issues The balancing mechanism described by formula (14) still presents several implementation challenges, which we will briefly discuss in the following. The proposed algorithm represents a discrete synchronous system that every seconds redistributes all requests arrived at a server during the previous interval, based on the current state of the neighbor’s queues. Naturally, this mechanism introduces two main feasibility issues: 1) a buffer is needed to store the incoming requests during the interval ; 2) a delay in the server replies is experienced by clients due both to the requests buffering and to the status update process at time just before the redistribution. The latter problem can be easily solved by adopting the status information at time rather than at time : If the interval approaches zero, such a solution is irrelevant since no requests are redistributed at each node before the end of the interval. The buffering problem, instead, can severely affect the overall performance of the system. Based on the update interval , a client can experience a long response delay that can make the balancing strategy itself fruitless. In order to overcome such an issue, we switch from a synchronous to an asynchronous mechanism: Instead of buffering all requests and redistributing them at regular intervals, we suppose that the system reacts as soon as a request arrives. At the beginning of each interval, indeed, the system updates the neighbors’ status, and by means of such information, it redistributes the upcoming requests based on the respective load differences. In order to be completely compliant with the model, every request should be split among all the neighbors in a way that is proportional to the difference . On the other hand, no a priori knowledge about the total number of requests arriving

(16) with success and unsuccess probability, respectively, and , and expected value . (for ) the result of the If we indicate with th experiment, the relative frequency associated with request redirections from node to node is (17) where represents the number of experiments in a time interval of seconds and in our case corresponds to the total number of requests arriving at node . Based on the law of converges to the expected value as aplarge numbers, proaches infinity, which is exactly the fraction of requests to be redirected to node according to formula (14). In Section IV-B, we will provide a detailed description of the implemented algorithm based on the above assumptions. B. Algorithm Description The implemented algorithm consists of two independent parts: a procedure that is in charge of updating the status of the neighbors’ load, and a mechanism representing the core of the algorithm, which is in charge of distributing requests to a node’s neighbors based on (15). In Fig. 5, the pseudocode of the algorithm is reported. Even though the communication protocol used for status information exchange is fundamental for the balancing process, in this paper we will not focus on it. Indeed, for our simulation tests, we implemented a specific mechanism: We extended the HTTP protocol with a new message, called CDN, which is periodically exchanged among neighboring peers to carry information about the current load status of the sending node. Naturally, a common update interval should be adopted to guarantee synchronization among all interacting peers. For this purpose, a number of alternative solutions can be put into place, which are nonetheless out of the scope of the present work. Every seconds, the server sends its status information to its neighbors and, at the same time, waits for their information. After a well-defined interval, the server launches the status update process. We suppose all the information about peers’ load is already available during such a process. Actually, status data

MANFREDI et al.: DISTRIBUTED CONTROL LAW FOR LOAD BALANCING IN CDNs

61

Fig. 7. Simulation topology.

Fig. 5. Pseudocode description of the proposed algorithm.

otherwise the balancing strategy is adopted. For requests redistribution, we adopt a random number generator with uniform distribution between 0 and 1. Depending on which interval the generated number falls in, the algorithm selects the corresponding peer for redirecting the incoming request. For example, from Fig. 6, if the random number falls in the first interval, the request will be redirected to peer 1. By adopting such a mechanism, we ensure that the probability of selecting a peer is in accordance with (15). Indeed, if we consider as the random variable with uniform probability density function , the probability at time that the algorithm selects the th peer is (18) where

Fig. 6. Probability space for requests assignment.

(19) inconsistencies among peers might occur due to delays in the status exchange process. In order to get around such an issue, we can consider the most recent peer information for the balancing process; although this might influence the performance of the algorithm, such effects can be mitigated by reducing the update interval. The status update process analyzes all information provided by the peers; for each of them, if its load is lower than the current local load , then the difference is used for setting up the probability space by means of (15). In particular, at every interval of length , we build a new vector where

for , which is exploited by the balancing process and whose representation is sketched in Fig. 6. Any time a new request arrives at a server, it verifies the presence of neighbors with a lower load. If no such neighbors are present, the server locally processes the request and serves it,

which exactly corresponds to the probability

in (15).

V. SYSTEM EVALUATION The effectiveness of our algorithm is evaluated through a simulation-based comparison with the most interesting existing techniques (both static and dynamic). We provide extensive simulation tests by using the ns2 network simulator.2 Since no suitable tool for CDN simulation is provided with the standard simulator package, we introduced a new library to support such a scenario [28]. A. Balancing Performance The simulations for the comparative analysis have been carried out using the network topology of Fig. 7. We suppose to have 10 servers connected in the overlay, as well as 10 clients, each of them connected to a single server. We model each server as an M/M/1 queue with service rate , and the generation requests form client as a Poisson process with arrival rate . 2http://www.isi.edu/nsnam/ns/

62

IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 21, NO. 1, FEBRUARY 2013

Fig. 8. Queue length behavior

s . (a) RAND. (b) RR. (c) LL. (d) 2RC. (e) CLB.

TABLE I SERVERS PARAMETERS

In order to correctly evaluate the capability of the algorithms to balance the load among the servers, we suppose different initial conditions for each of them by setting different initial queue lengths, . Furthermore, we consider a critical scenario of a saturated network characterized by (20) Such an assumption allows us to test the balancing algorithm in a critical network condition. In Table I, we report the values used in the simulations. Moreover, in a scenario characterized by initial load conditions similar to those indicated in Table I, with arrival rates req/s and service rates req/s, we also simulate a flash-crowd phenomenon by increasing the arrival rate to 200 req/s in the time interval s and s. We observe that the above scenarios represent simplified deployments when compared to a realistic CDN topology. Though in this section, we exclusively want to provide a qualitative evaluation of the solution proposed with respect to the existing algorithms. In Section V-B, we will demonstrate that the results herein achieved can be extended to larger scale topologies due to the high scalability of our solution. We implemented both the Random (RAND) and the Round Robin (RR) static algorithms, as well as the Least Loaded (LL)

and the Two Random Choices (2RC) dynamic algorithms to make a comparison to our solution [Control-Law Balancing (CLB)]. Then, for each algorithm, we first evaluated each server’s queue length behavior over time, together with the average value among all servers. Such a parameter represents an excellent indicator of the request distribution degree achieved by the CDN. Another important parameter is the Response Time (RT), which evaluates the efficiency of the algorithm in terms of end-user’s satisfaction. For such a parameter, we evaluated both the average value and the standard deviation . We also introduce an Unbalancing Index to estimate the capability of the algorithms to effectively balance requests among the available servers. Such an index is computed as the standard deviation of queue lengths of all the servers over time; clearly, the lower such value, the better the balancing result. Finally, since some of the proposed mechanisms provide multiple redirections, we also considered a parameter associated with communication overhead due to the redirection of a single request. Such a parameter is computed as the ratio of the number of requests due to redirections to the overall number of requests injected into the system. Fig. 8 shows the simulation results related to the profiles of each server’s queue length with an update interval equal to 0.5 s. As expected, static mechanisms provide worse performance since servers’ queue lengths exhibit unpredictable behaviors due to a lack of knowledge about the real status of the server loads. On the other hand, dynamic mechanisms provide better behaviors, and in particular, our solution clearly achieves the best performance since it limits both the number of enqueued requests and their oscillations over time, thus reducing the impact on delay jitter. This confirms the effectiveness of the proposed mechanism, as well as its capability to fairly distribute load among the servers.

MANFREDI et al.: DISTRIBUTED CONTROL LAW FOR LOAD BALANCING IN CDNs

Fig. 9. Queue length behavior

63

s . (a) LL. (b) 2RC. (c) CLB.

Fig. 10. Queue length behavior in the presence of a flash crowd TABLE II PERFORMANCE EVALUATION

s . (a) LL. (b) 2RC. (c) CLB.

s

TABLE III PERFORMANCE EVALUATION

s

TABLE IV PERFORMANCE EVALUATION WITH FLASH-CROWD

The quality of our solution can be further appreciated by analyzing the performance parameters reported in Table II. As it comes out from the table, the average queue length value is the lowest among the analyzed algorithms. The proposed mechanism also exhibits an excellent average Response Time , which is only comparable to the value obtained by the 2RC algorithm. The value associated with is also fairly low. The excellent performance of our mechanism might be paid in terms of a significant number of redirections. Since the redirection process is common to all the algorithms analyzed, we exclusively evaluate the percentage of requests redirected more than once over the total number of requests generated. By simulations we verify that multiple redirections happen for about 33% of the total requests served by the CDN, in a normal operational scenario. This problem can clearly be mitigated by limiting the number of redirections for each request. However, in Section IV-B, we show how the multiple redirections phenomenon does not affect the actual performance of our algorithm. In Fig. 9 we report the queue dynamics for an interval of 1 s. We exclusively considered the dynamic algorithms since the variation of the interval update does not affect the overall

s

performance of the static mechanisms. Also in this case, our solution shows excellent results when compared to the other proposals. This is confirmed by the values in Table III, where we again observe the advantages deriving from the adoption of our algorithm. Coming to the flash-crowd scenario, simulations once again demonstrate how our solution outperforms the analyzed competitor algorithms both in terms of response time and average queue length, as it can be appreciated by looking at the data reported in Table IV. Even more interestingly, Fig. 10 clearly shows how CLB is the best algorithm in terms of capability to recover from the overload situation due to the presence of excess traffic generated during the flash-crowd interval. In fact, as it comes out from the pictures, the 2RC algorithm hardly succeeds in reachieving the steady state condition prior to the flash

64

IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 21, NO. 1, FEBRUARY 2013

TABLE VI NETWORK DEGREE

Fig. 11. Unbalancing index behavior.

TABLE V UNBALANCING INDEX

traffic produced with respect to a solution requiring data exchanging across the whole network. Considering a fixed packet size, the amount of data exchanging for the implemented algorithm exclusively depends on the average number of neighbors at each server . The average neighbors number can be estimated by the “degree” of the network [29] (21)

crowd. On the other hand, the LL and the CLB approaches both react quite effectively to the transient abnormal conditions by quickly bringing back queue occupancies to their steady-state levels. However, this is achieved by the CLB with a more fair balancing among the available servers, as it is further confirmed by the analysis of the unbalancing index in Table V. In fact, in such a table we report the values of the unbalancing index analysis for both the normal and the flash-crowd scenarios. We point out once again the low degree of unbalancing exhibited by our solution with respect to the evaluated counterparts. Such a result confirms that the algorithm provides an optimized balancing mechanism. A thorough analysis of the unbalancing index behavior as a function of the update interval is carried out in Fig. 11, which reports results achieved with the 10 nodes topology of Fig. 7, which employs either the 2RC, the LL, or our CLB algorithm. In particular, we observe that the performance of the CLB is definitely much higher than the one achieved by the 2RC, as well as always better than the LL, which nonetheless represents the best performing competitor. As a final consideration, we point out that our solution outperforms the mentioned LL algorithm also with respect to queue stabilization, thus reducing jitter delay in the CDN, as already remarked when commenting on the results in both Figs. 8 and 9. The selection of an appropriate interval value can obviously be done by considering the trend of the unbalancing index over time. B. Scalability Analysis Before providing the testing results, we briefly discuss the scalability properties of the algorithm in terms of overhead introduced by the status update process. By adopting a local data exchange, we can considerably reduce the amount of overhead

where is the number of links connecting the servers and the total number of servers. For example, in Table VI, we observe a constant value for ring topologies and an asymptotical behavior for the chain. We remark that the average control data rate can be analytically estimated through (21), supposing that is the number of logical links connecting the cooperating servers. In the case of the proposed algorithm is the average degree of a network (corresponding to a logical average number of neighbors) that in a realistic network topology scales as a power law with value between 3 and 4 [30], [31] . On the other side, the global balancing approach, which requires the exchange of status information with all the nodes , is logically assimilated to a full-mesh topology. In this scenario, the worst case of logical average neighbors number occurs with linearly dependent on the number of nodes : . In any case, even if more sophisticated mechanisms, such as flooding, might be implemented for the global data dissemination process, the total number of packets exchanged, as well as their size, increases as long as the network grows in size. The results show the advantages of using a local information exchange with respect to providing all nodes with status information. In order to confirm such theoretical results, we generated several scale-free network topologies by using the BRITE topology generator. We evaluated by simulations the scalability of our solution by adopting the Barabási–Albert model [32] to generate several topologies with an increasing number of nodes (from 5 to 25 nodes, with an increasing step of 5 nodes between each pair of subsequent topologies). Furthermore, we considered an update interval varying in the set s s s s . We also made sure that the traffic parameter varied according to (20). First, we have evaluated the rate of control packets at every node due to the server status update process. Clearly, such a rate decreases as long as the interval increases. In Fig. 12, we have reported the average control rate corresponding to different network sizes. We also report the expected value based on (21) for each topology generated. We observe a limited increase in

MANFREDI et al.: DISTRIBUTED CONTROL LAW FOR LOAD BALANCING IN CDNs

65

Fig. 12. Control data rate.

the rate for each interval with an increasing number of nodes. In particular, the results for s exactly match the value estimated by formula (21). Such results confirm that the traffic due to algorithm control data has no strict dependence on the size of the network. Furthermore, the capability of our solution to properly scale is also evaluated by analyzing the impact of an increasing request load on the CDN in terms of response time, which, as already said, does represent a very good measure of the Quality of Experience of the CDN users. In particular, we have progressively increased the request rate while maintaining a fixed service rate at all servers in the network. Furthermore, we have also considered increasing network topology sizes. We have adopted an initial request rate [req/s] and a service rate [req/s]. The simulations consider an interval of 0.5 and 2.0 s, respectively. The results of simulations are depicted in Fig. 13, showing Response Time as a function of the arrival rate, which has been properly scaled in subsequent simulations as follows: , with and . In both graphs, we observe a very limited increase in average response time for each request load value, except for a request rate equal to , in which case the request rate approximates the service rate, thus bringing the overall system towards instability. We also observe that the value of is unaffected by network size increases, which confirms the correctness of our solution. Finally, in Fig. 14 we present an analysis of the Response Time as a function of a so-called Scale Factor, obtained by properly scaling with equal factors both the arrival rate (whose starting value has been set to 0.8 [req/s]) and the network size, in terms of number of nodes (whose initial value has instead been set to 5 nodes). The picture shows that if we keep the pace of increase of the arrival rate by properly scaling the number of nodes in the CDN, the average Response Time remains almost constant, which is by definition a scalability property of the overall system we designed and presented in the paper. VI. DISCUSSION ON POTENTIAL TUNING STRATEGIES A. Effects of Queue Threshold on Algorithm Performance The algorithm we devised tends to balance load in the CDN, independently from the fact that a specific server might not be overloaded at a certain point in time. Simulation results have shown that response time figures always outperform the other

Fig. 13. Scalability with respect to load. (a)

s (b)

s.

Fig. 14. Overall system scalability.

algorithms we analyzed. Nonetheless, with our approach, as long as a server has neighbors with lower load, incoming requests are redirected among them even when the server itself is underloaded. Therefore, redirections can happen very frequently, which might have an impact on response time. We hence decided to evaluate the possibility of better striking the balance between equalizing queue occupancies at the servers on one side and reducing the number of redirections on the other. With this aim in mind, we configured our simulator in such a way as to impose a lower limit on the queue length, below which no redirection mechanism is applied. With this configuration in place, we ran a whole new set of simulations and derived the main performance evaluation figures. Results are shown in Fig. 15 for what concerns Response Time, and in Fig. 16 for what concerns the Unbalancing Index. The figures report three

66

Fig. 15. Response time as a function of the lower queue threshold.

IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 21, NO. 1, FEBRUARY 2013

Fig. 17. Quota of requests receiving a specified number of redirections.

TABLE VII REDIRECTION THRESHOLD ANALYSIS

Fig. 16. Unbalancing index as a function of the lower queue threshold.

different load conditions, associated respectively with an almost uncharged network (where the overall arrival rate is 33% of the total service rate of the CDN, ), a medium-load scenario , and a heavily loaded network . As it comes out from the graphs, our algorithm (which corresponds to a queue threshold value of 0) always outperforms the alternative strategies based on the absence of redirections when server’s load is below a specified threshold. This holds both for Response Time and for the Unbalancing Index. While the latter result is obvious (since the approach of always redirecting requests guarantees a balanced queue load distribution), the former is less intuitive. Indeed, this is mainly due to the fact that the increased number of redirections in which we incur is repaid by a lower waiting time in the servers’ queues (whose length, as it comes out from the simulations, is always shorter when compared to the other threshold configurations).

As the figure clearly indicates, more than 95% of the requests receive less than eight redirections. We have then carried out a whole new set of simulations after having introduced the possibility to explicitly impose a limit on the overall amount of redirections that each server can make. Based on the above consideration about the request redirection frequency, we expect that a redirection threshold over the detected bound of 8 would prove almost useless in the scenario analyzed in the paper. The results of this new experimental campaign are presented in Table VII, which shows how the upper bound on the number of redirections brings to performance figures that are slightly worse than those achieved in the case of our native algorithm. Once again, this is due to the fact that, on average (and for each request), the reduced redirection time is paid in terms of an increased waiting time inside the (more unbalanced) server queues. C. Settling Time Analysis In this part of the paper, we perform an analysis aimed at evaluating the time it takes before our algorithm successfully balances all server queues. First, we define the following Convergence Index parameter:

B. Imposing an Upper Bound on the Number of Redirections As already stated, a potential drawback of our algorithm resides in the fact that it suffers from overhead due to the potentially significant number of redirections. In order to analyze the impact on performance ascribable to the presence of the redirection overhead, we first of all performed a deeper analysis of the request redirection frequency in the scenario associated with our simulations. The results of such analysis are shown in Fig. 17.

which provides an indication about queue errors (with respect to the final equilibrium point) over time. We then introduce the Settling Time as the time elapsing before the network reaches the balanced equilibrium condition, i.e., the time needed in order to let the following condition be met: , where is defined as follows:

MANFREDI et al.: DISTRIBUTED CONTROL LAW FOR LOAD BALANCING IN CDNs

67

TABLE VIII SETTLING TIME

Fig. 18. Dynamic evolution of the Convergence Index in a saturated scenario.

and represents a properly set threshold taking into account intrinsic oscillations of the queues once they have reached the steady-state condition. In the equation above, represents the maximum number of requests arriving in a single sampling interval, and is the chosen sampling period. Given the definitions above, we report in Table VIII the results of an extensive set of simulations aimed at evaluating the settling time in different configuration scenarios (in terms of queue occupancies at time 0) and under different load conditions (as before, 33% of the overall capacity, 66% of the overall capacity and saturation). In the table, we set the initial condition as the following: , where and . As it comes out from data in the table, settling time depends just on the initial conditions and is always contained within a very short time-frame. Interestingly enough, such a parameter decreases as long as the overall network load increases. This can be explained by the fact that under heavy load conditions, the flow of requests almost immediately fills up server queues, which quickly arrive at the balanced equilibrium condition. To confirm such consideration, we report in Fig. 18 the temporal behavior of the Convergence Index for the case of a saturated network. VII. CONCLUSION AND FUTURE WORK In this paper, we presented a novel load-balancing law for cooperative CDN networks. We first defined a model of such networks based on a fluid flow characterization. We hence moved

to the definition of an algorithm that aims at achieving load balancing in the network by removing local queue instability conditions through redistribution of potential excess traffic to the set of neighbors of the congested server. The algorithm is first introduced in its time-continuous formulation and then put in a discrete version specifically conceived for its actual implementation and deployment in an operational scenario. Through the help of simulations, we demonstrated both the scalability and the effectiveness of our proposal, which outperforms most of the potential alternatives that have been proposed in the past. The present work represents for us a first step toward the realization of a complete solution for load balancing in a cooperative, distributed environment. Our future work will be devoted to the actual implementation of our solution in a real system, so to arrive at a first prototype of a load-balanced, cooperative CDN network to be used both as a proof-of-concept implementation of the results obtained through simulations and as a playground for further research in the more generic field of content-centric network management. REFERENCES [1] S. Manfredi, F. Oliviero, and S. P. Romano, “Distributed management for load balancing in content delivery networks,” in Proc. IEEE GLOBECOM Workshop, Miami, FL, Dec. 2010, pp. 579–583. [2] H. Yin, X. Liu, G. Min, and C. Lin, “Content delivery networks: A Bridge between emerging applications and future IP networks,” IEEE Netw., vol. 24, no. 4, pp. 52–56, Jul.–Aug. 2010. [3] J. D. Pineda and C. P. Salvador, “On using content delivery networks to improve MOG performance,” Int. J. Adv. Media Commun., vol. 4, no. 2, pp. 182–201, Mar. 2010. [4] D. D. Sorte, M. Femminella, A. Parisi, and G. Reali, “Network delivery of live events in a digital cinema scenario,” in Proc. ONDM, Mar. 2008, pp. 1–6. [5] Akamai, “Akamai,” 2011 [Online]. Available: http://www.akamai. com/index.html [6] Limelight Networks, “Limelight Networks,” 2011 [Online]. Available: http://.uk.llnw.com [7] CDNetworks, “CDNetworks,” 2011 [Online]. Available: http:// www.us.cdnetworks.com/index.php [8] Coral, “The Coral Content Distribution Network,” 2004 [Online]. Available: http://www.coralcdn.org [9] Network Systems Group, “Projects,” Princeton University, Princeton, NJ, 2008 [Online]. Available: http://nsg.cs.princeton.edu/projects [10] A. Barbir, B. Cain, and R. Nair, “Known content network (CN) request-routing mechanisms,” IETF, RFC 3568 Internet Draft, Jul. 2003 [Online]. Available: http://tools.ietf.org/html/rfc3568 [11] T. Brisco, “DNS support for load balancing,” IETF, RFC 1794 Internet Draft, Apr. 1995 [Online]. Available: http://www.faqs.org/rfcs/ rfc1794.html [12] M. Colajanni, P. S. Yu, and D. M. Dias, “Analysis of task assignment policies in scalable distributed Web-server systems,” IEEE Trans. Parallel Distrib. Syst., vol. 9, no. 6, pp. 585–600, Jun. 1998. [13] D. M. Dias, W. Kish, R. Mukherjee, and R. Tewari, “A scalable and highly available Web server,” in Proc. IEEE Comput. Conf., Feb. 1996, pp. 85–92.

68

IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 21, NO. 1, FEBRUARY 2013

[14] C. V. Hollot, V. Misra, D. Towsley, and W. Gong, “Analysis and design of controllers for AQM routers supporting TCP flows,” IEEE Trans. Autom. Control, vol. 47, no. 6, pp. 945–959, Jun. 2002. [15] C. V. Hollot, V. Misra, D. Towsley, and W. bo Gong, “A control theoretic analysis of red,” in Proc. IEEE INFOCOM, 2001, pp. 1510–1519. [16] J. Aweya, M. Ouellette, and D. Y. Montuno, “A control theoretic approach to active queue management,” Comput. Netw., vol. 36, no. 2–3, pp. 203–235, Jul. 2001. [17] F. Blanchini, R. L. Cigno, and R. Tempo, “Robust rate control for integrated services packet networks,” IEEE/ACM Trans. Netw., vol. 10, no. 5, pp. 644–652, Oct. 2002. [18] V. Misra, W. Gong, W. bo Gong, and D. Towsley, “Fluid-based analysis of a network of AQM routers supporting TCP flows with an application to red,” Proc. ACM SIGCOMM, pp. 151–160, 2000. [19] D. Cavendish, M. Gerla, and S. Mascolo, “A control theoretical approach to congestion control in packet networks,” IEEE/ACM Trans. Netw., vol. 12, no. 5, pp. 893–906, Oct. 2004. [20] V. Cardellini, E. Casalicchio, M. Colajanni, and P. S. Yu, “The state of the art in locally distributed Web-server systems,” Comput. Surveys, vol. 34, no. 2, pp. 263–311, Jun. 2002. [21] Z. Zeng and B. Veeravalli, “Design and performance evaluation of queue-and-rate-adjustment dynamic load balancing policies for distributed networks,” IEEE Trans. Comput., vol. 55, no. 11, pp. 1410–1422, Nov. 2006. [22] V. Cardellini, M. Colajanni, and P. S. Yu, “Request redirection algorithms for distributed Web systems,” IEEE Trans. Parallel Distrib. Syst., vol. 14, no. 4, pp. 355–368, Apr. 2003. [23] M. Dahlin, “Interpreting stale load information,” IEEE Trans. Parallel Distrib. Syst., vol. 11, no. 10, pp. 1033–1047, Oct. 2000. [24] R. L. Carter and M. E. Crovella, “Server selection using dynamic path characterization in wide-area networks,” in Proc. IEEE INFOCOM, Apr. 1997, vol. 3, pp. 1014–1021. [25] M. D. Mitzenmacher, “The power of two choices in randomized load balancing,” IEEE Trans. Parallel Distrib. Syst., vol. 12, no. 10, pp. 1094–1104, Oct. 2001. [26] C.-M. Chen, Y. Ling, M. Pang, W. Chen, S. Cai, Y. Suwa, and O. Altintas, “Scalable request routing with next-neighbor load sharing in multi-server environments,” in Proc. IEEE Int. Conf. Adv. Inf. Netw. Appl., Mar. 2005, vol. 1, pp. 441–446. [27] R. A. Horn and C. R. Johnson, Topics in Matrix Analysis. Cambridge, U.K.: Cambridge Univ. Press, 1995. [28] F. Cece, V. Formicola, F. Oliviero, and S. P. Romano, “An extended ns-2 for validation of load balancing algorithms in content delivery networks,” in Proc. 3rd ICST SIMUTools, Malaga, Spain, Mar. 2010, pp. 32:1–32:6. [29] P. Erdós and A. Rényi, “On the evolution of random graphs,” A Matematikal Kutató Intézet Kóleményei, vol. 5, pp. 17–61, 1960. [30] M. Faloutsos, P. Faloutsos, and C. Faloutsos, “On power-law relationships of the internet topology,” Proc. SIGCOMM, pp. 251–262, 1999. [31] L. A. Adamic, R. M. Lukose, A. R. Puniyani, and B. A. Huberman, “Search in power-law networks,” Phys. Rev. E, vol. 64, p. 046135, 2001.

[32] A.-L. Barabási and R. Albert, “Emergence of scaling in random networks,” Science, vol. 286, no. 5439, pp. 509–512, Oct. 1999.

Sabato Manfredi (M’11) received the Ph.D. degree in control engineering from the University of Napoli Federico II, Naples, Italy, in 2005. He is currently an Assistant Professor with the University of Napoli Federico II. His research interests fall in the field of automatic control systems, with special regard to the design and implementation of novel modeling techniques, control and identification of underwater breathing systems, communication networks, and complex systems. Areas of application include congestion control in computer networks, active queue control, and routing. He works on prototyping control strategies through microcontroller-based embedded systems.

Francesco Oliviero (M’09) received the M.Sc. degree in telecommunication engineering and Ph.D. degree in computer engineering from Federico II University of Napoli, Naples, Italy, in 2004 and 2007, respectively. He is an Adjunct Researcher with the Department of Computer Engineering and Systems, Federico II University of Napoli. He is involved in several Italian and European projects, mainly concerning the development of frameworks for security in networked infrastructures. His research interests are in the areas of network security systems, network resources optimization, and green networking.

Simon Pietro Romano (M’05) received the degree in computer engineering and Ph.D. degree in computer networks from the University of Napoli Federico II, Naples, Italy, in 1998 and 2001, respectively. He is currently an Assistant Professor with the University of Napoli Federico II. He is involved in a number of European research projects, dealing with critical infrastructure protection. He actively participates in IETF standardization activities, where he chairs the SPLICES Working Group on loosely coupled SIP devices. His research interests fall in the field of networking, with special regard to real-time multimedia applications, network security, and autonomic network management.