the cloud provider. .... even in case of heterogeneous VMs, cloud providers have a ...... of service attacks: Characterization and implications for CDNs and Web.
FLEXIBLE DISTRIBUTED CAPACITY ALLOCATION AND LOAD REDIRECT ALGORITHMS FOR CLOUD SYSTEMS Danilo Ardagna, Sara Casolari, Barbara Panicucci
Report n. 2011.3
Flexible Distributed Capacity Allocation and Load Redirect Algorithms for Cloud Systems Danilo Ardagna∗ , Sara Casolari∗∗ , Barbara Panicucci∗ ∗
∗∗
Politecnico di Milano, Dipartimento di Elettronica Informazione Universit`a di Modena e Reggio Emilia, Dipartimento di Ingegneria dell’Informazione Email: {ardagna,panicucci}@elet.polimi.it, {sara.casolari}@unimore.it
Abstract—In Cloud computing systems, resource management is one of the main issues. Indeed, in any time instant resources have to be allocated to handle effectively workload fluctuations, while providing Quality of Service (QoS) guarantees to the end users. In such systems, workload prediction-based autonomic computing techniques have been developed. In this paper we propose capacity allocation techniques able to coordinate multiple distributed resource controllers working in geographically distributed cloud sites. Furthermore, capacity allocation solutions are integrated with a load redirection mechanism which forwards incoming requests between different domains. The overall goal is to minimize the costs of the allocated virtual machine instances, while guaranteeing QoS constraints expressed as a threshold on the average response time. We compare multiple heuristics which integrate workload prediction and distributed non-linear optimization techniques. Experimental results show how our solutions significantly improve other heuristics proposed in the literature (5-35% on average), without introducing significant QoS violations.
Keywords: Infrastructure-as-a-Service Clouds, Performance Modeling and Management, Capacity Allocation, Load Balancing, QoS. I. I NTRODUCTION Cloud computing is an emerging paradigm that aims at streamlining the on-demand provisioning of software, hardware, and data as services, providing end-users with flexible and scalable services accessible through the Internet [15]. Modern cloud infrastructures live in an open world, characterized by continuous changes in the environment and in the requirements they have to meet. Continuous changes occur autonomously and unpredictably, and they are out of control of the cloud provider. Therefore, in order to provide infrastructure or software as a service, advanced solutions have to be developed able to dynamically adapt the cloud infrastructure, while providing continuous service and performance guarantees. In this paper we propose workload prediction-based capacity allocation techniques able to coordinate multiple distributed resource controllers working in geographically distributed cloud sites. We propose also a dynamic load redirection mechanism which allows to make near-instantaneous and intelligent decisions on the requests that have to be redirected during peak loads from heavily loaded sites to other sites. Requests’ distribution is optimized according to the average response time of incoming requests and the QoS requirements of end users. In cloud systems centralized approaches to capacity allocation and load balancing have several critical design limitations
including lack scalability and high network communication cost (such as network bottleneck congestion) [5], [25]. Centralized solutions are not suitable for geographically distributed systems, such as the cloud or more in general massively distributed systems [3], [19], [16], since no entity has global information about all of the system resources. Therefore, efficient decentralized solutions are mandatory. Distributed resource management policies have been proposed to govern efficiently geographically distributed systems that cannot implement centralized decisions and support strong interactions among the remote nodes [3]. Sometimes, local decisions could lead the system even to unstable oscillations [18]. It is, thus, difficult to determine the best control mechanism at each node in isolation, so that the overall system performance is optimized. Dynamically choosing when, where and how allocate resources and coordinating the resource allocation accordingly is an open problem and is becoming more and more relevant with the advances of clouds [16]. One of the first contributions for resource management in geographically distributed systems has been proposed in [3], where novel autonomic distributed load balancing algorithms have been proposed. In distributed streaming networks, authors in [19] have proposed a joint admission control and resource allocation scheme. In our work, the capacity allocation and load redirect of multiple class of requests are modeled as non-linear programming problems and solved with decomposition techniques exploiting predictive models of the incoming workload at each physical site. We compare our approach with other heuristics proposed in the literature [13], [27], [26] obtaining 5-35% cost savings, without incurring in significant Service Level Agreement (SLA) violations. To the best of our knowledge this paper is the very first contribution that proposes an analytical solution to the capacity allocation and load redirection for cloud systems. The remainder of the paper is organized as follows. The next Section introduces the problem under study, while Section III describes our main design assumptions. The prediction techniques used in our work are introduced in Section IV. The optimization problem formulation is presented in Section V. The experimental results demonstrating the quality and efficiency of our solutions are reported in Section VI. Conclusions are finally drawn in Section VII.
i=2
i=1 Local workload manager
!"#$%"#& & 23 & 7( 23 ( 4(
4
235(
!)(
!)(
!)(
7(
236( !)(
!"#$%&'()&*+",-().,"$.#(
Virtualized Servers
/-(
IaaS Provider
!!"#$!""#$%#$!&"$$
!!"#$!%"#$$!'"$$ !"#$%!$%'$%!(#%%
!"#$%!$%'$%!(#%% i=3 i=4
Local WS arrival rates
Execution rate of local arrivals
!"#$%!$%'$%!(#%% Redirect rate of local arrivals Local CA and LR manager
Local workload manager
!"#$%!$%'$%!(#%%
Fig. 1.
Virtualized Servers
Cloud System Reference Framework.
II. P ROBLEM S TATEMENT In this paper we take the perspective of a Web service provider which offers multiple transactional Web Services (WSs) hosted at multiple sites of an Infrastructure as a Service (IaaS) provider. The hosted WSs represents different applications which can be heterogeneous with respect to resource demands, workload intensities, and QoS requirements. Services with different QoS and workload profiles are categorized into independent WS classes. An SLA contract, associated with each WS class k is established between the WS provider and its end users. It specifies the QoS levels, expressed in terms of average response time Rk , the WS provider must meet while responding to end users requests for a given service class. Overall, the system serves a set K of WS classes and average response time thresholds are denoted with Rk . Applications are hosted in virtual machines (VMs) which are provided on demand by the IaaS provider. For the sake of simplicity, we assume that each VM hosts a single Web service application. Multiple VMs implementing the same WS class can run in parallel at each physical location. In that case, we assume that the running VMs are homogeneous in terms of RAM and CPU capacity and evenly share the incoming workload (this corresponds to the solution currently implemented by IaaS providers [2]). Furthermore, services can be located on multiple sites (see Figure 1). For example, if we consider Amazon Inc. with its Elastic Compute Cloud (EC2) [2] as IaaS provider, EC2 allows software providers to dynamically deploy VMs on four regions located around the world which are further spread on multiple availability zones. IaaS providers usually charge software providers on a hourly basis [2]. Hence, the WS provider has to face the Capacity Allocation (CA) problem which consists on determining every hour the optimal number of VMs for each WS class in each IaaS site according to the average load predicted on a hourly basis, while guaranteeing SLA constraints. In the following
we will denote by T1 the mid-long time scale adopted for VMs provisioning. On the other hand, if a site resources are insufficient (e.g., because of an unpredictable workload fluctuation) and the computing conditions become critical, incoming requests can be redirected to other sites. As in other approaches, dynamic Load Redirection (LR) [5], [27] is performed periodically every T2 T1
b i (T1 ) is the initial predicted value and 0 < γ i (t) < 1 where Λ k k is the smoothing factor at current sample t related to the site i and the class k that determines how much weight is given to each sample. We obtain a dynamic ES model by re-evaluating the smoothing factor γki (t) at each prediction sample t. There are different proposals for the dynamic estimation of γki (t) (e.g., [24], [17]). Although there is no consensus, a widely used procedure is proposed by Trigg and Leach [24]. They define the smoothing parameter as the absolute value of the ratio of the smoothed error, Aik (t), to the absolute error, Eki (t), Ai (t) γki (t) = Eki (t) . The smoothed and absolute errors are equal to: k
Aik (t) Eki (t)
= =
φik (t)
+ (1 −
φ|ik (t)|
φ)Aik (t
+ (1 −
− T1 )
φ)Eki (t
− T1 )
where ik (t) is the forecast error at sample t, ik (t) = Λik (t) − b i (t), and φ is set arbitrary, with 0.2 being a common Λ k choice [24]. This dynamical choice of γki (t) should improve the prediction quality and should limit the delay problem related to the traditional ES model based on a static choice of the γki parameter. The considered model is expected to be useful in contexts characterized by time series with non stationary behaviour and a variable noise component. We use an analogous implementation of the ES prediction model to i b b k . For the predict the local arrival rate at time granularity T2 , Λ sake of simplicity, in the remainder of the paper the t sample index will be omitted. V. O PTIMIZATION P ROBLEM F ORMULATION As discussed in Section II, Capacity Allocation and Load Redirect are performed with different time scales. The Capacity Allocation problem is formulated in the next Section, while our Load Redirect mechanism is presented in Section V-B. A. Capacity Allocation problem The CA problem is solved with T1 time period and aims at minimizing the overall costs for flat and on demand VM instances of multiple distributed IaaS sites, while guaranteeing that the average response time of each class is lower than the SLA threshold. The CA determines the number of VMs Nki b i . In this phase and Mki required to serve the arrival rate Λ k the LR mechanism is neglected. Preliminary results, indeed, have shown that the LR mechanism, even if significant at the lower time scale T2 , introduces a limited increment to each b i which is comparable with class local incoming workload Λ k the workload prediction accuracy obtained in practice. If we denote with µk the maximum service rate of a capacity 1 VM for executing WS class k requests, the response time for executing locally WS class k at site i is given by Rki = 1 . In particular it must be (M/G/1 equilibrium bi Λ Ci
µk − i k i N +M k k
b i < C i µk (N i + M i ), and the total response condition) Λ k k k time for class k request over all sites is: XΛ b i Ri k k Rk = (2) P bj . Λk i j
Hence, after some basic algebra, the CA problem can be formulated as: PP i i c Nk + e ci Mki (CA) min i i Nk ,Mk k
i
subject to b i < C i µk (N i + M i ) Λ k k k X X j b i (N i + M i ) Λ k k k b ≤ Rk Λ k i µ (N i + M i ) − Λ bi C k k k k i j X i Nki ≤ N , ∀i ∈ I,
∀k ∈ K, ∀i ∈ I ∀k ∈ K
k∈K
where the last constraints family guarantees that the number of VMs allocated to the whole set of classes at site i is at most equal to the number of flat VMs available at each site. Note that, in the problem formulation we have not imposed variables Nki and Mki to be integer, as in reality they are. In fact, requiring variables to be integer makes the solution much more difficult since non linear constraints are introduced. We therefore decide to deal with continuous variables, actually considering a relaxation of the real problem. However, preliminary experimental results have shown that if the optimal values of the variables are fractional and they are rounded to the closest integer solution, the gap between the solution of the real integer problem and the relaxed one is very small, justifying the use of a relaxed model. Furthermore, we can always choose a rounding to an integer solution which preserves the feasibility and the corresponding gap in the objective function is a few percentage points. The CA problem has a linear objective function over a convex set. Hence, the global optimum solution can be obtained solving CA in parallel at each site by adopting standard non bi linear solvers. This requires that each site broadcasts its Λ k predictions which can be obtained however considering only local information. Since this broadcast is performed every T1 time instants, the network overhead for the CA solution is very limited. B. Load Redirect problem Once the number of on demand instances has been determined, local requests can be dynamically redirected to other sites with time granularity T2 in order to, e.g., avoid episodic local congestions due to the variability of the incoming workload at time granularity T2 around its hourly average prediction (see Figure 2). According to equation (1), the response time for executing locally WS class k at site i (i.e., without considering the network delay due to redirects) is given by: 1 bki = R . P gj,i zj xik +
Ci
µk −
Gj j6=i i Nk +Mki
k
During time interval T2 the number of executions of class k P gj,i zkj request at site i is T2 xik + T2 Gj , and the response j6=i
bi and the delay time for remote requests is given by both R k j,i z j g P dj,i · Gj k . Therefore the total response time for executing P gj,i zj j6=i
j6=i
Gj
k
class k request at site i is: P bki + Rki = R
j6=i
xik
+
j zk Gj
P j6=i
j g j,i zk Gj
,
and the total response time for classj,ik jrequest over all sites P g zk i is: xi + Rk X k j6=i Gj . Rk = j Pb bk i Λ j
The goal of our load redirect scheme is to cooperatively minimize request average response times. Formally the LR can be formulated as a constraint programming problem since Rk ≤ Rk must hold and the cost for request execution is determined by the CA solution and is not influenced by the LR decision variables. However, in order to provide an efficient distributed solution, in our LR problem formulation we consider the total requests response time as the metric to be minimized. Preliminary experimental results have shown, indeed, that introducing an objective function allows to speed up the distributed algorithm convergence relying on standard non linear solvers. The LR problem can be formulated as follows: (LR) XX min i i xk ,zk k
i
j
(Nki
+
Mki )
xik
+
P j6=i
g j,i zk Gj
C i µk (Nki + Mki ) − (xik +
subject to xik
zki
j6=i
i b bk =Λ
+ j,i j X g zk xik + < C i µk (Nki + Mki ) Gj
subject to
j
P
the non linear solvers currently available. For this reason, we have devised a distributed decomposable solution for problem (LR) relying on Lagrangian techniques and obtaining closed formulas for elementary problems to be solved by applying the Karush Kuhn Tucker (KKT) conditions. Our implementation supports a distributed protocol for (LR) solution in which each site solves its problem using both local information and information received from other sites. In particular, as discussed in the following, we develop an iterative method and at each iteration zki is the only information that will be shared among sites. Our decomposition technique is founded on duality theory in optimization, [9]. First of all, we observe that the optimization problem is convex (see [4]), the duality gap is zero and then the global optimum solution can be identified [23] solving the primal via the dual. Secondly, (LR) can be decomposed into |K| independent sub-problems (one for every class) which can be obtained from (LR) simply omitting the k index. Furthermore, LR is characterized by two types of coupling: Coupling constraints, and coupled utilities [23]. Indeed, each term in the objective function not only depends on the local variable xik (xi in the following) but also on the variables of the other sites zki (z i ). The key idea to address coupled utilities is to introduce auxiliary variables and additional equality constraints. The (LR) problem solution then can be obtained by solving |K| problems as: (LRk ) # " X (N i + M i ) xi + y i i +w min C i µ (N i + M i ) − (xi + y i ) xi ,y i ,z i ,wi i
g j,i zk ) Gj
X zj k + Gj j6=i
b b xi + z i = Λ i
i
∀i ∈ I,
(5)
∀i ∈ I,
(6)
∀i ∈ I,
(7)
X zj Gj
∀i ∈ I,
(8)
xi , y i , z , w ≥ 0
∀i ∈ I,
(9)
i
i
j6=i
wi =
j6=i i i
j6=i
xik , zki ≥ 0
∀k ∈ K, i ∈ I.
Constraints (3) ensure that the overall class k requests at node i are locally executed or are redirected toward the other sites, while constraints (4) guarantee that VMs saturation conditions are avoided. (LR) defines a centralized load balancing problem: All the system information (i.e., the local incoming workload i b b k ) has to be gathered together and used to get the predictions Λ optimal workload balancing. However, for large scale cloud systems, this centralized load balancing scheme is not suitable. i b b k values does not add Even assuming that the broadcast of Λ a significant network overhead in the system (indeed T2 is around 5-10 minutes), the solution of the (LR) problem for large system cannot be obtained within the T2 time limit with
i
x + y < C µ (N + M ) X g j,i z j yi = Gj
∀k ∈ K, i ∈ I, (3) ∀k ∈ K, i ∈ I, (4)
i
where xi , y i , and wi are local variables at site i. Next, we consider the Lagrangian: (RP ) " X (N i + M i ) xi + y i min + wi + C i µ (N i + M i ) − (xi + y i ) xi ,y i ,z i ,wi i X g j,i z j X zj +ηi wi − +Θi y i − Gj Gj j6=i
j6=i
subject to constraints (5), (6), and (9), where the Θi ’s and ηi ’s are the consistency prices [23]. By exploiting the decomposable structure of the Lagrangian, the relaxed problem (RP) further separates into |I| subprob-
lems: (SU Bi )
min
(N i +M i ) xi +y i i C µ (N i +M i )−(xi +y i )
xi ,y i ,z i ,wi P gj,i zj i +Θi y − +ηi Gj j6=i
wi −
P j6=i
j
z Gj
i
+w +
subject to constraints (5), (6), and (9). The optimal value of (RP) for a given set of Θi ’s and ηi ’s defines the dual function L(Θ, η) and the dual problem is then given by: (D) maxΘ,η L(Θ, η). The dual problem can be solved by using a subgradient method: Given initial values Θi (0) and ηi (0) the iterates are generated by X g j,i z j Θi (t + 1) = Θi (t) + αt y i − , (10) Gj j6X =i j z , (11) ηi (t + 1) = ηi (t) + βt wi − Gj j6=i
where t is the iteration index and αt , βt are sufficiently small positive parameters (see [23]). Note that, each update step in this approach uses data from all of the sites. This method naturally lends itself to a distributed implementation: Each site i updates its primal variables zi which in turn are broadcasted toward other sites. Then, each site i updates its dual variables Θi , ηi using only local (sub-gradient) information. The scheme for the solution of each sub problem (LRk ) is reported in Algorithm 1. The solution of each sub-problem (SU Bi ) can be obtained also very efficiently by closed formula derived by the KKT conditions. Details are reported in Appendix B. The procedure reported in Algorithm 1 is stopped when the percentage difference of the objective function of two consecutive iterations is within a given precision. Algorithm 1: Lagrangian Distributed Optimization Procedure 1) Initialization: Set t = 0, Θ(0) and η(0) equal to some values; 2) Each site solves its problem (SU Bi ) and broadcast the solution zi (not the auxiliary variables yi and wi ); 3) Price updating: Each site iterates the consistency prices with the iterate in (10) and (11); 4) Set t ← t + 1 and go to Step 2 (until satisfying stopping criterion);
VI. E XPERIMENTAL R ESULTS The resource management algorithms proposed have been evaluated for a variety of system and workload configurations. Section VI-A presents the experimental settings and the results on the scalability of our algorithms. Section VI-B presents a cost-benefit evaluation of our solution compared with other heuristics and state-of-the-art techniques [13], [27], [26]. Finally, Section VI-C shows the results of the application of our resource management algorithms in a real prototype environment deployed in Amazon EC2. A. Algorithm Performance To evaluate the efficiency of the proposed algorithms, we have used a large set of randomly generated instances. All tests have been performed on VMWare virtual machine based on Ubuntu 9.10 server running on an Intel Nehalem dual socket
quad-core system with 32 GB of RAM. The virtual machine has a physical core dedicated with guaranteed performance and 4 GB of memory reserved. We used SNOPT 7.2.4 as non linear solver [20]. The number of cloud sites |I| has been varied between 20 and 60, the number of request classes |K| between 100 and 1000. We would like to remark that, even if the number of cloud sites is small in reality (e.g., Amazon owns 11 availability zones spread over four different regions worldwide), we consider up to 60 sites. Recall from Section II that in our approach a site with S VM configurations is modelled by S sites with a single VM configuration. The maximum service rate of a capacity one VM for executing class k requests, µk , is set Rk = 3/µk , as in [26]. Experimental results (see Appendix A for details) have shown that the average execution time required to solve instances of maximum size is lower than 3 minutes and one minute for the CA and LR problems, respectively. Hence, both the CA and LR mechanisms can be adopted at the considered time scales. B. Comparison with alternative literature proposals We performed a cost-benefit evaluation of our approach considering other heuristics with a twofold aim: On the one hand we compared our solution with other state-of-the-art techniques which exploit the utilization principle and determine the number of VM instances according to an utilization threshold upper bound [13], [27], [26], [2]. On the other hand, the research question we addressed was concerning the effectiveness of the LR mechanism in the cloud. Indeed, in cloud systems the resource provisioning can be performed in very few minutes and hence instead of redirecting the load to other sites one can argue that the allocation of additional VMs to manage peak of traffics could be more effective. In this Section we report the results of the comparison of our CA+LR mechanism with a set of solutions which perform a more fine grained CA at multiple time scales. In the remainder of this Section the following alternative solutions will be considered: •
•
Heuristic 1: The CA is performed on a 5 minutes time horizon and the number of VMs is determined according to utilization thresholds as in other approaches proposed in the literature [13], [27], [26] and currently implemented also by IaaS providers (see, e.g., the very recent release of Amazon AWS Elastic Beanstalk [2]). In the evaluation, a life span of one hour for each instantiated VM has been considered. The number of VMs is determined such that the utilization of the VMs is equal to a given threshold τ1 . The VM provisioning is further triggered if the prediction of the VMs utilization is higher than a second threshold τ2 > τ1 . Multiple analyses have been performed by adopting different thresholds: (τ1 , τ2 ) = (40%, 50%), (50%, 60%), and (60%, 80%). Heuristic 2: Same as Heuristic 1 but the number of VMs is determined by optimally solving the CA problem reported in Section V-A every 5 minutes.
Heuristic 3: Same as Heuristic 2 but with a 10 minutes time horizon. The performance parameters of the request classes have been randomly generated as in Section VI-A, while the local incoming workload has been obtained from the traces of a very large dynamic Web-based system implementing a multitier logical architecture described in [11]. In our experiments the following daily traces have been considered with 5 minutes sample time interval: • Normal day scenario: It describes the baseline workload where the number of clients requests changes following the bi-modal requests profile shown in [7]. • Heavy day scenario: It exhibits a 40% increment in the number of the client requests with respect to the baseline workload. • Noisy day scenario: It is characterized by the same request profile belonging to the heavy day scenario with an additional noise component (we added a white noise with zero mean and standard deviation equal to 10% of the heavy day peak). In this way, we increase the system variability in order to prove the accuracy of the prediction model and the robustness of our overall solution also in highly variable contexts. All scenarios are representative of the typical Web-based workload that is characterized by heavy-tailed distributions [14], [6]. Moreover, the heavy scenarios add burst arrivals and flash crowds [21] that contribute to augment the request skew, and they represent a more stressful testbed for prediction models. The motivation behind this choice is to demonstrate that our prediction algorithm works even in critical scenarios and our CA+LR mechanism are robust to workload variability, although the toughest goal of predicting hot spot events remains an open issue beyond the scope of this paper. In particular, the prediction model considered in this paper is able to provide an accurate prediction quality that, in terms of mean square error [12], is always lower than 10%. Overall we have considered 12 sites, which we assume are located in 12 different time zones with a one hour time lag and the normal, heavy, and noisy traces have been skewed accordingly. In the following quantitative analysis we set T1 = 1 hour, and T2 = 5 minutes. Figure 3 plots, as an example, the VM costs over the 24 hours for the noisy day scenarios (the normal and heavy cases are very similar), while Table II reports the percentage savings of our approach with respect to the other heuristics considering the total costs over the whole day. Figures 4 and 5 reports, as an example, the plot of the ratio Rk of the average response time with respect to the response Rk time threshold of a class considered as a reference example at site 1. The plot shape is pretty general and is independent of the considered site and request class. As the results show, the Heuristic 1 is very sensitive to the thresholds adopted. The (40%, 50%) case is very conservative, it is around 35% more expensive than our approach but always allows to guarantee the response time threshold (the ratio is strictly lower than 1). Vice versa the (60%, 80%) case provides costs close to •
Alternative solution Heuristic 1 - (40%, 50%) Heuristic 1 - (50%, 60%) Heuristic 1 - (60%, 80%) Heuristic 2 Heuristic 3
Normal day
% Savings Heavy day
Noisy day
35.47 19.53 3.12 4.3 11.56
34.86 18.83 2.25 3.26 10.27
36.84 21.4 4.93 4.44 6.98
TABLE II VM PERCENTAGE COST SAVINGS OVER THE 24 HOURS OBTAINED BY OUR APPROACH .
our solution (only 2-4% higher) but introduces a very large number of SLA violations especially in the noisy day scenario (see Figure 5). Vice versa, our solution introduced overall only 37 violations over the 3456 time intervals considered in the 12 sites, over the whole day. Furthermore, Heuristics 1 is more sensitive to traffic variability. Heuristics 2 and 3 perform better than Heuristics 1, since the number of VMs is optimally determined by the (CA) problem solutions. However, the LR mechanism is still effective since allows to reduce costs by 4-12%. The fine grained resource allocation introduced by Heuristics 2 and 3 indeed ends into an over-provisioning and better performance (see Figures 4 and 5), while the LR mechanism allows to forward traffic spikes to other locations without overcoming in any additional capacity allocation or significant SLA violations. C. Amazon EC2 Test The effectiveness of our resource management algorithms has been also evaluated on Amazon EC2 performing experiments running the JSP implementation of the SPECweb2005 (http://www.spec.org/web2005/) benchmark. In particular, we have considered the banking workload, which simulates the access to an on line banking Web site implementing a full HTTPS load. The Web server (Apache Tomcat 5.5.27 in our setup) has been deployed on a large instance, while the load generators, the client coordinator, and the back-end simulator have been hosted by extra-large Amazon instances (in this way we are guaranteed that they are not the system bottleneck). The test is performed deploying VM instances in Virginia and North California Amazon regions. We have obtained an estimate of the maximum service rate parameters and the network delay among different Amazon sites by performing an extensive off line profiling along the lines of [22]. We set R = 0.7 seconds as threshold for the average response time and the overall test lasts one hour. We have generated an appropriate traffic profile and run the CA algorithm at time 0 and at time 40 minutes. The LR algorithm is run every 10 minutes. During the first 40 minutes the CA solution allocates two on demand Web server instances at the two Amazon sites. During the last 20 minutes the load is evenly shared by introducing the Amazon Elastic Load Balancer among three on demand Web server instances in the Virginia region and five in North California. The Virginia local incomig workload is redirected to North California from minute 30 to 40, while it is redirected from North California
Fig. 3. scenario.
Fig. 6.
VM instances costs for the noisy day Fig. 4. Response time threshold ratio for a reference Fig. 5. Response time threshold ratio for a reference class, normal day scenario. class, noisy day scenario.
Fig. 7. Overall traffic served at North California Fig. 8. Average response time measured for the Overall traffic served at Virginia EC2 site. EC2 site. SPECweb2005 banking workload.
to Virginia during the last 20 minutes. Figures 6 and 7 show the the overall traffic served at the two sites. Figure 8 reports the end users average response time and shows that our CA+LR algorithms are effective since the system provide performance according to the SLA for most of the time and it is able to react to abrupt workload variations. VII. C ONCLUSIONS We proposed prediction-based distributed CA and LR algorithms for IaaS cloud system minimizing the cost of the running VMs. Experimental results shown that our solutions significantly improve other heuristics proposed in the literature (5-35% on average), without introducing significant QoS violations. Future work will extend the validation of our solution considering a larger experimental setup. R EFERENCES [1] B. Abraham and J. Ledolter. Statistical Methods for Forecasting. John Wiley and Sons, 1983. [2] Amazon Inc. Amazon Elastic Cloud. http://aws.amazon.com/ec2/. [3] M. Andreolini, S. Casolari, and M. Colajanni. Autonomic request management algorithms for geographically distributed internet-based systems. In SASO, 2008. [4] D. Ardagna, S. Casolari, and B. Panicucci. Flexible distributed capacity allocation and load redirect algorithms for cloud systems. Politecnico di Milano, Tech. Report 2011.3. http://home.dei.polimi.it/ardagna/CloudCALR2011.pdf, 2011. [5] D. Ardagna, B. Panicucci, M. Trubian, and L. Zhang. Energy-Aware Autonomic Resource Allocation in Multi-tier Virtualized Environments. IEEE Trans. on Services Computing, available on line. [6] M. Arlitt, D. Krishnamurthy, and J. Rolia. Characterizing the scalability of a large Web-based shopping system. 1(1):44–69, Aug. 2001. [7] Y. Baryshnikov, E. Coffman, G. Pierre, D. Rubenstein, M. Squillante, and Y. Yimwadsana. Predictability of web server traffic congestion. In WCW Proc., 2005. [8] M. Bennani and D. Menasc´e. Resource Allocation for Autonomic Data Centers Using Analytic Performance Models. In IEEE Int’l Conf. Autonomic Computing Proc., 2005. [9] D. Bertsekas. Nonlinear Programming. Athena Scientific, 1999. [10] G. Bolch, S. Greiner, H. de Meer, and K. Trivedi. Queueing Networks and Markov Chains. J. Wiley, 1998.
[11] H. W. Cain, R. Rajwar, M. Marden, and M. H. Lipasti. An architectural evaluation of Java TPC-W. In HPCA Proc., 2001. [12] S. Casolari and M. Colajanni. On the selection of models for runtime prediction of system resources. Autonomic Systems, Springer (Eds. Danilo Ardagna, Li Zhang), 2010. [13] L. Cherkasova and P. Phaal. Session-Based Admission Control: A Mechanism for Peak Load Management of Commercial Web Sites. IEEE Transactions on Computers, 51(6), June 2002. [14] M. E. Crovella, M. S. Taqqu, and A. Bestavros. Heavy-tailed probability distributions in the World Wide Web. In A Practical Guide To Heavy Tails, pages 3–26. Chapman and Hall, New York, 1998. [15] M. D. Dikaiakos, D. Katsaros, P. Mehra, G. Pallis, and A. Vakali. Cloud Computing: Distributed Internet Computing for IT and Scientific Research. IEEE Internet Computing, 13(5):10–13, 2009. [16] H. Erdogmus. Cloud computing: Does nirvana hide behind the nebula? IEEE Softw., 26(2):4–6, 2009. [17] S. Everette and J. Gardner. Exponential smoothing: State of the art. Journal of Forecasting, 4, 1985. [18] P. Felber, T. Kaldewey, and S. Weiss. Proactive hot spot avoidance for web server dependability. Reliable Distributed Systems, IEEE Symposium on, pages 309–318, 2004. [19] H. Feng, Z. Liu, C. H. Xia, and L. Zhang. Load shedding and distributed resource control of stream processing networks. Perform. Eval., 64(912):1102–1120, 2007. [20] P. E. Gill, W. Murray, and M. A. Saunders. SNOPT: An SQP algorithm for large-scale constrained optimization. SIAM Journal of Optimization, 12:979–1006, 2002. [21] J. Jung, B. Krishnamurthy, and M. Rabinovich. Flash crowds and denial of service attacks: Characterization and implications for CDNs and Web sites. In WWW2002 Proc., Honolulu, HW, May 2002. [22] G. Pacifici, W. Segmuller, M. Spreitzer, and A. Tantawi. Cpu demand for web serving: Measurement analysis and dynamic estimation. Perform. Eval., 65(6-7):531–553, 2008. [23] D. P. Palomar and M. Chiang. A tutorial on decomposition methods for network utility maximization. IEEE J. Sel. Areas Commun, 24:1439– 1451, 2006. [24] D. Trigg and A. Leach. Exponential smoothing with an adaptive response rate. Operational Research Quarterly, 18, 1967. [25] B. Urgaonkar, G. Pacifici, P. J. Shenoy, M. Spreitzer, and A. N. Tantawi. Analytic modeling of multitier Internet applications. ACM Transaction on Web, 1(1), January 2007. [26] A. Wolke and G. Meixner. Twospot: A cloud platform for scaling out web applications dynamically. In ServiceWave, 2010. [27] X. Zhu, D. Young, B. Watson, Z. Wang, J. Rolia, S. Singhal, B. McKee, C. Hyser, D.Gmach, R. Gardner, T. Christian, and L. Cherkasova:. 1000
islands: An integrated approach to resource management for virtualized data centers. Journal of Cluster Computing, 12(1):45–57, 2009.
A PPENDIX A - CA AND LR PROBLEM S OLUTION P ERFORMANCE
A PPENDIX B - O N S OLVING (SU Bi ) PROBLEM At step 2 of Algorithm 1 we need to solve (SU Bi ): X zj X g j,i z j − η + − Θi i Gj Gj j6=i j6=i " # (N i + M i ) xi + y i i i i i min + Θi y − π x + y xi ,y i ,z i C i µ (N i + M i ) − (xi + y i ) + min (1 + ηi ) wi i
Table III reports, for problem instances of different sizes, the computational time in seconds required to solve the CA problem (figures are the means computed on ten different runs). Table IV reports, the performance of Algorithm 1 solving a single class problem for an increasing number of sites (recall that the LR problem is separable), including the computational time in seconds as well as the overall network time to broadcast z i , and the average number of iterations (also in this case, figures are the means computed on ten different runs). As Algorithm 1 stopping criterion, we have used the relative gap of two subsequent iterations. The experimental tests have been stopped when the gap was less than 2%, 3% and 4%. The computational time assumes that at each iteration sub-problems are solved in parallel at the IaaS provider sites, while the network time considers an average end-to-end delay for the data transfer among IaaS sites equal to 300ms which is the worst case delay we measured among the four Amazon EC2 regions during a one week experiment. Note that, the computation time is negligible with respect to network time. According to this computational results, both the CA and LR mechanisms can be adopted at the considered time scales to solve problem instances of maximum size and at the highest accuracy level.
|K|, |I|
Time
|K|, |I|
Time
|K|, |I|
Time
100,20 100,40 100,60
2.3 6.4 9.3
500,20 500,40 500,60
29.3 33.3 65.6
1000,20 1000,40 1000,60
34.5 98.6 160.9
TABLE III C APACITY A LLOCATION P ROBLEM S OLUTION E XECUTION T IME ( SEC ).
w
xi + z i = Λi ,
s.t.
xi + y i < C i µ (N i + M i ), i
x ≤ min{Λi , C i µ (N i + M i )}, y i ≤ C i µ (N i + M i ), X Λj wi ≤ , Gj i
i
i
j6=i i
x , y , z , w ≥ 0. Here, we have added some redundant P zj constraints: in fact,i we can observe that, since wi = Gj , we have 0 ≤ w ≤ j6=i P Λj Gj . It follows that
j6=i
arg min (1 + ηi ) wi = wi 0 j Λ Gj
P any value in 0,
j6=i
Λj Gj
if 1 + ηi > 0; if 1 + ηi < 0; if 1 + ηi = 0.
It remains to solve (N i + M i ) xi + y i + Θi y i min xi ,y i ,z i C i µ (N i + M i ) − (xi + y i ) −π i xi + y i xi + z i = Λi ,
s.t. i
i
i
x + y < C µ (N i + M i ), xi ≤ min{Λi , C i µ (N i + M i )}, y i ≤ C i µ (N i + M i ), xi , y i , z i ≥ 0.
2% |I| 20 30 40 50 60
Comp. Time 0.46 1.54 2.94 3.86 6.38
3%
Netw. Time 5.7 12.3 17.4 18.6 26.1
#Iter. 19.2 41.3 58.7 62.1 87.3
Comp. Time 0.32 1.19 2.20 2.78 4.96
Netw. Time 3.9 9.6 12.9 13.5 20.1
4% #Iter. 13.1 32.9 43.3 45.2 67.4
TABLE IV A LGORITHM 1 P ERFORMANCE .
Comp. Time 0.20 0.91 1.68 1.99 3.98
Netw. Time 3.0 7.5 9.9 9.9 16.2
#Iter. 9.6 25.7 33.5 33.3 54.2
Therefore we have to solve problems of the type α x+y −δx+γy min x,y β − (x + y)
(12)
x + y < β, x ≤ xmax = min{Λ, β}, y < β, x, y ≥ 0, where α, β and δ are positive parameter, and γ ∈ R. It can easily shown that this problem is convex, and then KKT conditions are necessary and sufficient for optimality. Let us
αβ 2 = δ introduce multipliers a for the constraint guaranteeing the β−x equilibrium condition of the queue, b and c for the upper bound αβ β−x 2 + γ − e = 0, constraints on x and y respectively, and finally, d and e for the non negativity constraints on x and y respectively. The therefore e = δq + γ ≥ 0 and hence γ ≥ −δ. We can Lagrangian for the above minimization problem is derive x = β − αδβ . α x+y − δ x + γ y + b x − xmax L(x, y, a, b, c, d, e) = β− x+y Case 4: 0 < x < xmax , 0 < y < β. It follows that δ = −γ. −d x − e y. Proof: 0 < x < xmax , 0 < y < β. It follows that d = b = e = 0 and The KKT conditions provide us with a set of equations that can be solved: αβ 2 = δ = −γ, ∂L αβ 2 − δ + b − d = 0 β − (x + y = ∂x β− x+y ∂L hence δ = −γ. αβ 2 + γ − e = 0 ∂y = β− x+y Case 5: x = 0, y = 0. It follows that γ + δ = e − d. This is a =0 the case in which the request classes are not sensible c =0 of response times and therefore it is convenient for the b (x − xmax ) = 0 provider to redirect the requests toward sites with lower ey = 0 costs. We must solve the equation in the system for x, y, a, b, c, d, e, with the condition that x and y must also be feasible (i.e., they must satisfy the constraints of the optimization problem (12)). Let us now examine the possible cases. Case 1: x = xmax , y = 0. It follows that δ ≥ αβ 2 . β−xmax
Proof: x = xmax , y = 0 It follows that d = 0 and αβ 2 − δ + b = 0 β−xmax αβ
β−xmax 2 + γ − e therefore
αβ
β−xmax
Case 2: x = x
max
therefore e − γ = δ + d or γ + δ = e − d. This is the case in which the request classes are not sensible of response times and therefore for the provider it is convenient to redirect the requests toward sites with lower costs. Case 6: x = 0, 0 < y < β. It follows that qδ ≤ −γ. Moreover we can derive γ < 0, and y = β − −αβ γ .
= 0,
−δ + b = γ − e b + e = γ + δ.
From b ≥ 0 we get δ ≥
Proof: x = 0, y = 0. It follows that b = 0 and α =δ+d β α = −γ + e, β
2 .
, 0 < y < β. It follows that −αβ γ= 2 = −δ + b, β − (xmax + y
from which we can derive y in a closed form. Moreover γ ≥ δ. Proof: x = xmax , 0 < y < β. It follows that d = e = 0 and −αβ γ= 2 = −δ + b, β − (xmax + y from which we can derive y. Moreover, b = γ + δ ≥ 0, hence γ ≥ δ. Case 3: 0 < x < xmax , q y = 0. It follows that γ ≥ −δ. We can derive x = β − αδβ . Proof: 0 < x < xmax , y = 0. It follows that d = b = 0 and
Proof: x = 0, 0 < y < β. It follows that b = e = 0 and αβ 2 = δ + d β−y αβ
β−y
2
= −γ
therefore γ = −δ − d, so d = −δ − γ ≥ 0, and 2 then = − β − y δ ≤ −γ. Moreover we can derive αβ , then γ q 2 −αβ β − y = − αβ γ , so γ < 0, and y = β − γ . ACKNOWLEDGEMENT The work of Danilo Ardagna and Barbara Panicucci has been partially supported by the GAME-IT and IDEAS- ERC Project 227977-SMScom research projects. Thanks are expressed to Prof. Michele Colajanni for his fruitful comments on the preliminary versions of this paper.