100
Management in the Autonomic Service-Oriented Architecture
Resource
Jussara Almeida and Virgilio Almeida Departamento de Ciencia da Computaqio Universidade Federal de Minas Gerais Belo Horizonte, MG 30161, Brazil {jussara, virgilio}@dcc.ufmg.br
Danilo Ardagna and Chiara Francalanci
Dipartimento di Elettronica e Informazione Politecnico di Milano Via Ponzio 34/5, 20133 Milan, Italy {ardagna, francala} @elet.polimi.it
This dynamic fulfillment of varying QoS requirements is further enhanced by the virtualization of resources [4], [6]. A virtualization mechanism manages the overall infrastructure of the service center, providing service differentiation and performance isolation for multiple Web services sharing the same physical resources. This simplifies the load balancing and the dynamic allocation of capacity resources to different Web service invocations. In this scenario, providers must solve two types of problems: 1) a short-term resource allocation problem, called SLA Management, i.e., how to allocate resources to different service invocations in order to maximize the revenues from SLA, while minimizing resource management costs, and 2) a long-term capacity planning problem, i.e., how to size the service center in order to maximize the long-term net revenue from SLA contracts, while minimizing the total cost of ownership (TCO) of resources. This paper addresses both issues from a theoretical standpoint by providing a comprehensive framework which represents the authors' research agenda and highlights the interrelations between resource allocation and capacity planning issues. The short-term resource allocation problem is analyzed in depth. In particular, the paper presents an optimization model that identifies the optimal resource allocation across a set of service invocations by maximizing a provider's revenues, while satisfying customers' QoS constraints and minimizing the cost of resources utilization. The minimization of resource usage costs represents the main novelty of the proposed model with respect to previous literature [5], [16], as further discussed in Section VII. Experimental results show that our model solves reasonably large problem sizes typically under 15 seconds, which makes it practical for online implementations. Moreover, our results show that taking resource usage costs explicitly into account in the optimization model can yield total cost savings for the provider of as much as 39%. The paper is organized as follows. Section II discusses the autonomic computing environment, defining the infrastructural characteristics of the reference system considered in this paper. Section III provides the overall framework encompassing both short-term and long-term issues. Section IV introduces the
Abstract- In service oriented systems, Quality of Service (QoS) is a service selection driver. Users evaluate QoS at run time to address their service invocation to the most suitable provider. Thus, QoS has a direct impact on providers' revenues. However, QoS requirements are difficult to satisfy because of the high variability of Internet workloads. Workload variability cannot be accommodated with traditional capacity planning and allocation practices, but requires autonomic computing techniques. Autonomic computing involves two tightly inter-related problems, namely, a short-term resource allocation problem and a long-term capacity planning problem. Capacity planning requires an investment that should be balanced by the revenues obtained through resource allocation. In this paper, we provide a comprehensive framework modelling both problems. The shortterm resource allocation problem is analyzed in depth. The paper proposes an optimization model that identifies the optimal resource allocation by maximizing a provider's revenues while satisfying customers QoS constraints and minimizing resource usage cost. Preliminary computational experiments are presented to support the effectiveness of our approach. I. INTRODUCTION
In the Service-Oriented Architecture (SOA), Quality of Service (QoS) is no longer statically associated with users or services, but can vary across different invocations of the same Web service. In other words, QoS becomes a service selection driver, which is assessed by users at run time to address their service invocation to the most suitable provider [19]. Providers, on the other hand, can offer the same service with different quality profiles, which can be dynamically selected by users according to their needs. Thus, in the SOA, QoS requirements are dynamic, and are ruled by Service Level Agreement (SLA) contracts which specify the unit price of a service invocation and corresponding QoS level [20]. As a consequence, QoS has a direct impact on providers' revenues. These dynamic requirements are difficult to satisfy especially due to the high variability of Internet application workloads. Internet workloads can vary by orders of magnitude within the same business day [14]. Such variations cannot be
accommodated with traditional capacity planning and allocation practices, but require autonomic computing self-managing techniques [7], which dynamically allocate resources among different services on the basis of short-term demand estimates.
1-4244-0175-5/06/$20.00 02006 IEEE.
Marco Trubian Dipartimento di Scienze dell' Informazione Universit'a degli Studi di Milano Via Comelico 39, 20135 Milan, Italy
[email protected]
84
101
performance model designed to evaluate QoS in the considered system. The optimization techniques designed to solve the short-term resource allocation problem are discussed in Section V. Section VI demonstrates the efficiency of our solutions. Section VII discusses related work. Conclusions are finally drawn in Section VIII.
I
IVM11 WS class
AUTONOMIC COMPUTING ENVIRONMENT We consider the case of multiple transactional Web services sharing the same service center. The center provider may be hosting different third-party Internet applications (e.g., eTourism services, financial services), or offering multiple instances of the same service with different quality profiles (e.g., stock quotes can be provided with different response times). Moreover, since typical Web services are very heterogeneous with respect to their resource demands, workload intensities and QoS requirements, we categorize the hosted services (or service instances) into independent Web service (WS) classes. A key feature of the environment under consideration is its virtualization scheme [4], [6], which partitions the physical resources (i.e., CPU, disks, communication network) into multiple virtual ones, creating isolated virtual machines (VMs), each running at a fraction of the total (physical) system capacity and dedicated to serve a single WS class. In particular, the virtualization mechanism provides performance isolation, preventing contention for resources between the hosted WS classes. This feature greatly facilitates the performance analysis and modeling of multiple WS classes running on dedicated VMs on top of the same physical infrastructure, as will be shown in Section IV. Instead of accessing the physical resources directly, the hosted WS classes demand service from a pool of virtual resources, created and maintained by a virtualization layer, as illustrated in Figure 1. Virtualization enables the flexible reduction or expansion of the resource capacity assigned to each VM (and, thus, to its hosted WS class). Once a capacity assignment has been performed, a WS class is guaranteed to receive as much resource as it has been assigned to, regardless of the load imposed by other classes. Finally, we assume that the VMs employ an admission control scheme [13] that may reject requests in order to guarantee that the QoS requirements of hosted WS classes are met, or to avoid service instability caused by capacity restrictions.
11
I
I
IVM21
I
00
WS class 2
IVM N
I
|WS class N
II.
invocations
Physical infrastructure Fig. 1. Virtualization in the Autonomic Computing Environment
it accounts for both acquisition and management costs [3]. On the other hand, the service incoming workload typically varies over time and we assume it increases in the long run. In such scenario, providers must periodically adjust their infrastructure in order to accommodate increasing demands. Otherwise, if the available resources remain unchanged, requests have to be rejected, and corresponding revenues are lost. Moreover, the reputation of a provider is predicated upon its ability to satisfy requests according to SLA contracts. A rejection of a request decreases a provider's reputation, and frequent service rejections are cause for additional revenue losses due to a long-term workload decrease. On the other hand, overprovisioning in order to satisfy all requests is extremely expensive. A 100% request satisfaction is obtained only by accommodating accidental load peaks and, thus, causes an exponential growth of costs with a limited revenue increase. Hence, there exists a trade-off between the provided service level and the TCO of a service center. This tradeoff varies with the workload and, thus, changes over time. Capacity must be planned before the SLA revenues to TCO trade-off becomes unfavorable. Traditional static resource allocation models can underestimate revenues by an order of magnitude [20], thus triggering an early capacity planning and, possibly, causing unnecessary expenses. On the other hand, the autonomic architecture optimizes the use of current resources and postpones the time for capacity planning. Thus, a dynamic autonomic model of a service center together with long-term forecasts of workload support a more realistic analysis of revenue trends over time. The goal of the proposed self-adaptive capacity management mechanism is to lower total cost of ownership to and increase revenue of virtualized service centers, by optimizing resource allocation while minimizing SLA penalties. Figure 2 shows a three-step resource management and capacity planning methodology. The first step is the dynamic allocation of system resources, whose goal is to make the best use of the available resources (including energy, hardware and
A MULTI-SCALE RESOURCE MANAGEMENT APPROACH FOR AUTONOMIC COMPUTING INFRASTRUCTURE Web service providers must respond to incoming requests by satisfying the QoS levels specified in SLA contracts with customers. If QoS requirements are satisfied, providers gain a full revenue. Otherwise, they incur penalties. The providers' ability to satisfy incoming requests directly depends on its available infrastructure, which in turn also represents a cost. In particular, the total cost of ownership (TCO) of a service center constitutes the most complete indicator of costs, since III.
85
102
A2
-
-
short-term % losses short-term % violations
Analysis Module Model
|
reputation - long-term revenues
IF- time for capacity planning
Long-term capacity planning
Fig. 2.
SLA Management and Long-term Capacity Planning
forecasting methods [1] to output the predicted incoming workload to each hosted WS class for the next period. The performance model produces estimates of the future performance of each VM (i.e., estimates of future SLA violations for each WS class). The optimization model 1 uses these estimates to determine the fraction of capacity to be assigned to each VM as well as the number of service invocations effectively served from each class in order to maximize revenues. The dynamic allocation model triggers the analysis step when the number of SLA violations and service invocation rejections grows above an empirically determined threshold. The analysis can also be performed periodically with period A 2 > A1, e.g., on a monthly basis. The analysis model embeds the short-term resource allocation model to obtain an estimate of revenues (performance model and optimization model boxes in Figure 2). It also includes a long-term workload predictor based on the concept of reputation. Each provider is associated with a reputation, which may vary over time depending on the provider's ability to satisfy incoming requests. If SLAs are violated or service invocations are rejected, the reputation of the provider decreases, and the (predicted) long-term workload decreases accordingly. On the other hand, the provider may choose to reject some of the incoming requests and to reduce the capacity assigned to some VMs, thus incurring in SLA violations, as discussed above. This trade-off identifies the correct time for capacity planning. The capacity-planning step is based on decision variables such as the number and configurations of new servers, possible upgrades of existing servers and a corresponding model of TCO. The optimization model is extended accordingly (optimization model 2 in Figure 2). This paper focuses on the first step of the methodology. The next section presents the performance model adopted by the dynamic resource allocation and analysis models. Section V formalizes the dynamic resource allocation problem by means of a non-linear optimization model. IV. SYSTEM PERFORMANCE MODEL This section presents an analytical queuing model to predict the performance metrics of each virtual machine. This model is a core component of the optimization solution presented in Section V to solve the short-term dynamic resource allocation problem. More precisely, the goal is to estimate the probability that a service invocation response time violates the SLA contract of the corresponding WS class, given the (accepted) service load (i.e., the WS class throughput) and the fraction of the total physical capacity assigned to the VM hosting the WS class. As a first level of abstraction, we model each VM as a single queue. This scenario provides a proof of concept, allowing us to evaluate the applicability and effectiveness of our mechanism in several configurations. We leave the extension to managing each of the VM's devices (CPU, disks, etc) independently for future work. Moreover, the only assumption we make regarding the specific applications running on the system is that WS invocations for each class arrive at the
software costs on demand), that is, to maximize revenues from SLA while minimizing the costs associated with the use of the available resources. Note that, to achieve this goal, the provider may choose to dynamically adjust the fraction of capacity assigned to each VM, relying on the virtualization mechanism for quick reconfiguration and performance isolation, and/or to limit the incoming workload by serving only the set of requests that maximizes its revenues. In particular, it may choose to reject requests from a given WS class to favor service to other classes. Thus, this step requires an optimization model to determine the optimal configuration, and a performance model to provide performance estimates, and thus revenue estimates, for each possible configuration. The resource allocation is performed periodically, and the period A1 depends on the type of the hosted WS classes (i.e., 10-30 minutes [17]). At the end of each period, the shortterm workload predictor uses one of the existing workload 86
103
SLA Parameters Description |price for class i WS invocations Uwi response time threshold guarantee for class i WS invoR SLA cations System Parameters N Number of hosted WS classes (as well as of VMs) | Maximum utilization planned for VM i (O < vi < 1) vi Dip Average service demand for class i WS invocations on a pre-production system Dv(l) Average service demand for class i WS invocations on the virtualized environment with a single VM Dv(N) Average service demand for class i WS invocations on the virtualized environment with multiple VMs C Cost per time unit associated to the use of the total system resources Service center environment total physical capacity O tputs from the Short-Term Workload Predictor Predicted arrival rate of class i WS invocations for next Ai
of class
Symbol
OH
Xi fi
i
response
times. Exact expressions for the
of the available expressions are quite complex, limiting the real time applicability of our mechanism. On the other hand, Markov's Inequality [12], [8] provides an upper-bound on the probability that the response time of a class i WS invocation, Ri, exceeds a threshold r. This upper-bound depends only on the average response time experienced by class i WS invocations, E[Ri], and can be easily computed as P(Ri > r) < E[Ri] /r. Although it might provide somewhat loose upper-bounds [2], we choose to use the simple and computational efficient Markov's Inequality as a first approximation of the response time probability distribution. It is under investigation the evaluation of alternative approximations (e.g., the Chebyshev's Inequality [12]), which depend on more complex calculations (e.g., response time variance) or impose restrictions on the type of queuing model used. In order to use Markov's Inequality, we first compute the average response time for class i WS invocations as follows
period
Virtualization overhead (OH > 1) Decision Variahles Class i WS invocations throughput for next period | Capacity fraction assigned to VM i for next period TABLE I
NOTATION
[11]:
Dv(N)
E[R1]
system according to
response
time distribution exist only for some types of queues that do not directly apply to the system considered. Moreover, some
Poisson process, as in most previous work [10], [15]. We leave for future work the evaluation of other traffic patterns that are specific to different applications. Thus, each VM is modeled as an M/G/1 open queue. We also do not assume any specific scheduling discipline. Either FCFS or processor sharing (PS) could be used, as both disciplines have been frequently considered reasonable abstractions for transactional service centers [11], [15]. The main notation used in the performance model (as well as in the optimization model) is shown in Table I. We assume that invocations from the same WS class i (i = 1..N) are statistically indistinguishable, thus, having the same average service demand on a physical reference system, D P. Note that DP can be estimated in a pre-production environment which we assume has capacity 1. We denote with P the ratio between the total physical capacity of the service center environment and the pre-production environment. For the sake of simplicity, in the following we will refer to P as total physical capacity of the service center. We use the term Dv(') to refer to the average service demand on the virtualized environment when WS class i is the only one running on it. Dv(') can be estimated by inflating DP by the overhead OH introduced by the virtualization layer. This overhead is evaluated by continuously monitoring the system, and is assumed to be one of the outputs of the shortterm workload predictor. The average service demand of class i WS invocations on the virtualized environment with multiple VMs can then be computed by inflating D v(') by the capacity assigned to VM i, that is: Dv(') = (DP OH)/(fiP) = Dv( )/(fiP) (1)
1-
a
E[R1]
DV(N)Xi DV()
Dv(')
-f
-(Dv) Ifi'P)Xi
(2)
We note that average response times for different WS classes are independent and separately computed. This is only possible because the virtualization mechanism prevents contention for resources, providing performance isolation for hosted services. We then use expression (2) to estimate the probability that a class i WS invocation violates its response time SLA by applying Markov's Inequality as follows:
E[Ri]
.(Ri > R SLA) < P(Ri
>
RtSLA
nm
RiLA)
(
[RL
1)
(3)
Note that the average response time for WS class i and, thus, our estimate of its response time violation probability depend on the capacity fraction fi assigned to VM i as well as on its achieved throughput Xi, the two decision variables in our optimization model. VM i throughput is constrained not only by the (predicted) arrival rate of class i WS invocations, Ai, but also by the maximum arrival rate VM i can support before 1 saturation (utilization equal to 100%), i.e., Xi < (or
Xi
f"P ). In order
to avoid the performance instability common to systems running close to saturation, we introduce an upper-bound vi (0 < vi < 1), up to which the utilization of
RiSLA)X,] + CfJ}. The quantities (Ai -Xi)T and P(Ri > RSLA)XiT are equal to the numbers of SLA violations due to the admission control system and to the response time violations, respectively, in the control horizon. The term CfiT indicates the cost associated in serving class i WS invocations for the next control interval. After substituting equations (2) and (3), we obtain EN 1 wiAiT + ( A xi:) } z1 T {-wiXi + Cfi + wiXi Since the term ,i=1 wiAiT is a constant with respect to the decision variables fi and Xi, the optimization problem can be formulated as: PI) min 1 {Wi (min (DRLAhiDxi) - I) xi + Cf1 }
fi < Di fi Xi < Vi Di
Xi 0 V-i
-Diag (DiXi + fi)
min(Di
Xi < Ai
(8) (9) (10)
Observation 1. Problem P2) has nonlinear objective function and linear constraints linked by logical conditions. The joint capacity allocation and admission control problem is difficult since the objective function is neither concave nor convex. In fact, the Hessian has the form:
Dv(')I/P.
XiA fiViX< RS
>OX
Vi'
Vti
(6) (7)
V
Constraint family (5) entails that the overall utilization of system resources dedicated in serving class i WS invocations is below the planned threshold vi and guarantees that the queuing network model is in equilibrium. Constraint family (6) entails that class i throughput is less than or equal to the incoming workload, while constraint (7) guarantees that at most 100% of the capacity of the data center is used in the next period. When Xi > 0, i.e., some WS class i invocations are served, and class i average response time is greater than or equal to its response time SLA (i.e., E[Ri] > RSLA), Markov's Inequality may yield a high probability of a constraint violation for WS class i. Furthermore, since service provision requires the allocation of system resources to the corresponding VM (i.e, Xi > 0 implies fi > 0), those resources are wasted in serving class i WS invocations with no profit for the provider. Hence, if Xi > 0, we have also to guarantee that the average response time of class i is strictly lower than the SLA threshold, i.e.,
solution. The conditional constraints are managed as follows. If in the solution of the admission control problem we have determined Xi > 0 (Xi = 0), then the condition on fi which follows from constraints (11) is enforced (discarded) in the solution of the capacity allocation problem.
88
105
1) The Admission Control Sub-Problem: If the capacity allocation of system resources in the virtualized environment is fixed, i.e., the fi variables are fixed to the values f then the admission control sub-problem, derived from the general problem P2) with fixed fi, can be formulated as follows: i) X} P3) minZ-i {w (R-DLA fiDXi
The partial derivatives of the objective function g by:
O