Energy Optimization Policies for Server Clusters - Semantic Scholar

1 downloads 29022 Views 448KB Size Report
stage, we monitor the workload of servers in a cluster, and gather the workload ..... which it would be best to perform maintenance operations in the cluster.
6th annual IEEE Conference on Automation Science and Engineering Marriott Eaton Centre Hotel Toronto, Ontario, Canada, August 21-24, 2010

SuC2.5

Energy Optimization Policies for Server Clusters Nidhi Singh∗ and Shrisha Rao† ∗ IBM

† International

India Private Limited Institute of Information Technology - Bangalore

Abstract— We construct energy optimization policies for a server cluster, using statistical data analyses and demand prediction methodologies, with an aim to reduce the power consumption of the server cluster. In doing so, we monitor and analyze the historical time-series utilization data of a server cluster. Based on the analyses results, we develop predictions about utilization of servers for future time periods. Using this predictive analysis, we formulate energy optimization rules or policies for the server cluster. These policies are then evaluated to determine if they result in energy savings in the server cluster. High-level implementation of this entire mechanism is provided and a strategy for inclusion of this mechanism in existing data center automation products is discussed. Index Terms— energy optimization, data center automation, server clusters, policy creation, workload prediction

I. I NTRODUCTION Computing infrastructure of large organizations is growing day by day. The total number of installed servers in the U.S. was expected to be around 15.8 million by 2010, which is nearly three times the number of installed servers in 2000 [1]. An increasing need has been felt by business organizations and researchers alike to come up with scientific techniques to curb the increasing energy consumption of this computing infrastructure. Towards this end, many data monitoring and analyses techniques using data mining and machine learning algorithms have been proposed to extract meaningful information from power consumption data. Significant work has been done in the area of energy demand prediction as well. However, there is a lack of a coherent mechanism which could glue together these energy saving techniques, at a certain level of abstraction, with a purpose to generate energy-saving policies for server farms, clusters and data centers. These policies, if formulated and enforced in IT systems, would help control the power consumption of these systems. Simultaneously, they would lead to optimization in the operations of the IT systems which evolve as the policies are refined with time, resulting in short-term as well as longterm financial gains. In this paper, we address this gap by formulating a mechanism for constructing energy optimization policies for a server farm or cluster, and providing an automation framework/methodology for implementing this mechanism in reallife scenarios. The proposed mechanism to construct energy optimization policies can be decomposed into four stages: monitor, analyze, predict, and frame policies. In the first stage, we monitor the workload of servers in a cluster, and 978-1-4244-5449-5/10/$26.00 978-1-4244-5448-8/10/$26.00 ©2010 ©2010 IEEE IEEE

gather the workload monitoring observations periodically at fixed time intervals (e.g., every hour). In the second stage, we reduce this data set and perform cleansing routines so as to be able to extract distinct workload demand patterns using data mining techniques. Using these server utilization patterns, and the Potluck Problem [2] solution strategy, we predict the demand of workloads in the cluster for future time periods in the third stage. Given the workload demand patterns and their respective properties for the tth time instance, the predictors estimate the workload demand in the cluster for the (t + 1)th time instance. The predictions, if reasonably accurate, help in the optimal resource provisioning in a server cluster, and the predictions that consistently repeat themselves for a significant period of time are used as a basis for framing systematic energy optimization policies in the final stage. We briefly describe the syntax and semantics of these energy optimization policies, and validate them using data from an actual server cluster, to determine energy savings that could have accrued had these policies been enforced in the cluster. In our study, we found that the enforcement of these policies could have yielded 45–58% energy savings in the server cluster. In order to be suited for practical use, the proposed mechanism to create these energy-saving policies must be able to take into consideration any change in the workload demand pattern, and refine the policies accordingly. Hence, the process of monitoring, analyzing data, predicting workload demand, and framing policies is repeated over fixed time intervals (i.e., at each potluck dinner instance) such that the policies are refined in every iteration to reflect any changes in the workload demand patterns in the cluster. We augment the usefulness of the proposed mechanism by providing an automation framework for the implementation of the mechanism. We consider a high-level implementation of this framework and discuss how it can be incorporated in the existing IT infrastructure automation products (e.g., Novell PlateSpin Orchestrate [3] and Cisco vFrame Data Center solution [4]). This paper is structured as follows. Section II provides an overview of the existing literature related to this work. Section III describes the system model which would be used in the data analysis, server utilization/workload prediction and construction of energy optimization policies. Section IV describes the Monitor, Analyze and Predict stages of the proposed mechanism. Section V starts with description of the energy optimization policies. This is followed by construc-

293

tion of the policies, which are then validated for their efficacy in terms of energy savings. Section VI discusses how the proposed mechanism can be incorporated in existing software products, and Section VII provides an automated framework for the implementation of the proposed mechanism.

be defined as the amount of work assigned to, or done by, a server in a given time period. Mathematically speaking, we can represent the server workload as a 3-tuple in the following manner:

II. L ITERATURE R EVIEW

where Wi = total workload on the ith server, ci = range of values which denote work assigned to, or done by, the CPU of the ith server, di = range of values which denote work assigned to, or done by, the disk of the ith server, and mi = range of values which denote work assigned to, or done by, the memory of the ith server. The aggregate Pnworkload on the server cluster can be represented by i=1 Wi , where n is the number of servers in the cluster, and Wi is the workload on the ith server.

A significant number of papers have appeared in the literature examining server workload management and prediction strategies in order to reduce the power consumption of computing infrastructure. Gmach et al. [5] propose a methodology to characterize and predict workload demand patterns based on which workload placement could be optimized. Moore et al. [6] present methods for automated analysis of workload data in data centers. They also propose workload playback methods which allow for emulation of sophisticated workloads in data centers. In [7], the same authors discuss temperature-aware placement of workloads in data centers which could lead to reduction in energy consumption. Heath et al. [8] illustrate the design mechanism of a heterogenous server cluster that can adjust its configuration and request distribution so as to reduce its power consumption. Research has also been done in the area of policy-based management of servers and other IT infrastructure. Xiao et al. [9] describe a policy-based wireless sensor network management structure and propose energy-efficient policies for clustering and cluster routing. Xu et al. [10] propose policies that could minimize the aggregate energy consumption of the clusters that are deployed for executing embedded applications. Elnozahy et al. [11] present policies for cluster-wide power management in server farms, using voltage scaling and node vary-on/vary-off techniques. Along similar lines, Rusu et al. [12] develop and evaluate power management policies for systems with unpredictable workloads, using dynamic voltage scaling technique. Paleologo et al. [13] describe a methodology for identifying optimal power management policies for electronic systems, using finite-state, stochastic models. These existing policy-construction methodologies result in ∼30% of energy savings for actual workloads in IT systems. In our work, we provide syntax for energy optimization policies, and using better analyses and prediction techniques, we frame policies for server clusters that yield 45–58% energy savings in an actual cluster. We also provide an automation framework using which the proposed mechanism can be implemented as a pluggable component in existing software products. This work is also an improvement over our earlier work [14] wherein we proposed a power reduction methodology that showed 20–35% energy savings in an IT system. III. T HE S YSTEM M ODEL A. Modeling Server Workload We model the server workload based on three physical attributes of the servers: CPU, memory and disk. Within the confines of these three components, server workload can

978-1-4244-5448-8/10/$26.00 ©2010 IEEE

Wi = hci , di , mi i

B. Modeling Server Utilization Pattern We call the workload on the ith server for a specific time period as the utilization pattern of the ith server. Formally, it can be defined as: Ui = hWi , start, endi, where Ui is the utilization pattern of the ith server, Wi is the total workload on the ith server, start is the start date and time of the utilization pattern, and end is the end date and time of the utilization pattern For instance, if the workload on the ith server happens to be h0–20.5, 20–41, 0–75i between 0200 hours and 0500 hours on February 1, 2010, we denote the utilization pattern as hh0–20.5, 20–41, 0–75i, 201002010200, 201002010500i. C. Modeling Power Consumption We use the power model of Heath et al. [8], which relates resource utilization of a server to its corresponding power consumption in the following manner: Pi = Bi +

X r

Mr,i ×

Rr,i , Cr,i

where Pi is the power consumed by the ith machine, Bi is the base power consumed by ith machine when idle, Mr,i is a measure of the power of resource r of ith machine at full utilization, Rr,i is the utilization of resource r on ith machine, and Cr,i is the capacity of resource r on ith machine. We assume that the variables in the power model assume the following (realistic) values: Bi = 100 watts Mc,i = 80 watts, where c denotes CPU resource Mm,i = 24 watts, where m denotes memory resource Md,i = 16 watts, where d denotes disk resource Cc,i , Cd,i (in %) = 100, as the entire CPU and disk is available for usage when the server is in power-on state, Cm,i = 5000 MB, the memory utilization of each of the server is normalized with respect to this maximum, Rc,i , Rd,i , Rm,i = CPU and disk utilization (in %), and memory utilization (in MB) respectively.

294

D. Modeling Server Workload Prediction As A Potluck Problem In a server cluster, each server can be considered to be a “supplier” of resources as it supplies CPU, memory and disk resources to the user at a particular time instance. Along similar lines, each user of the server cluster demands these resources and hence can be considered as a “consumer” of resources in the cluster. A time instance, t, at which the suppliers offer their resources and consumers utilize those resources, can be said to be a potluck dinner instance. At each dinner instance, the supplier individually decides how much resources to contribute to the dinner instance depending on the prediction it makes for the demand and supply at that time instance. There is no cooperation amongst the suppliers about the quantity of resources to be contributed to the dinner instance. The demand for server resources varies from one time instance to another owing to factors like additional workloads assigned to servers, trend variables based on day of the week and/or month, etc., making it difficult to predict. Let gi,t denote the quantity offered by the supplier i at instance t, and dj,t the quantity demanded by consumer j at instance t. In the Potluck Problem, the aim P an “enjoyable” dinner instance wherein the P is to have d ≈ j,t i gi,t . In the context of server cluster, this j implies that a time instance at which the demand of server resources is equal or nearly-equal to the supply of these resources, would be considered as an “enjoyable” potluck instance. This is the equilibrium or near-equilibrium state of the Potluck Problem. The Potluck Problem [2] is a repeated game of such instances wherein at each instance the supplier refines its predictions based on how close its predictions were to the actual demand at that instance. The repeat interval of the potluck dinner instance, in the context of server cluster, can be modeled as a tunable parameter which can be set, for example, to every 10 minutes or every hour.

number of seconds for which the CPU was in active state, to the total number of seconds in an hour. The memory utilization is the actual memory usage in MB, and disk utilization is actual disk space utilized as percentage of total disk space available. Before performing data analyses on the data set, we cleanse the data by filling in missing values and correcting inconsistencies in the data set. 2) Data Examination/Analyses: We proceed to determine the statistical properties of the time-series workload data of the server cluster using autocorrelation analysis. We determine the autocorrelation function (ACF) and partial autotcorrelation function (PACF) [15] for time series data about CPU, memory and disk. These are shown in Figures 1 and 2. The horizontal shaded line in each case indicates the critical values in this autocorrelation analysis.

Fig. 1.

ACF and PACF for CPU Utilization time-series

Fig. 2.

ACF and PACF for Memory Utilization time-series

IV. F ORMULATION OF THE M ECHANISM FOR C REATING E NERGY-O PTIMIZATION P OLICIES A. Workload Data Monitoring and Analyses We collect the workload data of a real server cluster for 9 months starting from April, 2009. Each of the machines in the cluster has one or more software agent(s) installed whose function is to monitor the utilization of the physical components (i.e., CPU, disk and memory utilization) on that server. These monitoring observations were taken by the agent every hour and then appropriately aggregated to form daily, weekly, and monthly reports. 1) Data Preparation and Cleansing: The data set collected is huge in size (i.e., greater than 1 TB) as it contains hundreds of attributes, most of which are not relevant to our analyses of server cluster workload. Therefore, we extract from this data only the subset of attributes that are required for the prediction of workload demand and creation of energy optimization policies. In the attributes subset, the CPU utilization for an hour is collected as the ratio of the

978-1-4244-5448-8/10/$26.00 ©2010 IEEE

It can be seen that the autocorrelation coefficients for CPU, memory and disk data are higher than the critical values of +0.144 and -0.144, for a significant number of time lags, indicating that this workload data is highly correlated to its past k period values, thus making it suitable for usage in the prediction mechanism. In order to find other trends in the workload time-series data, we use moving averages of order n, where n = 2, 3, . . . , 10. Using this technique, we eliminate unwanted fluctuations in the data set, and smoothen the time series. Various trends are revealed in this analysis about the usage of CPU, memory and disk in the server cluster. These trends

295

are then utilized in buiding predictors for workload in the server cluster. B. Developing A Prediction Mechanism We use the results of data analyses, and the Potluck Problem with multiple goods [16] concept in order to estimate demand for CPU, disk and memory resources of each of the servers in the server cluster. This prediction mechanism can be divided into the following two steps: 1) Constructing predictors for final workload prediction: We construct a system of predictors which, given the historical time-series utilization data about CPU, disk, and memory resources of a server for tth time instance, would estimate demand of these resources for the (t + 1)th time instance. Using the insights gained from the data analyses, we build the predictors on the following lines: a) Average hourly/daily/weekly/monthly resource utilization correlated to average utilization of each of the other resources. For instance, CPU usage correlated to memory/disk usage, etc. b) Each resource’s utilization autocorrelated to itself with varying time lags (in hours/days). c) nth weekday’s resource utilization as related to x% of average resource utilization of past y weeks’ nth weekday, where 0 ≤ x ≤ 100 is a tunable parameter. d) Average hourly/daily/weekly/monthly resource utilization as related to average hourly/daily/weekly/monthly CPU utilization of past n alternate weeks. e) Other variations of the above based on different statistical and time parameters. 2) Computing final workload demand prediction: Once we have built these predictors, we go on to determine the final estimate of servers’ workload in the cluster for a particular time instance, as suggested in the Potluck Problem. In order to do so, we determine the accuracy of each of the predictors using a training data set of 3months from July till September and assign weights to these predictors based on their accuracy rate. Thereafter, in accordance with the Potluck Problem solution, we use the weighted majority algorithm [17] to determine the final estimate about workload demand for the October to December 2009 time period. We derive hourly, daily, weekly and monthly predictions about the servers’ workload using this methodology. The daily predictions are shown in Figures 3, 4, and 5. V. F RAMING E NERGY O PTIMIZATION P OLICIES Following our predictive analysis, we now frame the energy optimization policies for server clusters. We first describe the construction of energy policies in Section V-A. Thereafter, we create energy optimization policies for server clusters in Section V-B. These policies are then evaluated in

978-1-4244-5448-8/10/$26.00 ©2010 IEEE

Fig. 3.

CPU Usage Predictions

Fig. 4.

Memory Usage Predictions

Section V-C to determine the energy savings that might have accrued had they been enforced in the cluster. A. Policy Description An energy optimization policy p ∈ P, where P is the universe of all energy-optimization policies, is given by a 4-tuple hσ, a, j, δi where σ ⊂ S, S being a set of all servers in the cluster, a ∈ A, A being a set of all energy-saving actions that could be executed in the cluster, and j is the begin time at which the energy-saving should be executed, and δ is the duration of the energy-saving action (in seconds). It may be noted that the energy-saving action that can be executed on a machine would depend on its hardware model. Hence, the elements of A might differ in every cluster based on the type of hardware models of servers in the cluster. In our server cluster, we considered the following energysaving actions: A = {Shut-down, Switch-to-low-power-state, Enable-resource-sharing}. Consider a policy p = hσ, a, 2009 − 11 − 21T 04 : 00, 7200i, where σ = {s1 , s2 , . . . , sk }, with si being the ith server in the cluster, and a being Shut-down energy saving action. This policy can be interpreted as: all the servers belonging to set σ should be put in shut-down mode starting November 21, 2009 at 0400 hours for a duration of 2 hours. (The date format used for t is as per the ISO 8601 convention. A value like 2009-11-21T04:00 should be read as: 2009 year, 11th month, 21st day, 0400 hours.) B. Constructing Policies We follow a two-step approach for constructing energy optimization policies: 1) Baseline characterization of servers: We first characterize the servers in the cluster based on the following attributes:

296

TABLE I S ERVER CHARACTERISTICS FOR E NERGY P OLICY p1 Criticality

Utilization Pattern

Functional Dependency

Low

Ui

Independent dependent

or jointly–

TABLE II S ERVER CHARACTERISTICS FOR E NERGY P OLICY p2 Fig. 5.

Accuracy rate of CPU and Memory Predictions

a) Criticality of Workloads: This attribute can have three values: Low, Medium, and High. While designing a policy for a server, we take into consideration the kind of workload running on the server, in terms of criticality. For instance, if a server is running a critical application which is, say, accessed by the end-users in a live production environment, then an energy-saving action like Shut-down may not be applicable to the server. b) Resource Utilization Pattern: The resource (i.e., CPU, disk, and memory) utilization pattern of a server plays an important role in determining what energy-saving actions may be enforced on it. The server systems that handle low workloads for a consistently long period and hence have low resource utilizations can easily be put in lowpower state, as opposed to servers with significantly higher workloads. c) Functional Interdependencies: If a server is functionally dependent on other server(s) then we consider the two servers as belonging to a logical group and form policies which can be enforced on the group as a whole. This kind of characterization is necessary because a power-saving policy, if applied to a server without taking into consideration other servers dependent on it, might result in an overall unstable system state in the cluster. 2) Creating policies: We now frame energy optimization policies based on the data analyses and prediction results. a) Energy Policy p1 : We formulate an energy optimization policy by taking into consideration the weekly or n-weekly predictions of server utilization patterns in the cluster. Out of all the utilization predictions, we extract the following utilization pattern prediction: Ui = hh0 − 4, 0 − 7, 0 − 280i, 2009 − 11 − 01T 01 : 00, 2009 − 11 − 08T 01 : 00i. It is observed that this pattern is predicted to occur every week for 8 weeks. Out of all the servers estimated to have Ui utilization pattern, we select a subset σ of the servers that have the characteristics specified in Table I. We then formulate a server-consolidation policy for the server cluster wherein the workload on

978-1-4244-5448-8/10/$26.00 ©2010 IEEE

Criticality Low Medium



Utilization Pattern

Functional Dependency

Ui

NA

these servers is consolidated and placed on a subset of servers, α ⊂ σ, and the following policy is enforced on servers belonging to (σ \ α): h(σ \ α), a, 2009 − 11 − 01T 01 : 00, 604800i where a = Shut-down. This policy states that all the servers belonging to the set (σ \ α) should be shut down starting November 1, 2009 at time 0100 hours for a duration of 1 week. Going by our prediction results, this policy can be actuated every week for a period of 8 weeks. b) Energy Policy p2 : We take into consideration the hourly or n-hourly predictions about server utilization patterns in the cluster, which may repeat on a periodic basis. Out of all the utilization predictions, we extract the following utilization pattern prediction, which occurs for ∼77% of the servers in the cluster: Ui = hh0 − 10, 5 − 15, 1 − 500i, 2009−11−21T 02 : 00, 2009−11−21T 06 : 00i. It is observed that this pattern is predicted to occur every day for 25 days. Out of all the servers estimated to have Ui utilization pattern, we now select a subset, σ, of the servers that have the characteristics specified in Table II. We then formulate the following policy for the server cluster: hσ, a, 2009 − 11 − 21T 02 : 00, 14400i where a = Switch-to-low-power-state. This policy states that all the servers belonging to σ should be put in low power state mode starting November 21, 2009 at time 0200 hours for a duration of 4 hours. Going by our prediction results, this policy can be actuated every day for a period of 25 days. c) Energy Policy p3 : In this policy, we analyze server utilization predictions to find a time instance for which the maximum number of servers are estimated to have relatively low utilization. In our cluster, one such time instance is 2009-1220T02:00 – 2009-12-20T06:00 at which ∼82% of the servers have the utilization pattern hh0−8, 0− 25, 1−400i, 2009−12−20T 02 : 00, 2009−12− 20T 06 : 00i. This utilization pattern is predicted to repeat every week (i.e., every Sunday) for 3 weeks. Next, we extract a set of servers σ from

297

S, which satisfies the criteria specified in Table I. Thereafter, we create the policy for scheduled shut-down of the servers in the cluster: hσ, a, 2009 − 12 − 20T 02 : 00, 14400i where a = Shut-down. This policy states that a shut-down operation should be scheduled for servers belonging to σ at 2009-12-20T02:00 for 4 hours. This policy can be enforced on the server cluster with relative ease if the servers are primarily used for development and/or testing purposes (and not for hosting critical client applications) and hence are of low criticality. In our server cluster, ∼80% of the total servers in our server cluster are development or test servers. Based on the prediction results, we can say that this policy can be actuated every week for 3 weeks. d) Energy Policy p4 : This policy deals with enabling the resources of servers to be shared so that the utilization of these resources can be improved. In order to create this policy, we first identify the servers which have moderate resource utilizations consistently from July to October, 2009, and are predicted to have similar levels of utilization for the following two months. In our server cluster, we found the following utilization pattern appropriate for this policy: hh0 − 15, 0 − 45, 1 − 1000i, 2009 − 07 − 10T 00 : 00, 2009 − 10 − 10T 00 : 00i. The servers with this utilization pattern are then taken as belonging to σ based on the characteristics specified in Table II. We then create the following policy for the servers in σ: hσ, a, 2009 − 11 − 01T 00 : 00, 5184000i where a = Enable-resource-sharing. This policy states that the CPU, disk and memory resources in servers in σ should be enabled for sharing by other systems/applications from November 01 to December 31, 2009. This policy may not lead to immediate tangible energy savings but would certainly result in the optimization of the power consumption in the server cluster over a period of time. In our framework, we have built policies based on aggregate utilization of CPU, disk, and memory of servers. But policies can also be built based on each of these physical components individually. For instance, CPU-specific energy-saving policies can be framed for servers that handle CPU-intensive workloads since those policies would be most effective in optimizing the power consumption of that server. Along similar lines, policies can be formulated for disk-intensive and memory-intensive servers based on individual disk, and memory resources, respectively. The energy optimization policies described above are derived from the workload demand in the server cluster which varies from time to time. Due to this, it becomes

978-1-4244-5448-8/10/$26.00 ©2010 IEEE

important to refine existing energy policies, and create new ones that can cater to the changing workload demand in the cluster. Towards this end, the proposed mechanism should be executed repeatedly at specific time intervals, analogous to a potluck dinner instance, such that with every time interval, the data is monitored and analyzed to compute the predictions and, in accordance with these predictions, the policies are updated. This would result in creation of a system of dynamic energy-saving policies which evolves with time and takes into consideration changes in the workload patterns from time to time in the server cluster. C. Policy Analyses We now evaluate the energy optimization policies based on the energy savings that each of the policy could have yielded had it been enforced in the server cluster. We found that energy policy p1 yielded the maximum energy savings of 45–58%, as shown in Figure 6.

Fig. 6.

Energy savings following actuation of policy p1

The energy savings (of between 6–10%) that would have accrued from the implementation of energy policy p2 is shown in Figure 7.

Fig. 7.

Energy savings following actuation of policy p2

The energy savings from the implementation of energy policy p3 are in the range of 2–3% as they are enforced only once in a week for 5 hours. However, the utilization pattern of this policy can be used to identify time periods during which it would be best to perform maintenance operations in the cluster. That is, this policy can be implemented in the cluster as part of preventive maintenance of the server cluster and hence can be considered as a “maintenance” policy. On

298

the other hand, policies p1 and p2 can be enforced in the cluster in order to reduce energy-savings by optimizing dayto-day operations, and hence can be considered as ’operational’ policies. The policy p4 can be actuated in order to reduce the power consumption of the cluster in the medium to long run by improvising the utilization of servers. Along similar lines, other policies can be constructed for energy optimization of server clusters. It may be noted that the energy savings shown in Figures 6 and 7 would differ from cluster to cluster owing to different utilization patterns and workloads in the cluster. VI. I NCLUSION OF THE P ROPOSED M ECHANISM E XISTING S OFTWARE P RODUCTS

IN

The proposed mechanism for creating energy optimization policies can be plugged into the following existing data center automation products: 1) Novell Platespin Orchestrate [3]: This product is used by data center administrators to manage and provision resources in order to optimize resource usage in the data center. It also provides an extensible policy definition language for constructing user-defined policies using which workload placement on physical and virtual resources in the data center can be automated and controlled. Proposed features and benefits: Our methodology can be used to automate the process of creating resource optimization policies in this product. The method construct policies for provisioning resources in the data center which would result in optimization of energy consumption of the data center. The users can preview these automated policies, update them using the policy definition language and thereafter enforce them in the data center. Specifically, the following new features would be added in the Platespin Orchestrate using our methodology: a) Addition of workload analyses and prediction methodology for the data center. b) Automation of the process of constructing resource provisioning policies. c) User interface for displaying estimated energy savings that could accrue from each of the resource provisioning policy. Implementation: The proposed mechanism to create energy optimization policies can be implemented as a pluggable component (using technologies like eclipse plug-in) in Platespin Orchestrate. The actuation mechanism for these policies can be built using custom automation scripts in the Platespin Orchestrate Server Portal. 2) Cisco VFrame Data Center solution [4]: This product offers service-oriented provisioning model for data centers wherein physical resources can be provisioned (and re-provisioned) in an automated and policy-based manner depending on the application requirements.

978-1-4244-5448-8/10/$26.00 ©2010 IEEE

Proposed features and benefits: The proposed mechanism can be used in this product to analyze the each server’s resource usage, predict the server utilization patterns for future time periods, and accordingly construct the policies for servers that would guide the data center administrators in provisioning applications on the servers. The following new features would be added in the product owing to our methodology: a) New methodology for analyzing various trends in each server’s utilization in the data center. b) Construction of application provisioning policies for each server as well as for the data center as a whole such that the resource usage in the data center is optimized in terms of energy consumption. c) Automate resource provisioning in the data center by enforcing policies that yield maximum energy savings. Implementation: The VFrame Data Center solution can incorporate our methodology as an add-on feature that would yield significant energy savings in the data centers. The algorithms for analysis and policy construction can be implemented as a component in the feature (using any common programming language). The enforcement of policies can be built and configured as an automated process in the product such that minimal user intervention is required in provisioning applications to the servers. VII. AUTOMATION OF THE M ECHANISM TO F RAME E NERGY O PTIMIZATION P OLICIES The proposed mechanism can be automated as a software tool which can either be used in isolation or can be integrated with a software product or service bundle. The following steps provide a high-level view of the automation approach: 1) Fetch the historical time-series server utilization data with hourly granularity from the repository. Let i = 1, 2, . . . , N denote the ith server in the cluster. This step requires the server monitoring infrastructure to be in place which could collect the server utilization data and place the same in a repository. 2) Reduce the data using data reduction techniques like dimensionality reduction, and attribute subset selection. Since the policies are based on CPU, memory and disk utilization of servers, we reduce the data set obtained in Step 1 to include data about only these physical attributes of the servers. 3) Execute data cleansing routines on the reduced data set to smooth out noise and fill in the missing values in the data set. Aggregate this cleansed data over time dimension to obtain daily, weekly and monthly summarization of servers’ utilization in the cluster. Let ci,j , di,j , mi,j denote cpu, disk and memory utilization of ith server at j th time instance respectively, and j denote one hour. We compute for each server i:

299

 P24

j=1 (ci,j )

24

  P24   P24  j=1 (di,j ) j=1 (mi,j ) , , and in 24 24

order to obtain daily workload summarization. Along similar lines, we compute weekly and monthly summarization of servers’ workload in the cluster. 4) Extract workload patterns for servers using each of ci,j , di,j , and mi,j physical utilization attributes. Statistical techniques like linear/multivariate regression, correlation analysis, etc can be used for workload pattern extraction operations. 5) Create a set of predictors, Q, based on the patterns obtained in Step 4 such that each predictor q ∈ Q could estimate the workload demand in the cluster for future time periods. 6) Calculate the final estimates of workload demand in accordance with the Potluck Problem solution which makes use of the weighted majority algorithm [17]: Pk q=1 (Ei,q,t × Fi,q,t ) Di,t = , Pk q=1 Ei,q,t

where Di,t = demand predicted by agent i at tth dinner instance, Ei,q,t = weight maintained by agent i for q th predictor at tth dinner instance, and Fi,q,t = demand estimated by agent i’s q th predictor at tth dinner instance. 7) Repeat Step 6 at every potluck dinner instance and update the weights of all the predictors based on how close their workload demand estimation were to the actual demand at the dinner instance. 8) Evaluate these final workload demand predictions, and frame energy optimization policies for the cluster based on the predictions which have relatively highaccuracy. The actual construction of policies would require an interface using which a server cluster administrator could specify criticality, functional-dependency and other attributes of the servers. 9) Build a software component to perform the operations required for actuation of the policies. The type of policies for which the actuation could be performed by the software itself, can be modeled as a configurable parameter. VIII. C ONCLUSION

AND

F UTURE W ORK

Energy optimization in server clusters is a critical issue that has been addressed in this paper using predictive analysis. We provided a method to predict workload demand in a server cluster using the Potluck Problem solution strategy. Based on the analyses of workload demand predictions, we framed energy optimization policies for the cluster which yield significant energy savings over a period of time. We also evaluated these policies in an actual computing cluster to determine the efficacy of our approach. Thereafter, we discussed how the proposed mechanism can be incorporated into existing data center automation products, and suggested a high-level implementation of the methodology to automate the proposed mechanism. The policies sketched can, in future

978-1-4244-5448-8/10/$26.00 ©2010 IEEE

work, be refined to include more of the intricacies involved in the actuation of realistic policies in a server cluster. Those policies that stand out in their contribution to the energy savings in the cluster can be studied in more detail so as to create a generalized policy-based framework for managing power in server clusters. IX. ACKNOWLEDGEMENT The work of author S. Rao was supported in part by an IBM Faculty Award. R EFERENCES [1] US EPA, “Report to Congress on server and data center energy efficiency,” in Public Law 109-431, U.S. Environmental Protection Agency ENERGY STAR Program, 2007. [2] P. K. Enumula and S. Rao, “The Potluck Problem,” Economics Letters, pp. 10–12, Apr. 2010, doi:10.1016/j.econlet.2009.12.011. [3] Novell Platespin Orchestrate Product Guide, Version 2.0.2, 2nd ed., [Online]. Available: http://www.novell.com/products/orchestrate/. [4] Cisco VFrame Data Center Administration Guide, Version 1.2, 1st ed., [Online]. Available: http://www.tinyurl.com/vframecisco. [5] D. Gmach, J. Rolia, L. Cherkasova, and A. Kemper, “Workload analysis and demand prediction of enterprise data center applications,” in IISWC ’07: Proceedings of the 10th IEEE International Symposium on Workload Characterization, Washington, DC, USA, 2007, pp. 171– 180. [6] J. Moore, J. Chase, K. Farkas, and P. Ranganathan, “Data center workload monitoring, analysis, and emulation,” in CAECW-8: Proceedings of the 8th Workshop on Computer Architecture Evaluation using Commercial Workloads, Feb. 2005. [7] J. Moore, J. Chase, P. Ranganathan, and R. Sharma, “Making scheduling cool: Temperature-aware workload placement in data centers,” in ATEC ’05: Proceedings of the annual conference on USENIX Annual Technical Conference, Berkeley, CA, USA, 2005, p. 5. [8] T. Heath, B. Diniz, E. Carrera, W. Meira, and R. Bianchini, “Energy conservation in heterogeneous server clusters,” in PPoPP ’05: Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming, New York, NY, USA, 2005, pp. 186–195. [9] M. Xiao, G. Chen, and D. Xiao, “A policy-based energy efficient clustering scheme for wireless sensor networks,” in SNPD ’07: Proceedings of the Eighth ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing, Washington, DC, USA, 2007, pp. 689–694. [10] R. Xu, D. Zhu, C. Rusu, R. Melhem, and D. Mosse, “Energy-efficient policies for embedded clusters,” in LCTES ’05: Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems, New York, NY, USA, 2005, pp. 1–10. [11] E. N. Elnozahy, M. Kistler, and R. Rajamony, “Energy-efficient server clusters,” in Workshop on Power-Aware Computer Systems (PACS02), Cambridge, MA, USA, Feb. 2002, pp. 179–196. [12] C. Rusu, R. Xu, R. Melhem, and D. Mosse, “Energy-efficient policies for request-driven soft real-time systems,” in ECRTS ’04: Proceedings of the 16th Euromicro Conference on Real-Time Systems, Washington, DC, USA, 2004, pp. 175–183. [13] G. A. Paleologo, L. Benini, A. Bogliolo, and G. D. Micheli, “Policy optimization for dynamic power management,” in DAC ’98: Proceedings of the 35th annual Design Automation Conference, New York, NY, USA, 1998, pp. 182–187. [14] N. Singh and S. Rao, “Modeling and reducing power consumption in large IT systems,” in Syscon ’10: Proceedings of the Fourth IEEE International Systems Conference, Vancouver, Canada, Apr. 2010. [15] G. Box, G. M. Jenkins, and G. Reinsel, Time Series Analysis: Forecasting and Control, 3rd ed. Prentice Hall, 1994. [16] N. Singh and S. Rao, “The Potluck Problem with Consumers‘ Choice Behavior,” in CASE’09: Proceedings of the fifth annual IEEE international conference on Automation science and engineering, Aug. 2009, pp. 328–333. [17] N. Littlestone and M. K. Warmuth, “The weighted majority algorithm,” Information and Computation, vol. 108, no. 2, pp. 212–261, Feb. 1994.

300