Meta-learning based architectural and ... - ACM Digital Library

6 downloads 114 Views 870KB Size Report
ABSTRACT. Predictive workload analytics for server systems has been the focus of recent research in energy-aware computing, with many algorithmic and ...
Meta-Learning based Architectural and Algorithmic Optimization for Achieving Green-ness in Predictive Workload Analytics Nidhi Singh∗ and Shrisha Rao† International Institute of Information Technology- Bangalore (IIIT-B) Bangalore, India

[email protected]∗ , [email protected]† ABSTRACT

1.

Predictive workload analytics for server systems has been the focus of recent research in energy-aware computing, with many algorithmic and architectural techniques proposed to analyze, predict, and optimize server workloads in order to build energy-aware IT systems. These techniques, though effective in optimizing server workloads, often ignore the green-ness aspect ‘in’ the technique itself and incur heavy computational costs in their operations. In this paper, we propose a meta-learning based architecture for building server workload prediction mechanism using which the computational cost of holistic predictive workload analytics can be optimized, and green-ness ‘in’ the analytics can be achieved. We also present an algorithmic optimization of the proposed meta-learning architecture for handling concept drift in server workloads, thereby also achieving improved greenness ‘by’ the analytics. Our experiments show that the proposed meta-learning based architecture substantially reduces the total computational cost of workload prediction mechanism, with a minor decrease of 0.5–1.3% in the accuracy, and the proposed algorithmic optimization significantly improves accuracy of the workload prediction mechanism in concept drift scenarios by 8.1%.

Predictive workload analytics for server systems has received wide attention from research community in the recent past due to its applications in optimizing servers’ utilization and reducing energy consumption in server systems. Many algorithmic techniques and software architectural designs for predictive analysis of server workloads are proposed in the literature [6, 7, 8]. While these techniques and architectures are effective in analyzing and predicting server workload, they typically do not take into consideration the green-ness aspect of the technique and/or the architecture itself. In this paper, we address this issue and propose a meta-learning based architecture for predicting server workload that focuses on achieving green-ness in the architecture without compromising significantly on the prediction accuracy. We also present algorithmic optimization of the proposed meta-learning architecture with an aim to increase the robustness in handling complex server workload scenarios, thereby achieving green-ness by the architecture. Since workload prediction mechanism forms the building block of any predictive workload analytics, achieving green-ness in and by the server workload prediction mechanism leads to green-ness in the holistic predictive workload analytics. We consider a baseline architecture of server workload prediction mechanism in which the historical workload timeseries of each server is analyzed, and based on the analyses results, a prediction model is developed for each server. Each prediction model is re-trained periodically in order to cope up with recent trends in server workloads. This architecture, which we hereafter call model-based architecture (MB-A), results in high accuracy of workload predictions but suffers from a serious drawback, i.e., it involves large number of expensive computations that are periodic in nature, as a result of which the total number of computations involved in MB-A increase rapidly with time. In this paper, we propose an architectural optimization of MB-A with the objective to achieve green-ness in the architecture by reducing its total computational cost. In particular, we use ensemble-based architectural design in which, on the lines of Weighted Majority [10], we train an ensemble of prediction models for each server, and periodically learn and update the meta-attribute of each prediction model. This architecture, which we hereafter call meta-learning architecture (ML-A), though involves higher number of computations as compared to ML-A in the initial phases, substantially reduces the num-

Categories and Subject Descriptors D.2.11 [Software Engineering]: Software Architectures; I.2.1 [Artificial Intelligence]: Applications and Expert Systems; I.2.6 [Artificial Intelligence]: Learning

General Terms Algorithms, Design, Performance, Experimentation

Keywords Meta-Learning, Predictive Workload Analytics, Energy Optimization Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SAC’13 March 18-22, 2013, Coimbra, Portugal. Copyright 2013 ACM 978-1-4503-1656-9/13/03 ...$15.00.

1169

INTRODUCTION

ber of expensive periodic computations involved in the later phases by alleviating the need to re-train each prediction model as a whole. We also present an algorithmic optimization of the proposed meta-learning architecture in order to achieve greenness by the architecture through robust workload predictions that can cope with concept drift in server workloads. A concept drift [3] is said to occur in server workload when there is a change in the underlying distribution of workload time-series data due to events like change in the ‘dominant’ application of servers (e.g., database server reconfigured to a web server), allocation/de-allocation of servers, etc. In order to handle such drifts, we algorithmically optimize ML-A on the lines of Dynamic Weighted Majority [9] such that the ensemble of prediction models associated with a server evolves by adding and removing prediction models in response to a concept drift in workload of the server. We perform detailed comparison study of the computational cost of proposed meta-learning architecture and that of the baseline model-based architecture. We emphasize that ML-A substantially reduces the computational cost of server workload prediction mechanism with only minor decrease of 0.05–1.3% in the prediction accuracy. We also demonstrate the effectiveness of the algorithmic optimization of ML-A in handling concept drifts, and show that the workload prediction accuracy can be improved by 8.1% in concept drift scenarios using the proposed algorithmic optimization. The rest of the paper is organized as follows. We describe the baseline model-based architecture and the proposed meta-learning architecture in Section 2, and the proposed algorithmic optimization for handling concept drift scenarios in Section 3. We compare the computational cost of model-based and meta-learning architectures, and present experimental results of the proposed architectural and algorithmic optimizations in Section 4. We review state-of-theart in this domain in Section 5, and conclude our work in Section 6.

2.

Figure 1: Baseline Model-based Architecture (MB-A)

ARCHITECTURAL OPTIMIZATION FOR GREEN-NESS IN WORKLOAD PREDICTION MECHANISM

In this section, we describe the design of baseline modelbased architecture and proposed architectural optimization thereof, i.e., the meta-learning based architecture, for workload prediction mechanism. We also present a high-level implementation of each of these architectures.

2.1

The prediction phase begins when servers’ workload prediction is requested for a future time step t0 by the end-users of workload prediction mechanism. The end-users could be components of a predictive workload analytics system like a reporting engine that reports predicted utilization of servers or a recommendation engine that build energy recommendations based on workload predictions. In this phase, the prediction model mi of each server i computes the predicted workload of the server for target time step t0 . After every k time steps, the incremental updation phase is entered into wherein the Prediction Model Updater component re-trains and updates each prediction model mi based on the actual workload of server i stored in the datamart for the past k time steps. Here, k can be interpreted as the periodicity of re-training and updation of prediction models. Typically, k is of hourly granularity and is set to small values like 2 or 6 because the predictive workload analytics system that uses server workload predictions for, say, building energy recommendations, needs to abide by certain Service Level Agreements in which the recency of the workload predictions (that form the basis of analytics like energy recommendations) is mandated to be greater than or equal to a fixed value of k. It may be noted that the re-training of prediction models typically involves re-estimating and updating the parameters of prediction models which, in turn, requires computationally expensive operations for any nontrivial workload prediction model. Implementation: A high-level implementation of MB-A is shown in Algorithm 1, which takes the following inputs: Set of servers S, a time period T for historical data analysis (e.g. last 30 days), and a target time step t0 for which workload predictions are desired for all servers in S. In the initialization or training phase, we train a prediction model mi for each server i in S, using historical workload data of server i in the time period T (Line 1–1.1). Each trained model mi is added to a vector m (Line 1.2) which is stored for use in the prediction and incremental updation phases of the algorithm. When workload predictions of servers in S is requested for target time step t0 , i.e., the prediction phase, we retrieve trained model mi for each server i (Line 2–2.1), (t0 ) and compute predicted workload pi of server i for time step t0 using retrieved model mi (Line 2.2). The predicted 0 workload of each server is added to the vector p(t ) (Line

Model-based Architecture (MB-A) for Workload Prediction Mechanism

The baseline model-based architecture has three phases namely, initialization or training phase, prediction phase, and incremental updation phase. In Fig. 1, we illustrate each of these phases (in different colored lines) along with the structural components of MB-A. In this figure, the workload datamart represents a centralized repository which contains the workload time-series data of all servers in consideration. In the initialization phase, the historical workload data of each server i in the datamart is individually analyzed by the Pattern Analyzer component in order to identify the statistical properties and patterns in the workload of the server. Based on the analyses results of Pattern Analyzer for server i, a prediction model mi is formed for the server.

1170

Algorithm 1: Workload Prediction using Baseline Model-based Architecture Input: S: Set of servers Input: T : Time period for historical data analysis Input: t0 : Target time step for which workload prediction is required 0 (t) |S| Output: p(t ) = {pi }i=1 : Vector containing workload predictions of each server in S for time step t0 begin m←∅ Training phase: 1. For each server i ∈ S 1.1 Create and train model mi based on server workload data for time period T 1.2 Add mi to m Prediction phase: 2. For each server i ∈ S 2.1 Retrieve mi from m 2.2 Compute (t0 ) pi

(t0 ) pi

Figure 2: Proposed Meta-Learning Architecture (ML-A)

using mi

(t)

where pi denotes the compound workload prediction of server i for time step t, L denotes the number of base learn(t) ers in the ensemble associated with server i, wj,i denotes the

(t0 )

2.3 Add to p Incremental Updation phase: 3. After every k time steps 3.1 For each server i ∈ S 3.1.1 Retrieve mi from m 3.1.2 Re-train model mi using workload data for past k time steps end

(t)

weight of base learner bj,i at time step t, and xj,i denotes the workload prediction of server i as computed by base learner bj,i at time step t. In this equation, the compound or final (t) workload prediction pi of server i at time step t is defined as the weighted average of server workload predictions of server i computed by all base learners in the ensemble. In the Incremental Updation phase, the weight of each base learner in the ensemble associated with a server is updated based on the extent of the prediction error made by the base learner. This is done by the Multiplicative Weight Updater component using the following weight updation rule of WM algorithm:

2.3) which is returned as output to the end-users. After every k time steps, we enter the incremental updation phase (Line 3) wherein we retrieve prediction model mi for each server i in S (Line 3.1.1), and re-train it using the workload data of server i for the last k time steps (Line 3.1.2).

2.2

Meta-learning Architecture (ML-A) for Workload Prediction Mechanism

(t+1)

wj,i

(t) pi

=

(t)

(t)

L P

(t)

β |xj,i −ρi

|

(t)

(t)

≤ F ≤ 1 − (1 − β)|xj,i − ρi |, 0 < β < 1

(3)

(t)

and ρi is the actual workload of server i at time step t. This weight updation rule decreases the weight of a base learner which mispredicted the server workload, by a factor of F , thereby reducing the relative influence of low-performing base learners on future workload predictions. The updated weights of base learners are then normalized by the Multiplicative Weight Updater component to sum to one using the following equation: (t+1)

(t+1)

wj,i

=

wj,i L P (t+1) wj,i

(4)

j=1

This meta-learning based architecture relies on learning and updating the meta-attributes, i.e., weights, of each base learner in order to accurately predict server workload. The weight updation operation is significantly cheaper in terms of computations as compared to re-training the prediction model itself (as was done in MB-A). Hence, even though ML-A incurs additional initial computational cost due to training more than one base learner for each server (as opposed to one prediction model for each server in MB-A), it leads to substantial reduction in the total computational

(t)

wj,i xj,i

j=1

(2)

where F is any factor that satisfies the condition:

The meta-learning based architecture for workload prediction has the same three phases as the MB-A, each of which is illustrated in Fig. 2 (in different colored lines). During the training phase, the Pattern Analyzer component analyses the historical workload of each server, and based on its analyses results, an ensemble of prediction models or base learners is formed for each server i. A base learner is a prediction model which, given historical server workload data, could predict workload for future time steps and, in addition, has a meta-attribute (which we hereafter call weight) associated with it which indicates its accuracy in predicting server workload. When workload prediction is requested for server i, i.e., in the prediction phase, each base learner in the ensemble associated with server i individually computes the workload prediction of server i for future time step. These workload predictions are then combined by the Compound Forecaster component to form the final or compound workload prediction of server i. Specifically, the Compound Forecaster uses the following ensemble combination rule of Weighted Majority(WM) algorithm [10]: L P

(t)

= F wj,i

(1) (t)

wj,i

j=1

1171

cost of the workload prediction mechanism by avoiding to re-train the prediction models and learning/updating only their meta-attributes (as will be explained in detail in Section 4).

done in MB-A. Instead, after every k time steps, we only (t) update the weight wj,i of each base learner in ensemble Ei associated with each server i, using Eq. 2 (Line 3.1.1). Thereafter we normalize the updated weights of base learners using Eq. 4 (Line 3.1.2).

Algorithm 2: Workload Prediction using Meta-Learning Based Architecture Input: Same as in Algorithm 1 Output: Same as in Algorithm 1 begin Training phase: 1. For each server i ∈ S 1.1 Create ensemble Ei based on workload data for time period T 1.2 For each base learner bj,i ∈ Ei 1.2.1 Train bj,i based on workload data for time period T (t) 1.2.2 Set wj,i to 1/|S| Prediction phase: 2. For each server i ∈ S

3.

Concept drift [3] is said to occur in server workloads when the underlying distribution of the workload time-series exhibit a significant change due to events like re-allocation of servers (which leads to changes in the ownership and usage of servers), or re-configuration of servers wherein the ‘dominant’ application of the servers is changed (e.g., a database server reconfigured to a web server). Model-based architecture tries to cope with such concept drift scenarios by updating the parameters of the prediction model for each server, which may not be sufficient in cases where the drifted concept cannot be approximated well by the functional form of the current prediction model and requires prediction model of a different functional form altogether. For instance, a linear workload prediction model may not perform well, even with updated parameters, if the workload time-series drifts to a distinct and complex non-linear pattern. For such concept drift scenarios, the meta-learning architecture can be easily extended using Dynamic Weighted Majority(DWM) [9], an ensemble learning approach in which the ensemble of base learners is not kept constant, but is evolved by adding new base learners and removing low-performing ones based on the accuracy of compound or final predictions as well as accuracy of each individual base learner.

(t0 )

2.1 Compute prediction xj,i of each base learner bj,i ∈ Ei 2.2 Compound workload prediction of base learners in Ei as: L P

(t0 ) pi

=

(t) (t0 )

wj,i xj,i

j=1 L P

(t)

j=1 0

wj,i 0

(t )

2.3 Add pi to p(t ) Incremental Updation phase: 3. After every k time steps 3.1 For each server i ∈ S 3.1.1 Update weight of each base learner bj,i (t) (t+1) ∈ Ei as: wj,i = F wj,i 3.1.2 Normalize the updated weights as: (t+1) wj,i

Algorithm 3: Dynamic Ensemble Updation to Handle Concept Drifts Input: S: Set of servers Input: θ: Threshold for minimum weight of a base learner Input: τ : Threshold for workload prediction accuracy of an ensemble begin Dynamic Ensemble Updation phase: 1. After every k time steps 1.1 For each server i in S (t) 1.1.1 If wj,i < θ for any bj,i in Ei 1.1.1.1 Remove base learner bj,i from Ei 1.1.2 Compute mean workload prediction accuracy Πi,k of Ei for past k time steps 1.1.3 If Πi,k < τ 1.1.3.1 Remove the base learner having minimum weight in Ei 1.1.4 If η base learners removed (in Step 1.1.1 and 1.1.3) 1.1.4.1 Add η base learners in Ei 1.1.4.2 Set the initial weight of new base learners to 1 1.1.4.3 Normalize the weights as:

(t+1)

=

wj,i L P j=1

ALGORITHMIC OPTIMIZATION FOR GREEN-NESS BY WORKLOAD PREDICTION MECHANISM

(t+1)

wj,i

end Implementation: We describe high-level implementation of the meta-learning architecture in Algorithm 2, which takes the following inputs: Set of servers S, a time period T for historical data analysis, and a target time step t0 for which workload predictions are desired for all servers in S. In the training phase, we create an ensemble Ei of base learners for each server i in S (Line 1.1) based on the analyses results of workload of server i in the time period T . We train each base learner bj,i in the ensemble Ei on the historical workload data in time period T (Line 1.2.1), and set their initial weight equally to 1/|S| (Line 1.2.2). In the prediction phase, each base learner bj,i in ensemble Ei associated with server i individually provides workload (t0 ) prediction xi of server i (Line 2.1). These predictions are compounded using ensemble combination rule of WM in Eq. 1 to compute final workload prediction of server i for the target time step t0 (Line 2.2). The workload prediction 0 of each server is added to the vector p(t ) (Line 2.3) which is returned as output to the end-users. In the incremental updation phase, we do not re-train the base learners in ensemble Ei associated with server i, as was

(t+1)

wj,i end

1172

(t+1)

=

wj,i PL (t+1) j=1 wj,i

Table 1: Computational Cost Comparison of Model-based and Meta-Learning Architectures Phases

Computational Cost

Training Phase

Model-based Architecture     M [O α2 N + O α3 + O α2 N + O αN ]

Meta-Learning Architecture     M L[O α2 N + O α3 + O α2 N + O αN ]

Prediction Phase

 M [O α ]

  M L[O α + O 1 ]

Incremental Updation Phase

    M[O α2 k + O α3 + O α2 k + O αk ]

    ML[O αk + O k + O k + O k ]

The meta-learning architecture can be extended to incorporate DWM through an additional Dynamic Ensemble Updation phase described in Algorithm 3. This phase would be executed after the Incremental Updation phase of the meta-learning architecture (described in Algorithm 2). We may hereafter refer to the meta-learning architecture that includes Dynamic Ensemble Updation phase as MLCD. In Dynamic Ensemble Updation phase, after every k time steps we evaluate if the ensemble Ei associated with each server i needs to be evolved by adding and/or removing the base learners. We remove a base learner from ensemble Ei if any of the following two conditions is satisfied:(1) the weight of a base learner falls below a threshold value θ (Line 1.1.1), or (2) the mean prediction accuracy Πi,k of ensemble Ei over the past k time steps is less than an acceptable accuracy threshold τ (Line 1.1.2–1.1.3), in which case we remove the base learner which has the least weight in the ensemble (Line 1.1.3.1). Thereafter, we add the same number of base learners (Line 1.1.4–1.1.4.1) as were removed in Step 1.1.1 and 1.1.2. The new base learners can be trained on the most recent workload data of server i. We initialize the weights of new base learners to default value of 1 (Line 1.1.4.2) and then normalize the weights of all base learners in the resultant ensemble using Eq. 4 (Line 1.1.4.3). This dynamic updation of the ensemble associated with each server enables the holistic workload prediction mechanism to cope up drifting concepts in server workloads.

4.

COMPUTATIONAL COST COMPARISON AND EXPERIMENTAL EVALUATION

In this section, we perform detailed comparison study of the computational cost of proposed MB-A and the baseline MB-A, and illustrate the gains that can be achieved by the proposed ML-A without compromising significantly on the prediction accuracy. We also simulate concept drift scenarios and experimentally evaluate the prediction accuracy of the proposed algorithmic optimization MLCD.

4.1

Computational Cost Comparison of Metalearning and Model-based Architectures

Let N denote the total number of training samples, M denote the number of servers, L denote the size of the ensemble, k denote the periodicity of the incremental updation phase, and α denote the number of features in server workload model. The features of server workload model are typically derived from the utilization of physical attributes of a server, like CPU, I/O, memory, and network bandwidth. The number of features used in server workload model gen-

1173

erally increases with the complexity of workload under consideration. On the other hand, the size of the ensemble, i.e., L, is typically small, e.g., in our experiments presented in Section 4.2, it sufficed to have L = 6. To illustrate the difference in the computational cost of ML-A and MB-A, we consider a simple scenario where the prediction models are based on linear least squares regression [4], and the variants thereof. The computational cost   of training such a prediction model is O α2 N + O α3 +   O α2 N +O αN , and that of executing the trained model  is O α . It may be noted that we denote the training cost    of such prediction model as O α2 N + O α3 + O α2 N  +O αN , instead of denoting the same by its asymptotic  complexity O α2 N , in order to highlight the total number of computations involved in training the prediction model. We now analyze and compare the computational cost of each phase (i.e., training phase, prediction phase and incremental updation phase) of meta-learning and model-based architectures. The comparison results are presented in Table 1, where phase costs  the training   it can be seen that M [O α2 N + O α3 + O α2 N + O αN ] computations in the model-based architecture. In the meta-learning architecture, as we train an ensemble of L base learners for each server, the computational cost of the   trainingphase increases by a factor of L to M L[O α2 N + O α3 + O α2 N +  O αN ]. It may be noted that the training phase is executed just once, and hence the additional computational cost in the training phase of meta-learning architecture is incurred only once. In the next phase, i.e., the prediction  phase, the meta-learning architecture costs M L[O α ] computations for computing L workload predictions  (one by each base learner) for M servers, and M L[O 1 ] computations for compounding the predictions using Eq. 1. On the other hand, the model-based architecture incurs relatively lesser  cost of M [O α ] as it involves execution of only one prediction model per server. The additional computational cost incurred by the metalearning architecture during the training and prediction phase is more than offset in the incremental updation phase.  In this phase, the model-based architecture costs M [O α2 k +    O α3 + O α2 k + O αk ] computations as it needs to retrain M prediction models (i.e., one model per server). On the other hand, the meta-learning architecture costs M L[  O αk ] for computing L workload predictions for M servers,  M L[O k ] computations for compounding the predictions to  form final workload predictions using Eq. 1, M L[O k ] computations for updating the weights of L base learners for M servers using Eq. 2, and another M L[O k ] computations

(b) Analysis on (G10) Dataset

(a) Analysis on (G1) Dataset

(c) Analysis on (G120) Dataset

Figure 3: Prediction Accuracy Evaluation of ML-A and MB-A on Real Workload Datasets

for normalizing the weights of each of L base learners for M servers using Eq. 4. Since L is a fixed small number (e.g., 5, 8, 10), it is clear that the incremental updation phase of the meta-learning architecture costs significantly lesser computations for any non-trivial server workload model as compared to the incremental updation phase of the model-based architecture. This results in substantial computational savings over a period of time as the incremental updation phase is executed after every k time steps, unlike the training phase which is executed just once. Hence, the computational savings achieved in the incremental updation phase of the metalearning architecture accumulate with every k time steps, resulting in significant improvement in the green-nees factor of workload prediction mechanism over a period of time, which in turn leads to green-ness in the predictive workload analytics system of which the workload prediction mechanism forms an important building block.

4.2 4.2.1

period of five months. Each dataset is partitioned into training and test sets such that data for four months is used for training purpose, and data for one month is used for testing. We perform detailed analyses of the training dataset and based on the analyses results, we create prediction models (or base learners) using exponential smoothing, autocorrelation functions, and regression models [4] with varying parameters. For the incremental updation phase, we set the periodicity, i.e., k, as 6 hours such that the re-training of prediction models in MB-A, and the updation of weights of base learners in ML-A occurs after every 6 hours.

4.2.2

Prediction Accuracy Evaluation of MB-A and ML-A

We evaluate the workload prediction accuracy of ML-A on real datasets, using MB-A as the baseline. We consider the following 3 scenarios:(1) Scenario (G1) in which we evaluate the workload prediction accuracy of ML-A and MB-A for an individual server, (2) Scenario (G10) in which we analyze the accuracy of ML-A and MB-A on workload of a server cluster consisting of 10 servers; for each time step t, the workload of servers in this cluster is macro-averaged, i.e., averaged over all the servers, to form consolidated workload of the cluster at time t, and (3) Scenario (G120) in which we extend (G10) to include 120 servers in the cluster. The experimental results illustrating the workload predictions of ML-A and MB-A for (G1), (G10), and (G120) scenarios are shown in Fig. 3a, 3b, and 3c respectively. In each of the figures, we plot 6-hourly time steps for each day in the test dataset on the X-axis and the actual/predicted

Experimental Evaluation Data Sets and Experimental Setup

We evaluate the proposed architectural and algorithmic optimization approaches on two workload datasets: (1) A real-world dataset collected from an actual datacenter, and (2) a synthetic dataset which is generated by introducing different trends in the real dataset in order to simulate concept drift scenario. In these datasets, the workload model of servers is based on the actual CPU utilization of servers with hourly granularity. The datasets contain workload data of around 120 servers which were actively monitored for a

1174

Table 3: Prediction Accuracy Evaluation in Concept-drift Scenarios Workload Prediction Accuracy (in %)

MB-A ML-A MLCD

Before Concept Drift

After Concept Drift

Average

91.87 93.19 93.19

76.20 78.89 91.61

84.03 86.04 92.40

workload on the Y-axis. The time steps are displayed in the format hDD::HHi, where ‘D’ denotes day, and ‘H’ denotes hour. Since the workload predictions by ML-A and MB-A overlap with the actual workload for many data points in the test dataset, the actual workload and predicted workload series in each of the figures show significant overlap as well. For ease of comprehension, we summarize the prediction accuracy evaluation results of ML-A and MB-A for (G1), (G10), and (G120) scenarios in Table 2. In Fig. 3a, we see distinct periodic rise and falls in the workload of the server in (G1) scenario. This is so because the workload of the server in (G1) has two distinct patterns: (1) for most of the days, the workload from 18 hours to 06 hours next day is considerably lower than the workload during rest of the hours, (2) the workload also shows significant dip for days at the end of most of the weeks. The MB-A performs reasonably well on this dataset and, as shown in Table 2, it has average prediction accuracy (i.e., when averaged over all time steps in the test dataset) of 88.8%. The ML-A too has prediction accuracy of 87.5% which is comparable to that of MB-A. The minor decrease in the prediction accuracy of ML-A is almost negligible when compared to the green-ness, by way of reduced computational cost, that can be achieved by using ML-A (as was described in Section 4.1). In Fig. 3b, we plot macro-averaged workload of the server cluster in (G10) scenario (consisting of 10 servers), and the workload predictions of ML-A and MB-A for the cluster. The hourly patterns that existed in (G1) scenario can be seen in this scenario as well, as is evident in the periodic rise and fall in the consolidated workload of the cluster. In this scenario, the MB-A achieves workload prediction accuracy of 91.95% while ML-A achieves accuracy of 91.9%, as shown in Table 2. In Fig. 3c, we present the macro-averaged workload and the workload predictions of ML-A and MB-A in (G120) scenario where we consider a larger cluster consisting of 120 servers. We see that the magnitude of rise and falls in the cluster workload is relatively reduced as compared to that in (G1) and (G10) scenarios. This is expected as the hourly-workload fluctuations of one server could be evened out by other servers in a large cluster. As shown in Table 2, the workload prediction accuracy of MB-A in this scenario is 91.6% while that of ML-A is 91.92%, which is a small improvement of 0.32% over MB-A. These evaluation results show that the architectural optimization introduced in ML-A for achieving green-ness in workload prediction mechanism does not significantly impact the prediction accuracy, making it attractive for practical use in real-world predictive workload analytics systems.

4.2.3

Figure 4: Prediction Accuracy Analysis in Concept Drift Scenarios In order to evaluate the accuracy of proposed algorithmic optimization, i.e., MLCD, in handling concept drifts, we create synthetic dataset by using real workload dataset of a server as the basis and then artificially introducing concept drift in the server workload. In order to introduce this concept drift, we invert the trend of low workloads between 18 hours and 06s hour every day to periodic spikes, i.e., sudden increase, in workloads during those hours for the last 15 days in test dataset. In Fig. 4, we show the effect of this concept drift on the workload predictions of MB-A and ML-A, and the effectiveness of the proposed algorithmic optimization MLCD in handling such concept drifts. Since there is significant overlap in the series in Fig. 4, we provide a summary of workload prediction accuracy of MB-A, ML-A and MLCD before and after the concept drift separately in Table 3. In MB-A, for simplicity sake, we train exponential smoothing and moving average [4] based prediction models which could follow recent trends in the workload for the past k time steps. We see that this architecture works well before the concept drift (i.e., for the first 15 days in the test dataset) and achieves prediction accuracy of 91.87% as shown in Table 3, but after the concept drift in server workload (i.e., for the last 15 days), the prediction accuracy falls to 76.20%. In ML-A, we train an ensemble of base learners based on moving average and auto-correlation prediction models [4] that, apart from following recent trends, also capture the seasonal hourly trends in the workload. Before the concept drift, this architecture performs slightly better than MBA because of the presence of the auto-correlation models. However, the inverted trend introduced during the concept drift leads to decaying of the trained auto-correlation models, which in turn results in decrease in the weights of these models. Hence, the accuracy of the resulting compound predictions of the ensemble falls to 86.04%, as shown in Table 3. For evaluating MLCD, we use the same initial ensemble as was used in ML-A, and hence the workload predictions before the concept drift are same for MLCD and ML-A. Due to this, the ML-A and MLCD series overlap in Fig 4 for the first 15 days in the test dataset. After the concept drift, as the weights of trained auto-correlation models fall to less than 0.05,( i.e., θ = 0.05), the MLCD algorithm removes the base learners that were based on the initial auto-correlation models, and adds new ones in the ensemble that are trained on the workload data for the past 48 hours, and hence capture the inverted trend of the concept drift. Due to the addition

Prediction Accuracy Evaluation of MLCD in handling concept drifts 1175

of new base learners, the prediction accuracy of MLCD does not fall after the concept drift and stays at 91.61% which is much higher than 76.2% and 78.89% prediction accuracy of MB-A and ML-A respectively. This improvement is also reflected in the overall average prediction accuracy (i.e., averaged over the all time steps in the test dataset) shown in Table 3, where it can be seen that the MLCD achieves average accuracy of 92.4% which is an improvement of 8.1% over MB-A and 6.36% accuracy over ML-A. It may be noted that in the experimental evaluation of MLCD, we have demonstrated gain in the prediction accuracy for one server. But in large IT systems, concept drift happens for many servers over a period of time, and hence gains at a larger scale can be reaped using MLCD in such IT systems.

5.

ing green-ness in the prediction mechanism, with only minor decrease of 0.5–1.3% in the prediction accuracy. We also showed how the proposed architecture can be algorithmically optimized to handle concept drift scenarios in order to achieve improved green-ness by the workload prediction mechanism through robust workload predictions. These optimizations lead to significant improvement in the green-ness factor of holistic predictive workload analytics systems that are based on server workload predictions.

7.

[1] Advanced Configuration and Power Interface. http://www.acpi.info/, 2011. [Online; accessed 28-Sep-2012]. [2] D. Ardagna, C. Cappiello, M. Lovera, B. Pernici, and M. Tanelli. Active energy-aware management of business-process based applications. In Proceedings of the 1st European Conference on Towards a Service-Based Internet. Springer-Verlag, 2008. [3] A. Blum. Empirical support for winnow and weighted-majority algorithms: Results on a calendar scheduling domain. Machine Learning, 26(1):5–23, Jan. 1997. [4] G. Box, G. M. Jenkins, and G. Reinsel. Time Series Analysis: Forecasting and Control. Prentice Hall, 1994. [5] D. J. Brown and C. Reams. Toward energy-efficient computing. Communications of the ACM, 53(3):50–58, Mar. 2010. [6] J. S. Chase, D. C. Anderson, P. N. Thakar, and A. M. Vahdat. In Proceedings of the 18th ACM Symposium on Operating Systems Principles, USA, 2001. [7] D. Gmach, J. Rolia, L. Cherkasova, and A. Kemper. Workload analysis and demand prediction of enterprise data center applications. In Proceedings of the 10th IEEE International Symposium on Workload Characterization, USA, 2007. [8] F. Kluge, S. Uhrig, J. Mische, B. Satzger, and T. Ungerer. Dynamic workload prediction for soft real-time applications. In IEEE 10th International Conference on Computer and Information Technology (CIT), UK, 2010. [9] J. Z. Kolter and M. A. Maloof. Dynamic weighted majority: An ensemble method for drifting concepts. J. Mach. Learn. Res., 8:2755–2790, Dec. 2007. [10] N. Littlestone and M. K. Warmuth. The weighted majority algorithm. Information and Computation, 108(2):212–261, Feb. 1994. [11] S. Naumann, M. Dick, E. Kern, and T. Johann. The GREENSOFT model: A reference model for green and sustainable software and its engineering. Sustainable Computing: Informatics and Systems, 1(4):294 – 304, 2011. [12] A. Sivasubramaniam, M. T. Kandemir, N. Vijaykrishnan, and M. J. Irwin. Designing energy-efficient software. In 16th International Parallel and Distributed Processing Symposium, USA, 2002.

RELATED WORK

Extensive work has been done in the domain of predictive workload analytics. However, most of the work is based on off-line pattern analysis of historical data. For instance, Gmach et al. [7] analyze and predict workload of datacenter applications based on statistical analysis of historical data. Kluge et al. [8] present a technique to predict the workload of single iterations of soft real-time application. Chase et al. [6] propose a method to dynamically allocate resources in large server clusters in order to improve the energy efficiency of the cluster. Such methods rely on the statistics of workload time-series (such as percentiles, auto-correlation, and peaks) to predict server workload, and hence need to be re-trained periodically in order to cope with changes in trend patterns of server workloads, and also to avoid any SLA violations regarding recency of workload predictions. As we described in Section 4.1, re-training or re-estimating the parameters of a statistical model on a periodic basis can be computationally expensive and increases the total computational cost of the holistic predictive analytics system. The proposed meta-learning architecture avoids such periodic re-training of the statistical prediction models, leading to substantial reduction in computational cost, without compromising significantly on the accuracy of workload predictions. Also related is the work on different aspects of designing energy-aware software systems. Sivasubramaniam et al.[12] investigate compilation techniques that can generate energy optimized code. Ardagna et al. [2] propose energy-aware resource allocation mechanisms and policies for service-oriented architectures by optimizing energy consumption at different layers of the system e.g., process layer, infrastructure layer, etc. Brown and Reams [5] present energy-optimization mechanisms within systems software via a power model for hardware system which can be used for resource-provisioning adjustments by software applications. Other innovations in this domain include hardware interfaces provided by Advanced Configuration and Power Interface (ACPI) [1] architecture that can be used by software systems for power management, and the GreenSoft model [11] for designing and engineering sustainable software systems, but a thorough survey of this work is beyond the scope of the paper.

6.

REFERENCES

CONCLUSION

In this paper, we proposed a novel meta-learning based architecture for workload prediction mechanism that substantially reduces the total computational cost, thereby achiev-

1176