Energy Performance of Heuristics and Meta-heuristics

J Supercomput manuscript No. (will be inserted by the editor)

Energy Performance of Heuristics and Meta-heuristics for Real-time Joint Resource Scaling and Consolidation in Virtualized Networked Data Centers Michele Scarpiniti · Enzo Baccarelli · Paola G. Vinueza Naranjo · Aurelio Uncini

Received: October 7, 2017 / Revised: December 28, 2017 / Accepted: January 4, 2018

Abstract In this paper, we explore on a comparative basis the performance suitability of meta-heuristic, sometime denoted as random-search algorithms, and greedytype heuristics for the energy-saving joint dynamic scaling and consolidation of the network-plus-computing resources hosted by networked virtualized data centers when the target is the support of real-time streaming-type applications. For this purpose, the energy and delay performances of Tabu Search (TS), Simulated Annealing (SA) and Evolutionary Strategy (ES) meta-heuristics are tested and compared with the corresponding ones of Best-Fit Decreasing (BFD)-type heuristics, in order to give insight on the resulting performance-vs.-implementation complexity trade-offs. In principle, the considered meta-heuristics and heuristics are general formal approaches that can be applied to large classes of (typically, non-convex and mixed-integer) optimization problems. However, specially for the meta-heuristics, a main challenge is to design them to properly address the real-time joint computing-plus-networking resource consolidation and scaling optimization problem. To this purpose, the aim of this paper is: (i) introduce a novel Virtual Machine Allocation (VMA) scheme that aims at choosing a suitable set of possible Virtual Machine (VM) placements among the (possibly, non-homogeneous) set of available servers; (ii) propose a new class of Random Search Algorithms (RSAs) denoted as consolidation meta-heuristic (CMH), considering the VMA problem in RSAs. In particular, the design of novel variants of meta-heuristics, namely TS-RSC, SA-RSC and ES-RSC, are particularized to the resource scaling and consolidation (RSC) problem; (iii) compare the results of the Michele Scarpiniti E-mail: [email protected] Enzo Baccarelli E-mail: [email protected] Paola G. Vinueza Naranjo E-mail: [email protected] Aurelio Uncini E-mail: [email protected] Department of Information Engineering, Electronics and Telecommunications (DIET), “Sapienza” University of Rome, Italy

2

Scarpiniti et al.

obtained new RSAs class against some state-of-the-art heuristic approaches. A set of experimental results, both simulated and real-world ones, support the effectiveness of the proposed approaches against the traditional ones. Keywords Resource consolidation · energy-saving · meta-heuristics optimization · Tabu Search · Simulated Annealing · Genetic algorithms · TCP/IP virtualized data centers · real-time streaming applications 1 Introduction Nowadays, an even increasing interest in applications running on the Internet implies a huge energy consumption in Data Centers (DCs) that always increase their size in order to cope with the required Quality of Service (QoS) [1]. This intensive use of Internet applications causes a huge waste of energy. In fact, although modern DCs are highly optimized, Google’s DCs consumed in 2013 the electricity of 200,000 homes [2–4]. This fact leads to huge operational costs and serious environmental impacts. For this purpose, both academy and industry have to identify a number of mandatory efforts in order to try to reduce such an energetic wastefulness [2]. Improved energy efficiency and flexibility, can be obtained by resorting to the virtualization of both computing and networking resources hosted by the DC [5]. In fact, in many models of on-demand cloud services, as Software-as-a-Service (SaaS), clients requests run in form of Virtual Machines (VMs). The consumed resources have to consider both the network and computing energies and should be metered on a per-VM basis under the SaaS model [1]. At this regard, we point out that the energy consumption of the large number of switches and routers equipping intra-data center networks is becoming a major concern. The analysis in [6] concludes that, in typical large-scale data centers by Google, the network energy is over 30% of the total one at server utilization below 70%, which is typical in production data centers. Today, Virtual Machine Consolidation (VMC) covers a plethora of methods for the energy-efficient resource management in DCs. This technique aims at turning ON a minimum number of VMs by selecting a proper operating frequency and at moving these VMs to a minimal number of physical servers, in order to minimize the total energy consumption [2, 7]. Obviously, an excess in the use of the consolidation could provide poor performance from a QoS point of view. Hence, VMC has to take into account specific constraints in order to avoid QoS violations [7]. In literature, there exist several approaches to VMC. In particular, the existing approaches can be categorized into five macro-groups with regard the used criterion [8]. VMC can be based on: (i) time of decision making (static, dynamic); (ii) considered parameters (hardware utilization, network traffic, cooling systems, etc.); (iii) used optimization method (exact methods, heuristics, meta-heuristics); (iv) particular objective function (minimizing the number of active hosts, minimizing QoS violations, maximizing resource utilization, etc.); and, (v) specific evaluation method (simulated or real-world environments). However, due to the high variability of the processed workloads and the increasing flexibility required in DCs, it is not possible to identify a winner approach.

Energy Performance of Heuristics and Meta-heuristics for Real-time Resource Consolidation

3

A list of the main used acronyms in the paper is provided in Table 1 in order to simplify the reading of the sequel of the paper. Table 1: List of the main acronyms used in the paper. Acronym

Description

BFD DC ES-RSC GA MBFD MDC QoS RSC RS-RSC SA-RSC TS-RSC VM VMA

Best-Fit Decreasing Data Center Evolutionary Strategy Resource Scaling and Consolidation Genetic Algorithm Modified Best-Fit Decreasing Maximum Density Consolidation Quality of Service Resource Scaling and Consolidation Random Search Resource Scaling and Consolidation Simulated Annealing Resource Scaling and Consolidation Tabu Search Resource Scaling and Consolidation Virtual Machine Virtual Machine Allocation

1.1 Main contributions of the paper In this paper, we focus on the dynamic energy-efficient scaling and consolidation of the networking-plus-computing virtualized resources available at the Middleware layer of networked TCP/IP-based virtualized DC [8, 5]. The considered reference scenario is outlined in Fig. 1, where remote clients exploit Internet connections in order to submit their workloads to the serving DC. According to the SaaS model, the DC is virtualized: client workloads are submitted in form of demands for VM processing/storage capacity, while reliable inter-VM communication is attained through endto-end overlay reliable TCP/IP connections. Furthermore, in order to effectively cope with the unpredictable peaks of the input workload, the Middleware layer of the data center is equipped with input/output queues (see Fig. 1). Since, in our framework, the input stream of Fig. 1 carries out homogeneous data, without loss of generality, the turned-ON VMs may be assumed homogeneous and working at a same processing speed. However, (i) the servers hosting the VMs may be heterogeneous, i.e., they may exhibit different consuming power and processing capabilities; (ii) the numbers of VMs actually hosted by each server may be different; and, (iii) the end-to-end communication transport paths of Fig. 1 may be composed by different number of hops and/or may be sustained by different multi-hop routes with heterogeneous transport capacities, energy consumptions and congestion levels. As a consequence, both the per-VM energies consumed for computing and communication purposes may vary from VM to VM and, which is the most, they may depend on the actually planned VM-to-server mapping. As a consequence, we anticipate that the resulting afforded joint resource allocation and consolidation constrained optimization problem is, by design, NP-hard.

4

Scarpiniti et al. Intput Workload from the IoT devices Input Queue

Virtual Switch

MV-th TCP/IP connection

Parallel TCP/IP Connections 1-th TCP/IP connection

Dynamic Workload Dispatcher

VM(1)

VM(MV)

Bank of VMs

Middleware Virtualization Layer

Physical Switched Network Physical Server #1

Physical Server #NS

Output Queue

Output Workload to the IoT devices

Fig. 1: The considered virtualized architecture. It operates at the Middleware layer of the corresponding protocol stack. VM:=Virtual Machine; MV :=number of available Virtual Machines.

The viable solutions proposed in this paper to the joint scaling and consolidation of virtualized networking-plus-computing resources hosted by the DC, are based on exploiting the characteristics of three meta-heuristic approaches, that we match to the consolidation problem. Comparisons with state-of-the-art greedy-type heuristic approaches will be also carried out, in order to validate the effectiveness of the proposed solutions. Specifically, the afforded resource management problem jointly embraces: (i) dynamic access control of the input workload; (ii) dynamic up/down scaling of the processing frequencies of the instantiated VMs; and, (iii) joint dynamic consolidation of both physical servers and TCP/IP connections. QoS guarantees are provided in form of hard (e.g., deterministic) upper bounds on the queue backlogs, while the pursued objective is the achievement of an optimized tradeoff among two contrasting tasks, e.g.: (i) low energy consumptions for the networking-plus-computing tasks; and (ii) low queue backlogs. Regarding the novelty of the technical contributions of this paper, it mainly relies on the peculiar aspects listed below. – Three meta-heuristics approaches, based on the Tabu Search (TS), Simulated Annealing (SA) and Evolutionary Strategy (ES) algorithms are proposed to be used for the resource scaling and consolidation (RSC) problem. The proposed solutions (named here TS-RSC, SA-RSC and ES-RSC), are novel variants of the previous meta-heuristics, properly fitted to the VMC.


5

– In order to find a good solution, the neighborhood in which a candidate solution can be searched is iteratively adapted by using the Rechenberg 1/5 success rule [9]. – A novel VM allocation procedure is proposed. It aims at choosing a certain number of possible VM placements among the (possibly, non-homogeneous) list of servers in the data center, in order to provide a limited number of starting configurations to the meta-heuristics. – Comparisons among the three proposed TS-RSC, SA-RSC and ES-RSC metaheuristic algorithms and some state-of-the-art greedy-type heuristics are provided, in order to validate the proposed approaches. The remainder of this paper is organized as follows. Section 2 presents an overview of the main related work. Section 3 describes the considered networked architecture, while Section 4 describes the energy consumption model. Then Section 5 deeply describes the proposed meta-heuristic approaches, and Section 6 reviews some heuristic approaches for comparison purposes. Afterwards, Section 7 details some peculiar implementation aspects of the proposed approaches. Section 8 presents some extensive and experimental results. Finally, Section 9 concludes the work and outlines directions for future research. 2 Related work The literature on VMC is vast. However, a large body of the existing works focuses on the introduction of new or modified heuristic approaches, like works in [2, 10–12]. Very few works are dedicated to meta-heuristic approaches. In addition, papers on meta-heuristics usually introduce approaches based on a single method, without a full comparison among different methods. Some few comparisons are recently presented in [13–15] that, differently from the proposed work, do not consider the networking aspects. Consolidation based on Tabu Search (TS) meta-heuristic is the topic of works in [16–19]. Specifically, [16] models the problem by using a mixed-integer programming model. In a similar way, [17] adopts TS to solve a Mixed Integer Linear programming (MILP) formulation for the robust VMs migration. It strikes a balance between resource availability and energy costs, while considering uncertainties on VMs resource demands. Also [18] uses an integer programming formulation in order to allocate data and application services over a distributed computing system. Finally, differently from previous works, [19] focuses on VM migration time rather than the minimization of energy consumption. Differently from these works in literature that are based on an integer programming formulation, in the proposed work: (i) we employ a continuous-valued formulation that is amenable of a real-time implementation; and, (ii) the searching interval in the tabu list is implemented by the Rechenberg success rule (see Section 5). Consolidation based on Simulated Annealing meta-heuristic is the topic of works in [20–22]. Specifically, [20] aims at increasing the overall cost-efficiency by reducing the number of active nodes: the goal is reached by a SA-based algorithm which solves the problem by evaluating the attractiveness of the possible VM migrations.

6

Scarpiniti et al.

Moreover, [21] attempts of using SA to solve the VM placement problem in order to optimize the power consumption by modeling the placement problem as a bin packing problem. Finally, [22] approaches the problem of VMs migration into two phases: in the first phase, a promising group of servers is identified by SA among the many choices that might be available; in the second phase, the final VM-to-Server mapping considering all low-level constraints arising from the particular user requests is decided. Differently from the previous works, where the SA is based on only two loops while the variance for the neighborhoods generation is fixed, in the proposed work: (i) we implement the SA by using three internal loops, that can provide better results in terms of convergence accuracy; and, (ii) we adapt the variance through the Rechenberg success rule. Finally, works in [23–26] describe approaches based on the Evolutionary Strategies. Specifically, [23] uses a Genetic Algorithm (GA) for the VMC, solved as a bin packing problem. The work in [24] presents a variant of GA, known as Grouping Genetic Algorithm (GGA), used to solve the server consolidation problem as a vector packing problem with conflicts, and to try to minimize the number of servers used for hosting applications within DC. In addition, [25], similar to our work, introduces a GA for the VM placement problem that considers the energy consumption of both servers and communication network. All the previous works employ traditional GAs to solve the problem. Differently, in the proposed work: (i) we chose the variant based on the Evolutionary Strategy (ES) idea; and, (ii) we use the Rechenberg success rule to adapt the permutation probability. Finally, [26] goes towards the direction of evolutionary algorithm. However, unlike our approach, the method in [26] uses a demand predictor to forecast the computing demand and it may be affected by error prediction errors, specially when the workload exhibits big data-like statistical features (e.g., it presents unexpected fluctuation).

2.1 Some recent contributions on the multi-constrained/multi-objective design of data centers In this subsection we provide some recent literature review on the design of DCs based on multi-constrained and multi-objective approaches, and we point out the main differences with respect to the proposed paper. The authors of [27] propose a two-level control system for the combined management of the static workload-to-VM assignment and VM-to-physical server mapping. Specifically, they formulate the “initial” VM placement problem as a multi-objective optimization problem in which the computing power consumption, memory resource usage and thermal dissipation are to be simultaneously minimized under the assumption of constant (e.g., time-invariant) input workload. For this purpose, the authors of [27] propose an improved genetic algorithm, in which suitable fuzzy multi-objective evaluation and ranking procedures are designed, in order to effectively deal with the considered (possibly contrasting) objectives. However, unlike our paper, in [27], the communication costs are not included in the objective functions and the considered VM placement problem is static, e.g., the workload to be processed by the available VMs is assumed time-invariant.


7

The salient feature of the approach proposed in [28] is to include the information about the inter-VM traffic flows generated by the running (possibly, multi-tier) applications in the formulation of the VM placement problem. Afterwards, in order to solve the resulting application-aware optimization problem, the authors of [28] develop a greedy-type VM migration algorithm that attempts to minimize the overall data center network traffic, while meeting the constraints imposed by the server computing capacities and the data center network topology. Overall, the VM migration/placement algorithm in [28] explicitly considers the network-induced resource costs, but, unlike our contribution, it does not perform the adaptive scaling of the processing speeds of the running VMs, in order to further reduce the resulting computing consumptions. The same conclusion also applies to the solving approach developed in [29]. Specifically, the authors of this contribution propose a VM placement algorithm that is capable to jointly account for multiple resource constraints, such as the CPU, memory and storage capabilities of the hosting physical servers as well as the available network link bandwidths. For this purpose, the proposed algorithm suitably combines the minimum-cut and best-fit principles, in order to improve the data center resource utilization and energy efficiency, while simultaneously minimizing both the numbers of the turned ON physical servers and network switches. Hence, like our contribution, the approach of [29] simultaneously considers computing and networking resource usages, in order to improve the resulting data center energy efficiency. However, unlike our contribution, the VM placement performed in [29] is “static” (e.g., the workload to be processed is assumed time-invariant) and, then, the processing speeds of the running VMs are not adaptively scaled at run-time. The distinguishing feature of the (more recent) contribution in [30] is the introduction of the trust dimension in the design of the VM migration/placement plan. Specifically, the authors of [30] develop a trust-based multi-agent middleware-layer architecture to assist the Cloud providers in planning inter-Cloud VM migration. The designed trust mechanism allows software agents to assist the Cloud providers in the evaluation of both performance penalties and loss of reputation which may be caused by errors in the profiles of the migrating VMs. The resulting multi-agent architecture is composed by two hierarchically organized sets of software agents. The higher-layer set is composed by agents that act on a per-data center basis. They periodically carry out the evaluation of trust information and, then, perform negotiation of the required inter-Cloud VM migrations. The lower-layer set is built up by software agents that operate on a physical server-basis. They perform at run-time local measurements of the actual profiles of the VMs to be migrated, and, then, disseminate the acquired VM profiles through a suitably designed gossip-based inter-agent communication protocol. Overall, like our contribution, the resulting greedy-type VM migration algorithm proposed in [30] accounts for the costs of both computing and networking resources. However, it assumes that the resource requirements of the migrating VMs do not vary over the time and, then, it does not consider the dynamic scaling of the corresponding VM processing speeds.

8

Scarpiniti et al.

3 The considered virtualized networked computing architecture In this section we details the main fundamental blocks of the considered virtualized networked architecture of Fig. 1. In this architecture, time is slotted, t is the discretevalued slot index, and the t−th slot spans the semi-open time-interval: [tTS , (t + 1)TS ), where TS (measured in seconds) is the time duration of each slot. Further, according to Fig. 1, the considered architecture is composed by: – an input buffer, that stores the workload received during each time slot. At the beginning of each slot, the input buffer is emptied and the input workload is processed by the computing platform of Fig. 1. In order to perform access control and guarantee hard bounds on the overall resulting processing delay, the storage max (measured in bit) of the input buffer is limited up to the maximum capacity: Sbu f workload that the computing platform of Fig. 1 may process during a single time max ≤ f max T , where f max (b/s) is the summation of the maximum slot, that is, Sbu S f processing frequencies of all available VMs of Fig. 1; – an output buffer, that stores the workload processed during each time slot. At the beginning of each time slot, the output buffer is emptied and the output workload is rendered to the clients; – a Dynamic Workload Dispatcher, that allocates the input workload, and dynamically reconfigures and consolidates the available computing-plus-networking physical resources, in order to minimize the consumed energy; – a Virtual Switch, that manages the TCP/IP transfer-layer connections on an endto-end basics and performs network flow control; – a bank of Virtual Machines that processes the workload received during each time slot. Moreover, in the proposed architecture, we have also that: (i) SSE (s), s = 1, . . . , SSE , is the set of the available physical servers; (ii) V M (s; v) indicates the v-th VM hosted by the s-th server; (iii) fs,v (t) is the processing rate of V M (s; v) at slot t; (iv) Ls,v (t) max is the maximum prois the workload processed by V M (s; v) at slot t; and, (v) fs,v cessing speed (in (bit/s)) of V M (s; v). 4 Formal description of the afforded optimization problem With reference to the proposed VM-based virtualized architecture of Fig. 1, let Evidle (s) and Evmax (s) be the idle and maximum energies (measured in (Joules)) wasted by the v-th VM hosted by the s-th physical server. In our approach, the cost function is the com , the communication energy sum of three contributions: the computing energy Es,v net sw Es,v and the connection/switching energy Es,v . Specifically, the computing energy com (t) (Joule) consumed by the VM during the t-th slot may be modeled as in [31]: Es,v fs,v (t) w max idle ON com idle E (s) − E (s) , (1) Es,v (t) = Ev (s)u−1 fs,v (t) − fs,v + v v max fs,v where: (i) u−1 (·) is the unit-step Heaviside’s function (i.e., u−1 (x) ≡ 1 for x ≥ 0, and ON (b/s) is the minimum processing frequency needed to u−1 (x) ≡ 0 otherwise); (ii) fs,v


9

ON ∼ 10−2 × f max turn ON the v-th VM hosted by the s-th physical server. Typically, fs,v = s,v [1]; and, (iii) the (dimension-less) power exponent w fixes the rate of increment of the energy consumed by the VM for increasing values of the corresponding processing frequency fs,v (t). Typically, w ≥ 2 [1]. Furthermore, the first (resp., second) term at the right-hand-side of Eq. (1) accounts for the static (resp., dynamic) energy consumed by the considered VM and it may be reduced by performing VMC (resp., dynamic scaling of the processing frequency fs,v (t)). net (t) (Joule) consumed by the (s, v)-th TCP/IP Passing to consider the energy Es,v connection, several formal analysis and numerical measurements support the conclusion that, when the connection operates in the Congestion Avoidance state, its energyvs.-TCP throughput relationship may be well described by the following power-like formula (see, for example, [32] and references therein): Ls,v (t) γ net net setup ON Ωs,v . (2) Es,v (t) = Es,v u−1 Ls,v (t) − Ls,v + TS setup In Eq. (2), we have that: (i) Es,v (Joule) is the energy consumed for the setup of the (s, v)-th connection. It mainly depends on the actually adopted Fast/Giga-Ethernet switching technology; (ii) since Ls,v (t) (measured in bit) is the volume of the traffic conveyed by the (s, v)-th connection at slot t, the ratio: Ls,v /TS (measured in (b/s)) ON , f ON × T (measured in bit) is is the corresponding conveyed throughput; (iii) Ls,v S s,v the minimum traffic that the (s, v)-th connection must convey, when it is turned ON; (iv) the (dimension-less) exponent γ fixes the rate of the increment of the energy consumed by the (s, v)-th connection for increasing values of the sustained throughput. net ((Joule)/(b/s)γ ) is the energy profile of Typically, γ ∼ = 1.2 − 1.4 [31]; and, (iv) Ωs,v the considered connection, that, in turn, depends on the actually adopted Ethernettype switching technology [1]. Before proceeding, we also stress that switching from the processing frequency: fs,v (t − 1) (i.e., the processing frequency of the (s, v)-th VM at slot (t − 1)) to the processing frequency: fs,v (t) (i.e., the corresponding processing frequency at slot t) sw (t) (Joule). Although its actual value depend on the induces an energy overhead Es,v adopted switching technique, it may be typically modeled as in [32]: sw Es,v (t) = ke | fs,v (t) − fs,v (t − 1)|β ,

(3)

where: (i) ke ((Joule)/(Hz)β ) is the energy cost induced by a unit-size frequency switching. Typically, it is limited up to few hundreds of (µJ)s per (MHz) [32]; and, (ii) β is a dimension-less power exponent, with β ∼ = 2 [32]. Overall, by summing the energy contributes in Eqs. (1), (2) and (3) over the full set of the MV available VMs and NS physical servers, we obtain the total energy ET (t) (Joule) consumed by the considered platform of Fig. 1 at slot t, that is formally defined as: NS MV net sw com (t) + Es,v (t) + Es,v (t) . (4) ET (t) , ∑ ∑ Es,v s=1 v=1

In order to formally introduce the afforded optimization problem, let LT (t) be the overall workload (measured in bit) that is drained from the input queue of Fig. 1 at

10

Scarpiniti et al.

the beginning of slot t (i.e., the total workload to be processed by the set of available VMs of Fig. 1 during the t-th time slot). Hence, the constrained resource management problem to be solved at slot t is stated as follows: min

{Ls,v (t), fs,v (t)}

ET (t),

(5.1)

s.t.: LT (t) − fs,v (t) −

L (t) ∑ ∑ s,v = 0,

s v max fs,v ≤ 0,

Ls,v (t) − TS fs,v (t) ≤ 0, fs,v (t) ≥ 0,

max Ls,v

≥ 0,

(5.2)

∀s, v,

(5.3)

∀s, v,

(5.4)

∀s, v.

(5.5)

Four main explicative remarks are in order about the formulation of the considered optimization problem. First, the problem is continuous-valued but, due to the presence of the Heaviside function in Eq. (2), it is not convex neither differentiable. Second, the constraint in Eq. (5.2) guarantees that all the input workload LT (t) is acmax the current tually processed by the VMs of Fig. 1. Third, Eq. (5.3) limits up to fs,v processing frequency fs,v (t) of the (s, v)-th VM. Fourth, Eq. (5.4) forces the processing frequency fs,v (t) to be high enough to allow the VM to process the assigned workload Ls,v (t) within a (single) time slot TS . Hence, since the delays induced by the input and output buffer of Fig. 1 are limited up to two slot times by design, we conclude that the total per-task queue-plus-networking-plus-computing delay introduced by the proposed virtualized platform of Fig. 1 is limited in a hard (i.e., deterministic) way up to 3 × TS seconds. This confirms, in turn, that the proposed platform is capable to support real-time applications, such as, for example, delay and delay-jitter sensitive audio/video streaming applications. 5 The proposed Meta-heuristic approaches In this section, we introduce three random search meta-heuristic approaches aiming at the solution of problem in (5). Specifically, the proposed approaches are based on the Tabu Search (TS), Simulated Annealing (SA) and Evolutionary Strategy (ES) metaheuristics [33], fitted to the afforded problem. The obtained algorithms are named, respectively, Tabu Search Resource Scaling and Consolidation (TS-RSC), Simulated Annealing Resource Scaling and Consolidation (SA-RSC) and Evolutionary Strategy Resource Scaling and Consolidation (ES-RSC). The first two approaches (TS and SA) are single-solution based meta-heuristics that work by improving a single solution. They could be viewed as “walks” through neighborhoods or search trajectories through the search space of the afforded optimization problem. The walks (or trajectories) are built up by iterative procedures that move from the current solution to another one in the search space. These new candidate solutions are also called moves. The third approach (ES), instead, can be viewed as an iterative improvement in a population of solutions: first, the population is initialized; then, a new population of solutions is generated according to some rules;


11

finally, this new population is integrated into the current one by using some selection procedures. In all the proposed approaches, the search process is stopped when a given stopping criterion is met. For a general overview of the previous three meta-heuristics, we refer to the general literature, as for example [33, 34]. At this regard, we observe that, nevertheless, although a general convergence result states that a global optimum can be found, there is not well suited as to generate new moves. Rechenberg [9] proposed his “1/5 success rule” to chose the best standard deviation value, in order to select a suitable neighborhood region over which generate new moves. Specifically, the rule is applied every Lsr iterations and it is stated as [9]:   cd σn , if ϕ(k) < 1/5, σn+1 = ci σn , if ϕ(k) > 1/5,   σn , if ϕ(k) = 1/5,

(6)

where 0 ≤ ϕ(k) ≤ 1 denotes the number of success ratio during the last Lsr iterations, and ci > 1 and cd < 1 regulate the increase and decrease rates of the standard deviation for the new moves generation.

5.1 Tabu Search Resource Scaling and Consolidation The first meta-heuristic approach is based on the Tabu Search (TS) optimization technique. TS has been developed independently by Glover [35] and Hansen [36]. It consists in a sort of iterative search and it is characterized by the use of a memory, i.e., the tabu list, that stores a number of successfully moves. The presence of this memory enables to search beyond a local minimum, hence giving the ability to reach a global minimum. The iterative procedure generates, at every iteration, a number K of neighbor possible solutions in the feasibility set and the best one is selected as the best current solution f˜v (t). The accepted solution is stored in the tabu list of length LT S , so that the region of space related to this solution is not revisited in next iterations. The best solution f˜v (t) in the k-th iteration is compared with the global solution ∗ fv (t), and in case it produces a better result, it is accepted as the new global solution. After a total number of iterations Nit the TS algorithm is stopped. Since our problem concerns continuous-valued variables (see, Eq. (5)), the search in the tabu list should be made in a symmetric interval between each stored tabu solutions. The width of this interval is controlled by a fraction α the standard deviation σm that is obtained by the online mean of the standard deviation σ evaluated in each iteration by the Rechenberg rule in (6). This last rule is implemented by checking how many new best solutions fv∗ (t) have been obtained in the last Lsr = 15 iterations. Hence, if more than three solutions are classified as the global best, the standard deviation is increased. The length LT S of the tabu list controls the memory of the tabu search. Once the list is full, each new move overwrites the oldest one.

12

Scarpiniti et al.

5.2 Simulated Annealing Resource Scaling and Consolidation Another powerful scheme of global optimization is provided by the Simulated Annealing (SA) algorithm, proposed by Kirkpatrick et al. in [37]. It is based on the analogy with the annealing of a solid in mechanical statistics. The algorithm consists of a sequence of iterations [38]. Each iteration involves some random changes of the current solution in order to create a new solution fv (t) in the neighborhood of the current solution. After a new solution is created, the corresponding change ∆ E in the cost function is computed to decide whether the newly produced solution can be accepted as the current solution f˜v (t). If the change ∆ E in the cost function is negative, the newly produced solution is directly taken as the current solution f˜v (t). Otherwise, it is accepted according to Metropolis’s criterion that, in turn, it is based on the following Boltzman’s probability: p (∆ E) = e−

∆E T

,

(7)

where T is the temperature parameter that controls the speed of the annealing process. According to the Metropolis’s criterion, if the difference ∆ E between the cost function values of the current and the newly produced solutions is equal to or larger than zero, a random number δ ∈ [0, 1] is generated from a uniform distribution. Afterwards, if δ ≤ p (∆ E), then the newly produced solution is accepted as the current solution. If not, the current solution remains unchanged. Then, this accepted new solution f˜v (t) in the k-th iteration is compared with the global solution fv∗ (t), and, in case it produces a better result, it is accepted as the new global solution. While the temperature T is kept fixed, a certain number of iterations Nit is performed. In each iteration, K random possible solutions are evaluated. These random solutions are generated over a circular neighborhood which radius is controlled by the standard deviation σm computed as the online mean of the standard deviation σ , that is evaluated in each iteration by the Rechenberg rule in (6). This last rule is implemented by checking how many new best solutions fv∗ (t) are obtained in the last Lsr = 15 iterations. After that, the temperature is decreased. Although there exist several criteria to decrease this parameter, we adopt the following one: Tk+1 = rTk ,

(8)

with r < 1 be a parameter that controls the speed in the temperature decreasing, starting from an initial value T0 . In principle, if the temperature is slowly decreased, a global minimum can be reached. A total of NT changes in the temperature profile are evaluated before SA is stopped. The number NT should be great enough in order that the final temperature is close to zero to ensure a vanishing probability to accept other solutions. 5.3 Evolutionary Strategy Resource Scaling and Consolidation The last considered technique is based by on the GA introduced by [39]. A GA models a natural evolution, because the operators it employs are inspired by the natural


13

evolution process. These operators, known as genetic operators, manipulate individuals in a population over several generations to gradually improve their fitness. However, in literature there exist several variants of GAs. In this work, we focus our attention on a particular Evolutionary Strategy (ES) that requires the sorting of an initial population P of size NP based on the fitness of its components. In this type of meta-heuristic approach, each individual into the population is a real-valued number representing a candidate of the operating frequency fv (t) of the VMs. The fitness function used to sort the population is the total energy consumption in Eq. (4). Then, the population is doubled into two subsets of the same size. The first subset Pf , is populated by the same individuals of the initial population P, while individuals in the second subset Pe are processed by the crossover and mutation operators applied to the initial population P. The entire population obtained by joining Pf and Pe is subsequently sorted based on the fitness function in (4) and the best NP individuals are selected to form the new population in P. The process continues for a total number Ng of generations (iterations). The top individual in the population P of the last generation will be the global solution fv∗ (t). Since in our framework, each individual is a real-valued continuous number, the crossover and mutation operators should be carefully defined. Specifically, the crossover is defined by substituting the even and odd individuals with the following combination of them: Pe (ko ) = P (ko ) + αc (P (ko ) − P (ke )) , Pe (ke ) = P (ke ) − αc (P (ko ) − P (ke )) ,

(9)

where ko represents the k-th odd individual in the population P, ke represents its kth even individual and αc < 1 is a suitable parameter that controls the fraction of crossover. The mutation operator is defined as: Pe (k) = P(k) + N 0, σg2 ,

(10)

where Pe (k) is the k-th individual of the population Pe and N 0, σg2 is a random generated value with a Gaussian distribution with zero mean and variance σg2 . The value of the standard deviation σg is updated in each iteration by using the Rechenberg rule in (6), based on the last Lsr = 15 iterations. In case of the crossover and mutation generate infeasible individuals, these latter are projected inside the feasible set [ fvmin (t), fvmax ] by the following rules:

Pe (k) =

 min  Pe (k)+2fv (t) , if

Pe (k) > fvmax ,

 Pe (k)+ fvmax

Pe (k) < fvmin (t)

2

,

if

(11)

Since the parameter αc < 1 and the variance σg2 is small enough, the new solution found with eq. (11) is always feasible.

14

Scarpiniti et al.

5.4 Algorithms’ initialization All the proposed meta-heuristic algorithms are initialized by the same set of values. For each slot t, the feasibility set is [ fvmin (t), fvmax ], where the lower bound fvmin (t) is the minimum operating frequency that guarantees the elaboration of the load in that slot. It is evaluated as in: fvmin (t) ,

MV

LT (t) , TS − TSON − TVON M

(12)

where TSON and TVON M are the times needed to turn ON a server and a VM, respectively. The initial solution fv0 (t) is chosen in the middle of the range [ fvmin (t), fvmax ], i.e.: fv0 (t) ,

fvmax + fvmin (t) . 2

(13)

At the beginning of each algorithm, the iterative best solution and the global best solution are set to fv0 (t), that is, fv∗ (t) = f˜v (t) = fv0 (t). The initial standard deviation σ (t) at slot t is set, for TS and SA, to half of the range [ fvmin (t), fvmax ], i.e.: σ (t) ,

fvmax − fvmin (t) . 2

(14)

At the k-the iteration, the mean value of the standard deviation is evaluated by the value σˆ k obtained by the Rechemberg rule (6) as: σk =

kσk−1 + σˆ k . k+1

(15)

In the ES algorithm, the initial value of σ (t) at slot t is set to: σ (t) ,

fvmax − fvmin (t) , 2NP

(16)

where NP is the size of the considered population P. The initial population P is generated by individuals represented by NP VM processing frequencies that are assumed uniformly distributed over the interval [ fvmin (t), fvmax ]. 5.5 The proposed server allocation procedure The allocation algorithm aims at finding the “best” placement of the set of VMs. Since servers in a data center may be heterogeneous, some allocations can be more energy-efficient than others with respect to the power consumption of the selected host servers. In this work, we propose an heuristic random search algorithm that at each iteration tests different allocations and then selects the allocation that has shown the lower energy consumption. The total number of these different allocations is denoted by Nall . In particular, the VM placements follow the below patterns:


15

– The first configuration attempts to minimize the total number of turned ON servers NSON and therefore to gather all the virtual machines needed on the first servers. The minimum number of needed servers is: ON NV M (t) NSON = , (17) min Mmax where Mmax is the maximum number of VMs that can be hosted by a single physical server. Hence, the first NSON − 1 server will be completely full, while the last min one gathers the remaining VMs. – The second placement adopts the opposite approach: it uses every server as little as possible, so that the number of turned ON servers will be maximum, as dictated by the number of turned ON VMs. – The remaining Nall − 2 allocations randomly select a number of servers equal to the minimum value in (17) multiplied by a given tolerance controlled by a parameter δ ∈ [0, 1], i.e.: NSON = (1 + δ ) NSON . min

(18)

By suitably tuning the value of δ , the number of servers is constant but VMs are randomly allocated to the servers. As the number of allocations increases, more VM allocations are tested: this procedure tends to an optimal solution for increasing values Nall of the tested allocations.

5.6 The proposed dynamic selection of the consolidation slots In agreement with the dynamic nature of the afforded problem, we adopt a dynamic approach for the selection of the consolidation slots [40], that is based on the online evaluation of the per-slot average utilization of the turned-ON VMs. Specifically, after indicating by |S (t)| the size of the set S (t) (e.g., the number of turned ON VMs at slot t), the current average utilization: ∗ 1 f (t) U (t) , (19) ∑ fvvmax , |S (t)| v∈S (t) of the currently turned ON VMs is computed, together with the exponentially weighted moving average prediction of the expected next utilization: c(t + 1) = λ U c(t) + (1 − λ ) U (t), U

(20)

where the scalar λ is the predictor coefficient [40]. Afterward, as in [40], the following l-out-m decision rule is applied: the next slot (t + 1) is flagged as a consolidation slot if at least l-out-m most recently measured utilizations in (19) as the predicted one in (20) fall out of a target interval: I , [T hL , T hU ], where T hL (resp., T hU ) is the lower (resp., upper) desired value of the average utilization.

16

Scarpiniti et al.

5.7 Implementation of the proposed algorithms All the considered optimization algorithms return four values, namely: 1. the optimum frequency for each time slot f ∗ (t); 2. the number NVON M (t) of VMs that are turned ON in each time slot t, that is: & ' LT (t) ON ; NV M (t) = f ∗ (t) TS − TSON − TVON M 3. the number NSON (t) of server that are turned ON in each time slot t, i.e.: ON NV M (t) ; NSON (t) = Mmax

(21)

(22)

4. the average of the total consumed energy E T , obtained as the average of (4) over all time slots. Moreover, each VM that is turned ON should process at slot t a load l(t) given by: l(t) =

LT (t) . NVON M

(23)

The pseudo-codes of the algorithms implementing the TS-RSC, SA-RSC and ESRSC approaches are presented in the following Algorithms 1, 2 and 3. The proposed meta-heuristic approaches will be compared with the two heuristics described in Section 6 and a scaling and consolidation approach based on a simple Random Search (RS-RSC) [34]. The RS-RSC algorithm tries N f different frequency solutions randomly extracted from the feasibility set [ fvmin (t), fvmax ] and selects the solution that exhibits the lower energy consumption according to Eq. (4). Also, the number NSON (t) of needed servers at slot t in Eq. (18) are randomly selected from the list of available servers. 6 Benchmark Heuristic approaches In literature, there exist several heuristic algorithms to approach the solution of the optimization problem in (5). Specifically, in the next subsections we described the Maximum Density Consolidation (MDC) [11] and the Modified Best-Fit Decreasing (MBFD) consolidation algorithms [40], chosen as the benchmark approaches to test and compare the performance of the proposed meta-heuristics. 6.1 The MDC heuristic The Maximum Density Consolidation (MDC) heuristic [11] is a quite common simpleto-implement consolidation strategy. It assumes that, at each slot t, the turned ON physical servers are utilized by the maximum, in order to minimize the number of running physical servers. At each slot t, the minimum number of the physical servers


17

Algorithm 1 – Pseudo-code of the proposed TS-RSC Input: LT , Mmax , NS , Nit , K ON Output: E T , fv∗ (t), NVON M (t), NS (t)

1: for t ≥ 1 do 2: Initialize: f˜v (t) = fv∗ (t) = fv0 (t) = ( fvmin (t) + fvmax )/2; 3: for k = 1 : Nit do 4: Allocate all VMs; 5: Compute neighborhoods F = { f1 , f2 , . . . , fK }; 6: Check feasibility of F ; 7: for j = 1 : K do j 8: if fv is not tabu then 9: Allocate all VMs; 10: Evaluate Energies; 11: end if 12: end for 13: Select the best move f˜v (t); 14: Compute energy for f˜v (t); 15: if f˜v (t) is better than fv∗ (t) then 16: fv∗ (t) ← f˜v (t); 17: Add a success count; 18: Tabu list ← fv∗ (t); 19: end if 20: end for 21: q = ∑ success; 22: σ = Rechenberg(q); 23: Update σm through Eq. (15); c(t + 1) through Eqs. (19) and (20); 24: Update U (t) and U

ON 25: Evaluate NVON M (t) and NS (t) through Eqs. (21) and (22); 26: Evaluate ET (t) through Eq. (4); 27: end for 28: Evaluate the average E T over all time slots.

which is needed for processing the current workload LT (t) is turned ON, starting from the physical server with the lower energy consumption. All the VMs hosted by the selected servers are turned ON and the workload LT (t) is evenly assigned to the turned ON VMs. Formally speaking, the MDC heuristic works according to the following steps: – at each slot t, it computes the minimum number of physical servers which is needed to process the currently submitted workload LT (t); – according to the server list, it turns ON the needed most energy-efficient servers; – it evenly splits the current workload over the VMs hosted by the turned ON physical servers.

6.2 The MBFD heuristic The Modified Best-Fit Decreasing (MBFD) heuristic [40] aims at finding the best placement of the available MV VMs, in order to efficiently cope with the time-variation of the workload LT (t) to be processed at the consolidation slot t.

18

Scarpiniti et al.

Algorithm 2 – Pseudo-code of the proposed SA-RSC Input: LT , Mmax , NS , Nit , NT , K, T0 , r ON Output: E T , fv∗ (t), NVON M (t), NS (t)

1: for t ≥ 1 do 2: Initialize: f˜v (t) = fv∗ (t) = fv0 (t) = ( fvmin (t) + fvmax )/2, T = T0 ; 3: for i = 1 : NT do 4: for k = 1 : Nit do 5: Allocate all VMs; 6: for j = 1 : K do j 7: Compute neighborhood fv (t); j 8: Check feasibility of fv (t); 9: Allocate all VMs; j 10: if fv (t) is better than f˜v (t) then j 11: f˜v (t) ← fv (t); 12: else j 13: Accept fv (t) with probability p in (7); 14: end if j 15: if fv (t) is better than fv∗ (t) then j 16: fv∗ (t) ← fv (t); 17: Add a success count; 18: end if 19: end for 20: end for 21: Ti+1 = rTi ; 22: end for 23: q = ∑ success; 24: σ = Rechenberg(q); 25: Update σm through Eq. (15); c(t + 1) through Eqs. (19) and (20); 26: Update U (t) and U ON 27: Evaluate NVON M (t) and NS (t) through Eqs. (21) and (22); 28: Evaluate ET (t) through Eq. (4); 29: end for 30: Evaluate the average E T over all time slots.

The MBFD heuristic directly assumes that we have MV ≥ 1 VMs, which must operate at the frequencies f1 , f2 , . . . , fMV , with MV and { f1 , f2 , . . . , fMV } already assigned. These VMs should be allocated over NS ≥ 1 physical servers. The s-th server has the capacity to host up to Mmax (s), 1 ≤ s ≤ NS , VMs. The final goal of the MBFD is: (i) to reduce the number of turned ON servers; and, (ii) to lower the overall energy consumption of the VM placement. In order to attempt to meet this twofold objective, the MBFD heuristic works as follows: – it sorts the MV VMs in decreasing order of their operating frequencies, that is f1 ≤ f2 ≤ . . . ≤ fMV ; – hence, it iterates once through the sorted VMs: – for VM v, the MBFD heuristic computes the set Hv = {s : Ms (v) < Mmax (s)}, 1 ≤ s ≤ NS , of servers having still free capacity to accommodate the v-th VM; – then, it iterates over Hv and selects the physical server that consumes the lowest computing energy when VM v-th is allocated it.


19

Algorithm 3 – Pseudo-code of the proposed ES-RSC Input: LT , Mmax , NS , Ng , NP ON Output: E T , fv∗ (t), NVON M (t), NS (t)

1: for t ≥ 1 do 2: Initialize: P0 (t) = fvi (t) , fvmin (t) ≤ fvi (t) ≤ fvmax , i = 1, . . . , NP ; 3: Allocate all VMs; 4: Rank P0 (t) based on lowest Energy; 5: f˜v (t) ← best fvi (t) ∈ P0 (t); 6: for k = 1 : Ng do 7: Double population Pk (t) into two parts: Pf and Pe ; 8: Crossover on Pe through Eq. (9); 9: Mutation on Pe through Eq. (10); 10: Check feasibility of Pe ; 11: Join Pf and Pe ; 12: Allocate all VMs; 13: Compute Energies; 14: Rank the joint population based on lowest Energy; 15: Select the best NP individuals in new population Pk+1 (t); 16: f˜v (t) ← best fvi (t) ∈ Pk (t); 17: if f˜v (t) is better than fv∗ (t) then 18: fv∗ (t) ← f˜v (t); 19: Add a success count; 20: end if 21: end for 22: q = ∑ success; 23: σ = Rechenberg(q); 24: Update σm through Eq. (15); c(t + 1) through Eqs. (19) and (20); 25: Update U (t) and U

ON 26: Evaluate NVON M (t) and NS (t) through Eqs. (21) and (22); 27: Evaluate ET (t) through Eq. (4); 28: end for 29: Evaluate the average E T over all time slots.

7 Implementation aspects In this Section, we address several aspects related to the implementation of the proposed meta-heuristic approaches. So doing, we also point out some possible generalizations of the considered application scenario.

7.1 Computational complexity and real-time implementation The computational asymptotic complexities of the three proposed meta-heuristic approaches in Algorithms 1 – 3 for each time slot, compared with the heuristic ones of Section 6, are shown in Table 2. In order to provide a fair comparison between the different tested approaches, Table 3 shows the normalized execution times of all implemented algorithms. All times are normalized with respect to the fastest algorithm, i.e., the MDC one. Since we used a time slot of Ts = 1 (s) (see Table 5 in Section 8.1), the problem can be solved in real-time until the resource and scaling consolidation procedure con-

20

Scarpiniti et al.

Table 2: Computational asymptotic complexities of the proposed consolidation approaches and benchmark heuristics. Approach

Complexity

TS-RSC SA-RSC ES-RSC

O (Nit K) O (Nit NT K) O (Ng NP ) O Nf O (1) O (NS Mmax )

RS-RSC MDC MBFD

Table 3: Normalized execution times of the tested heuristics/meta-heuristics. Approach

Normalized Time


363 3584 1640

RS-RSC MDC MBFD

110 1 5

sumes a small fraction of Ts . It has been numerically ascertained that, at least in the carried out tests in [12], the consolidation time should be less than about 100 (ms) (that is the fraction should be less than Ts /10). In order to give an idea of consolidation times, Table 4 provides the numerically measured execution times of all the R CoreTM i7-7700 64 bit Processor at 3.60 GHz implemented algorithms on an Intel and 8 MB Cache with 4 GB available. On the used processor, as can be seen in Table 4, the base heuristic algorithm (MDC) require about 33 (µs/slot). Hence, the ESRSC requires about 54 (ms/slot) and it can be performed in real-time. For the worst case (SA-RSC) we have about 118 (ms/slot), and in this case the consolidation can be performed in real-time very difficultly since its execution time is slightly larger than Ts /10.

7.2 Managing the turning OFF/ON delay Turning ON a VM may require some time, specially when the corresponding TCP/IP connection and hosting physical server must be also turned ON [1, 40]. Let TVON M be the time needed to turn ON a VM and the associated TCP/IP connection. Hence, since the corresponding time available for the data transport is reduced by a fraction equal to TVON M /TS , we have that: ! TVON M (24) ∑ rv (t) + 1 − TS ∑ rv (t) ≤ rmax , v∈S (t−1) v∈S / (t−1)


21

Table 4: Numerically measured execution times (ms) of the tested heuristics/metaheuristics on the used processor. Approach

Time (ms)


12.000 118.272 54.120

RS-RSC MDC MBFD

3.630 0.033 0.165

where rv (t) is the flow of the v-th TCP/IP connection at slot t and rmax is its maximum value. The constraint in Eq. (24) applies when t is a consolidation slot, so that the first (resp., second) summation is over the set S (t − 1) of VMs that were turned ON (resp., turned OFF) at the previous slot (t − 1).

7.3 Managing Virtual-to-Physical resource mapping Task of the Virtualization layer of Fig. 1 is to map the demands for the per-connection flows and per-VM processing frequencies done by the Middleware layer into adequate channel bandwidths and CPU cycles at the underlying Network and Server layers. These mappings may be performed by equipping the Virtualization layer of Fig. 1 by the so-called mClock and SecondNet mappers in [41] and [42], respectively. Specifically, Table 1 of [41] points out that the mClock mapper is capable to guarantee CPU cycles on a per-VM basis by adaptively managing the computing power of the underlying DVFS-enabled (possibly, multi-core) physical servers. Likewise, the SecondNet network mapper provides Ethernet-type contention-free links atop any set of TCP-based (possibly, multi-hop) end-to-end connections by resorting to a suitable Port-Switching based Source Routing [42]. As pointed out in [41] and [42], both mClock and SecondNet mappers may be implemented by exploiting the primitive functionalities provided, for example, by (commodity) Xen hypervisors [31].

7.4 Dynamic profiling of the computing-networking energy consumptions The maximum and idle energies in Eq. (1) consumed by each VM may be dynamically profiled on a per-slot basis by equipping the Virtualization layer of Fig. 1 with the Joulemeter tool in [43]. It is a software tool that is capable to provide per-slot and per-VM energy metering functionalities as currently exist in hardware for physical servers. For this purpose, Joulemeter uses hypervisor-observable hardware power states to track the VM energy usage on each hardware component (see Section 5 of [43] for a detailed description). Interestingly, the field trials reported in [43] support the conclusion that, at least when the VMs hosted by each physical server are

22

Scarpiniti et al.

homogeneous, the per-slot maximum and idle energies wasted by a turned ON VM are proportional to the maximum and idle powers consumed by the hosting physical server, that is (see Eq. (1)), com Eidle (v) =

NSE

∑ dsv

s=1

idle (s) TS PSE , Ms

(25.1)

max (s) TS PSE , Ms

(25.2)

and com Emax (v) =

NSE

∑ dsv

s=1

where Ms is the number of VMs running atop s-th physical server and dsv is a binary set of variables that assume the value 1 if the v-th VM is hosted by the s physical server, 0 otherwise. Passing to consider the online profiling of the setup and dynamic energies in (3) and (4) of the v-th TCP/IP connection, we observe that, in emerging broadband data centers, each Physical Network Interface Card (PNIC) typically supports a single connection [1], in order to reduce the resulting average round-trip-time and, then, setup dyn saving energy. Hence, after indicating by PSW (n) (resp., PSW (n)) the setup (resp., dynamic) energy consumed by each PNIC hosted by the n-th physical switch, the resulting setup and dynamic energies in (3) and (4) of the v-th connection may be profiled online as the corresponding summations of the setup and dynamic energies consumed by the crossed PNICs, that is: net Esetup (v) =

NSW

setup (n), ∑ bnv TS PSW

(26.1)

n=1

and net Edyn (v) =

NSW

dyn (n), ∑ bnv TS PSW

(26.2)

n=1

where bnv is a set of binary variables that assume the value 1 if the v-th connection crosses the n-th physical switch, 0 otherwise. Finally, after evaluating Eqs. (26.1) and (26.2), the γ exponent in (2) may be profiled as in: net (v) − E net (v) /σ log Emax v setup , (27) γ= log(rmax ) net (v) is the profiled setup-plus-dynamic energy in (4) consumed by the v-th where Emax connection when it works at the peak rate rmax . We anticipate that, in the experimental work of Section 8, we use Eqs. (25), (26) and (27), in order to profile the involved energies.


23

7.5 Managing discrete processing frequencies Dynamic scaling of the VM processing frequencies relies on Dynamic Voltage and Frequency Scaling (DVFS)-enabled physical servers [1] that, in general, do available only a finite set of H ≥ 2 discrete CPU processing speeds: F , { fb(0) , fb(1) , . . . , fb(H−1) }. Hence, in order to deal with continuous and discrete frequency settings under a unified framework, we borrow the time-sharing approach developed in [44]. For this purpose, let fb(q) , and fb(q+1) be the (discrete) allowed frequencies that surround the (continuous) processing frequency fv∗ (t) obtained by the proposed approaches, that is, fb(q) ≤ fv∗ (t) ≤ fb(q+1) . Hence, according to [44], V M(v) runs at fb(q) (resp., fb(q+1) ) during a fraction k0 (resp., (1 −k0 )) of slot t. In order to leave unchanged the resulting per-slot energy consumption, we set (see Eq. (1)): α fb(q+1) − ( fv∗ (t))α (28) k0 = α α . fb(q+1) − fb(q) In practice, the frequency hopping mechanism required for performing this timesharing operation may be implemented by equipping the physical servers with commodity delta modulators [44]. 8 Numerical results and performance comparisons The targeted data center of Fig. 1 is a SaaS cloud, so that its performance should be evaluated by considering a large-scale infrastructure. However, since it is challenging to carry out repeatable large-scale field trials on real-world data centers, we choose numerical simulations to evaluate and compare the performance of the proposed meta-heuristic approaches, in order to assure the repeatability of the performed tests. The CloudSim toolkit [45] has been selected as the simulation platform, mainly due to the fact that it natively supports a number of primitives for modeling networked SaaS data centers. In this work, we have considered the Giga Ethernet, as networking technology. However, similar results have been obtained under the Fast Ethernet and 10G Ethernet ones. 8.1 Experimental setup The test data center comprises NSE = 128 heterogeneous servers and each server may host up to Mmax = 20 virtual machines, so that the total number of VMs is MV = 2560. The heterogeneous servers are partitioned into two sets of 44 and 84 servers, respectively. According to the power profiles of current commodity servers [1], the (idle; maximum) per-server power consumptions of these two sets of servers are set to: {150 (Watt); 450 (Watt)}, and {315 (Watt); 450 (Watt)}, respectively. The topology of the simulated network is the fat-tree one [1], so that the set of heterogeneous servers is partitioned into 20 equal-size pods. Hence, the resulting total number of physical switches is NSW = 500 (e.g., 200 edge switches, 200 aggregation switches and 100

24

Scarpiniti et al.

core switches), while each 20-port switch is equipped with Ethernet-type PNICs. According to the power profile of Cisco Nexus 3548 commodity switches, the perswitch setup (resp., maximum) power is 130 (Watt) (resp., 250 (Watt)). The setup of the main simulated parameters is summarized in Table 5. Both the Admission Control Server and Load Balancer of Fig. 1 are co-located at the access router of Fig. 1, so that they act as root of the fat-tree network and are connected to each core switch by dedicated links. Minimum-hop static routing is applied to compute the Load Balancer-to-VM shortest routes. Table 5: Default setup of the main simulated parameters. Simulated parameters TS = 1 (s) f max = 12.5 (Mbit/slot) rmax = {16; 22.4} (Gbit/slot) NSE = 128 ke = 10−5 MV = 2560

−2 (s) TVON M = 10 ON TS = 10−2 (s) Ω net = 0.025 NSW = 500 w = 2.5 M max = 20

idle = {150; 315} (Watt) ESE max ESE = 450 (Watt) setup PSW = 130 (Watt) max PSW = 250 (Watt) γ = 1.4 pu = {0.5; 0.7}

Both synthetic and real-world workloads are considered for the tests, with the peakworkload set to two percentages pu = 0.5 and pu = 0.7 of the overall processing capacity of the data center, that is, rmax = pu × MV × f max ,

(Mbit/slot)

(29)

with f max = 12.5 (Mbit/slot). Specifically, in the synthetic case the input workload LT (t) is an independent and identically distributed random sequence, whose sample values are evenly distributed over the set [0, rmax ]. In order to test the performance of the proposed approaches under time-correlated workloads, we consider the real-world workload extracted from World Cup 981 [46] and the real-world trace referred to an I/O workload sampled from four RAID volumes of an enterprise storage cluster in Microsoft [4]. These two workloads have been chosen because they present two complementary properties: i) the Peak-to-Mean Ratio (PMR), and ii) the Cross-covariance coefficient (CCC). Specifically, numerical evaluations provide: PMR = 1.5 and CCC = 0.97 for the World Cup 98 workload, while PMR = 2.5 and CCC = 0.25 for the RAID workload. Hence, the PMR is greater in the case of the RAID workload, while the CCC is greater in the case of the World Cup 98 workload. Summarizing: 1) the RAID workload presents frequent spikes, therefore, it has a time behavior similar to a white noise. In this sense, it can be used to characterize traffic flows typically generated by IoT-based sensing applications in time-varying (possibly, mobile) environments; 2) the World Cup 98 workload is smoother. Therefore, it is better suited to characterize web-like traffic flows, which typically have self-similar characteristics (e.g., they exhibit high time-correlation values). 1

Data can be downloaded from: http://ita.ee.lbl.gov/html/contrib/WorldCup.html


25

In the following subsections, we show results for both simulated and real-world workloads, using pu = 0.5 and pu = 0.7. In addition, three different thresholds have been used. They trigger consolidation at slot t when at least l = 3 out last m = 5 values of the utilization factor in (19) plus its predicted value in (20) fall out of the following target intervals: (i) I (1) , [0.4, 0.6] (Case 1); (ii) I (2) , [0.3, 0.7] (Case 2); and, (iii) I (3) , [0.2, 0.8] (Case 3).

8.2 Comparative time-tracking performance and predictor tuning

120

120

100

100

80

80

60

60

40

40

20 Workload

20

Number of turned ON servers

Normalized workload

As a first result, we show the capabilities of the proposed meta-heuristics and benchmark heuristics to track the time fluctuation of the workloads. In order to present a fair comparison, we have chosen a synthetic workload at 70% of utilization in the interval of Case 1. The tracking capabilities of the MDC heuristic and the ES-RSC meta-heuristic may be argued by the time curves of Fig. 2, where we have plotted the workload (in blue line) and the number of physical servers that at each time slot t are turned ON. We have selected only these two approaches, in order to obtain a clear figure with few curves. However, similar numerical results have been obtained by the implementation of the other considered heuristics and meta-heuristics.

MDC ES-RSC

0 0

20

40

60

80

0 100

Slot index t

Fig. 2: Tracking behaviors of the MDC heuristic and the proposed ES-RSC metaheuristic.

The behavior of the tracking capabilities are strongly related to the choice of the λ parameter in (20), the predictor of the next utilization. By tuning this parameter, we meet good performance for λ = 0.7: this value will be used in the next experiments

26

Scarpiniti et al.

per-VM Energy Consumption [J]

and in the next tuning procedure of the meta-heuristic parameters. The chosen value of λ guarantees both good tracking performance and saving in energy consumption. In addition, the per-slot and per-VM total energy consumption, evaluated in the case of the ES-RSC algorithm working on the real-world RAID workload at 70% of utilization and interval of Case 1, versus different values of the λ parameter is shown in Fig. 3. Similar behaviors have been found for the other proposed approaches, workloads and percentage of utilization. Fig. 3 confirms the choice λ = 0.7. Energy Consumption at 70%

25 24 23 22 21 0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Fig. 3: Per-slot and per-VM total energy consumption vs. λ parameter of the proposed ES-RSC meta-heuristic. An examination of Fig. 2 shows that both MDC and ES-RSC are able to track the workload fluctuation, with a delay limited up to one/two time slots. Figure 2 also highlights that the MDC heuristic generally tends to turn ON a lower number of physical servers than the ES-RSC meta-heuristic. However, we anticipate that this does not imply energy saving, as we will point out in the following subsection. Specifically, we anticipate that: – the MDC, for its nature, aims at minimizing the number of turned ON servers. However, in order to process all the admitted workload, the operating frequency fv∗ (t) of the v-th VM at slot t should be sufficiently high, hence consuming high energy. In addition, the MDC heuristic does not consider the energy consumption due to the supported TCP/IP connections, while a high operating frequency implies also a high networking energy consumption; – the ES-RSC approach tends to turn ON a greater number of physical server. However, in this case the greater number of VMs works at a lower operating frequency, producing an overall lower energy consumption. This behavior will be investigated in the next subsection.


27

8.3 Tuning of the meta-heuristic parameters and performance sensitivity All the parameters of the meta-heuristics used in simulations have been carefully tuned in order to obtain the best solution in the involved RSC problem. As it is well known, the tuning procedure of RSAs is quite a challenging affair and often parameters are selected by conventions in practice. However, several methods exist to tune such parameters [47, 48]. In this paper, we adopt a simple parameter-sweeping technique: a set of parameters is allowed to vary inside a specific interval, while the other ones are kept fixed. We have chosen the parameters that provide best results. Since it is impossible to graphically show results for each of the conducted experiment, we have selected a couple of representative tuning procedures for each meta-heuristic, by taking into account the effects on energy consumption. The tests shown in the following experiments have been conducted in the case of the RAID workload at 70% of utilization for the interval in Case 1. For the TS-RSC approach, we provide a comparison of energy performance with respect to the length LT S of the tabu list and the number K of candidate neighborhood solutions. Fig. 4a shows the profile of the energy consumption as a function of the length LT S of the tabu list and demonstrates that the energy is a decreasing function of the LT S parameter, except for larger values of LT S for which the energy slightly increases. In addition, Fig. 4a assures that the energy consumption is quite constant for a wide range of the LT S parameter. A good compromise is to select LT S = 60, value used in next experiments. As a second comparison, Fig. 4b shows the profile of the energy consumption as a function of the number K of candidate neighborhood solutions and demonstrates that the energy is a decreasing function of the K parameter. However, Fig. 4b also assures that the energy consumption is not further reduced and it remains constant. A good choice for this parameter is to set K = 10 candidate neighborhood solutions. For the SA-RSC approach, we provide a comparison of energy performance with respect to the ratio r of the temperature decreasing and the number K of candidate neighborhood solutions. Fig. 4c shows the profile of the energy consumption as a function of the ratio r of the temperature decreasing in (8) and demonstrates that the energy is a decreasing function of the r parameter. From an examination of Fig. 4c we have chosen r = 0.99. As a second comparison, Fig. 4d shows the profile of the energy consumption as a function of the number K of candidate neighborhood solutions and also in this case it demonstrates that the energy is a decreasing function of the K parameter. Once again, we have chosen a value of K = 10 candidate neighborhood solutions. Finally, for the ES-RSC approach, we provide a comparison of energy performance with respect to the size NP of the population and the coefficient αc of the crossover operator in (9), both in function of the number Ng of generations. Fig. 4e shows the profile of the energy consumption as a function of the number Ng of generations for different values NP of the population size. This figure demonstrates that the energy is always a decreasing function of the Ng parameter, regardless the size NP of the population. Generally, the energy consumption increases for a small number of generation and a small size of the population. Fig. 4e also implies that the ES-RSC

Scarpiniti et al.

Energy Consumption at 70%

30

28

26

24 10

20

30

40

50

60

70

80

90

100



28


25 24.5 24 23.5 23 4

5

6

7

LTS


30 28 26 24 22 0.5

0.6

0.7

0.8

0.9

1

27

NP = 50

26

NP = 100 NP = 120

25

NP = 150

24

NP = 200

23 22 21 5

10

15

20

25

30

Ng

(e) Per-VM energy consumption vs. number of generations Ng and population size NP in ES-RSC.

11

12

24.5 24 23.5 23 22.5 4

5

6

7

8 K

9

10

11

12

(d) Per-VM energy consumption vs. number of candidate neighborhood solutions K in SA-RSC. per-VM Energy Consumption [J]



28

10


25

r

(c) Per-VM energy consumption vs. rate of temperature decreasing r in SA-RSC.

9

(b) Per-VM energy consumption vs. number of candidate neighborhood solutions K in TS-RSC.



(a) Per-VM energy consumption vs. length of tabu list LT S in TS-RSC.

8 K


25

c

24

c c

23

c c

= 0.1 = 0.3 = 0.5 = 0.6 = 0.75

22 21 5

10

15

20

25

30

Ng

(f) Per-VM energy consumption vs. number of generations Ng and crossover parameter αc in ES-RSC.

Fig. 4: Per-slot and per-VM total energy consumption vs. different parameters values: (a) length LT S of the tabu list; (b) number K of candidate neighborhood solutions in TS-RSC; (c) rate r of the temperature decreasing; (d) number K of candidate neighborhood solutions in SA-RSC; (e) number Ng of generations and size NP of the population; and, (f) number Ng of generations and parameter αc of the crossover.

approach can converge to about a similar value for a sufficient large number of generations regardless the size of the population. However, the convergence is faster (in fewer generations) if the size of the population is sufficiently large. Similarly, Fig. 4f shows the profile of the energy consumption as a function of the parameter αc of the crossover operator for different values NP of the population size. This figure


29

demonstrates that, also in this case, the energy is always a decreasing function of the Ng parameter, while it is not for the αc parameter: in fact, from Fig. 4f, the case of αc = 0.75 is more energy consumption with respect to the case of αc = 0.6. However, such as in the previous case, the convergence is faster if the crossover parameter is larger. From an examination of both Figs. 4e and 4f, we have chosen the values Ng = 20, NP = 150 and αc = 0.6. From the plots in Fig. 4, we can observe the interesting behavior in the adopted meta-heuristic approaches that show a good sensitivity with respect to their critical parameters. In fact, the energy consumption reduction does not vary abruptly by small changes in the parameter values, but it shows a quite constant behavior in a wide range of the parameters. Plots similar to those in Figs. 3 and 4, can be drawn by tuning all the other parameters of the considered meta-heuristics, not shown in the paper for space limitations. We have selected parameters that assure a good energy consumption reduction and provide a low sensitivity with respect to their variability. Specific parameters of the three tested meta-heuristics selected after the above tuning procedure are summarized in Table 6. Specifically, for the TS-RSC we have chosen a number of iterations Nit = 13 and a number of neighborhood solutions of K = 10, while the tabu list has a length of LT S = 60. For the SA-RSC we have chosen a number of iterations Nit = 4, a number of neighborhood solutions of K = 10 and a number of iteration keeping fixed the temperature of NT = 3. The initial temperature is set to T0 = 50, while it is decreased by a decaying factor of r = 0.99. For the ESRSC, we have chosen a number of generations Ng = 20, a number individuals of NP = 150, a cross-over parameter αc = 0.6 and a permutation probability of σg = 0.5. In all the approaches, we set the parameters in (6) to Ci = 1.01, Cd = 0.99 and Lsr = 15, while the tolerance in (18) is set to δ = 0.5.

Table 6: Parameters used for the numerical tests of the meta-heuristics and their descriptions. Description

Parameter value

Description

Parameter value

TS-RSC Parameters Number of iterations Length of tabu list

Nit = 13 LT S = 60

Num. of iterat. at fixed temp. Num. of temp. iterations Rate of temp. decreasing in (8)

Nit = 4 NT = 3 r = 0.99

Number of neighborhoods Width of searching interval

K = 10 α = 0.01

SA-RSC Parameters Number of neighborhoods Initial temperature in (8)

K = 10 T0 = 50

ES-RSC Parameters Number of generations Crossover parameter

Ng = 20 αc = 0.6

Size of population Mutation parameter

NP = 150 σg = 0.5

Common Parameters Increase std coefficient in (6) Decrease std coefficient in (6)

ci = 1.01 cd = 0.99

Number of iterations in (6) Tolerance coefficient in (18)

Lsr = 15 δ = 0.5

30

Scarpiniti et al.

8.4 Numerical evaluations The per-slot and per-VM total energy consumption (expressed in (Joule) and averaged over 100 slots) for the synthetic workload in the three previous cases are reported in Table 7. Specifically, the upper part of Table 7 refers to the case of server utilization of 50% (i.e., pu = 0.5), while the lower part is related to a server utilization of 70% (i.e., pu = 0.7). For each of the intervals in Case 1, Case 2 and Case 3, Table 7 shows the energy consumptions of all the approaches considered in this paper, namely: MDC, MBFD, RS-RSC, TS-RSC, SA-RSC and ES-RSC. From an examination of Table 7, we can argue that: (i) the energy consumption increases by passing from Case 1 to Case 2 and Case 3; (ii) with regards the efficiency in energy consumption reduction, the analyzed algorithms can be sorted in the following order: MDC (the poorest performing), MBFD, RS-RSC, TS-RSC, SA-RSC and ES-RSC (the best performing); (iii) among the meta-heuristic approaches, the ES-RSC one provides the best energy results; (iv) the ES-RSC approach provides an energy consumption reduction of about 16% with respect to the MDC heuristic. As a general validation of the proposed approach, we can note that: (i) by increasing the server utilization, the whole energy consumption also increases; (ii) at a higher server utilization, the gaps between different approaches are reduced. Moreover, this table confirms the behavior argued in the previous subsection 8.2: the fact that the MDC heuristic tends to turn ON a lower number of physical servers does not imply a lower energy consumption.

Table 7: Per-slot and per-VM total energy consumption (Joule) for the three considered intervals under synthetic workload at 50% and 70% of utilization. Utilization

Interval

MDC

MBFD

RS-RSC

TS-RSC

SA-RSC

ES-RSC

50%

0.4 – 0.6 0.3 – 0.7 0.2 – 0.8

27.4 29.2 31.1

27.1 27.3 30.8

26.2 26.8 30.7

24.6 25.7 30.5

23.9 25.4 30.2

23.1 24.5 29.5

70%

0.4 – 0.6 0.3 – 0.7 0.2 – 0.8

38.2 38.6 43.4

37.9 38.0 43.1

37.3 37.9 43.0

36.1 37.1 42.8

35.3 36.8 42.6

34.5 36.3 41.2

In order to test the performance of the proposed approaches under time-correlated workloads, we provide energy comparisons by considering the real-world workload extracted from World Cup 98 [46] and the real-world workload sampled from four RAID volumes of an enterprise storage cluster in Microsoft [4]. For an energy comparison of the proposed approaches, Table 8 reports the per-slot and per-VM total energy consumption of the World Cup 98 workload. Once again, the upper and lower parts of Table 8 refer to server utilization percentages of 50% and 70%, respectively. From this table we can draw the same considerations as the case of a synthetic workload, barring for an uniform increase of per-slot and per-VM energy. However, it is interesting to note that the gaps between the three different cases are reduced, while in Cases 3 the differences between the compared algorithms are more


31

pronounced. In summary, under this workload, the energy consumption reduction is limited up to about 9% with respect to the MDC heuristic. Table 8: Per-slot and per-VM total energy consumption (Joule) for the three considered intervals under World Cup 98 workload at 50% and 70% of utilization. Utilization

Interval

MDC

MBFD

RS-RSC

TS-RSC

SA-RSC

ES-RSC

50%

0.4 – 0.6 0.3 – 0.7 0.2 – 0.8

31.5 32.3 33.4

31.2 31.8 32.3

30.9 31.4 31.9

29.5 31.1 31.3

29.4 30.8 31.1

29.0 29.9 30.3

70%

0.4 – 0.6 0.3 – 0.7 0.2 – 0.8

43.9 45.5 46.7

43.7 45.2 45.9

42.3 43.7 44.2

42.3 43.7 44.2

41.9 43.4 43.8

41.1 42.6 43.0

Finally, Table 9 reports the per-slot and per-VM total energy consumption under the RAID workload. The upper and lower parts of the table refer to server utilization percentages of 50% and 70%, respectively. Once again, from this table we can draw the same considerations as the case of a synthetic workload and World Cup 98. Table 9: Per-slot and per-VM total energy consumption (Joule) for the three considered intervals under RAID workload at 50% and 70% of utilization. Utilization

Interval

MDC

MBFD

RS-RSC

TS-RSC

SA-RSC

ES-RSC

50%

0.4 – 0.6 0.3 – 0.7 0.2 – 0.8

17.7 18.9 21.6

17.4 18.5 21.2

17.2 18.4 20.9

16.9 18.3 20.6

16.7 17.9 20.4

15.4 17.1 19.3

70%

0.4 – 0.6 0.3 – 0.7 0.2 – 0.8

24.1 26.0 30.2

23.7 25.8 29.3

23.5 25.7 29.1

23.1 25.5 28.6

22.9 25.4 28.4

21.2 24.1 27.3

In this last real-world case, the energy consumption reduction is about 13% with respect to the MDC heuristic. However, interestingly enough, although the per-slot and per-VM energy consumption is lower than the other workloads, the gaps between the three different cases (in particular between Case 2 and Case 3) are greater than the other two workloads. 8.5 Performance comparisons and related comments Results presented in the previous subsection confirm that the proposed meta-heuristics are able to correctly address the problem of real-time resource scaling and consolidation in data centers. Interestingly enough, all the proposed approaches behave similarly in all the workloads considered, both synthetic and real-world. However, from Tables 7 – 9 we can observe that the ES-RSC approach is the best performing in energy consumption reduction. This approach is the recommended one to be applied in

32

Scarpiniti et al.

the RSC problem. In addition, Table 4 confirms that the ES-RSC can be easily performed in real-time, while, for example, SA-RSC shows some difficulties in real-time implementation. To better underline the advantages of the ES-RSC approach, we report in next figures the relative gain in energy consumption reduction of the best performing metaheuristic approach, the ES-RSC, with respect to the other heuristic and meta-heuristic approaches. Specifically, Figs. 5a and 5b show the relative gain of the ES-RSC in the case of a synthetic workload at 50% and 70% of utilization, respectively. These figures show that the gap with the MDC approach is about 16% (resp. 10%) at 50% (resp. 70%) of utilization for Case 1, and gradually reduces to 5% for Case 3. Interestingly, in Case 2 the gap between the MDC heuristic at 50% of utilization is still about 16%. These gaps also gradually reduce by considering more efficient approaches, in order MBFD, RS-RSC, TS-RSC and SA-RSC. With respect to the SA-RSC approach gaps are reduced around 2 – 3 %. Figs. 6a and 6b show the relative gain of the ES-RSC in the case of the real-world World Cup 98 workload at 50% and 70% of utilization, respectively. These figures show that the gap with the MDC approach is about 8% (resp. 6%) at 50% (resp. 70%) of utilization for Case 1, and gradually increases to 9% (resp. 8%) for Case 3. In addition, it can be observed that gaps between different approaches in Case 3 are more pronounced with respect to the Cases 1 and 2. This behavior can be explained by the fact that the World Cup 98 exhibits a high cross-covariance coefficients (about 0.97). These gaps also gradually reduce by considering more efficient approaches, in order MBFD, RS-RSC, TS-RSC and SA-RSC. With respect to the SA-RSC approach, gaps are reduced around 2%. Generally, in the case of the World Cup 98 workload, the ES-RSC approach provides the lowest gaps with respect to other approaches. Finally, Figs. 7a and 7b show the relative gain of the ES-RSC in the case of the real-world RAID workload at 50% and 70% of utilization, respectively. These figures show that the gap with the MDC approach is about 13% (resp. 12%) at 50% (resp. 70%) of utilization for Case 1, it reduces to 9.5% (resp. 7.5%) for Case 2 and slightly increase to 10.5% (resp. 9.5%) for Case 3. These gaps also gradually reduce by considering more efficient approaches, in order MBFD, RS-RSC, TS-RSC and SA-RSC. Interestingly enough, the gaps with respect to the SA-RSC approach are always greater than 4% for all the considered intervals. Moreover, as already noted in the previous subsection, the gaps between the three different cases (in particular between Case 2 and Case 3) are greater than the other two workloads. This behavior is due to the rapid fluctuation of the RAID workload that exhibits a large peak-tomean ratio (about 2.5). In addition, in Cases 3 the differences between the compared algorithms are more pronounced with respect to Case 1 and Case 2.

9 Conclusion and future work In this paper, we have investigated the VMC problem by proposing three “ad hoc” meta-heuristic approaches, namely Tabu Search Resource Scaling and Consolidation (TS-RSC), Simulated Annealing Resource Scaling and Consolidation (SA-RSC) and


Relative Gain in Energy Consumption [%]

20

33

Relative Gain in Energy Consumption of ES-RSC at 50% MDC MBFD RS-RSC TS-RSC SA-RSC

18 16 14 12 10 8 6 4 2 0 [0.4 - 0.6]

[0.3 - 0.7]

[0.2 - 0.8]

Interval

(a)


20


18 16 14 12 10 8 6 4 2 0 [0.4 - 0.6]

[0.3 - 0.7]

[0.2 - 0.8]

Interval

(b)

Fig. 5: Relative gain (%) in per-slot and per-VM total energy consumption for the three considered intervals under synthetic workload at: (a) 50% of utilization, and (b) 70% of utilization.

Evolutionary Strategy Resource Scaling and Consolidation (ES-RSC). The proposed methods have been adapted to the consolidation problem and, in order to refine the search space of algorithms, the Rechemberg success rule has been used. Comparisons with state-of-the-art heuristics and a pure random search metaheuristic have demonstrated the effectiveness of the proposed approaches. Experimental results have shown that the ES-RSC approach provide the best reduction in

34

Scarpiniti et al.


20


18 16 14 12 10 8 6 4 2 0 [0.4 - 0.6]

[0.3 - 0.7]

[0.2 - 0.8]

Interval

(a)


20


18 16 14 12 10 8 6 4 2 0 [0.4 - 0.6]

[0.3 - 0.7]

[0.2 - 0.8]

Interval

(b)

Fig. 6: Relative gain (%) in per-slot and per-VM total energy consumption for the three considered intervals under World Cup 98 workload at: (a) 50% of utilization, and (b) 70% of utilization.

energy consumption and it becomes the suggested approach in solving the resource scaling and consolidation problem. The obtained energy reduction applying the ESRSC approach is always greater than 10% with respect to the heuristic solution. In principle, the presented results may be extended along two main research directions.



20

35


18 16 14 12 10 8 6 4 2 0 [0.4 - 0.6]

[0.3 - 0.7]

[0.2 - 0.8]

Interval

(a)


20


18 16 14 12 10 8 6 4 2 0 [0.4 - 0.6]

[0.3 - 0.7]

[0.2 - 0.8]

Interval

(b)

Fig. 7: Relative gain (%) in per-slot and per-VM total energy consumption for the three considered intervals under RAID workload at: (a) 50% of utilization, and (b) 70% of utilization.

The first one regards multimedia applications in which the input workload of Fig. 1 is the multiplexing of multiple heterogeneous streams with different processing demands. Under multimedia application scenarios, it could be reasonable to relax the (here considered) assumption of homogeneous VMs, in order to account for the heterogeneous nature of the input stream. As a consequence, the formulation of the corresponding resource allocation and consolidation problem for multimedia application should be generalized, in order to allow different VMs to run at different processing

36

Scarpiniti et al.

frequencies. Specifically, in multimedia application scenarios, the multiple heterogeneous streams can be processed by a set of homogeneous VMs, one set for each type of available stream (music, videos, images, text, etc.). The additional problem that arises in this scenario can be represented by the fact that a server can host VMs becoming from different sets of VMs, working at a different operating frequency and showing different delay bounds. In the authors’ opinion, this problem could be afforded by resorting to a hierarchically organized set of meta-heuristics in which: (i) a master meta-heuristic coordinates the VM-to-server allocation; (ii) a number of slave meta-heuristics adaptively tune the set of processing frequencies on a per-server or per-VM basis; and, (iii) master and slave meta-heuristics cooperate, in order to perform energy-saving server consolidation. The design of the overall resulting hyper meta-heuristic [34] is currently under investigation by the authors. A second research direction regards the integration on the considered solving framework of energy-saving algorithms for the live migration of VMs. At this regard, we observe that, when the focus shifts to the processing of delay-tolerant (e.g., non real-time) input streams, the energy consumption of the overall virtualized networked computing platform of Fig. 1 could be, in principle, further reduced by adaptively planning inter-server live migrations of VMs. In principle, live migrations of VMs allows to quickly reduce the number of turned-ON servers when the input workload scales down, but it may also introduce both VM migration delays and inter-server traffic overhead that, in turn, may dilate the resulting execution processing time. Thus, the design of adaptive resource allocation and consolidation meta-heuristics that integrate the adaptive planning of minimum-energy delay-constrained live migrations of VMs is a second research topic of potential interest. Acknowledgements This work has been supported by the project: “GAUChO – A Green Adaptive Fog Computing and networking Architectures” funded by the MIUR Progetti di Ricerca di Rilevante Interesse Nazionale (PRIN) Bando 2015 – grant 2015YPXH4W_004, and by the projects: V-Fog and V-Fog2 “Vehicular Fog energy-efficient QoS mining and dissemination of multimedia Big Data streams” funded by Sapienza University of Rome, Bando 2016 and 2017.

References 1. C. Wu and R. Buyya. Cloud data centers and cost modeling. Morgan Kaufmann, 2015. 2. M. A. Khoshkholghi, M. N. Derahman, A. Abdullah, S. Subramaniam, and M. Othman. Energyefficient algorithms for dynamic virtual machine consolidation in cloud data centers. IEEE Access, 5:10709–10722, 2017. 3. J. G. Koomey. Worldwide electricity used in data centers. Environmental Research Letters, 3:1–8, September 2008. 4. Z. Zhou, F. Liu, Y. Xu, R. Zou, H. Xu, J. Lui, and H. Jin. Carbon-aware load balancing for geodistributed cloud services. In Proc. IEEE Int. Symp. Modelling Anal. Simulation Comput. Telecomun. Syst. (MASCOTS2013), pages 232–241, San Francisco, CA, USA, 2013. 5. M. F. Bari, R. Boutaba, R. Esteves, L. Zambenedetti Granville, M. Podlesny, M. G. Rabbani, Q. Zhang, and M. F. Zhani. Data center network virtualization: A survey. IEEE Communications Surveys and Tutorials, 15(2):909–928, Second Quarter 2013. 6. D. Abts, M. R. Marty, P. M. Wells, P. Klausler, and H. Liu. Energy proportional datacenter networks. In Proc. of ACM Int. Sypm. on Computer Architecture (ISCA2010), pages 338–347, SaintMalo, France, 2010.


37

7. R. Buyya, A. Beloglazov, and J. Abawajy. Energy-efficient management of data center resources for cloud computing: a vision, architectural elements, and open challenges. In Proceedings of the Internet Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA2010), pages 1–12, July 2010. 8. A. Varasteh and M. Goudarzi. Server consolidation techniques in virtualized data centers: A survey. IEEE Systems Journal, 11(2):772–783, June 2017. 9. I. Rechenberg. Evolutionsstrategie: Optimierung technischer Systeme nach Prinzipien der biologischen Evolution. Frommann–Holzboog Verlag, 1973. 10. T. Wood, P. Shenoy, A. Venkataramani, and M. Yousif. Sandpiper: Black-box and gray-box resource management for virtual machines. Computer Networks, 53(17):2923–2938, December 2009. 11. Z. Cao and S. Dong. An energy-aware heuristic framework for virtual machine consolidation in Cloud computing. J. Supercomputing, 69(1):429–451, July 2014. 12. E. Baccarelli, P. G. Vinueza Naranjo, M. Shojafar, and M. Scarpiniti. Q*: Energy and delay-efficient dynamic queue management in TCP/IP virtualized data centers. Computer Communications, 102:89– 106, April 2017. 13. S. K. Mishra, B. Sahoo, K. S. Sahoo, and S. K. Jena. Metaheuristic approaches to task consolidation problem in the cloud. In A. K. Turuk, B. Sahoo, and S. K. Assya, editors, Resource Management and Efficiency in Cloud Computing Environments. IGI Global, 2017. 14. E. Feller, L. Rilling, and C. Morin. Energy-aware ant colony based workload placement in clouds. In Proceedings of the 2011 IEEE/ACM 12-th International Conference on Grid Computing, pages 26–33, 2011. 15. P. R. Theja and S. K. K. Babu. An evolutionary computing based energy efficient vm consolidation scheme for optimal resource utilization and qos assurance. Indian Journal of Science and Technology, 8(26):1–11, October 2015. 16. F. Larumbe and B. Sansò. A tabu search algorithm for the location of data centers and software components in green cloud computing networks. IEEE Transactions on Cloud Computing, 1(1):22– 35, January-June 2013. 17. R. Nasim and A. J. Kassier. A robust Tabu Search heuristic for VM consolidation under demand uncertainty in virtualized datacenters. In Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pages 170–180, Madrid, Spain, May 14–17 2017. 18. B. Zeng, S. Feng, and J. Zhang. Tabu Search-based heuristic resource allocation algorithm for database web services in a enterprise organization. In 2010 International Conference on Information Management, Innovation Management and Industrial Engineering (ICIII), Kunming, China, November 26–28 2010. 19. T. Ferreto, C. A. De Rose, and H.-U. Heiss. Maximum migration time guarantees in dynamic server consolidation for virtualized data centers. In E. Jeannot, R. Namyst, and J. Roman, editors, EuroPar 2011 Parallel Processing, volume 6852 of Lecture Notes in Computer Science, pages 443–454. Springer, 2011. 20. A. Marotta and S. Avallone. A simulated annealing based approach for power efficient virtual machines consolidation. In 2015 IEEE 8th International Conference on Cloud Computing, New York, NY, USA, 27 June – 2 July 2015. 21. Y. Wu, M. Tang, and W. Fraser. A simulated annealing algorithm for energy efficient virtual machine placement. In IEEE International Conference on Systems, Man, and Cybernetics (SMC 2012), pages 1245–1250, Seoul, Corea, October, 14–17 2012. 22. K. Tsakalozos, M. Roussopoulos, and A. Delis. VM placement in non-homogeneous IaaS-clouds. In G. Kappel, Z. Maamar, and H.R. Motahari-Nezhad, editors, Service-Oriented Computing, volume 7084 of Lecture Notes in Computer Science. Springer, 172–187, 2011. 23. H. Nakada, T. Hirofuchi, H. Ogawa, and S. Itoh. Toward virtual machine packing optimization based on genetic algorithm. In Distributed Computing, Artificial Intelligence, Bioinformatics, Soft Computing, and Ambient Assisted Living, pages 651–654. Springer, 2009. 24. S. Agrawal, S. K. Bose, and S. Sundarrajan. Grouping genetic algorithm for solving the server consolidation problem with conflicts. In Proc. of the first ACM/SIGEVO Summit on Genetic and Evolutionary Computation, pages 1–8, Shangai, China, June 12–14 2009. 25. G. Wu, M. Tang, Y.-C. Tian, and W. Li. Energy-efficient virtual machine placement in data centers by genetic algorithm. In T. Huang, Z. Zeng, C. Li, and C.S. Leung, editors, International Conference on Neural Information Processing (ICONIP 2012), volume 7665 of Lecture Notes in Computer Science, pages 315–323. Springer, 2012.

38

Scarpiniti et al.

26. C. C: T. Mark, D. Niyato, and T. Chen-Khong. Evolutionary optimal virtual machine placement and demand forecaster for cloud computing. In IEEE International Conference on Advanced Information Networking and Applications (AINA 2011), pages 348–355, 2011. 27. J. Xu and J. A. Fortes. Multi-objective virtual machine placement in virtualized data center environments. In Proceedings of the 2010 IEEE/ACM International Conference on Green Computing and Communications (GreenCom) & International Conference on Cyber, Physical and Social Computing (CPSCom), pages 179–188, December 18–20 2010. 28. V. Shrivastava, P. Zerfos, K. W. Lee, H. Jamjoom, Y. H. Liu, and S. Banerjee. Application-aware virtual machine migration in data centers. In 2011 Proceedings IEEE INFOCOM, pages 66–70, Shanghai, China, April 10–15 2011. 29. J. Dong, X. Jin, H. Wang, Y. Li, P. Zhang, and S. Cheng. Energy-saving virtual machine placement in cloud data centers. In 13th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), pages 618–624, Delft, Netherlands, May 13–16 2013. 30. F. Messina, G. Pappalardo, D. Rosaci, and G. M. Sarné. A trust-based, multi-agent architecture supporting inter-cloud VM migration in IaaS federations. In International Conference on Internet and Distributed Computing Systems (IDCS 2014), pages 74–83, September 2014. 31. M. Portnoy. Virtualization Essentials. John Wiley & Sons, 2012. 32. M. Shojafar, N. Cordeschi, and E. Baccarelli. Energy-efficient adaptive resource management for real-time vehicular cloud services. IEEE Trans. Cloud Computing, 2016. 33. D. T. Pham and D. Karaboga. Intelligent Optimization Techniques – Genetic Algorithms, Tabu Search, Simulated Annealing and Neural Networks. Springer, 2000. 34. El-Ghazabli Talbi. Metaheuristic – From design to implementation. John Wiley & Sons, 2009. 35. F. Glover. Future paths for integer programming and links to artificial intelligence. Computers and Operation Research, 13:533–549, 1986. 36. P. Hansen. The steepest ascent mildest descent heuristic for combinatorial programming. In Conference on Numerical Methods in Combinatorial Optimisation, Capri, Italy, 1986. 37. S. Kirkpatrick, C.D. Jr Gelatt, and M.P. Vecchi. Optimization by simulated annealing. Science, 220(4598):671–680, 1983. 38. N. Benvenuto, M. Marchesi, and A. Uncini. Applications of simulated annealing for the design of special digital filters. IEEE Transactions on Signal Processing, 40(2):323–332, February 1992. 39. J. H. Holland. Adaptation in Natural and Artificial Systems. University of Michigan Press, Ann Arbor, MI, 1975. 40. A. Beloglazov, J. Abawajy, and R. Buyya. Energy-aware resource allocation heuristics for efficient management of data centers for cloud computing. Future Generation Computer Systems, 28(5):755– 768, May 2012. 41. A. Gulati, A. Merchant, and P. J. Varman. mClock: handling throughput variability for hypervisor IO scheduling. In Proc. USENIX Symp. on Networked System Design and Implementation (NSDI2010), pages 1–7, San Jose, CA, USA, 2010. 42. C. Guo, G. Lu, H. J. Wang, S. Yang, C. Kong, P. Sun, W. Wu, and Y. Zhang. Secondnet: a data center network virtualization architecture with bandwidth guarantees. In Proc. ACM Co-Next, pages 15–26, Philadelphia, USA, 2010. 43. A. Kansal, F. Zhao, J. Liu, N. Kothari, and A. A. Bhattacharya. Virtual machine power metering and provisioning. In Proc. ACM Symp. Cloud Computing (SoCC2010), pages 39–50, Indianapolis, IN, USA, 2010. 44. K. Li. Performance analysis of power-aware task scheduling algorithms on multiprocessor computers with dynamic voltage and speed. IEEE Trans. Parallel Distrib. Syst., 19(11):1484–1497, 2008. 45. R. N. Calheiros, R. Ranjan, A. Beloglazov, C. A. F. D. Rose, and R. Buyya. CloudSim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms. Software: Practice and Experience, 41(1):23–50, 2011. 46. M. Arlitt and T. Jin. A workload characterization study of the 1998 World Cup web site. IEEE Network, 14(3):30–37, May 2000. 47. A.E. Eiben and S.K. Smit. Evolutionary algorithm parameters and methods to tune them. In Y. Hamadi, E. Monfroy, and F. Saubion, editors, Autonomous Search, pages 15–36. Springer, 2011. 48. S. Traferro and A. Uncini. Power-of-two adaptive filters using tabu search. IEEE Transactions on Circuits and Systems–II: Analog and Digital Signal Processing, 47(6):566–569, June 2000.