Towards Energy Efficient Workload Placement in ... - Computer Science

3 downloads 16211 Views 1MB Size Report
hardware, software and services) into large-scale data centers that are shared by a global pool of .... Other research efforts have focused on selecting a workload-dependant best policy to manage a ..... and server resources in hosting centers.
Towards Energy Efficient Workload Placement in Data Centers Rania Elnaggar, Portland State University [email protected]

Abstract. A new era of computing is being defined by a shift to aggregate computing resources into large-scale data centers (DCs) that are shared by a global pool of users. In this paradigm, DCs' operational energy costs are a rising concern as they continue an upward trend that is poised to surpass the capital cost of equipment in a typical lifetime usage model. A DC is a complex distributed system comprised of a hierarchy of numerous components; thus, power and energy management can be performed at many levels of granularity and through various techniques. We contend that the energy efficiency problem in DCs should be addressed through a holistic, endto-end approach that accounts for the many, and sometimes-conflicting, parameters. In this paper, we discuss workload placement strategies as a model for a holistic approach. Earlier research that addressed workload placement, capitalized on a maximumidle approach that seeks to maximize both spatial and temporal idleness. We show that the underlying concept for that approach does not hold as a basis for energy-efficient placement; we investigate current and future system power expenditure with respect to system load and analyze the contributing factors. We then utilize our analysis to introduce a framework for energy-efficient load placement strategies in DCs. Comparing our approach to maximum-idle-based placement shows gains in compute energy efficiency. Finally, we discuss how our new approach affects DC thermals and the energy required for cooling.

1

Introduction

Energy efficiency of has always been a first-class design goal in the mobile and embedded fields due to battery-life limitations. However, until recently, it has been less of a concern for servers and data centers. We witnessed waves of interest in DC energy efficiency with the advent of new technologies such as WWW, clusters, grid, utility compute models. The interest is now renewed with the inception of Cloud Computing [2] [8], given the massive scale anticipated in future DCs. Cloud Computing is charcterized by a move to aggregate computing resources (in terms of hardware, software and services) into large-scale data centers that are shared by a global pool of users. Those data centers are typically owned and run by third-party entities and export a wide array of services and applications ranging from individual consumer-oriented services to enterprise-class offerings.

In this new paradigm, computing energy costs are a rising concern. This is especially the case in DCs where energy costs continue an upward trend that is poised to surpass the cost of the equipment itself in a typical lifetime usage model [6]. In 2005, DCs consumed an estimated total of about 50 billion kWh in the U.S., and around 130 billion kWh for the world. These figures accounted for 1.2% and 0.8% of the total electricity consumption of the U.S. and the world respectively [24]. The U.S. Environmental Protection Agency (EPA) estimates that if current energy-efficiency trends continue, DCs will consume more than 120 billion kWh in the U.S. alone [13]. While cutting down operational costs of data centers is a chief concern, the environmental impact of this spiraling computing energy expenditure is equivalently important. It is projected that improving DC efficiency beyond current trends can reduce carbon dioxide emissions by 15 to 47 million metric tons in 2011 [13]. A data center is a complex entity that can be partitioned into two inter-dependent systems; the IT system and the facilities system. The IT system is comprised of compute related parts such as servers, networks and management components. The facilities system delivers the power and cooling needed by IT equipment as well as other facilities overhead such as lighting. As servers’ performance grows, they generate increasing amounts of heat [3] which in turn demands progressively demand more cooling [6]. In fact, cooling and power delivery energy expenditure already surpasses compute energy use in a great percentage of data centers [6]. DCs are usually cooled using Computer Room Air Conditioning (CRAC) units, and are typically arranged in a hot-aisle, cold-aisle configuration [47]. In this paper we contend that achieving improved energy-efficiency for a DC should be driven by a holistic that takes into consideration all system components to achieve maximum synergetic energy savings. This strategy governs a set of local policies and protocols that effectively use existing power-saving features within each system component in a way that is proportional to the workload of the component and of the overall system. Though ultimately, we are interested in defining an end-to-end global energy optimization strategy over a generalized model of a multi-level distributed system, the focus of our short-term research effort is on defining such a strategy for DCs, as a key subset of the larger problem. We consider a DC workload that is a mix of online and offline workloads that are mostly characterized as massively parallel, or throughputoriented. In defining such strategies, we will initially examine just the compute-related part of the DC, thus excluding networking, cooling and power delivery overheads. While we recognize that these overheads are critically important, we also observe that, in many cases, they are proportional to computing power expenditure in modern data centers. Expanding our work to explicitly consider those other components is a topic for future research. The rest of this paper is organized as follows. In section 2 we present research background and review related work. In section 3 we present our hypotheses and outline an investigation methodology. In section 4 we present results and analysis. In section 5 we introduce a framework for energy efficient workload placement. In section 6 we conclude the paper and present direction for future research.

2

Background

A DC’s compute system is comprised of a hierarchy of components of different scale, where power management can be performed at many levels of that hierarchy, erarchy, as shown in Fig. 1 below.

Fig. 1.. Scale and hierarchy of power management in a DC We recognize three power-related power attributes that affect the average power consumption for each component, component as well as, the overall system; namely: peak power power, idle power and dynamic power range. range Peak power is the power consumed at the maximum workload of a component. Idle power is the power consumed when a component has no workload but is powered-on powered and active, thus has low-latency latency response to increasing workload. The dynamic power range defines the distance between peak power and idle power, and it is i desirable for it to scale proportionally to the workload of the component [3]. [ When addressing the power consumption problem in a DC, we identify two main cost components: ponents: the capital cost of power provisioning in the infrastructure, and the operational power cost during the life span of the DC. The capital cost component is directly related to expected maximum power consumption and is a pressing issue as more DCs are built and typically amortized over an average of 15 years. Large DC operators such as Google are exploring ways to cut down on that cost through a workload mix that keeps maximum utilization within a decreased power envelope [13]. Decreasing operational power, however, is our area of interest and of a large body of other research. Approaches to decrease operational (also termed average) power have focused ed on the three power attributes attribute we identified earlier. The solutions olutions proposed can be broadly classified into three categories: scaling solutions that track utilization levels; levels; sleep solutions that shut off parts of the t system at low utilization; and hybrid solutions that combine the two former approaches. On the component level, Dynamic Voltage and Frequency Scaling (DVFS) has been widely researched and applied to CPUs in particular as they consume a large percentage of the overall system power. Also, o, clock gating and system sleep states have been introduced for various platform components. components These techniquess have been utilized in wide-spread spread commercial products such as those produced by Intel [22]] and AMD [1]. The Advanced Configuration and Power Interface (ACPI) [20] was defined to standardize power management for different platforms. platforms Use of heterogeneous cores and accelerators has also been explored to enhance energy efficiency [27],, and the use of operating perating system timers has been scrutinized to enable longer periods of uninterrupted sleep [44]. Conserving disk drive related energy consumption has also been explored through scaled-speed scaled and sleep modes [9]. In the computer networks domain, the concept of maximizing idleness through sleep states has also been the corner stone of a large body of research that includes

putting network devices and interfaces to sleep during periods of low traffic [19]. Since network chatter can prevent connected workstations from going to sleep, proxying has been explored [18], as has explicitly waking-up sleeping devices [26]. Other research efforts have focused on selecting a workload-dependant best policy to manage a device from a pool of candidate policies [40] [45]. Buffering has also been introduced to create bursty/idle cycles to allow enough sleep time for systems of low to moderate utilization [53], and the use of burstiness has been investigated for mobile devices [30]. Power management policies can be largely classified as heuristic policies, or stochastic policies. Heuristic policies are simpler to adapt [21] [46], while stochastic policies can offer performance guarantees only for stationary workloads [34] [45]. Application-level work load profiling has been discussed in [15] to guide power management for meeting user-defined QoS goals such as a battery-life objective. Addressing cluster-level power management through workload distribution is usually attempted by powering-off some servers according to the total DC workload [36] [33]. In [12], a Vary-On Vary-Off policy is introduced and combined with Dynamic Voltage Scaling (DVS) and also Coordinated DVS (CVS), with the latter policy shown not to be worthwhile. An economic resource distribution model through bidding is introduced in [10], and utilizes a cost-based on-off model that powers on machines if utility exceeds cost. It is well-understood that the on-off model is limited by the high latency, and decreasing it is instrumental to maintain QoS performance levels [25]. In earlier research, the on-off approach suffered from a lack of means for live workload migration in order to consolidate server and power-off lightly loaded machines. This problem was alleviated by the emergence of virtualization technologies such as Xen [52] and VMWare [50]. Resource provisioning and isolation became critical in such a virtualized environment as discussed in [37]. Also, translating power management directives from different VMs on the same hardware becomes tricky and the need arises for a standardized method to interface and coordinate VM power management [31]. Ensemble-level power management in the data centers provides greater opportunity to maximize energy-efficiency as outlined in [49]. However, the need for coordination between different power management modules arises and is investigated through control theory in [35]. Such problems are optimization problems that are difficult to solve for large-scale data centers [23]. Additionally, optimizing powerusage is subject to QoS requirements as problem constraints. In [43], the authors present a QoS-based workload placement approach that reduces average power consumption by running tasks as slowly as possible within performance bounds. As we have outlined in section 1, the magnitude of cooling costs in DCs is in the same order as IT cost and thus has received much attention. Researchers have investigated reducing cooling costs through controlling the heat load generated in the DC. As heat recirculation and equipment layout create distinct thermal conditions in each DC, thermal-aware workload placement strategies were introduced. In [29], zonal heat distribution and recirculation are factored in selecting a set of servers to run the workload. In [42] a thermal multiplier is generated to capture DC thermal conditions and is then used to assign priority workload placement. The same concept

of thermal profiling is used in [48] to design scheduling algorithms aimed at reducing server inlet temperatures. The scale of the thermal-aware placement approach is further expanded in [32] to be used for choosing a particular DC among many in a grid environment. As exceeding thermal limits is always a concern with increased heat densities, DC-level thermal throttling and capping is also investigated as in [51].

3

Energy Efficient Workload Placement

3.1

The case for energy efficient placement

Reviewing the body of research outlined in section 2 shows particular focus on maximizing both temporal and spatial idleness through workload placement. In the spatial dimension, servers’ workloads are consolidated to run a set of active servers at their maximum utilization, while idle servers are powered off. In the time dimension, we observe a strategy to maximize idle periods and allow a few consolidated peak utilization periods. We refer to these approaches as maximum-idle or (MI). Many researchers have shown reductions in average power consumption by using variants of MI. In [43] researches break from the pack and discuss a strategy based on running workloads as slowly as possible within certain performance constraints for meeting deadlines. Their hypothesis is that running slower will reduce average power consumption through DVFS in server systems. While we agree that there is a price tag associated with increased performance, contrary to MI, we contend,, that the energy consumption might not be reduced by such approach as we discuss below. As we stated earlier, we our research goal is to reduce energy consumption, and not average power consumption. Evidently, energy-based measurements are inherently difficult to work with as they are scenario-based and will track a task from start to finish. We lack quantization and benchmarking of energy efficiency albeit some efforts such as the JouleSort benchmark presented in [41]. Researchers and system designers have always resorted to the use of a power metric in lieu of an energy metric, as power is a an average rate that is easy to deal with. Lower average power doesn’t necessarily translate into lower energy expenditure; the time needed to complete a given task may be lengthened, with a slower rate corresponding to lower average power, such that the total energy expenditure increases. This concept is illustrated in Fig. 2, where a task is run twice at two different power rates. The energy consumed in each case, corresponds to the two areas under the curve, A1 and A2. Since A2 is greater than A1, the total energy consumed in the second run of the task is increased, even though the average power consumption has decreased. The necessary, and sufficient condition for a lower-power strategy to translate into a lower-energy strategy is for the performance difference (which is a time-based metric) to be less than the power difference.

Fig. 2. Total energy consumed is the area under the curve In order to investigate the underlying basis for MI success we must look at the power behavior with varying workload levels and be able to compare that across different systems. There are some inherent difficulties with this approach: − As different kinds of workload will utilize different parts of server platforms in different manners,, the power behavior observed will be workload dependent. − It is difficult to standardize a method to to specify graduated levels of platform utilization as it is also workload dependent. CPU-bound bound workloads for example can be specified by CPU utilization levels, while memory-bound memory bound workloads can be specified by memory bandwidth utilized. Power behavior corresponding responding to one utilization form cannot be compared or aggregated with another. − If we try to standardize workloads for the sake of comparison we see that existing performance benchmarks will typically exercise underlying systems at near peak utilization levels and hence cannot produce power traces that are valid to study graduated behavior. In fact, capturing a representative set of real data center workloads for repeatable experiments is a challenging task [28]. Addressing these issues led to the development of SPEC Power benchmark that is described and characterized in [17]. The workload used in SPEC Power is a transactional, throughput-oriented throughput workload that is characteristic of web-based based applications. Also, high performance computing applications that are massively or embarrassingly parallel fit into that category as well. well Frameworks to transform compute tasks into small throughput throughput-oriented jobs, such as Google’s MapReduce [39 39] [16] developed primarily for their web search application [5], are expanding the percentage of workloads that can be transformed into this category. category Choosing throughput as representative of utilization is indeed helpful as it eliminates the need to associate with workload specific metrics such as CPU utilization. For these reasons we chose throughput, in general, and SPEC Power, in particular, to represent workloads in this paper. Although the throughput workload we are considering here is a specific kind of web-based w small requests, many workload workloads can be equally represented by a throughput metric.

450 400 350

Hewlett-Packard Company Proliant DL580 G5 (1.86 GHz, Intel Xeon processor L7345) IBM Corporation IBM System x3350

300 Power (W)

250 200 150 100 50 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

0

Dell Inc. PowerEdge R300

Throughput

Fig. 3. SPEC Power curves for three different servers1 Fig. 3 shows Spec Power curves for three different servers of past generations. Using this representation to study the basis of MI, we note that each power curve is almost linear with a slope less than 1. That slope means that larger performance gains can be achieved with smaller power increase, and thus the most power efficient operation is at maximum utilization, and this is the reason MI has been successful to date. This result however is a consequence of the power behavior of the system which can be represented by a power curve. Changes in the shape of the power curve can easily lead to different behavior. We contend the behavior presented in Fig. 3 is a direct result of the inability of underlying systems to efficiently scale dynamic power expenditure with workload. Those systems achieved power saving primarily by entering sleeps states, thus minimizing idle power; the workload-proportional behavior of those systems is enabled mostly by temporal idleness, as illustrated in [17] by showing the inversely proportional relationship between the percentage of time spent in C1 state versus system throughput. The best workload scaling that can be achieved via this model is linear with utilization as shown in Fig. 3. We argue that future servers and systems will exhibit markedly different power behavior with respect to utilization as they pursue more aggressive dynamic scaling with workload as the nature of the dynamic power scaling is non-linear. A simple example is the following relationship defining dynamic power,

ܲ = ‫ ܸ݂ܥ‬ଶ . If we closely scaled dynamic power of each server platform component, then according to the above equation, that will yield quadratic power behavior with power efficiency getting worse with higher utilization. In fact, state-of-the-art systems are starting to exhibit that behavior already. We select two systems representative of the state-of-the-art for the evaluation discussed later in this paper. There is reason to 1

All SPEC Power data in this document are public postings on SPEC website at http://www.spec.org

believe this non-linear behavior will only get more pronounced in the future, as hardware system developers pursue more aggressive power savings. Given the above arguments, it is clear that the underlying principles of MI are unlikely to hold in future systems. Evidently, we contend that there exists a workload placement distribution corresponding to the utilization level of a DC, which minimizes total energy expenditure, and that does not necessarily coincide with MI placement. The overall energy envelope of the data center consists of both IT power, which is used to power severs and network, and facilities power that is used for cooling, power provisioning and overheads. In our research we distinguish between two types of energy-related costs; first, the cost of power, cooling and facilities provisioning in capital expenditure; and second, the operational energy cost of powering servers, cooling and facilities equipment in operational expenditure. The CapEx cost component is proportional to peak power and set power budget, while the OpEx component is proportional to average power. Our goal in the research is to focus on improving operational energy expenditure and hence will not discuss efficiencies related to under/over capacity provisioning. In this paradigm, the operational cooling cost is dependent on the heat generated by IT equipment and hence, their power expenditure. Subsequently, we will focus on discussing energy-efficient expenditure in servers and correlate its effect on DC thermals and cooling. To summarize our research directions, we believe that achieving maximum energyefficiency in data centers should be driven by a holistic strategy that takes into consideration all system components to achieve maximum synergetic power savings. This strategy seeks to effectively use existing power-management features within each system component in a way that is proportional to the workload of the component and of the overall system. We contend that such holistic strategy can be achieved and coordinated through intelligent workload placement. The workload placement strategy is informed of the energy-proportionality of each individual subsystem (a server in this case) and will assign workloads in a manner that achieves maximum energy efficiency over the entire DC. We are not addressing thermal-aware workload placemat at this point, but rather a placement based on finding the most efficient load distribution and that is thermal-friendly, i.e., does not cause undesirable rise in cooling costs due to localized increase in heat generation (formation of hot spots).

3.2

Methodology and Limitations

As outlined in the previous section, we contend that there exists a workload distribution strategy that will maximize energy efficiency in the data center, and that doesn’t necessarily coincide with MI. We identified normalized throughput-based characterizations of workload, such as achieved by SPEC Power, to be a valid representation of a system’s power proportionality. This is an optimization problem that is seeks to minimize energy consumption as workloads are being assigned. In order to investigate our hypothesis we constructed a profiling simulator written in

C++ to experiment with different settings. In order to simplify the problem we only consider homogeneous DCs in the simulator. We will show in the results analysis how to easily expand our framework to incorporate heterogeneous DCs and we intend to incorporate that in our future work. The Simulator has a cost matrix for each server type that is constructed from the SPEC Power profiling of the server. Each cost matrix has 11 values corresponding to power usage by the server at utilization levels ranging from 0% (active idle) to 100% of maximum throughput. The number of servers in the DC is given as input to the simulator for each run. Simulation runs exercise the DC in a normalized graduated utilization manner at 10% increments of the DC’s maximum throughput workload. Maximum DC workload is determined by multiplying the number of servers in the DC by the maximum throughput of each server. DC throughput directly corresponds to the arrival rate of jobs. We assume that for arrival rates higher than the total DC capacity, jobs are queued at the workload dispatcher. The simulator finds all possible combinations of workload distributions at each utilization level, and calculates the aggregate corresponding cost over the entire DC. The simulator then identifies minimum, maximum and MI cost values for each workload level, corresponding to minimum, maximum and MI aggregate power consumption. To limit the search space, we assume that workloads can only be distributed to servers in chunks of 10% of the maximum server load. Note that the slow-down factor we identified as a necessary condition to translate power efficiency into energy efficiency, is not present in this framework; that is because the requested workload throughput is not scaled or throttled but is be kept constant as the server scales its power consumption accordingly. The purpose of the profiling simulation is to characterize the DC’s aggregate power behavior versus throughput, as well as, to compare and contrast different machine configurations and varying number of machines. Profiling results can be used to develop heuristics aiding in designing workload placement policies in DCs.

4

Results and Analysis

For presenting this section, we consider three different servers as building blocks for a DC system of varying number of machines. The three server systems are respectively named: m1, m2 and m3. The SPEC Power profiles of the three systems under investigation are presented in Fig. 4 below. Note that both m1 and m2 exhibit the nonlinear power scaling we outlined in section 3, while m3 has a mostly linear power curve.

250

200

Hewlett-Packard Company ProLiant DL380 G6 (2.26GHz, Intel Xeon L5520 processor), m1

150 Power 100

Dell Inc. PowerEdge R710 (Intel Xeon X5570, 2.93 GHz), m2

50

Fujitsu Siemens Computers PRIMERGY TX150 S6 (Intel Xeon L3360), m3

0

Throuput

Fig. 4. SPEC Power profile of the three machines under study. We run the profiling simulator on three DC systems that are homogeneous collections of the three machines. We use the graduated workload methodology outlined in the section 3.2, to vary the DC throughput. The number of machines in each DC is varied between 2 and 8. Nonetheless, we will show later that the results are valid for larger configurations. The purpose of the simulations is to find a minimum workload distribution at each DC throughput-level and study the properties of this distribution. In Fig. 5 below we plot the minimum power distribution against normalized DC utilization for a number of machines n ranging from 2 to 8 in a DC system of m1. Fig. 6 shows the percentage increase of power savings above MI placement, achieved by finding the minimum power distributions at each DC utilization level. 1600 1400 1200

n= 2

1000 Power (W)

n= 3

800

n= 4

600 n= 5 400 n= 6 200 n= 7

90%

100%

80%

70%

60%

50%

40%

30%

20%

0%

10%

0

n= 8

DC Throughput

Fig. 5. Minimum power versus throughput for different DC configurations, m1.

6% 5% n= 8 4% Power

n= 7

3%

n= 6 n= 5

2%

n= 4 n= 3

1%

n= 2 0% 0%

10% 20% 30% 40% 50% 60% 70% 80% 90% 100% DC Throughput

Fig. 6. Percentage power savings versus MI, m1. Examining Fig. 5 and Fig. 6 shows that, for an m1-based DC, there is a minimum aggregate power workload distribution different than MI and computes the amount of savings obtained at various utilization levels and with different numbers of machines. We obtain analogous results for m2, while for m3, MI proves to always coincide with the minimum as suggested by the shape of its power profile in Fig. 4. The above figures, however, do not provide insights into the properties of the minimum power distribution and additional analysis is needed. Taking the analysis further, we normalize the resulting minimum power curve to the value obtained at maximum throughput. This is akin to the way we normalized throughput and thus we can compare the aggregate minimum power of different DC configurations. The results of this approach are presented in Fig. 7, Fig. 8 and Fig. 9, for m1, m2 and m3 respectively. 100% 90% 80% 70%

n= 2

60% Power

n= 3

50% n= 4

40%

n= 5

30% 20%

n= 6

10%

n= 7

0%

n= 8

DC Throughput

Fig. 7. Normalized percentage minimum aggregate power versus DC throughput for m1-based DC.

100% 90% 80%

n= 2

70% n= 3

60%

n= 4

50%

n= 5

40%

n= 6

30%

n= 7

20%

n= 8

10% 0% 0%

10% 20% 30% 40% 50% 60% 70% 80% 90%

Fig. 8. Normalized percentage minimum aggregate power versus DC throughput for m2-based DC. 120% 100% n= 2 80%

n= 3 n= 4

60%

n= 5 40%

n= 6 n= 7

20%

n= 8

0% 0%

10% 20% 30% 40% 50% 60% 70% 80% 90%

Fig. 9. Normalized percentage minimum aggregate power versus DC throughput for m3-based DC. The above charts show that varying the number of machines in the DC doesn’t change the shape of the normalized aggregate minimum power curve obtained through workload placement. The shape of that curve changes though with different server base as we see by considering m1, m2 and m3. That means that the power profile (or behavior) of the DC is independent of the number of machines for a homogeneous system of a fixed workload type such as the system profiled. We examine the power savings obtained by seeking minimum workload distribution versus MI, and we plot that against throughput for both m1 and m2 while fixing n. We ignore m3 because those savings always amount to zero in this case. Fig. 10, Fig. 11 and Fig. 12 show those savings for n = 2, n = 6 and n = 8 respectively.

9% 8% 7% 6% Power saved

5% 4% 3%

m1

2%

m2

1% 0%

DC Throughput

Fig. 10. Percentage of power saved versus MI, n = 4. 7% 6% 5% 4% Power saved 3%

m1

2%

m2

1%

90%

100%

80%

70%

60%

50%

40%

30%

20%

0%

10%

0%

DC Throughput

Fig. 11. Percentage of power saved versus MI, n = 6. 6% 5% 4% Power saved

3% m1

2%

m2 1%

100%

90%

80%

70%

60%

50%

40%

30%

20%

0%

10%

0%

DC Throughput

Fig. 12. Percentage of power saved versus MI, n = 8.

From the above figures, we observe that there is a sweet spot in the DC utilization level that will result in a workload distribution corresponding to a maximum power

savings over MI. The location of that sweet spot is independent of the machine type, but dependent on the number of machines. In the results above, we find a workload distribution that minimizes energy consumption given a fixed DC configuration according to the number machines. This approach however does not scale well as the size of the DC grows. We need to find a way to further constrict the search space in the problem. Accordingly, we investigate another approach to optimization of workload placement, by aggregating machines into a hierarchy of nodes and distributing the workload between the top-most nodes in the hierarchy at each level until bottom nodes (individual servers) are reached. That hierarchy is similar to a tree structure with a constant branching factor equal to the number of nodes at each level. We use our profiling simulator to test this strategy and produce the DC power profiles as shown earlier. Simulations seek the most efficient placement at each node level. We setup an experiment with a DC of 8 servers organized in a node hierarchy of 3 levels with a branching factor of 2 at each level. Fig. 13 shows the resulting normalized power curve at each hierarchal level and for a DC of 8 at top most level. 100% 90% 80% 70% 60% Power

Level 2

50% 40%

Level 1

30%

Level 0

20%

n= 8

10% 0% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% DC Throughput

Fig. 13. Normalized power curve for a DC of 8 nodes and a branching factor of 2 in 3 levels of hierarchy, m2. Looking at the above chart we conclude that the aggregate power behavior of the DC doesn’t change by breaking down the best placement decision into a hierarchy of nodes. The trade-off between the branching factors and the depth of the hierarchy can be used to reduce exhaustive search time for best placement. This observation can be used to find best placement in a DC with a large number of servers. We also consider using profiling to find adequate placement heuristics for a given branching factor, and then repeatedly applying that across the hierarchy. In fact, such an approach can be generalized to partition a heterogeneous DC into several homogeneous clusters and treat them as end nodes at the top-most level for workload distribution. We intend to investigate the heterogeneous structure in our future work.

5

Recommended Framework

In section 4, we profiled the problem of finding a minimum power workload distribution in homogeneous DCs for fixed transactional workloads. We showed that the aggregate power behavior of the DC, when normalized, is independent of the number of machines. On the other hand, the maximum power savings obtained from such a distribution corresponds to a certain DC utilization level that depends on the total number of machines. We also showed how to structure the DC into a hierarchy of nodes and thus limit the problem to the set of nodes at each level, in a repeatable fashion. It is worth noting that in our simulations we considered only active idle power at 0% utilization and didn’t power-off machines to get a power cost of zero. There is no doubt that turning machines off will save more power at low utilization levels. However, we only consider an active set of servers among a pool of all available servers in the DC. Machines outside the active set are considered to be shutdown and thus have no contribution to workload distribution problem. Machines are brought up as the size of the active set expands and are turned off as it contracts. The size of this active set should have enough servers to guard against rapid workload growth, and thus to stay within latency thresholds. We intend to investigate the trade-offs between the active-set size and DC energy-efficiency in our future work. DC cooling cost is directly proportional to the maximum heat load generated. As MI seeks to run servers at their maximum utilization through workload consolidation, this will likely result in an increased heat load that is likely to affect cooling cost. Our approach, however, distributes workloads such that servers avoid operating at peak utilization; this is because of the increased energy cost of peak utilization present in the power curves of well-power-managed systems. Hence, we expect our approach to be thermal-friendly, albeit not thermal-aware as it does not capture the thermal conditions in the DC. We also consider spatially rotating the active set by adding and dropping servers to it, to decrease localized heat densities. The active set of servers should also be spatially dispersed for the same reason.

6

Conclusions and Future Work

The problem of reducing energy consumed by DCs is a chief concern as DCs grow in size and complexity. Previous research has typically addressed this problem through power management techniques that capitalize on maximizing both spatial and temporal idleness. In this paper we investigate the contributing factors to the energy expenditure in the DC, and analyze power behavior corresponding to different levels of workloads. We illustrate, through using power profiles, the reason MI was believed to offer best workload distribution and how that is changing in future systems due to pursuing aggressive dynamic power management. We discuss the differences between minimizing energy and minimizing average power and show they are not

synonymous. We use a throughput metric to characterize both DC and server utilization. Subsequently, we employ the corresponding power curves to design a new cost-based workload placement strategy that seeks to minimize power usage at each DC utilization level. As the throughput is kept constant, and not scaled or throttled down, we ensure that the minimum power placement corresponds to a minimum energy placement. We evaluate our new strategy through a profiling simulator that we have developed, and study the power behavior of various DC configurations employing three different server types. We validate our strategy through simulation and show that it finds a minimum-power workload distribution that does not necessarily coincide with MI. We then expand the result to show that that there is a sweet spot for DC utilization where the power savings are maximized. Afterwards, we show how to scale down the problems by arranging the DC into a hierarchy of nodes with a given branching factor at each level. We expect that result to be very useful when applied to large scale DCs, as well as, to heterogeneous configurations. We then refine the framework introduced by introducing the concept of a rotating, spatiallydispersed active set of servers. Although this approach is not thermally-aware, we believe it is thermal-friendly as it avoids the increased localized heat loads. Heterogeneity in both server type and workload type is a natural expansion of the approach presented in this paper, and we plan to tackle in future work. We propose to address heterogeneity by aggregating similar servers/workloads into groups in a hierarchy of top-level heterogeneous nodes. Another good area for future investigation is incorporating cost metrics corresponding to energy consumption in other DC systems, such as: networking, workload migration, heat generation, heat recirculation and thermal layout. We believe, however, that the cost contribution of both heat recirculation and thermal layout can be eliminated by careful DC design. Modern DCs already exhibit this feature by tunneling hot or cold air in isolation, as well as a host of other techniques..

References 1.

2.

3. 4.

5.

AMD Opteron Processor Power and Thermal Data sheet, publication #30417, http://www.amd.com/us-en/Processors/TechnicalResources/0,,30_182_739_ 9003,00.html Armbrust, M., Fox, A., Griffith, R., Joseph, A., Katz, R., Konwinski, A., et al.: Above the Clouds: A Berkeley View of Cloud Computing. University of California, Berkeley, Technical. Report (2009) Barroso, L. A.: The Price of Performance. ACM Queue , vol. 3, pp. 48-53. ACM, New York (2005) Barroso, L., & Holzle, U.: The case for energy-proportional computing. Computer, IEEE Computer Society, vol. 40, no. 33. IEEE Press, New York (2007) Barroso, L., Dean, J., & Holzle, U.: Web search for a planet: The Google cluster architecture. IEEE micro , vol. 23, pp. 22-28. IEEE Press, New York (2003)

6.

7.

8.

9.

10.

11.

12.

13. 14.

15.

16.

17. 18.

19.

20.

Belady, C., & Malone, C.: Metrics and an Infrastructure Model to Evaluate Data Center Efficiency. In proceedings of the ASME InterPACK, Vancouver, BC. (2007) Bohrer, P., Elnozahy, E., Keller, T., Kistler, M., Lefurgy, C., McDowell, C., et al.: The case for power management in web servers. Power Aware Computing, pp. 261. Kluwer Academic Publishers (2002) Buyya, R., Yeo, C., Venugopal, S., Ltd, M., & Melbourne, A.: Market-oriented cloud computing: Vision, hype, and reality for delivering it services as computing utilities. In proceedings of the 10th IEEE International Conference on High Performance Computing and Communications (HPCC). IEEE CS Press, Los Alamitos, CA, USA (2008) Carrera, E., Pinheiro, E., & Bianchini, R.: Conserving disk energy in network servers. In proceedings of the 17th annual international conference on Supercomputing, pp. 86-97. ACM New York, NY, USA (2003) Chase, J., Anderson, D., Thakar, P., Vahdat, A., & Doyle, R.: Managing energy and server resources in hosting centers. In proceedings of the eighteenth ACM symposium on Operating systems principles ,pp. 103-116. ACM New York, NY, USA (2001) Dhiman, G., & Rosing, T.: Dynamic power management using machine learning. In proceedings of the 2006 IEEE/ACM international conference on Computeraided design, pp. 747-754. ACM New York, NY, USA (2006). Elnozahy, E., Kistler, M., & Rajamony, R.: Energy-efficient server clusters. Lecture Notes in Computer Science , Vol. 2325/2003, pp. 179-196. Spriger, Hiedelberg (2003) Envorinmental Protection Agency (EPA): Report to Congress on Server and Data Center Energy Efficiency Public Law 109-431 (2007) Fan, X., Weber, W., & Barroso, L.: Power provisioning for a warehouse-sized computer. In Proceedings of the 34th annual international symposium on Computer architecture, pp. 13-23. ACM, New York, NY, USA (2007) Flinn, J., & Satyanarayanan, M.: Managing battery lifetime with energy-aware adaptation. ACM Transactions on Computer Systems (TOCS) , vol. 22, pp. 137179. ACM, New York (2004) Ghemawat, S., & Dean, J.: MapReduce: Simplified Data Processing on Large Clusters. In Proceedings of the Sixth Symposium on Operating System Design and Implementation vol. 6, p. 10. (2004) Gray, L., Kumar, A., & Li, H.: Characterization of SPECpower_ssj2008 benchmark. SPEC Benchmark Workshop. (2008) Gunaratne, C., Christensen, K., & Nordman, B.: Managing energy consumption costs in desktop PCs and LAN switches with proxying, split TCP connections, and scaling of link speed. Internationa Journal on Network Management , vol. 15, p.p 297-310. (2005) Gupta, M., Grover, S., & Singh, S.: A feasibility study for power management in LAN switches. In proceedings of the 12th IEEE International Conference on Network Protocols (ICNP), pp. 361-371. (2004) HP, Intel, Microsoft, Phoenix, Toshiba: Advanced configuration and power interface specification. ACPI Specification Document, Revision 3. (2004)

21. Hwang, C., & Allen, C.: A predictive system shutdown method for energy saving of event-driven computation. ACM Transactions on Design Automation of Electronic Systems (TODAES) ,vol. 5, pp. 226-241. (2000) 22. Intel Xeon Processor 7400 Series datasheet, http://www.intel.com/products/processor/xeon7000/documentation.htm?iid=prod ucts_xeon7000+tab_techdocs. (2008) 23. Keeton, K., Kelly, T., Merchant, A., Santos, C., Wiener, J., Zhu, X., et al.: Don’t settle for less than the best: use optimization to make decisions. In proceedings of the 11th USENIX workshop on Hot topics in Operating Systems. USENIX Association Berkeley, CA, USA (2007) 24. Koomey, J.: Estimating total power consumption by servers in the US and the world. Final report. (2007) 25. Meisner, D., Gold, B., & Wenisch, T.: PowerNap: Eliminating Server Idle Power. SIGPLAN Notices, vol. 44, pp. 205-216. ACM, New York, USA. (2008) 26. Mishra, N., Chebrolu, K., Raman, B., & Pathak, A.: Wake-on-WLAN. In proceedings of the 15th international conference on World Wide Web, pp. 761769. ACM New York, NY, USA (2006) 27. Mogul, J., Mudigonda, J., Binkert, N., Ranganathan, P., & Talwar, V.: Using Asymmetric Single-ISA CMPs to Save Energy on Operating Systems. IEEE Micro , vol. 28, pp. 26-41. IEEE Press (2008) 28. Moore, J., Chase, J., Farkas, K., & Ranganathan, P.: Data center workload monitoring, analysis, and emulation. In proceedings of Eighth Workshop on Computer Architecture Evaluation using Commercial Workloads (2005) 29. Moore, J., Chase, J., Ranganathan, P., & Sharma, R.: Making scheduling “cool”: Temperature-aware resource assignment in data centers. In proceedings of Proceedings of the USENIX Annual Technical Conference (2005) 30. Nathuji, R., & Schwan, K.: Reducing system level power consumption for mobile and embedded platforms. In proceedings of the International Conference on Architecture of Computing Systems (ARCS) (2005) 31. Nathuji, R., & Schwan, K.: Virtualpower: Coordinated power management in virtualized enterprise systems. ACM SIGOPS Operating Systems Review, vol. 21, no. 6, pp. 265-278..ACM New York, NY, USA. (2007) 32. Patel, C., Sharma, R., Bash, C., & Graupner, S.: Energy Aware Grid: Global Workload Placement based on Energy Efficiency. In proceesings of IMECE ( 2003) 33. Pinheiro, E., Bianchini, R., Carrera, E., & Heath, T.: Load balancing and unbalancing for power and performance in cluster-based systems. In processing of Workshop on Compilers and Operating Systems for Low Power, vol. 180, pp. 182-195 (2001) 34. Qiu, Q., & Pedram, M.: Dynamic power management based on continuous-time Markov decision processes. In proceedings of the 36th ACM/IEEE conference on Design automation, pp. 555-561. ACM New York, NY, USA (1999) 35. Raghavendra, R., Ranganathan, P., Talwar, V., Wang, Z., & Zhu, X.: No" power" struggles: coordinated multi-level power management for the data center. In proceedings of ASLOPS (2008) 36. Rajamani, K., Lefurgy, C., Res, I., & Austin, T.: On evaluating requestdistribution schemes for saving energy in server clusters. In proceedings of the

37.

38.

39.

40.

41.

42.

43.

44. 45.

46.

47. 48.

49.

IEEE International Symposium on Performance Analysis of Systems and Software, pp. 111-122. (2003) Ramakrishnan, L., Irwin, D., Grit, L., Yumerefendi, A., Iamnitchi, A., & Chase, J.: Toward a doctrine of containment: grid hosting with adaptive resource control. In proceedings of the ACM/IEEE conference on Supercomputing. ACM New York, NY, USA (2006) Ranganathan, P., Leech, P., Irwin, D., & Chase, J.: Ensemble-level power management for dense blade servers. In proceedings of the 33rd International Symposium on Computer Architecture (ISCA), pp. 66-77. IEEE Computer Society Washington, DC, USA (2006) Ranger, C., Raghuraman, R., Penmetsa, A., Bradski, G., & Kozyrakis, C.: Evaluating MapReduce for multi-core and multiprocessor systems. In proceedings of IEEE 13th International Symposium on High Performance Computer Architecture, pp. 13-24. IEEE Computer Society Washington, DC, USA (2007) Ren, Z., Krogh, B., & Marculescu, R.: Hierarchical adaptive dynamic power management. IEEE Transactions on Computers , vol. 54, pp. 409-420. IEEE Press (2005) Rivoire, S., Shah, M., Ranganathan, P., & Kozyrakis, C.: Joulesort: a balanced energy-efficiency benchmark. In proceedings of the 2007 ACM SIGMOD international conference on Management of data, pp. 365-376. ACM New York, NY, USA (2007) Sharma, R., Bash, C., Patel, C., Friedrich, R., & Chase, J.: Balance of power: Dynamic thermal management for internet data centers. IEEE Internet Computing ,vol. 9, pp. 42-49. IEEE Press (2005) Sharma, V., Thomas, A., Abdelzaher, T., Skadron, K., & Lu, Z.: Power-aware QoS management in web servers. In proceedings of 24th IEEE Real-Time Systems Symposium, pp. 63-72. IEEE Press (2003) Siddha, S., Pallipadi, V., & De, A. V.: Getting maximum mileage out of tickless. In proceedings of Linux Synposium, pp. 201-208. (2007). Simunic, T., Benini, L., Glynn, P., & Micheli, G. D.: Event-driven power management. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems , vol. 20, pp. 840-857. IEEE Press (2001) Srivastava, M., Chandrakasan, A., & Brodersen, R.: Predictive system shutdown and other architectural techniques for energy efficient programmable computation. IEEE Transactions on Very Large Scale Integration (VLSI) Systems ,vol. 4, pp. 42-55. IEEE Press (1996) Sullivan, R.: Alternating cold and hot aisles provides more reliable cooling for server farms. Uptime Institute (2000) Tang, Q., Gupta, S., & Varsamopoulos, G.: Energy-Efficient Thermal-Aware Task Scheduling for Homogeneous High-Performance Computing Data Centers: A Cyber-Physical Approach. IEEE Transactions on Parallel and Distributed Systems , vol. 19, pp. 1458-1472. IEEE Press (2008) Tolia, N., Wang, Z., Marwah, M., Bash, C., Ranganathan, P., & Zhu, X.: Delivering Energy Proportionality with Non Energy-Proportional Systems-Optimizing the Ensemble. In proceedings of the 1st Workshop on Power Aware Computing and Systems (HotPower) (2008)

50. VM Ware, http://www.vmware.com 51. Weissel, A., & Bellosa, F.: Dynamic thermal management for distributed systems. In proceedings of the First Workshop on Temperature-Aware Computer Systems (2004) 52. Xen, http://www.xen.org/ 53. Yavatkar, R., & Krishnamurthy, L.: Method and apparatus for managing energy usage of processors while executing protocol state machines. US Patent App. 10/056,160 (2002)

Suggest Documents