Autonomic Power and Performance Management for ... - CiteSeerX

1 downloads 0 Views 173KB Size Report
grants from Intel Corporation ISTG R&D Council, NSF/NGS Contract. 0305427 and .... definition at each hierarchy –going from device, to server to server-cluster.
Autonomic Power and Performance Management for Computing Systems Bithika Khargharia1, Salim Hariri1 and Mazin S. Yousif 2 University of Arizona, Tucson, AZ, email: {bithika_k, hariri}@ece.arizona.edu, 2 Intel Corporation, Hillsboro, OR, email: [email protected]

1



Abstract—With the increased complexity of platforms, the growing demand of applications and data centers’ servers sprawl, power consumption is reaching unsustainable limits. The need to improved power management is becoming essential for many reasons including reduced power consumption & cooling, improved density, reliability & compliance with environmental standards. This paper presents a theoretical framework and methodology for autonomic power and performance management in e-business data centers. We optimize for power and performance (performance/watt) at each level of the hierarchy while maintaining scalability. We adopt mathematically-rigorous optimization approach minimizing power while meeting performance constraints. Our experimental results show around 72% savings in power as compared to static power management techniques and 69.8% additional savings with both global and local optimizations. Index Terms—Autonomic Management, Power, Optimization

I. INTRODUCTION

T

HE problem of excessive power consumption in data centers cannot be overstated. Newly planned high performance data centers can consume 10% of the current electricity generating capacity of California [1]. Furthermore, the Total Cost of Ownership (TCO) of these centers could reach $4B dollars per year when running 24/7 [1] and 63% of this cost can be contributed to power, cooling equipment and electricity [2]. High energy consumption also translates to excessive heat dissipation, which in turn, increases cooling costs and causes servers to become more prone to failure [3]. Current research related to power management addresses two distinct domains – battery-operated portable devices and servers. In the portable devices domain, the battery life is significantly lengthened with small savings in power while in the server domain the power savings need to be much higher to be of any significant value. Nevertheless, successful Dynamic Power Management (DPM) techniques in portable devices have also found their way into the server domain, but perhaps in a different form. DPM is a system-level low power design technique aimed at controlling performance and power of digital circuits and systems, by exploiting the idleness of the Manuscript received January 30, 2006. This work is supported in part by grants from Intel Corporation ISTG R&D Council, NSF/NGS Contract 0305427 and NSF/SEI(EAR) Contract 0431079.

components [4]. In the server domain, the nature of workload is different since multiple users send requests simultaneously and performance is governed by Service Level Agreements (SLA). Research in this area has successfully exploited these unique characteristics while applying DPM techniques (such as Dynamic Voltage Scaling for processors) for server power management. We take this work to the domain of high performance distributed computing centers such as e-business data centers of today. The primary contribution of this paper is a theoretical framework and general methodology for autonomic power and performance management of high performance distributed computing centers. The framework consists of the Managed System (MS) which is the high-performance data center and Autonomic Power and Performance Manager (APPM). Here, we exploit the inherent hierarchy of networked clusters of heterogeneous data center components all the way to networked devices within a server such as processor, memory, cache, disk etc. At any hierarchy, the MS is modeled as a set of states and transitions. Each state is determined by a fixed power consumption and performance values. A transition takes the MS from one state to another. APPM determines the optimal state for the MS where the power consumption is minimal while meeting all performance constraints. The decision of the APPM is based upon the solution of a combinatorial optimization problem for power minimization with respect to performance constraints. At each hierarchy, APPM solves this optimization problem with respect to the MS. This leads to coarse-grained global optimization solutions that guide the local optimization to search for more refined local solutions. While, most power related research in the server domain centers around processor power management, we adopt a more holistic approach. We investigate the possibility of power savings by exploiting different components (processor, network, memory and I/O) that constitute a whole system and the interactions between these components. We also investigate the scalability of solutions and performance of our framework by adopting a case-study from [5] and applying it to our framework for optimizing the power and performance of a memory system. Our results show the following: 1. Autonomic power and performance management for memory yields about 72% savings in power as

opposed to static power management policies (refer Section VI-A). 2.

The algorithm adapts very well to changes in workload dictated by a random workload distributions (refer Section VI-B).

3.

A hierarchical approach to power and performance management yields additional power savings of about 69.8%. This lends itself very well to the server cluster and data center domain in terms of the scalability of the solutions (refer Section VI-C).

4.

The simulations allow us to experiment with hypothetical hardware features such as enhanced or new power saving capabilities that can be exploited for additional power savings without increasing the computational complexity (refer Section VI-B).

The rest of the paper is organized as follows. In Section II we discuss related research, in Section III we introduce the general framework followed by the specific power and performance management framework in Section IV. In Section V, we discuss the case-study of the memory system power/performance management. Section VI presents some experimental results. We conclude in Section VII.

II. BACKGROUND AND RELATED WORK We can categorize power management techniques into three broad classes as discussed in [6] 1) Hardware based power management 2) Turning-off unused device/system capacity and 3) QoS and energy tradeoff. A. Hardware based power management In this Section we talk about power management techniques that can be applied to specific hardware technology. For example, processors with DVS (Dynamic Voltage Scaling) capability such as Transmeta’s Crusoe processor [7] and Intel’s Pentiums with SpeedStep [8] allow varying the processor voltage in proportion with its frequency. By varying the processor voltage and frequency, it is possible to obtain a quadratic reduction in power consumption [9].This is a type of Dynamic Power Management (DPM) technique that reduces power dissipation by slowing or shutting-down components that are idle or underutilized. Frequency-scaling, clock throttling & DVS are three DPM techniques available on existing processors. Another example is the Direct Rambus DRAM (RDRAM) [10] technology that delivers high bandwidth (1.6 GB per sec per device) using a narrow bus topology operating at a high clock-rate. As a result each RDRAM chip in a memory system can be set to an appropriate power state independently. [5] uses multiple power modes of RDRAM and dynamically turns off memory chips with poweraware page allocation in operating systems. These studies suggest that future memory should support power management to turn on/off part of the memory. B. Turning off unused device/system capacity

A significant amount of research has been dedicated towards reducing the energy cost of maintaining surplus capacity. This is equally true for power management in the realm of portable devices as well as servers and server clusters. DPM (Dynamic Power Management) is one such very popular technique that reduces power dissipation by slowing or shutting-down components that are idle or underutilized. As mentioned in the previous Section, some DPM techniques require hardware support such as DVS enabled processor. However, there are other DPM techniques that can work for any kind of hardware. In the battery-operated domain, researchers have targeted the power management of an entire range of devices from the processor, to cache, memory, disk, NIC etc based on DPM techniques. DPM schemes in the storage area cover three levels of storage cache, memory and disk. The relationship between memory and disk is established by the fact that the smaller the memory size, the higher the page misses from which follows the higher disk access. [11] used this relationship to achieve power savings by proactively changing disk IO by expanding or contracting the size of the memory depending on the workload. An interesting counter to their work is presented in [12] who showed that lower miss rate do not necessarily save disk energy. [13] address NIC energy consumption while achieving high throughput. In addition there are a myriad of heuristic-based approaches [14][15][16] that determine the idle time that a device should be in before it is switched to a low power mode. The DPM technique has also come to be widely and successfully applied to the server domain for power management. [9] has proved that the maximum energy savings can be gained from CPU power management. This has led to a number of techniques for power management that use DVS for the processor power management in conjunction with other power management techniques. Another technique that falls into this category of power management is to turn whole devices on/off dynamically in order to reduce the surplus capacity and hence the power consumption. This technique has been applied both in the battery-operated domain as well as the server domain. [17] proposes five policies for cluster-wide power management in server farms. The policies employ various combinations of DVS and node Vary-On/Vary-Off (VOVO) to reduce the aggregate power consumption of a server cluster during periods of reduced workload. [18] focuses on processor power-management and presents a request-batching scheme where jobs are forwarded to the processor in batches such that the response time constraint is met and the processor can remain in a lower power state for a longer period of time. [19] concentrates the workload on a limited number of servers in the cluster such that the rest of the servers can remain switched-off. C. QoS and Energy trade-offs Research that has focused on QoS trade-offs for energy savings has studied the impact of power savings on performance. They have investigated if additional power savings can be obtained at the cost of QoS trade-off within acceptable limits. This has given rise to the application of

proactive mathematically rigorous optimization techniques as well as reactive control theoretic techniques being applied for power management while maintaining performance both in the battery-operated world as well as the server world. For example [20][21][22][23] have developed a myriad of stochastic optimization techniques for portable devices. In the server domain, [24] has presented three online approaches for server provisioning and DVS control for multiple applications – namely a predictive stochastic queuing technique, reactive feedback control technique and another hybrid technique where they use predictive information for server provisioning and feedback control for DVS. [25] has studied the impact of reducing power consumption of large server installations subject to QoS constraints. They developed algorithms for DVS in QoS-enabled web servers to minimize energy consumption subject to service delay constraints. They use a utilization bound for schedulability of a-periodic tasks [26] to maintain the timeliness of processed jobs while conserving power. [27] investigates autonomic power control policies for internet servers and data centers. They use both the system load and thermal status to vary the utilized processing resources to achieve acceptable delay and power performance. They use Dynamic Programming to solve their optimization problem. [28] addresses base power consumption for web servers by using a power-shifting technique that dynamically distributes power among components using workload sensitive polices. Most of the work related to power consumption of servers has either focused on the processor or used heuristics to address base power consumption in server clusters such as the work of [19].This motivates us to adopt a holistic approach for system level power management where we exploit the interactions and dependencies between different devices that constitute a whole computing system. We apply the technique at different hierarchies of a high-performance computing system similar to data centers where systems assume different definition at each hierarchy –going from device, to server to server-cluster. This makes our approach more comprehensive as compared to [29]. We adopt a mathematically rigorous optimization approach for determining optimal power states that a system can be in under the performance constraints. The closest to combining device power models to build a whole system has been presented in [30]. We use modeling and simulation initially for development and testing of our framework. This enables us to develop power management techniques reasonably unlimited by current hardware technologies landing us with important insights in terms of technology enhancements in existing hardware to exploit greater savings in power.

monitoring, analysis, planning and execution as shown in Figure1. In each phase, the AM may use previously stored knowledge to aid this decision making process. The AM leverages the tools and services provided by the Autonomia Environment developed as part of earlier work [32]. Autonomia has been successfully used to proactively manage the performance of large scale applications encountered in science and engineering and also to achieve self-protection of networks from a wide range of network attacks. Autonomic Manager Sensor Port

Analysis

System Monitoring

Knowledge

Planning

Execution

Effector Port

Fig 1. Autonomic management system.

IV. AMF: HIERARCHICAL POWER AND PERFORMANCE MANAGEMENT

The framework essentially consists of the Managed System (MS) which is the high-performance data center and Autonomic Power and Performance Manager (APPM). The MS can be logically organized into three distinguishable hierarchies (refer Figure 2) i) cluster level, where the whole data center is modeled as collection of networked clusters ii) server level, where each cluster is modeled as a collection of networked servers and iii) device level, where each server is modeled as a collection of networked devices. We model MS (at any hierarchy) as a set of states and transitions. Each state is associated with power consumption and performance values. A transition takes MS from one state to another. It is the task of the APPM to enable MS to transition to a state where power consumption is minimal without violating any performance constraints. Our autonomic management approach relies on MS states at each level of the hierarchy to proactively manage power consumption and also maintain the QoS requirements. A. Autonomic Power and Performance Manager (APPM)

III. AUTONOMIC MANAGEMENT FRAMEWORK (AMF)

Statistics of power consumption in data centers shows that data centers are always over-provisioned to meet peak loads, suggesting considerable possibilities for power saving. Power and performance management algorithms need to be adaptive to workload changes, flexible, and proactive. APPM is responsible for adaptively maintaining MS to minimize power consumption without violating any performance constraints. We propose to formulate this as a constrained optimization problem.

We define an Autonomic Management System as a system augmented with intelligence to manage and maintain itself under changing circumstances impacted by the internal or external environment. In previous work we have laid the foundation for an Autonomic Computing System [31]. The Autonomic Manager (AM) operates in four phases -

1) Policy Optimization for Power and Performance Management: The core objective of APPM is to solve a constrained optimization problem described above. APPM senses the current state of the workload to predict the workload for the next time window. It uses this knowledge to determine the optimal power state for MS to service the

expected workload for the next time window without sacrificing any performance. 2) Hierarchical Power and Performance Management: We solve the optimization problem at each hierarchy of a high performance data center. Autonomic Manager Cluster

Autonomic Manager

.. . … … …

Autonomic Manager

Server1 Autonomic Manager Device1

.. . …

Server1 Autonomic Manager

.. . …

Devicen

Fig 2. Hierarchical Power Management

The solutions at the upper levels of the hierarchy are more coarse-grained taking into consideration the global view of the system such as a whole cluster. As we go down the hierarchy solutions become more fine-grained taking into consideration a narrow local view of the system such as a server within a cluster. This hierarchical solution approach necessitates introducing an APPM at each hierarchy within the framework. - Cluster-level APPM: APPM at the cluster level monitors the workload and the state of MS (cluster) through its sensor ports. It uses this information to analyze and plan the optimal state for the entire cluster. APPM executes the decision by utilizing the effecter ports of MS. - Server-level APPM: APPM at the server level functions exactly as the cluster level APPM, but with narrower visibility. Each server level APPM works individually to refine this global decision from the cluster-level APPM. Essentially, each server level APPM solves an optimization problem to determine the local optimal states for its MS (server). - Device-level APPM: This APPM is responsible for power/performance management at the lowest level in the hierarchy comprising of individual devices such as the processor, memory, network, disk etc within a server. It functions exactly as the server level APPM. V. MODELING AND SIMULATION APPROACH In this Section we discuss a case study of power and performance management of a memory system applied to our framework. The objective of this exercise is to analyze the adaptability of the framework to the data center domain by studying its performance based on existing techniques for power management. A. Discrete Event System (DEVS) Modeling and Simulation Framework We model and simulate this case study using the DEVS modeling and simulation framework [33]. DEVS provides a sound modeling and simulation framework and is derived from mathematical dynamical system theory. It supports

hierarchical, modular composition and reuse and can express discrete time, continuous, and hybrid models. DEVS allows for the construction of hierarchical simulation models composed of atomic and coupled models. Each atomic model is assigned to an atomic simulator, and atomic models as components within coupled models are assigned to a coupled simulator. Coupled models are assigned to coordinators, while coupled models as components within larger models are assigned to coupled-coordinator simulators. The simulators keep track of the events and execute the simulation modeldefined methods based on the events list. Each component model of this case-study is formulated as a DEVS Atomic Model. A DEVS Atomic Model is a finite state machine and is defined as [33],

M Atomic  X , S , Y ,  int ,  ext ,  , t a  where, X is the set of inputs accepted by the model S is the set of states of the model Y is the output set generated by the model  int : S  S , captures internal state transitions for the model

 ext : Q  X  S , captures state transitions for the model in response to external inputs  : S  Y , is the output function that maps a state to an output from the output set. t a : is the time–advance function for remaining in a state before an internal state transition occurs Q  {( s, e) | s  S ,0  e  t a ( s )} is the total state set, e is the elapsed time since last transition. The internal transition function

 int dictates the system’s new

state when no events occur since the last transition. The external transition function  ext dictates the system’s new state when an external event occurs and this state is determined by the input, x, the current state s and how long the system has been in this state e. B. Autonomic Power and Performance Management of a Memory System In this Section we apply our Autonomic Management Framework to collectively manage the power and performance of a power-aware memory system consisting of four DRAM chips. In what follows, we first discuss the power and performance management for the memory system (consisting of four RDRAM modules) and then discuss how the global (system level) optimization solutions are further refined at the local level for power and performance management of a single memory module. This case-study is motivated by the work performed by [5]. The power-aware techniques are based on the technology of Direct Rambus DRAM (RDRAM) [10].This technology delivers high bandwidth (1.6 GB per sec per device) using a narrow bus topology operating at a high clock-rate.

Active 1.0 x mW

1.0 x ns

100 x ns

Nap 0.1 x mW

PwrDown 0.01 x mW 0.1 x ns Standby 0.6 x mW

Fig 3. RDRAM (local) power states [5] Selected Queues: LQ1, LQ2 (R1:A, R2:A, R3:P, R4:P)

Selected Queues: LQ1 (R1:A, R2:P, R3:P, R4:P)

A2

A0

A1

A3

(R1:P, R2:P, R3:P, R4:P)

A4

Global APPM that forwards jobs from the GQ to the LQ of the individual RDRAM chips such that the entire system can be in the optimal state determined by the Global APPM. An RDRAM can process one job at a time. After a job is processed, it sends an acknowledgement to the respective LQ and the GQ. This causes the LQ and the GQ to delete the job from their internal queues. This frees up slots in the queues to accept new jobs. The GQ can accept jobs generated by the SR only if it has free slots in its internal queue. Otherwise the job is considered to be lost. This way, the GQ measures two important global performance parameters for the system – the request loss and the response time for each job. The response time for a job is determined by the amount of time that the job spends in the GQ before it is accepted by an RDRAM for processing. stateGqueue

Selected Queues: LQ1, LQ2, LQ3 (R1:A, R2:A, R3:A, R4:P)

decision, d  d PM RDRAMSYSTEM Power and Performance managed RDRAM1

(R1:A, R2:A, R3:A, R4:A)

Local APPM

Selected Queues: LQ1, LQ2, LQ3, LQ4

Fig 4. RDRAM System (global) power states

As a result each RDRAM chip can be set to an appropriate power state independently. Each RDRAM chip supports several different power modes - active, standby, nap and power-down in order of decreasing power consumption but increasing access time. The power management strategies adopted here makes the assumption that data has been replicated across all the memory modules. This avoids the constraint on turning on/off RDRAM modules because each module can service all data requests. This helps us simplify the model to elucidate our approach better. These power states and transitions are shown in Figure 3. 1) Global Optimization (System Level): Now let us revisit our target system that has four RDRAM chips each of which can be individually switched on and off as suggested in [5]. Figure 4 shows the power states and transitions for a system with 4 RDRAM chips where the state transition diagram for a single RDRAM chip is given by Figure 3. The set of all possible states for 4 RDRAM chips, each with 4 local states is 16. However, in deriving the state diagram shown in Figure 4 we only consider two states for the RDRAM chip, active (A) and power down (P). Thus the search space for an optimal state at this global (system) level is reduced from 16 to 5 thus leading to a significant reduction in the search space. Later we will demonstrate that the local (single RDRAM) level optimizations can provide additional power savings by searching for optimal states only within the global states. Figure 5 shows the model for the power managed RDRAM system. Jobs submitted by the Service Requester (SR) are stored in the Global Queue (GQ). The Local Queue (LQ) stores jobs to be processed by the specific RDRAM. The Job Flow Controller (JFC) is the execution engine of the

Global APPM Job Flow Controller

Global Queue

Power and Performance managed RDRAM2

Power and Performance managed RDRAM2

Power and Performance managed RDRAM2

Local Queue3

Local Queue4

RDRAM

jobIn Service Requester

Local Queue1

Local Queue2

Fig 5. Power & performance managed RDRAM system

The Global APPM searches for an optimal state that the entire system should be in such that the global power consumption is minimal while meeting the performance constraints. It achieves this objective by solving the optimization equation formulated below with respect to the predicted request loss and predicted response time constraints for the next n cycles where n is a configurable parameter. For example, let us assume that the current state of the system is A1 (refer Figure 4) where R1 is in state active (A) and R2, R3, R4 are in the power down (P) state. Now, the Global APPM predicts the response time and request loss for the next n cycles and based on this prediction, it determines that the optimal power state is A2 where R1 and R2 are in state active (A) and R3 and R4 are in state power down (P). The JFC implements this global state, by forwarding jobs from the GQ to the LQ of R1 and R2. It does not forward any jobs to the LQs’ of R3 and R4. The JFC transfers jobs from the GQ to a single LQ in a FIFO order that follows a sequential first-touch policy [5] that allocates pages in the order that they are first accessed, filling an entire RDRAM chip before moving on to the next. Thus, R3 and R4 will remain in the power down (P) state because their LQs are empty. - DEVS Atomic Model: Global APPM (GAPPM): The GAPPM is formally expressed as a DEVS atomic model as follows:

M G  APPM  X , S , Y ,  int ,  ext ,  , t a 

S = {compute, idle} X= {{ S RDRAM  SYSTEM }, { S Gqueue }} where, S RDRAM

 SYSTEM

 { A1, A 2 , A 3, A 4 , A 5}

SGqueue = {idle, numEmpt }, where numEmpty is the number of free slots in the queue are the states of the Global Queue Y={reqStateGqueue,reqStateRDRAM-SYSTEM, {d | d  d PM RDRAM  SYSTEM } },where d PM RDRAM  SYSTEM  {do_A1, do_A2, do_A3, do_A4, do_A5}  ext : Determines the optimal global state for the RDRAM system to transition to after every n simulation cycles. The optimal state can be determined by solving an optimization equation, POSYSTEM (given below), heuristics or graph-theoretic approaches.  int : S  S , requests the state of the queue and the RDRAM system after time

t a  n to recompute the optimal

decision t a  n , where n is simulation cycles for which the GAPPM computes the optimal decision - Policy Optimization POSYSTEM: Every n cycles, the GAPPM computes the optimal power and performance state for the system. The optimal decision is based on the current state of the system and the predicted global response time and request loss parameters for the next n cycles. Assuming the system is in global state Ai, the GAPPM picks the optimal state for the system such that the sum of the transition cost c(Aj, Ai) from state Ai to Aj and the power consumption in the new state p(Aj) is minimum subject to two performance constraints – the response time r(Aj) and the request loss b(Aj) in state Aj. The transition cost c(Aj, Ai) involves power consumption and a time required to make the transition. The equation for c(Aj, Ai) follows from the fact that we consider the system has reached a global state only after all the local atomic models (RDRAMs) transition to the desired local states.

c( Aj , Ai ) 

Max

k :1to 4, s j A j , si Ai

[c(s j , si ) |Mk ]

where, j: 1 to 5 for all the five global states p(Aj): power consumption in state Aj. p(Aj) is given below

p ( Aj ) 



p ( s i | Mi )

i :1 to 4 , s i  Aj

where, i, j: 1 to 5 for all the five global states We formulate the optimization problem as follows:

POSYSTEM:

min  d

 [(c(A , A )  p ( A )) x( A , A )] j

Ai

i

j

i

j

Aj

such that,

 r ( A ) x( Ai , A j ) R G M j

Aj

 b( A ) x( Ai , A j )  B G M j

Aj

 x( A , A )  1

A j S RDRAM  SYSTM

i

j

{1, if decision d is taken for x ( Ai , A j )  transition from state Ai to Aj {0, otherwise where, Ai: current state Aj: destination state r(Aj): predicted response time in state Aj b(Aj): predicted request loss in state Aj RGM: threshold value for response time B GM: threshold value for request loss 2) Local Optimization (Component Level): To explain the local optimizations we make a hypothesis here about the 0.01 x ns

Active-A2 0.4 x mW

Active-A1 0.2 x mW

Active-A3 0.6 x mW

0.01 x ns Active-A5 1 x mW 0.01 x ns

0.01 ns

0.01 x ns Active-A4 0.8 x mW

Fig 6. RDRAM sub-states within state ‘Active’

power states of a single RDRAM that is a diversion from what was shown in Figure 4. We assume that the RDRAM supports multiple power states within that active (A) state. This modification is shown in Figure 6.Given the global optimal state A2, the Local APPM for M1 and M2 solve another optimization problem locally (discussed below) to determine the optimal state (from Figure 6) within the active (A) state. Just like their global counterpart, the Local APPMs use the predicted response time and predicted request loss parameters in their search for a local optimal solution. The only difference being, in this case these performance parameters are for the LQ instead of the GQ. Thus, the coarse grained global solution is further refined at the local level. The RDRAM currently does not support multiple active states but in Section VI-C, we evaluate additional power savings that can be achieved had the hardware supported these power states. Note that, in both the global and the local case, we can experiment with alternative techniques of when to trigger the Global and Local APPMs to re-compute the optimal state such the computational overhead can be minimized. - DEVS Atomic Model: Local APPM (LAPPM).The LAPPM expressed as a formal DEVS atomic model is shown below.

M APPM  X , S , Y ,  int ,  ext ,  , t a 

r(sj): predicted response time in state sj b(sj): predicted request loss in state sj RM: threshold value for response time BM: threshold value for request loss.

where, S = {sendDecision, idle} X= {{ S

RDRAM

}, { S queue }}

Y={reqStatequeue,reqStateRDRAM, {d | d  d PM RDRAM } },where d PM RDRAM

 {do_Active,

C. Power and Performance Managed Server Cluster In Section V-B we have discussed how we apply our

do_Nap, do_standby, do_pwrdown}

 ext : S  X  S :

RDRAM to transition to after every n simulation cycles The optimal state can be determined by solving an optimization equation, PODEVICE (discussed below), heuristics or graphtheoretic approaches.

 int : S  S

, requests the state of the queue and the

,

RDRAM after time

t a  n to recompute the optimal decision

t a  n , where n is simulation cycles for which the LAPPM computes the optimal decision For power and performance managed RDRAM decision d: {d | d  d PM RDRAM } and d PM RDRAM : {do_active,do_nap,do_s tdby,do_pwrdn} states si: and sj: {active, nap, standby, pwrdown}

 SRDRAM and SRDRAM =

- Policy Optimization PODEVICE. Every n cycles, the LAPPM computes the optimal decision for the RDRAM. This decision is based on the current state of the RDRAM and the predicted state of the queue (in terms of its response time and request loss) for n cycles. Given that the RDRAM is in state si, the LAPPM picks the optimal state for the RDRAM where the sum of the transition cost c(sj, si) from state sj to state si and the power consumption in the new state p(sj) is minimum subject to two performance constraints – the response time r(sj) and the request loss b(sj) in state sj. The transition cost c(sj, si) consists of a power value in addition to the time taken to transition. We formulate the optimization problem as follows: PODEVICE:

min  d

si

such that,

 [c(s , s )  p( s )]x( s , s ) j

i

j

i

j

sj

 r (s ) x(si , s j ) R M j

sj

 b(s ) x(si , s j ) B M j

sj

 x(s , s

s j S RDRAM

i

active

Determines the optimal state for the

j

) 1

x (si, sj) = {1, if decision d is taken for transition from state si to state sj {0, otherwise where, si: current state sj: destination state

sleep

standby

Fig 7. Server power states

approach to achieve autonomic power and performance management for an RDRAM System. Here we apply the same scheme to the domain of server clusters in data centers. It has been shown that the base (idle) power consumption for servers is very high and always over- provisioned for peak load [6]. For example, IBM xSeries 330 consumes peak 210 watts of power at full utilization and 150 watts at 70% load, but it consumes 90-100 watts while it is idle [6] One power optimization strategy is to consolidate the workload on a small set of servers so that the remaining servers in the cluster can be in a low power state as presented in [19]. 1) Global Autonomic Power and Performance Manager (GAPPM): This involves a global power manager responsible for power management of the entire server cluster subject to global performance constraints. We consider a cluster of four servers, where each server can be individually in one of the three states as shown in Figure 7. The states and transitions for the cluster are similar to Figure 4. Each individual server contains a local queue and can be in either state active or sleep. The Job Flow Controller (JFC) is replaced by the Server Switch that is managed by the GAPPM. If we consider the major class of devices in a server as – processor, memory, disk and network interface card, each state of Figure 7 can be decomposed into individual state transition diagrams for each of the four devices classes. In that case, the framework would involve another hierarchy at the device level with models and formulations similar to that discussed for a single RDRAM module in Section V-B. 2) Local Autonomic Power and Performance Manager (LAPPM): This involves a local power manager per server in the server cluster responsible for maintaining the power states per server. Just as was discussed for the RDRAM case study in Section V-B, the server level LAPPM refines the optimal decision received from the GAPPM at the server cluster level.

VI. EXPERIMENTAL RESULTS This Section presents the features and capabilities of our framework for the case study presented in Section V. We have used DEVS [33] modeling and simulation framework for the implementation. A. Static vs Dynamic Optimization polices under uniform workload

The upper graph in Figure 9 shows the behavior of the workload generator and the lower graph shows the power consumption of the system as guided by the GAPPM. The dotted lines show that as the frequency of job generation increases the power consumption increases. However, in response to the second dotted section, the system is maintained at the same low power state. Job Request Frequency: Random

Figure 8 compares the performance of the dynamic optimization policy adopted in our framework compared to static policy where the RDRAM modules are always maintained in the ‘active’ state. This is equivalent to the global state ‘A4’ of Figure 4. The Global APPM (refer Section V-B) computes its decision for the optimal state given the predicted wait time and the predicted request loss values once every 20 simulation cycles, where one simulation cycle corresponds to 1 CPU clock cycle. The values shown in Figure 8 are captured at the end of each decision period. On an average there is a savings of around 72%. However, these results depend on a number of parameters, such as the accuracy of the prediction mechanism, choice of cut-off values RGM and BGM (refer Section V-B) choice of the GAPPM’s time to re-compute its decision. The analysis of optimal values for these choices or accuracy of the prediction scheme is not the focus of this paper and will be addressed in our future research. The impact of changing the value of RGM is shown in Figure 9. As expected, with a maximum wait time of 70 simulation cycles, the RDRAM system transitions from state ‘A5’ to state

time(simulation cycles)

250

2

150

Job Request Frequency

1

100 50 0 1 3

5 7

9 11 13 15 17 19

Power Consumption 2.5

Power Consumption(mW)

4.5 Power Consumption(in mW)

200

2 1.5

Power Consumption

1 0.5 0 1 3 5 7

4 3.5

9 11 13 15 17 19

Fig 9. Changing the constraint values

3 2.5

Static

2

Dynamic

1.5 1 0.5 0 1

2

3

4

5

C. Hierarchical Dynamic Optimization Polices In Section IV-A, we make the LAPPMs inactive. Thus, it is the GAPPM that performs optimization decisions using the five global states describedin Figure 4. The individual RDRAMs may either remain in ‘active’ or ‘powerdown’ states. In this Section we demonstrate that additional power savings

Static vs Dynamic

2.5

‘A1’ about 20 cycles later than with a maximum wait time of 50. This means that the system can be in a low power state for a longer time while maintaining the performance values and thus leading to a greater savings in power. Applying the same concept to the server world means that we choose a realistic value of these performance parameters such that the servers operate at low power states as long as possible while meeting the performance constraints.

power consumption (mW)

Fig 8. Static vs Dynamic Policies 2

1.5

maxWaitTime:70 maxWaitTime:40 maxWaitTime:50

1

0.5

0

B. Static vs Dynamic Optimization polices under random workloads

1

18 35 52 69 86 103 120 137 154 time(sim cycles)

Fig 10. Response to random workload

can be obtained by performing local optimizations together with global optimizations. We consider five hypothetical memory banks within one RDRAM module. Thus, we subdivide the ‘active’ state of an RDRAM module into five sub states, corresponding to the number of banks ON at any given time. When the GAPPM determines the RDRAM to be ‘active’, the RDRAM maybe in one of the five possible substates within the ‘active’ state. The sub-states are shown in Figure 6. The power consumption of each bank is computed equal to active state power consumption divided by the number of banks. In Figure 11, the GAPPM optimization determines state ‘A1’ as optimal, where one RDRAM is in the ‘active’ state. The power consumption in that state is uniformly 1.0 mW (refer Figure 4).The LAPPM, refines this decision and determines a local optimal sub-state with the ‘active’ state among the five possible states Active-A1, ActiveA2, Active-A3, Active-A4, Active-A5. This leads to an average additional savings of 69.8%

Power Consumption(mW)

1.2 1 0.8

Local Power Consumption

0.6

Global Power Budget

0.4 0.2 0 1

2

3

4

5

Global budget vs Local Power Consumption Fig. 11. Global vs local power consumption

These results demonstrate a number of interesting possibilities: i) The utility of hardware features (such as supporting power states shown in Figure 6) for exploiting greater power savings while maintaining performance. ii) The hierarchical approach enables the GAPPM to optimize its solutions based on a smaller number of the local states (‘active’ and ‘pwrdown’ in this case) among eight possible local states. The local optimization searches for an optimal state within the bounds of the global state. Thus the hierarchical approach reduces the search space for both global and local optimization. iii) These results are interesting in terms of applying the framework for server cluster power management. The hierarchical approach lends itself well to handle the state explosion problem in the search spaces for global and local when performing the optimizations.

VII. CONCLUSION In this paper, we presented a theoretical framework to optimize power and performance at runtime for e-business data

centers. We have presented a detailed case-study of how to apply our approach to manage power and performance of a multi-chip memory system followed by a high performance server cluster. Our experimental results show around 72% savings in power with our approach as compared to static power management techniques and 69.8% additional savings with both global and local optimizations. We are currently performing comprehensive modeling and analysis of large scale e-business data centers. We are also analyzing the comparative performance of stochastic, predictive and heuristic techniques on power and performance management applied to the data center domain. We are investigating the comparative utility of these approaches in terms of runtime complexity and overhead. VIII. REFERENCES [1]ftp://www.apcmedia.com/salestools/CMRP5T9PQG_R2_EN.pdf [2] http://www.cs.rutgers.edu/~ricardob/power.html [3] H. Huang, C. Lefurgy, K. Rajamani, T. Keller, E. Van Hensbergen, F. Rawson, and K. G. Shin, "Cooperative Software-Hardware Power Management for Main Memory", published in Power-Aware Computer Systems - 4th International Workshop (PACS'04), editors B. Falsafi and T. N. Vijaykumar, Springer, December, 2004. [4] E.-Y. Chung, L. Benini, and G. De Micheli, “Dynamic Power Management Using Adaptive Learning Tree,” Proc. Int'l Conf. ComputerAided Design, pp. 274-279, 1999. [5] A. R. Lebeck, X. Fan, H. Zeng, and C. Ellis. Power Aware Page Allocation. In ASPLOS, pages 105-116,2000. [6] J. Chase and R. Doyle, “Balance of Power: Energy Management for Server Clusters,” Proceedings of the 8th Workshop on Hot Topics in Operating Systems (HotOS-VIII), May 2001, pp. 163–165. [7] M. Fleischmann. Dynamic Power Management for Crusoe Processors, Jan. 2001. http://www.transmeta.com/. [8] Intel. Mobile Intel Pentium III Processor in BGA2 and MicroPGA2 Packages, 2001. Order Number 283653-002. [9] P. Bohrer, E. Elnozahy, T. Keller, M. Kistler, C. Lefurgy, C. McDowell, and R. Rajamony. The case for power management in web servers. Power Aware Computing, Klewer Academic Publishers, 2002. [10] Rambus,RDRAM, 1999. http://www.rambus.com [11] Cai, L., Yung L., Joint Power Management of Memory and Disk, IEEE, 2005. [12] Q. Zhu, F. M. David, C. Devaraj, Z. Li, Y. Zhou, and P. Cao. “Reducing Energy Consumption of Disk Storage Using Power-Aware Cache Management”. In HPCA, pages 118-129, 2004. [13] Banginwar R., Gorbatov E., Gibraltar: Application and Network Aware Adaptive Power Management for IEEE 802.11, Proceedings of the Second Annual Conference on Wireless On-demand Network Systems and Services (WONS’05), 19-21 Jan. 2005 Page(s):98 – 108 [14] C-H. Hwang and A. Wu, “A predictive system shutdown method for energy saving of event-driven computation,” Int. Conf. Computer-Aided Design, Nov. 1997, pp. 28–32. [15] C. Hsu and U. Kremer, “The Design, Implementation, and Evaluation of a Compiler Algorithm for CPU Energy Reduction” PLDI'03, San Diego, CA, June 2003. [16] F. Douglis, P. Krishnan and B. Marsh. “Thwarting the Power Hungry Disk," in Proceedings of the 1994 Winter USENIX Conference, San Francisco, January 1994. [17] E.N. (Mootaz) Elnozahy, Michael Kistler, and Ramakrishnan Rajamony. Energy-efficient server clusters. In Workshop on Mobile Computing Systems and Applications, Feb2002. [18] M. Elnozahy, M. Kistler, and R. Rajamony. “Energy Conservation Policies for Web Servers”. In Proceedings of the 4th USENIX Symposium on Internet Technologies and Systems, March 2003. [19] E. Pinheiro, R. Bianchini, E. V. Carrera, and T. Heath, “Load Balancing and Unbalancing for Power and Performance in Cluster-Based Systems,” Proceedings of the Workshop on Compilers and Operating Systems for Low Power, September 2001; Technical Report DCS-TR-440, Department of Computer Science, Rutgers University, New Brunswick, NJ, May 2001.

[20] G. Paleologo, L. Benini, A. Bogliolo, and G. De Micheli, “Policy optimization for dynamic power management,” IEEE Trans. Computer-Aided Design, Vol. 18, Jun. 1999, pp. 813–33. [21] Q. Qiu, Q Wu and M. Pedram, “Stochastic modeling of a powermanaged system-construction and optimization,” IEEE Trans.ComputerAided Design, Vol. 20, Oct. 2001, pp. 1200-1217. [22] E. Chung, L. Benini, A. Bogliolo and G. Micheli, “Dynamic Power Management for non-stationary service requests”, IEEE Trans. Computers, Vol. 51, No. 11, Nov. 2002, pp. 1345–1361. [23] T. Simunic, “Dynamic Management of Power Consumption”, Power Aware Computing edited by R. Graybill and R. Melhem, 2002 [24] Y. Chen, A. Das, W. Qin, A. Sivasubramaniam, Q. Wang, N. Gautam, Managing Server Energy and Operational Costs in Hosting Centers, In Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems SIGMETRICS 2005. [25] V. Sharma , A. Thomas , T. Abdelzaher , K. Skadron and Z. Lu, “Poweraware QoS Management in Web Servers”, Proceedings of the 24th IEEE International Real-Time Systems Symposium, p.63, December 03-05, 2003 [26] T. Abdelzaher and V. Sharma. “A synthetic utilization bound for aperiodic tasks with resource requirements”. In Euromicro Conference on Real Time Systems, Porto, Portugal, July 2003. [27] L. Mastroleon, N. Bambos, C. Kozyrakis, D. Economou , Autonomic Power Management Schemes for Internet Servers and Data Centers , Proceedings of the IEEE Global Telecommunications Conference (GLOBECOM), November 2005. [28] W. Felter, K. Rajamani, T. Keller (IBM ARL), and C. Rusu, A Performance-Conserving Approach for Reducing Peak Power Consumption in Server Systems, ACM International Conference on Supercomputing (ICS), Cambridge, MA, June 2005. [29] P. Rong and M. Pedram, “Hierarchical Power Management with Application to Scheduling”, ISLPED (International Symposium on Low Power Electronics and Design) , 2005. [30]S.Gurumurthi,A.Sivasubramaniam, M.J. Irwin, N. Vijaykrishnan, M. Kandemir, T. Li, L.K. John, “Using Complete Machine Simulation for Software Power Estimation: The SoftWatt Approach,” In Proceedings of the International Symposium on High Performance Computer Architecture (HPCA-8), Cambridge, MA, February, 20, pages141-150. [31] S. Hariri, Bithia Khargharia, Manish Parashar and Zhen Li, ``The Foundations of Autonomic Computing, edited by Albert Zomaya,CHAPMN,2005. [32] S. Hariri, AUTONOMIA: An Autonomic Computing Environment, IEEE 22nd International Performance, Computing, and Communication Conference, April 2003. [33] B. P. Zeigler, H. Praehofer, and T. G. Kim, Theory of modeling and simulation. 2nd ed. NewYork: Academic Press, 2000.