Self-Configuring Heterogeneous Server Clusters - CiteSeerX

3 downloads 21492 Views 170KB Size Report
off-line and store the best configuration and request distri- bution found for ...... Managing Energy and Server Resources in Hosting Centers. In Proceedings of ...
Self-Configuring Heterogeneous Server Clusters  Taliver Heathy , Bruno Dinizyz , Enrique V. Carrera y , Wagner Meira Jr.z , and Ricardo Bianchiniy z Department of Computer Science Department of Computer Science Rutgers University Federal University of Minas Gerais Piscataway, NJ 08854-8019 Belo Horizonte, Brazil ftaliver,vinicio,ricardob [email protected] fdiniz,[email protected] y

Abstract

components keep falling; and (2) any necessary increases in performance or capacity, due to expected increases in offered load, are also usually made with more powerful components than those of the existing cluster. In essence, the server cluster is only homogeneous (if at all) when first installed. Finally, single-board computers called “blades” are starting to make their way into existing large server clusters. Blade systems are more compact and are typically easier to manage than traditional ones. Furthermore, several lowpower blades exploit laptop technology to consume significantly less energy than traditional computers. However, it is unreasonable to expect that these blades will replace all the traditional nodes of existing large server clusters in one shot. This replacement is more likely to occur in multiple stages, producing clusters that at least temporarily include nodes of widely varying characteristics. Heterogeneity raises the interesting problem of how to distribute the clients’ requests to the different cluster nodes for best performance. Furthermore, heterogeneity must be considered in cluster reconfiguration for power and/or energy conservation, raising the additional problem of how to configure the cluster for an appropriate tradeoff between power and/or energy conservation and performance. In this paper, we address these two research problems. Our ultimate goal is to develop a server cluster that can adjust the cluster configuration and the request distribution to optimize power, energy, throughput, latency, or some combination of these metrics. The particular optimization function can be defined by the system administrator; for this paper, we select the ratio of cluster-wide power consumption and throughput as an example, so that our system can consume the lowest energy per request at each point in time. Unfortunately, designing such a server is a non-trivial task when nodes are highly heterogeneous, and becomes even more complex when nodes communicate (e.g., when subsets of nodes have different functionalities as in multi-tier e-commerce servers, or when nodes cooperate to share resources such as CPUs, main memory caches, and/or disk storage). To see why it is difficult to determine the best con-

Previous research on cluster-based servers has focused on homogeneous systems. However, real-life clusters are almost invariably heterogeneous in terms of the performance, capacity, and power consumption of their hardware components. In this paper, we describe a self-configuring Web server for a heterogeneous cluster. The self-configuration is guided by analytical models of throughput and power consumption. Our experimental results for a cluster comprised of traditional and blade nodes show that the modelbased server can consume 29% less energy than an energyoblivious server, with only a negligible loss in throughput. The results also show that our server conserves more than twice as much energy as an energy-conscious server that we previously proposed for homogeneous clusters.

1 Introduction A significant fraction of the previous research on clusterbased servers (or simply “server clusters”) has focused on request distribution for improved performance (e.g., [3, 4, 5, 7, 16]) and dynamic cluster reconfiguration for energy conservation without performance degradation [8, 18]. Because these works focused solely on homogeneous clusters, they found essentially that the cluster configuration should be the smallest (in terms of number of nodes) needed to satisfy the current offered load, whereas requests should be evenly distributed across the nodes modulo locality considerations. However, real-life clusters are almost invariably heterogeneous in terms of the performance, capacity, and power consumption of their hardware components. The reason for this is simple and at least two-fold: (1) failed or misbehaving components are usually replaced with different (more powerful) ones, as cost/performance ratios for off-the-shelf  This research has been supported by NSF under grants #EIA-0224428, #CCR-0100798, and #CCR-0238182 (CAREER award), and CNPq/Brazil under grants #680.024/01-8 and #380.134/97-7.

1

figuration and distribution, consider our goal of minimizing energy per request. The difficulty is that the heterogeneous components will likely have different maximum throughputs, as well as different power efficiencies for each level of utilization, creating a potentially huge number of possible configurations and distributions. To tackle this design task, we develop analytical models that use information about the expected load on the cluster to predict the overall throughput and power consumption, as a function of the request distribution. Using the models and an iterative optimization algorithm, we can evaluate a large space of configurations and distributions to find the one that minimizes the power/throughput ratio for each level of offered load intensity. As a proof-of-concept implementation, we apply our models to a cluster of Flash-like Web servers [17]. The optimization step is typically time-consuming, so we run it off-line and store the best configuration and request distribution found for each load intensity on the node that runs a master process. Periodically, the Web servers send their offered load information to the master process, which then computes the total load imposed on the system. With this information, the master looks up the best request distribution and configuration for the current level of load and commands the nodes to adjust accordingly. Our validation experiments with this Web server running on a cluster of blade and traditional servers show that the models are accurate for a wide range of distributions; throughput and power predictions are all within 10% of the measured results. The experimental results with our modelbased server running on the heterogeneous cluster show that we can consume 29% less energy than a traditional, energyoblivious server with only a 2.9% loss in average throughput. The results also show that our server conserves more than twice as much energy as an energy-conscious server that we previously proposed for homogeneous clusters [18]. Based on these results, we conclude that servers must be able to self-configure intelligently on heterogeneous clusters for an ideal tradeoff between energy and performance. The remainder of the paper is organized as follows. The next section details our modeling and optimization approach. Section 3 describes our model-based server software and how we use our analytical framework to guide the decisions the system makes. Section 4 presents our methodology and experimental results. Section 5 discusses the related work. Finally, section 6 draws our conclusions.

cluster configuration and request distribution for a heterogeneous Web server cluster. This machinery works for homogeneous clusters as well but is overkill for these systems, since configuration and distribution are well-understood issues for them. Our models can be applied to clusters comprised of “cooperative” or “stand-alone” Web servers. A cooperative server communicates with other servers to share resources and, thus, serve requests more efficiently. For example, a cooperative server may forward requests for dynamic content to servers running on faster processors. In contrast, a stand-alone Web server, such as Apache or Flash, serves the requests it receives independently of other servers in the cluster. The formulation of the models as we describe them here is general enough to be applied to cooperative servers; a system of stand-alone servers can be seen as a special, simpler case. To determine the best configuration and distribution for heterogeneous clusters, we need to consider all resources (i.e., hardware components), their performance and power consumptions, at the same time. In more detail, we can cast this as an optimization problem: find the request distribution (1) from clients to servers and (2) between servers, in such a way that the demand for each resource is not higher than its bandwidth, and we minimize a particular metric of interest. The cluster configuration comes out from the request distribution information; nodes that are not sent any requests can be turned off. For the purposes of this paper, we restrict ourselves to Web servers of static content and consider the metric of interest to be cluster-wide power consumption divided by throughput. Thus, the request distributions, the total power, and the maximum throughput for each cluster configuration are the unknowns in the models, whereas the resource and offered load characteristics are the inputs to the models. In particular, the resource-related inputs are the bandwidth of each resource, the power consumed by each node when it is idle, and a linear factor describing how the power consumption changes as a function of resource utilization. The resource types we consider are processor, network interface, and disk. For simplicity, we will assume that each node contains a single resource of each type. The load-related inputs are the expected file popularity and the expected average size of requested files. Both of these inputs can be determined from a sample trace of the requests directed to the server cluster. Table 1 summarizes the unknowns and inputs of our models.

2 The Analytical Models

2.1 Modeling Request Distributions

In this section, we describe the analytical models and optimization procedure that allow us to determine the best

To formalize the problem, we will represent the distributions using a vector and multiple matrices. We use a distribution

2

Inputs

Unknowns

Description Bandwidth of each resource Idle power of each node Power factor of each resource Average requested file size Request distributions Maximum system throughput Total system power

Representation in models C vector per resource B vector M vector per resource

avg a

ess size D vector and R matrix per resource max thruput overall power

Table 1: Summary of model inputs and unknowns. vector, D, to represent the fraction of the requests coming from the clients that will be sent to a certain node. Thus, Di is the fraction of the requests from clients that server i will receive. When nodes cooperate, we must also take the intra-cluster request distribution into account. We represent this distribution with one matrix, Rr , per resource type r that can be utilized as a result of an intra-cluster message. Each eler represents the fraction of requests on node i that ment Rij use resource type r on machine j . There are two types of resource matrices: derived and variable. The matrix for a resource type is considered variable if it cannot be determined from other variable matrices, otherwise it is considered derived. For example, the network matrices (one for incoming and one for outgoing messages) can be considered derived matrices, since they can be determined once the D vector and the Rr matrices are known. Variable resource matrices ( D and Rr ) are the ones we solve our model for.

2.3 Determining Throughput and Power Throughput. We can use the utilization vectors to determine the maximum throughput and power consumption of a server cluster configuration. We find the maximum throughput achievable by the server cluster by determining the bottleneck resource(s) in the system. This involves operating with the utilizations defined by equation (1). From the equation, we find that:

thruput = P (

j

R

U

r i

r ji

 D )  avg a

ess size

The bottleneck(s) will be such that Uir = Cir under maximum throughput, where each element Cir is the capacity of resource type r on node i. Thus, we can define the maximum throughput of the system as:

max thruput = min P 8r;i (

2.2 Modeling Utilization of Resources

r i

=(

X

R

r ji

 D )  thruput  avg a

ess size j

j

R

C

r ji

r i

 D )  avg a

ess size j

(3) Power. The power consumed by each resource generally relates to the utilization of the resource. We use a linear model of resource power, such that the power consumed by each node i is:

To determine the throughput and power consumed by various cluster configurations, we need to determine the utilization of each resource in the server cluster. We use an utilization vector, U r , to represent the utilization of each resource type. Thus, each Uir is the utilization of resource type r of node i. To compute each Uir , we need to multiply the percentage of requests that are sent to the corresponding resource (from D and Rr ) by the total number of requests being served per second (the current throughput of the server cluster). This gives us the number of requests directed to the specific resource per second. To get the final utilization in bytes/sec, all we need is to multiply that quantity by the average request size in bytes. In mathematical terms, we have:

U

(2)

j

P

i

=

B

i

+

X r

M

r i

   CU r i r i

(4)

where Bi is the base power consumed by node i when it is idle, and Mir is a measure of power per utilization of resource r. Finally, we define the power consumed by the server cluster, overall power, as the sum of the power consumed by all resources in the system:

overall power =

(1)

j

X

3

P

i

i

(5)

2.4 Instantiating the Inputs to the Models

bined in several ways, depending on the emphasis we give to each of them. In this work, we decided to give them equal emphasis, so our optimization procedure attempts to minimize overall power=thruput, under the constraint that max thruput  thruput. Thus, a decrease in power consumption (with fixed throughput) has the same effect as an equal increase in throughput (with fixed power). We could attempt to solve the system of equations defined by our model directly. Unfortunately, this is a complex proposition, because to determine utilizations we need to multiply unknown distributions, making the problem nonlinear. Moreover, cluster power and throughput are functions of these distributions and vice-versa, so the problem is also recursive. These characteristics suggest that a numerical and iterative optimization technique, such as simulated annealing or genetic algorithms, is appropriate. We use simulated annealing. The annealing works by trying to iteratively optimize a “current solution”, i.e. values for the distributions D and R, starting from initial guess values for these matrices. The initial distributions are set such that the Di are all the same and the Rir are proportional to the capacity of resource type r in node i. A candidate solution is generated by modifying two elements of the same (randomly chosen) vector/matrix at a time; one element is decreased by a randomly chosen amount, while another is increased by the same amount. A candidate also has to satisfy the constraint that Uir  Cir 8r; i. Evaluating a candidate solution involves computing its overall power=thruput measure and comparing it to the corresponding measure of the current solution. If the candidate solution produces a measure that is smaller than the current minimum, it becomes the new current solution. If it does not, it might still become the new current solution, but with a decreasing probability. After evaluating each candidate solution, a new one is generated and the process is repeated. The number of iterations is determined by a “temperature” parameter to the annealing algorithm. More details about simulated annealing can be found in [13].

The C r , B , and M r vectors are inputs to our models. Determining the correct values for these vectors is not always simple. To determine each of the C r vectors, we run a microbenchmark to exercise the resource with accesses of different types and/or sizes and observe their performance. C disk is perhaps the hardest vector to instantiate. More specifically, our disk microbenchmark goes through several rounds of random reads of fixed size, going from 4KByte accesses to 128-KByte accesses (the largest chunk our server will read at once from disk) in steps of 4 KBytes. We run this microbenchmark for each of the different disks in the heterogeneous cluster. Besides these data, we also need the average size of disk accesses on each different node. Unfortunately, the average size of disk accesses (and the memory cache hit ratios) is not readily available offline. To approximate that size, we use information about the memory size and the expected file popularity coming from a representative request trace. In more detail, we rank the files in descending order of popularity and add the file sizes until the combined size exceed the memory size. Accesses to files that are more popular than this threshold are (optimistically) expected to be cache hits. With information about the files that cause misses, we can easily compute the average disk access size. We have successfully taken this same approach to approximating hit rates and disk access sizes in our previous Web server modeling work (e.g., [5, 7]). Instantiating the B vector involves measuring the power consumed by each different node when it is completely idle. Determining the M r vectors is somewhat more involved. For each different node i we determine Midisk , MiCP U , and Minet by running several microbenchmarks, each of which exercises one of these three resources in different ways. For each microbenchmark, we record average disk, CPU, and network interface utilization statistics, as well as average overall power. The utilization data forms an m  3 matrix called F , where m is the number of microbenchmarks. The power data (actually, the subtraction of the power data by the base power of the node) forms an m  1 vector called W . We then compute a least-squares fit to determine the 1  3 vector X for which F X = W . The resulting X1 ; X2 ; and X3 are M disk , M CP U , and M net , respectively, for the node. This same approach has been used successfully by other groups [14].

3 A Model-Based, Self-Configuring Web Server To start getting experience with our model-based selfconfiguration approach, we developed a simple selfconfiguring Web server system for heterogeneous clusters. Our server services requests independently of its peer nodes, obviating the need to model the Rr matrices we described in the previous section. (We are currently experimenting with a cooperative Web server for heterogeneous systems that distributes dynamic requests intelligently; this server exploits our models to the fullest.)

2.5 Finding the Best Distributions To optimize power consumption and throughput, we need to find the request distributions D and Rr for each resource r (and consequently the best cluster configuration), such that Uir  Cir 8r; i and we minimize some metric that combines throughput and power. These metrics can be com4

Each local Web server sends its offered load information to a master process every 10 seconds. The master process accumulates this information and smooths it by computing an Exponentially Weighted Moving Average (EWMA) of the form avg load =  previous load + (1 ) 

urrent load. Our EWMA uses an value of 0.8. Periodically, the master decides whether the request distribution (and configuration) needs to be changed. We refer to the fixed time in between decisions as the reconfiguration interval. The master’s decision is based on its prediction of what the average load will be in the end of the next reconfiguration interval, so that it can adjust the system accordingly. The prediction uses a first order derivative of the current average load and the last average load of the previous interval, i.e. predi ted load = avg load + (avg load last avg load). There is also a slight bias (0.8) against downward trends to avoid undershooting and losing requests. Based on the predicted load, the master can determine the best request distributions, as predicted by our model and optimization procedure. If the request distribution needs to be changed, the master process commands the servers to adjust accordingly. Because our optimization procedure is time-consuming, the best distributions for each amount of load are computed off-line. We store a large pre-computed table of best distributions and throughputs entries, fthruput; best distributiong, on the local disk of the machine running the master process. The master finds the best distributions for a certain load by looking up the entry listing the lowest throughput that is higher than the load. Note however that simulated annealing may be overkill for clusters of stand-alone servers, since the number of possible configurations and distributions is polynomial; a more direct iterative optimization procedure would probably suffice. Nevertheless, the self-configuration is still non-trivial in this case, since the heterogeneous components may have different maximum throughputs, as well as different power efficiencies for each level of utilization.

lized. The power supply loss of each PC is about 48 Watts. In contrast, each blade consumes about 40 Watts when idle and 46 Watts when fully utilized. We estimate the power supply loss in the blade system to be around 38 Watts, due to a pair of supplies. However, the fixed power consumption of the blade chassis is substantially higher, 110 Watts, due to the chassis backplane, the KVM (keyboard-video-mouse) controller, and one fan. (This fixed consumption is a relatively high overhead because the pair of supplies is capable of supporting 4 or 5 additional blades that are not included in our setup.) All PCs and the blade system are connected to a power strip that allows for remote control of the outlets. The systems can be turned on and off by sending commands to the IP address of the power strip. Blades can be turned on and off independently as well, by using software commands issued to the KVM controller. Shutting a node down takes approximately 45 (traditional nodes) and 21 (blades) seconds; bringing any node back up takes about 96 seconds. The total amount of power consumed by the cluster is monitored by a multimeter connected to the power strip. The multimeter collects instantaneous power measurements and sends these measurements to another computer, which stores them in a log for later use. We obtain the power consumed by different cluster configurations by aligning the log and our systems’ statistics. Server software. We experiment with three servers: (1) a traditional, energy-oblivious server; (2) an energyconscious server we developed for homogeneous clusters; and (3) our energy-conscious, model-based Web server for heterogeneous clusters. The traditional server, called “Energy-Oblivious”, is similar to Flash [17]. The energy-conscious server for homogeneous clusters, called “Adaptive”, is based on the same code as EnergyOblivious, but includes a master process that collects load information from the servers and decides on the minimum cluster configuration that can satisfy the offered load. The master uses a PID feedback controller to determine how to change the configuration. When there are enough spare resources, the master forces a node to turn off. When more resources are needed, the master turns a node on. Because the server assumes a homogeneous cluster, the master randomly selects which nodes to turn on and off. In addition, requests are distributed evenly across the active nodes. More details about Adaptive can be found in [18]. Our model-based Web server, called “Model Adaptive”, is based on the same code as Adaptive, but uses our modeling and optimization machinery as described in the previous section. For a fair comparison, the frequency of load information exchanges and reconfiguration decisions in our experiments is kept the same in both adaptive systems. All servers are run on a homogeneous cluster of 8 PCs and

4 Experimental Results 4.1 Methodology Cluster hardware. Our cluster is comprised by 8 Linuxbased PCs (each with an 800-MHz Pentium III processor, local memory, a SCSI disk, and a Fast Ethernet interface) and 4 Linux-based blade servers (each with a 1.2-GHz Celeron processor, local memory, an IDE disk, and a Fast Ethernet interface) from Nexcom [15]. The two sets of machines are connected by a Fast Ethernet switch. The PCs (herein also called traditional nodes) consume about 70 Watts when idle and about 94 Watts when fully uti5

WC98 Homogeneous

WC98 Heterogeneous 1000 Model Power Measured Power Modeled Throughput Measured Throughput

14000

600 8000

6000

400

4000

800

10000 600 8000

6000

400

4000 200

200

2000

0 1:1

Overall Avg. Power (W)

10000

12000

Maximum Throughput (r/s)

800

Overall Avg. Power (W)

12000

Maximum Throughput (r/s)

1000

Model Power Measured Power Modeled Throughput Measured Throughput

14000

2000

0 1:3

1:5 Unbalance Factor

1:7

0 1:1

1:9

0 1:3

1:5 Unbalance Factor

1:7

1:9

Figure 1: Modeling and experimental results for homogeneous cluster throughput and power.

Figure 2: Modeling and experimental results for heterogeneous cluster throughput and power.

a heterogeneous cluster of 4 PCs and 4 blades. The master process is idle most of the time, so it can run on any active node without a noticeable increase in energy consumption. For simplicity, we run it alone on a 9th node.

4.2 Validation of the Models Figures 1 and 2 show our validation results for the WC’98 trace on the homogeneous and heterogeneous clusters, respectively. Both figures show modeled and measured maximum system throughput (in requests/second) and overall power consumption (in Watts), as a function of the request distribution (the D vector). More specifically, each point 1:x on the X-axis represents a different request distribution, where x times more load is directed to one half of the cluster than to the other half. In the heterogeneous cluster, we define each half of the cluster to have 2 traditional and 2 blade nodes. The leftmost point represents an even distribution of requests, i.e. each node receives 1/8 of the requests. These figures demonstrate that our models are very accurate for WC’98. Both modeled maximum throughput and modeled overall power are within 10% of the measured results. Furthermore, the figures demonstrate that the heterogeneous cluster consumes about the same amount of power as the homogeneous system. The throughputs are essentially the same as well, since this workload is bottlenecked by the network links, which have the same bandwidth in both systems. Finally, it is interesting to observe that the request distribution has a more significant effect on throughput than on power for these systems.

Clients. Besides our main cluster, we use 12 Pentium-based machines to generate load for the servers. These clients connect to the cluster using TCP over the Fast Ethernet switch. We run two types of experiments: validation and self-configuration. In our self-configuration experiments, for simplicity, we did not use a front-end device that would enforce request distributions. Instead, when the request distribution needs to change, the master process sends the D vector to each client over pre-established socket connections. The clients send requests to the available nodes of the server in randomized fashion, but obey the D vector. The clients issue their requests according to a trace of the accesses to the World Cup ’98 site from June 23rd to June 24th, 1998 (WC’98). In our model validation experiments, the clients disregard the timing information in the trace and issue new requests as soon as possible. In our selfconfiguration experiments, the clients take the timings of the trace into account, but accelerate the replay of the trace 20 times to shorten the experiments to 8500 seconds. Given our accelerated trace, we set the reconfiguration interval of the adaptive servers to 120 seconds in the experiments. This relatively short interval allows time for a rebooted node to settle and the servers to react quickly to variations in offered load. In practice, the reconfiguration interval can be substantially longer however, since real-time variations in load occur much more slowly than in our experiments. For WC’98, for example, we could have a reconfiguration interval of 2400 seconds in real time. In fact, we expect the energy and time overheads of reconfigurations to be almost negligible in practice. As a result, our models and systems do not consider these overheads.

4.3 Comparing Server Systems Homogeneous cluster. Figures 3 to 5 show the throughput and power consumption of each server system on the homogeneous cluster, as a function of time. Figure 3 demonstrates that the energy-oblivious server consumes roughly the same amount of power throughout the experiment; nontrivial variations only occur during the three load peaks. In contrast, figures 4 and 5 demonstrate that the Adaptive and Model Adaptive systems can nicely adjust the cluster

6

Homogeneous Energy Oblivious

Homogeneous Adaptive

16000

800

16000

800

700

14000

700

12000

600

12000

600

10000

500

10000

500

8000

400

8000

400

6000

300

6000

300

4000

200

4000

200

2000

100

2000

100

0 0

1000

2000

3000

4000 5000 Time (s)

6000

7000

0

8000

0 0

Figure 3: Throughput and power of Energy-Oblivious for homogeneous cluster.

1000

2000

3000

4000 5000 Time (s)

6000

7000

8000

Figure 4: Throughput and power of Adaptive for homogeneous cluster. Homogeneous Model Adaptive

configuration, according to the offered load. For instance, during the lulls in traffic, only 3 or 4 nodes are required to serve the offered load; the other nodes can be turned off. As a result of the reconfiguration, substantial energy savings can be accrued by the two systems. Because the cluster is homogeneous, the two systems behave similarly, consuming about 30% less energy than Energy-Oblivious. In terms of performance, two observations need to be made from these results. First, both energy-conscious systems exhibit throughput drops when nodes are turned on. This effect results from the cold start cache misses taken by the activated node. We performed experiments in which the contents of the cache are stored to disk (right before a node shutdown) and later restored (right after booting the node). In these experiments, the throughput drops are eliminated. Second, as a result of delay involved in node reactivations, a small percentage of requests is delayed. To determine the extent of this effect, we assess how long the energy-conscious servers take to serve the same number of requests as Energy-Oblivious serves in 8500 seconds. We find that both systems take 1.8% longer, meaning that they degrade the average throughput by only 1.8%. Considering the energy they consume over these extended periods, their energy savings are a little lower, around 28%.

16000

800

Throughput (r/s)

Throughput Power 14000

700

12000

600

10000

500

8000

400

6000

300

4000

200

2000

100

0

Power (W)

0

Throughput (r/s)

14000

Power (W)

Throughput Power

Power (W)

Throughput (r/s)

Throughput Power

0 0

1000

2000

3000

4000 5000 Time (s)

6000

7000

8000

Figure 5: Throughput and power of Model Adaptive for homogeneous cluster. energy than Energy-Oblivious during the 8500 seconds of this experiment, whereas Adaptive only consumes 18% less energy during the same period. The reason for the inefficient behavior of Adaptive is that it treats a heterogeneous system as if it were homogeneous. For instance, it treats a single-node blade system with an idle power consumption of 150 Watts (110+40 Watts) the same as a traditional node that consumes only 70 Watts when idle. In effect, Adaptive selects the nodes to be part of the cluster configuration randomly, using feedback control to achieve the required throughput. In this experiment, the 4-node configuration during the lulls in load includes only one blade and, hence, incurs the unamortized fixed power consumption of the entire blade system. We also performed experiments in which Adaptive knew exactly what nodes to turn on/off, although this would be unrealistic in practice. For our trace and heterogeneous cluster, this means turning off the traditional nodes first. The result of this experiment is 26% less energy than Energy-Oblivious.

Heterogeneous cluster. Figures 6 and 7 show the throughput and power consumption of Adaptive and Model Adaptive, respectively, on the heterogeneous cluster, as a function of time. We do not show results for Energy-Oblivious because they are almost exactly the same as in figure 3; the only difference is that the power consumption of the heterogeneous cluster is slightly lower. These figures show substantially different behavior. In particular, note that Adaptive leads to substantially higher power consumption than Model Adaptive during the load valleys. As a result, Model Adaptive consumes 32% less

7

Heterogeneous Model Adaptive

Heterogeneous Adaptive 800

16000

14000

700

14000

700

12000

600

12000

600

10000

500

10000

500

8000

400

8000

400

6000

300

6000

300

4000

200

4000

200

2000

100

2000

100

800 Throughput Power

0

Throughput (r/s)

Power (W)

Throughput (r/s)

Throughput Power

0

0 0

1000

2000

3000

4000 5000 Time (s)

6000

7000

Homogeneous

Heterogeneous

System Energy Oblivious Adaptive Model Adaptive Energy Oblivious Adaptive Model Adaptive

1000

2000

3000

4000 5000 Time (s)

6000

7000

8000

Figure 7: Throughput and power of Model Adaptive for heterogeneous cluster.

Figure 6: Throughput and power of Adaptive for heterogeneous cluster. Hardware

0 0

8000

Power (W)

16000

Energy in 8500 secs (MJ) 5.342 3.733 (30%) 3.783 (29%) 5.277 4.337 (18%) 3.607 (32%)

Time for all requests (secs) 8500 8660 8660 8500 8930 8750

Energy for all requests (MJ) 5.342 3.841 (28%) 3.891 (27%) 5.277 4.611 (13%) 3.769 (29%)

Avg. throughput loss (%) 1.8 1.8 4.8 2.9

Table 2: Summary of energy consumption and performance degradation. Percentages inside parentheses are the energy savings compared to the corresponding energy oblivious result.

server clusters. Pinheiro et al. [18] and Chase et al. [8] concurrently proposed cluster reconfiguration to conserve energy. Elnozahy et al. [11] evaluated different combinations of cluster reconfiguration and dynamic voltage scaling. Rajamani and Lefurgy [19] studied how to improve the cluster reconfiguration technique by using spare servers and history information about peak server loads. Finally, Elnozahy et al. [12] considered dynamic voltage scaling and request batching in Web servers. All of the previous work has been focused solely on conserving energy in homogeneous clusters. As our results demonstrate, systems proposed for these clusters do not conserve as much energy as they could on (more realistic) heterogeneous systems. A survey of power and energy research for servers can be found in [6].

In terms of performance, note that both Adaptive and Model Adaptive now have trouble reaching the maximum throughput of this workload. Again, the reason is the overhead of node activations. When we require that these systems serve the same number of requests as EnergyOblivious, we find that the experiments last slightly longer, 4.8% (Adaptive) and 2.9% (Model Adaptive) with corresponding decreases in average throughput. The extended periods also reduce the energy savings accrued by the systems to 13% (Adaptive) and 29% (Model Adaptive). Summary. Table 2 summarizes our main results. The results are clearly favorable to the model-based selfconfiguring system. It can conserve more than twice as much energy as Adaptive in a heterogeneous cluster, with only a small impact on throughput. These results can be improved further by fine-tuning our load prediction scheme and/or speeding up the node reboot process.

Modeling and optimization for servers. We have successfully modeled the throughput of Web server clusters (e.g., [5, 7]). Aron et al. [2] enforced resource shares in shared hosting platforms using models and optimization. Doyle et al. [10] have proposed a model-based approach to adjusting resource allocations again in shared hosting platforms. Hippodrome [1] applies modeling and optimization to assign load to units of a storage system.

5 Related Work Energy conservation research for clusters. A few recent papers [8, 11, 12, 18] deal with energy conservation for

8

Request distribution in server clusters. Several request distribution strategies for homogeneous server clusters have been proposed, e.g. [9, 4, 16, 7]. We are not aware of studies of request distribution for heterogeneous server clusters. Our results for Adaptive demonstrate that disregarding heterogeneity when it exists can lead to reduced energy savings.

Acknowledgements

Load balancing in heterogeneous systems. A few papers do address job/task balancing/sharing in heterogeneous systems, e.g. [20]. The key difference between these studies and ours is that their goal is usually to improve running time, rather than increase throughput and reduce energy consumption. This difference is important in that we need to distribute requests based on resource utilization and energy consumption, rather than server bandwidth only.

References

We would like to thank Gustavo Gama, Dorgival Guedes, and Eduardo Pinheiro for their comments on the topic of this paper.

[1] E. Anderson, M. Hobbs, K. Keeton, S. Spence, M. Uysal, and A. Veitch. Hippodrome: Running Circles Around Storage Administration. In Proceedings of the Conference on File and Storage Technologies, pages 175–188, Monterrey, CA, January 2002. [2] M. Aron, P. Druschel, and W. Zwaenepoel. Cluster Reserves: A Mechanism for Resource Management in Cluster-Based Network Servers. In Proceedings of the International Conference on Measurement and Modeling of Computer Systems, June 2000.

6 Conclusions In this paper we developed a model-based self-configuring Web server for heterogeneous clusters. The server is based on modeling and optimization of configurations, request distributions, throughput and power. Our experimental results demonstrated that (1) our modeling is accurate and (2) our server conserves more energy than the previously proposed system on a heterogeneous cluster, with a negligible effect on throughput. Based on these results, we conclude that Web servers must be able to self-configure intelligently on heterogeneous clusters. We also conclude that the style of modeling that allows our system to self-configure should be more widely applied in the systems community, given that most real systems do exhibit varying degrees of heterogeneity. As a matter of fact, our modeling framework may even be used in building systems that are heterogeneous by design.

[3] M. Aron, D. Sanders, P. Druschel, and W. Zwaenepoel. Scalable Content-Aware Request Distribution in Cluster-Based Network Servers. In Proceedings of USENIX’2000 Technical Conference, June 2000.

Limitations and future work. The work presented in this paper has a few limitations:

[7] E. V. Carrera and R. Bianchini. Efficiency vs. Portability in Cluster-Based Network Servers. In Proceedings of the 8th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, June 2001.

 



[4] A. Bestavros, M. Crovella, J. Liu, and D. Martin. Distributed Packet Rewriting and its Application to Scalable Server Architectures. In Proceedings of the International Conference on Network Protocols, October 1998. [5] R. Bianchini and E. V. Carrera. Analytical and Experimental Evaluation of Cluster-Based WWW Servers. World Wide Web journal, 3(4), December 2000. Earlier version published as TR 718, Department of Computer Science, University of Rochester, August 1999, Revised April 2000. [6] R. Bianchini and R. Rajamony. Power and Energy Management for Server Systems. Technical Report DCS-TR-528, Department of Computer Science, Rutgers University, June 2003.

We have only considered stand-alone Web servers. We are currently experimenting with a cooperative Web server for heterogeneous clusters.

[8] J. Chase, D. Anderson, P. Thackar, A. Vahdat, and R. Boyle. Managing Energy and Server Resources in Hosting Centers. In Proceedings of the 18th Symposium on Operating Systems Principles, October 2001.

We have only considered Web server loads that do not include dynamic-content requests. We have also only considered clusters with homogeneous server software. Our approach should apply to dynamic content and heterogeneous server software (such as in multi-tier services) directly, but we have to confirm that.

[9] Cisco LocalDirector. http://www.cisco.com/. [10] Ronald P. Doyle, Jeffrey S. Chase, Omer M. Asad, Wei Jin, and Amin M. Vahdat. Model-Based Resource Provisioning in a Web Service Utility. In Proceddings of the 4th USENIX Symposium on Internet Technologies and Systems, Seattle, WA, March 2003.

We have only considered energy conservation through powering entire nodes down. We will consider finer control of device states and dynamic voltage scaling in the future.

[11] E. N. Elnozahy, M. Kistler, and R. Rajamony. EnergyEfficient Server Clusters. In Proceedings of the 2nd Workshop on Power-Aware Computing Systems, February 2002.

9

[12] M. Elnozahy, M. Kistler, and R. Rajamony. Energy Conservation Policies for Web Servers. In Proceedings of the 4th USENIX Symposium on Internet Technologies and Systems, March 2003. [13] S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi. Optimization by Simulated Annealing. Science, Number 4598, 13 May 1983, 220, 4598:671–680, 1983. [14] M. Martonosi, D. Brooks, and P. Bose. Power-Performance Modeling and Validation. In Tutorial given at SIGMETRICS’01, June 2001. [15] Nexcom International. http://www.nexcom.com.tw/. [16] V. Pai, M. Aron, G. Banga, M. Svendsen, P. Druschel, W. Zwaenepoel, and E. Nahum. Locality-Aware Request Distribution in Cluster-based Network Servers. In Proceedings of the 8th ACM Conference on Architectural Support for Programming Languages and Operating Systems, pages 205–216, October 1998. [17] V. Pai, P. Druschel, and W. Zwaenepoel. Flash: An Efficient and Portable Web Server. In Proceedings of USENIX’99 Technical Conference, June 1999. [18] E. Pinheiro, R. Bianchini, E. Carrera, and T. Heath. Dynamic Cluster Reconfiguration for Power and Performance. In L. Benini, M. Kandemir, and J. Ramanujam, editors, Compilers and Operating Systems for Low Power. Kluwer Academic Publishers, August 2003. Earlier version published as ”Load Balancing and Unbalancing for Power and Performance in Cluster-Based Systems”, Proceedings of the International Workshop on Compilers and Operating Systems for Low Power, September 2001. [19] K. Rajamani and C. Lefurgy. On Evaluating RequestDistribution Schemes for Saving Energy in Server Clusters. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, March 2003. [20] S. Zhou, X. Zheng, J. Wang, and P. Delisle. Utopia: a Load Sharing Facility for Large, Heterogeneous Distributed Computer Systems. Software - Practice and Experience, 23(12), 1993.

10

Suggest Documents