A Control Theoretic Approach for Energy-Efficient ... - IEEE Xplore

3 downloads 6725 Views 573KB Size Report
Management of Online Social Network Services. L. Caviglione. Institute of ... Department of Electrical and Electronic Engineering (DIEE). University of Cagliari.
A Control Theoretic Approach for Energy-Efficient Management of Online Social Network Services L. Caviglione Institute of Intelligent Systems for Automation (ISSIA) National Research Council of Italy (CNR) Via de Marini 6, 16149, Genova (Italy) e-mail: [email protected]

Abstract—As a consequence of the massive utilization of Online Social Networks (OSNs) they must exploit a huge amount of hardware and network appliances. Therefore, optimizing their energy consumption could lead to major benefits both from the environmental and economical viewpoints. In this perspective, this paper defines a general OSN model based on homogenous aggregates of machineries concurring for a common goal, defined as worker. Such template is then used to develop a non-linear controller to increase the power efficiency of the OSN by adjusting the frequencies of each worker, according to usergenerated load of requests. Simulations are provided to show the effectiveness of the approach.

I. I NTRODUCTION Nowadays, Online Social Networks (OSNs) are cultural phenomena, offering a rich set of tools, for instance, to share photos and videos, to communicate via text or A/V, and to access functionalities by means of highly responsive Web 2.0 interfaces. To serve a worldwide user population, the OSN must implement an Internet-scale system, which has an important fraction of the operating costs represented by energy expenditures [1]. The needed computing/storage/networking resources are often grouped within large data centers, potentially causing inefficiencies, for instance in terms of traffic management [2]. Besides, it has been estimated that cutting down their energetic requirements would lead to major economical benefits, as well as mitigating the impact of ICT over the environment [3]. In this perspective, the reduction of unnecessary power depletions is now a key design constraint already adopted in different scenarios, ranging from Grid infrastructures [4] to networking facilities [5]. However, to the authors’ best knowledge, this is the first work optimizing the power efficiency of an OSN, even if studies on generic data-centers or cloud-based deployments have been proposed, (see, e.g., [6] and [7] and references therein). Particularly, we exploit the fact that an OSN uses a limited set of basic functionalities, e.g., databases to store snapshots of individuals/relations, and computing units to offer real-time interaction among users. Then, this highly specialized behaviors allows the identification of homogenous aggregates of hardware concurring for a common goal. To optimize the consumptions of such “building blocks”, we take advantage of results obtained in a different applica-

978-1-4799-0756-4/13/$31.00 ©2013 IEEE

A. Pisano Department of Electrical and Electronic Engineering (DIEE) University of Cagliari Piazza d’Armi, 09126, Cagliari (Italy) e-mail: [email protected]

tion scenario, namely the Dynamic Voltage-Frequency Scaling (DVFS) of embedded MultiProcessor Systems-on-Chip (MPSoC) architectures [8]. Specifically, our approach is feedback based and it is similar to the one relying on the occupancy of buffering queues used for data exchange between adjacent processing stages [9], [10]. Also, we exploit results from [11], [12] as to avoid the need of any “linearizing” mapping or identification of parameters. The contributions of this work are: i) the definition of a general OSN template based on homogenous aggregates of hardware components, and ii) the development of a control scheme to increase its energy efficiency through the online, feedback-based, adjustment of the working frequencies of the machineries. It is well known that dynamic power consumption (and, hence, the energy demand over a time interval) depends linearly on the operating frequency. Dynamic voltage scaling, in addition to the suggested dynamic frequency scaling, can further improve energy efficiency. The remainder of the paper is structured as follows: Section II discusses the model, Section III presents the non-linear control algorithm, and Section IV showcases its performance evaluation. Finally, Section V concludes the paper. II. M ODELING AN OSN A. Preliminaries and Notations To have a comprehensive and realistic reference scenario, we selected Facebook (http://www.facebook.com) yielding to a model general enough to represent a wide range of similar services. In the following, the terms “profile” and “page” are use in an interchangeable way. We do assume a user accessing his/her profile (or the wall, according to the OSN jargon) via a Web-based interface, i.e., the conversation is handled via HTTP [13]. To complete the task, typical bits to retrieve are: i) personal details; ii) multimedia objects and graphic components, for instance pictures, avatars and previews of items visible on the wall; iii) lists of friends/interests composing the user’s social graph; iv) the interactions performed over the page, e.g., past comments, appreciations (or likes), and shared URLs. Data is then arranged according to a template/stylesheet, and additional resources (e.g., .js) are injected in the main HTML body. Finally, a Web server sends

the content(s) back via the HTTP channel. It is worth stressing that instant messaging services and visual feedbacks lead to a continuous exchange of data between the two endpoints, reducing the accuracy of a page-by-page model [14]. However, this is more crucial when investigating traffic patterns, and such overheads will be taken into account when defining the size of a page. Refining the model by considering the presence of caches is part of our ongoing research. The workers Wi model an homogenous set of machineries cooperating for a common goal (e.g., computing units for processing multimedia objects, or appliances such as routers), while fWi (t) denotes the time-varying frequency of worker Wi . D denotes a proper dispatching element, also in charge of decomposing a request into specialized and parallel tasks (see, e.g., [2] and [15]). Qi or Qij identify the level of occupancy of the FIFO queues connecting different workers. To ease the modeling effort, we also introduce Qi (t) or Qij (t), with Qi (t), Qij (t) > 0, as real-valued approximations (for instance by considering a “fluid” approach) of the corresponding discrete (integer) degrees of occupancy. It is feasible to assume that the data throughput of a worker is proportional to its frequency by time-varying positive coefficients ki (t), called throughput gains. Also, they allow modeling possible idle states (i.e., when the incoming queue of a worker is empty):  0, idle ki (t) = ki busy where k i is some constant. B. Description of the Model The client-server architecture of an OSN can be reduced to a producer-consumer scheme, as depicted in Figure 1, where C is the consumer, and P1 , P2 and P3 are the producers. C requests profiles to the OSN, with rate fc (t), and drives the generation of results to the users block denoting the actual receivers of the pages. As soon as a request is received by D, it is decomposed into a a set of “primitive” tasks properly enqueued in QT 1 , QT 2 or QT 3 . Note that D is not a real device and it is solely used to ease the underlying model. As said, each worker is an homogenous group of machineries concurrently tackling a task. In details, W1 handles HTTP/Webbased duties, i.e., it sends to the original requestor a proper Web page containing data gathered from different producers. WN models the hardware in charge of running the network infrastructure of the OSN. Similarly to workers, producers are arrays of machineries generating the specific datasets composing a profile. To capture real-world deployments, we use three specific producers [2]: P1 delivering personal details, as well as a snapshot of the interactions made within the OSN (e.g., wall posts and likes); P2 computing the neighborhood, retrieving the needed multimedia objects to display posts, and previews of shared URLs, and peers’ avatars; P3 providing additional data such as, advertisements, geotags and presence information.

To model interactions among different elements, we consider the input/output balance of each queue: ˙ (t) Q N ˙ (t) Q 21

˙ (t) Q 22 ˙ (t) Q 23

˙ (t) Q T1 ˙ Q (t) T2

˙ (t) Q T3

=

kIWN (t)fWN (t) − kOW1 (t)fW1 (t)

=

kIP1 (t)fP1 (t) − kOWN ,1 (t)fWN (t)

=

kIP2 (t)fP2 (t) − kOWN ,2 (t)fWN (t)

=

kIP3 (t)fP3 (t) − kOWN ,3 (t)fWN (t)

=

kID,1 (t)fC (t) − kOP1 (t)fP1 (t)

=

kID,2 (t)fC (t) − kOP2 (t)fP2 (t)

=

kID,3 (t)fC (t) − kOP3 (t)fP3 (t)

(1)

where fW1 (t), fWN (t), fP1 (t), fP2 (t), fP3 (t) are the working frequencies of the respective set of machines. In the following, the explicit dependence on time of the model variables will be sometimes dropped. As showcased in Figure 1, building a Web page needs producers and workers to generate and transform an assorted set of outputs. Specifically, to retrieve data composing a profile, the dispatcher D instantly decomposes a request issued by C into a task to be executed by each producer. Assignments are properly distributed through the queues QT 1 , QT 2 and QT 3 . Then, Pi , will process pending tasks enqueued in the corresponding QT i (i = 1, 2, 3). Each Pi will produce a fixed, but different, amount of objects to complete a task. Therefore, the occupancy of the related output queue (i.e, Q2i , with i = 1, 2, 3) will grow proportionally to kIP1 (t), kIP2 (t), and kIP3 (t), respectively. The worker WN asynchronously pushes data from producers to the final stage W1 , and its effort is ruled by the specific outcome of each producer. As an example, P2 generates multimedia objects, which are bigger than textual ones handled by P1 . In this perspective, kOWN ,1 (t), kOWN ,2 (t) and kOWN ,3 (t) “partition” the service capacity of WN . Its output is a stream of content-insensitive Protocol Data Units (PDUs), hence a unique kIWN (t) suffices. Lastly, W1 fetches and receives the PDUs composing a page with a rate proportional to kOW1 (t). C. Working frequencies and the two-tiered controller As discussed, workers and producers model groups of machineries, thus the related fi (t) is actually an average value (e.g., fP1 (t) is the average working frequency of all the hosts composing P1 ). Besides, we assume that each device supports the Advance Configuration & Power Interface (ACPI) specification [16], hence values of fi (t) can be only selected among a finite set. As a matter of fact adjusting the frequency with an host granularity would be unfeasible (in 2010 Facebook already had more than 10, 000 units). Then, when our scheme imposes a frequency to a specific aggregate of nodes this should be intended as a “set-point” value for a lower-level controller. The latter is in charge of finding a weighted linear combination of ACPI-compliant values and/or idle-states. Defining proper ways to implement such a mapping is outside the scope of this paper and part of our ongoing research.

QT1

QT2

Q22 WN

Obj2

W1

PDU

Page

C

QN

P2

P3

Reference model of an OSN architecture.

D. Computation of throughput gains To get realistic throughput gain values, we used publicly released hardware/software data from Facebook. Gains are computed on a per-group basis and not for a single host. Let us define fˆi as the maximum frequency allotted for the i-th worker. We do assume that the hardware operating at the full speed can deliver 952, 000, 000 pages per day, as actually happens for Facebook (see, e.g., [17] and [18]), which yields a maximal page request rate per second fˆC = 11, 000 req s . Each request triggers one task per consumer, hence kID,1 (t) = kID,2 (t) = kID,3 (t) = 1. The worker W1 relies upon an integrated Web framework, such as Tornado [19], typically running on 2.4 GHz Quad-core AMD Opteron, thus fˆW1 = 2.4 GHz. Considering the typical size of a profile page as ∼1 Mbyte [13], worker W1 , when operating at full-speed, should serve 11, 000 pag s , and sustain (fetch at) the proper rate inbound data from the internal network, i.e., kOW1 (t) = 4.58. This also takes into account overheads due to additional services but it does not model traffic belonging to internal syncing and control signaling, which is assumed to be negligible. Similarly, WN has to move all the needed data from the producers towards the Web front-end W1 . According to the Facebook involvement in the Opencompute Project [20], the hardware template for WN is a 3.3 GHz dual Xeon host, able to process traffic at Gigabit speed [21], i.e., fˆWN = 3.3 GHz. Then, when the maximum load of requests is experienced, and WN operates at the highest speed, it must be able to send to W1 a volume of (106 × 11, 000) byte s , therefore, kIWN (t) = 3.33. To compose a wall, each producer will have an output characterized by a specific number of objects with different sizes. Thus, the three inbound flows to WN require a different “amount of processing” (in the sense that are composed by a different amount of PDUs). A quite precise estimation for its partitioning (e.g., to move graphics, text and scripts) can be done by taking into account the pagelet and BigPipe [18] designs used by Facebook. So, we consider the average user profile composed of 30 objects provided by P1 , 50 by P2 and 10 by P3 [13]. Consequently, kOW ,1 (t) = 30 × 11, 000/fˆW = 10−4 N

QT3 Task

Obj3

Q23

Fig. 1.

D

Task

Q1

P1

Task

Obj1

Q21

N

and, by analogous expressions with different number of objects (50 and 10, respectively), it yields kOWN ,2 (t) = 1.66 · 10−4

and kOWN ,3 (t) = 10.33 · 10−4 . Producers are considered with the same hardware and software functionalities, but physically disjoint. For data retrieval duties, Facebook uses tweaked versions of MySQL [17], Hive, Pig and Hadoop with 2.4 GHz AMD Opteron machines [22]. Hence, we do assume that fˆP1 = fˆP2 = fˆP3 = 2.4 GHz, also consistent with other works, such as [23]. As a worst case, there will be a new task request per producer with a rate fˆC , thus kOP1 (t) = kOP2 (t) = kOP3 (t) = 4.58 · 10−6 . To face this load, each producer must release an output rate of ({30, 50, 10} × 11, 000) obj s , leading to kIP1 (t) = 137.5 · 10−6 , kIP2 (t) = 229.2 · 10−6 , and kIP3 (t) = 45.83 · 10−6 . III. A N ONLINEAR ALGORITHM FOR DVFS Frequencies fW1 , fWN , fP1 , fP2 , fP3 are user selectable among a set of permitted values. The idea behind feedback based speed adjustment algorithm is to determine the proper speed of each worker according to the inspected occupancy levels of the FIFO queues (used as feedback variables), trying to steer them in a vicinity of some desired constant “set-point” value. Let Qcap be the queue capacity. Often Qref = Qcap /2 is a convenient set-point for the queue occupancy. Denote as follows the “error variables” ei whose mean value has to be steered near zero. − Qi ei = Qref i

i ∈ {N, 21, 22, 23, T 1, T 2, T 3}

(2)

Since only a finite number of permitted frequencies is available to each computing unit, it makes sense to parameterize the set of admissible frequencies by an integer subscript coefficient j ranging from 1 to N , with N being the number of permitted frequencies. By convention we assume that the unit frequency increases with j. Every T seconds, the controller adjusts the current value of j according to the next Non-linear controller Every T seconds do: - If [ (ei [k] < −Δ) AND (ei [k] ≤ ei [k − 1]) ] then increase the frequency (j := min(j + 1, N )). - If [ (ei [k] > Δ) AND (ei [k] ≥ ei [k − 1]) ] then decrease the frequency (j := max(j − 1, 1)). where ei [k] = ei (kT ). The nonlinear controller considered here can only change the frequency to adjacent values in the admitted set, in other words at each trigger instant the

coefficient j can be incremented, or decremented, only by one. This is not a limitation, but rather an intrinsic characteristics of the considered nonlinear variable-structure controller (see [24]). As shown in [11], [12], this controller provides good reactivity against abrupt workload variations coupled with ease of tuning. At every trigger time instants, the controller increases, decreases or keep constant the frequencies on the basis of the current queue occupancy error ei [k] and also by looking at its comparison with the previously observed value ei [k−1] (the sign of the difference ei [k] ≤ ei [k−1] determines the filling or emptying “trend” of the queue). Frequency adjustments are only applied when the corresponding error variable lies outside a boundary layer of thickness Δ, which plays the role of an “hysteresis parameter” limiting the rate of frequency adjustments at the price of larger fluctuations of the corresponding queue occupancy level. Reasonable values of Δ lie between the 2% and 10% of the corresponding desired set-point value. The above non-linear controller is compactly referred to as N L(ei ). The overall frequency adjustment algorithm is composed by five parallel instances of the NL algorithm fW1 = N L(eN ) fWN = N L(e2i∗ ) fPi = N L(eT i ), i = 1, 2, 3

The adjustable frequencies fW1 (t), fWN (t), fP1 (t), fP2 (t), fP3 (t) are varied on-line according to the suggested nonlinear controller (3). The set-point values have been selected = 10, 000, ∀i. as Qref i The plan of the tests is as follows. The T and Δ parameters are taken as T = 60 s and Δ = 50 in the first test, and a detailed discussion of the results is made. The adjustable frequencies and the queue occupancy profiles will be displayed in a number of dedicated plots (Figures 3-6). Then, two comparative tests will be presented to evaluate the effects the T and Δ parameters. Results of the comparative tests will be given in aggregate form by means of bar diagrams (Figures 7-9). For ease of visualization and comparison, in all the next plots the frequencies will be converted to the corresponding normalized value fi (t) , i ∈ {W1 , WN , P1 , P2 , P3 } (6) fˆi and, similarly, all queue occupancies will be normalized with respect to the corresponding set-point value according to Qi (t) Qref i

(7)

(3) Consumer frequency fC(t)

Consumer frequency fC(t)

12000



where index i is such that

6100

10000

(4)

It can be seen that each machinery adjusts the corresponding frequency depending on the occupancy of the incoming “upstream” queues(s). The particular form of adjustment logic for fWN reflects the fact that WN is supplied by three parallel incoming queues, and relation (4) selects the queue having the largest-in-magnitude occupancy error. The reader is referred to [12] for a formal treatment supporting the correct functioning of the suggested control algorithm. IV. S IMULATION R ESULTS The considered OSN architecture has been simulated in the Matlab-Simulink environment. Equations (1) have been solved numerically by Euler method with fixed step size τs = 1 s. The simulation covers 24 hours of functioning. The adopted profile for the consumer frequency fc is displayed, with different time scales, in the left and right plots of Figure 2. As shown in the left plot, during the first 18 hours the consumer frequency oscillates in a sinusoidal manner, while in the last 6 hours it grows linearly reaching the maximal value of fˆc = 11, 000 req/s approximately after the 23-rd hour, staying frozen there afterwards until the end of the simulation. The right plot of Figure 2 displays a zoom of fc (t) during the first 120 seconds, showing that the adopted profile also includes randomly generated fluctuations. The gain parameter values are taken from Section II-D. The admissible frequency sets contain N = 100 elements, uniformly sampled within the range [0.1fˆi , fˆi ], i ∈ {W1 , WN , P1 , P2 , P3 } (5)

6050 Pages / s

∀i∗ , j = 1, 2, 3, i∗ = j

Pages / s

|e2i∗ | ≥ |e2j |,

i ∈ {N, 21, 22, 23, T 1, T 2, T 3}

8000 6000 4000

6000

5950

2000 0

5

10 15 Time [h]

20

0

20

40

60 80 Time [sec]

100

120

Fig. 2. Consumer frequency fC . Left plot: long-term behaviour. Right plot: zoom on initial transient.

Normalized frequencies fC(t) and fW

Normalized queue occupancy QN

1

1

1.5

0.8 1 0.6 0.4

0.5

f (t) C

fW

1

0.2 0

5

10 15 Time [h]

20

0 0

5

10 15 Time [h]

20

Fig. 3. Left plot: normalized frequencies fC and fW1 . Right plot: normalized queue occupancy QN

Figure 3-left displays the superimposed plots of the normalized frequencies fC and fW1 , showing that the latter varies tracking the actual fluctuations of the consumer frequency, as expected. Figure 3-right shows the normalized occupancy of the queue QN , which is kept close to the unity value (meaning that the actual, i.e., non normalized, queue occupancy is kept close to the desired set-point). Figure 4-left displays the superimposed plots of the normalized frequencies fC and fWN , showing that the latter exhibits the expected profile.

Normalized frequencies f (t) and f C

W

(t)

Normalized frequencies f (t) and f C

n

W

(t)

W1

n

0.75

WN

2500

P1

4500

4000

4000

3500

0.4

f (t)

fC(t)

C

fW

f

W

n

n

0.2 0

5

10 15 Time [h]

0.65 2

20

2.5

3 Time [h]

3.5

4

1500

1000

500

Fig. 4. Normalized frequencies fC and fWn . Left plot: long-term behaviour. Right plot: zoom. Normalized queue occupancy Q

23

8

2

6

1.5

4

1

2

0.5

3000 2500 2000 1500 1000

1 2 3 ϲϬϯϬϭϱ

0

sĂůƵĞ ŽĨ d΀ƐĞĐ΁

T1

0 0

20

5

10 15 Time [h]

20

Fig. 5. Left plot: normalized queue occupancy Q23 . Right plot: normalized queue occupancy QT 1 .

Figure 4-right is a zoomed plot highlighting the limited nature of the rate at which frequency adjustments are made. The left and right plots in Figure 5 depict that the content of the queues Q23 and QT 1 is oscillating around the desired setpoint. The queues Q21 and Q22 quickly become empty (only one queue among the three parallel ones is regulated), while the occupancy levels of the queues QT 2 and QT 3 have a profile analogous to that of QT 1 as a consequence of the fact that the inbound and outbound throughput gains are the same. Figure 6-left shows that also the normalized frequency fP1 “tracks” the consumer frequency fC , and Figure 6-right shows that the rate of frequency adjustments is also limited. The profiles of the normalized frequencies fP2 and fP3 are both analogous to that of fP1 . Figures 7-9 present comparative analysis results where different tuning of parameters T and Δ are considered. Figure 7 considers the total number of frequency adjustments undergone by workers W1 , WN and P1 during the entire simulation test, revealing almost a linear dependence from the T parameter. Figure 8 investigates the normalized variance

Normalized queue variance

10 15 Time [h]

1500 1000

1 2 3 ϲϬϯϬϭϱ

0

1 2 3 ϲϬϯϬϭϱ

sĂůƵĞ ŽĨ d΀ƐĞĐ΁

Q23

QT1

1

1

1

0.9

0.9

0.9

0.8

0.8

0.8

0.7

0.7

0.7

0.6

0.6

0.6

0.5

0.5

0.5

0.4

0.4

0.4

0.3

0.3

0.3

0.2

0.2

0.2

0.1

0.1

0.1

0

0

1

2

3

sĂůƵĞ ŽĨ d΀ƐĞĐ΁

Fig. 8.

1

2

3

0

1

ϲϬϯϬϭϱ

ϲϬϯϬϭϱ

sĂůƵĞ ŽĨ d΀ƐĞĐ΁

2

3

ϲϬϯϬϭϱ

sĂůƵĞ ŽĨ d΀ƐĞĐ΁

Normalized queue variances with varying T . W1

800

WN 1200

P1 1000 900

700 1000

800

600 700

800 500

600

400

600

500 400

300 400

300

200 200

200 100 0

100 1 2 3 ϱϬϱϬϬϭϬϬϬ

sĂůƵĞ ŽĨ Δ Normalized frequencies fC(t) and fP (t)

2000

Number of frequency adjustments with varying T .

Fig. 7.

Number of frequency switches

5

2500

sĂůƵĞ ŽĨ d΀ƐĞĐ΁

QN

0 0

3000

500

500 0

Normalized queue occupancy Q

3500

Number of frequency switches

0.7

0.6

2000

Number of frequency switches

0.8

Number of frequency switches

1

0

1 2 3 ϱϬϱϬϬϭϬϬϬ

sĂůƵĞ ŽĨ Δ

0

1 2 3 ϱϬϱϬϬϭϬϬϬ

sĂůƵĞ ŽĨ Δ

Normalized frequency fP (t)

1

1

Fig. 9.

0.75

1

Number of frequency adjustments with varying Δ.

0.8

0.6

0.4

0.7

fC(t) fP

1

0.2 0

5

10 15 Time [h]

20

25

0.65 2

2.5

3 Time [h]

3.5

4

Fig. 6. Left plot: Normalized frequencies fc and fP1 . Right plot: zoomed profile of normalized frequency fP1 .

of the queues occupancy (each variance is divided by the maximum value among the three) supporting that reducing T , i.e., changing the frequency more often, one observe a reduction of the variance values. Figure 9 presents a comparative analysis of the number of frequency adjustments with varying Δ. W1 and P1 show a reduction of the number of frequency adjustments with increasing T , while that associated to WN

Normalized queue variance

QN

Q23

QT1

1

1

1

0.9

0.9

0.9

0.8

0.8

0.8

0.7

0.7

0.7

0.6

0.6

0.6

0.5

0.5

0.5

0.4

0.4

0.4

0.3

0.3

0.3

0.2

0.2

0.2

0.1

0.1

0.1

0

0

1 2 3 ϱϬϱϬϬϭϬϬϬ

sĂůƵĞ ŽĨ Δ

Fig. 10.

1 2 3 ϱϬϱϬϬϭϬϬϬ

sĂůƵĞ ŽĨ Δ

0

1 2 3 ϱϬϱϬϬϭϬϬϬ

sĂůƵĞ ŽĨ Δ

Normalized queue variances with varying Δ.

remains almost unchanged. Finally, Figure 10 showcases that Δ has a negligible effect on the queue variances. To sum up, our tests validate that the proposed scheme works correctly in the sense that all the frequencies of the computing units decrease and increase over time tracking the variations of the consumer demand. It is worth to stress that in the final hour of test, when the consumer frequency reaches the maximal value fˆc = 11, 000, all the frequencies of the hardware composing the workers tend to the corresponding maximal value. V. C ONCLUSIONS AND F UTURE W ORK In this paper we investigated a control theoretic approach to increase the power efficiency of OSNs by adjusting the working frequencies of machineries, according to user-generated load of requests. Future works aims at enhancing the twotiered frequency controller explained in Section II-C, since it has the following benefits: i) it allows to have a more fine-grained control over the frequencies, since it can bypass the strict values imposed by the ACPI; ii) it handles in an unique way heterogeneities of the hardware (e.g., nodes having different specifications), as well as it hides features such as the number of CPUs per node; iii) the target frequency could be the weighted combination of different ACPI-compliant values and machines put in a idle-state, i.e., it can mix scaling and dynamic shutdown/wake-up policies; iv) it enables to exploit load-balancing through task allocation and frequency scaling. As a part of our ongoing research, we also aim at using more realistic loads (i.e., fc (t)), possibly collected from real deployments. Lastly, we are working towards a prototypal implementation of our scheme. R EFERENCES [1] A. Qureshi, R. Weber, H. Balakrishnan, J. Guttag, B. Maggs, “Cutting the Electric Bill for Internet-Scale Systems”, in Proc. of the 2009 ACM SIGCOMM Conference on Data Communication, Barcelona, Spain, pp. 123 - 124, Aug. 2009. [2] P. M. Wittie, V. Pejovic, L. Deek, K. C. Almeroth, B. Y. Zhao, “Exploiting Locality of Interest in Online Social Networks”, in Proc. of the 6th COnference on emerging Networking EXperiments and Technologies (Co-NEXT ‘10), Philadelphia, PA, USA, Nov. 2010.

[3] M.Gupta, S.Singh,“Greening of the Internet”, in Proceedings of the ACM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (SIGCOMM 2003), Karlsrhue, Germany, pp. 19 - 26, Aug. 2003. [4] G. Da Costa, J.-P. Gelas, Y. Georgiou, L. Lefevre, A.-C. Orgerie, J.M. Pierson, O. Richard, K. Sharma, “The GREEN-NET Framework: Energy Efficiency in Large Scale Distributed Systems”, in Proc. of the High Performance Power Aware Computing Workshop (HPPAC 2009), Rome, Italy, pp. 1 - 8, May 2009 [5] A. P. Bianzino, C. Chaudet, D. Rossi, J.-L. Rougier, “A Survey of Green Networking Research”, IEEE Communications Surveys & Tutorials, vol. 14, no. 1, pp. 3 - 20, First Quarter 2012. [6] V. Valancius, N. Laoutaris, L. Massouli´e, C. Diot, P. Rodriguez, “Greening the Internet with nano data Centers”, in Proc. of the 5th International Conference on Emerging Networking Experiments and Technologies, pp. 37 – 48, ACM, Dec. 2009. [7] A. Berl, E. Gelenbe, M. Di Girolamo, G. Giuliani, H. De Meer, M. Q. Dang, K. Pentikousis, “Energy-efficient Cloud Computing”, The Computer Journal, vol. 53, no. 7, pp. 1045 –1051, 2010. [8] Z. Lu, J. Lach, M. Stan, “Reducing Multimedia Decode Power Using Feedback Control”, in Proc. of Internat. Conference on Computer Design (ICCD), 2003, pp. 489-496. [9] Q. Wu, P. Juang, M. Martonosi, W. Clark, “Formal Online Methods for Voltage/Frequency Control in Multiple Clock Domain Microprocessors”, in Proc. of Int. Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2004, pp. 248-259. [10] Q. Wu, P. Juang, M. Martonosi, L.-S. Peh, D. W. Clark, “Formal Control Techniques for Power-Performance Management”, IEEE Micro, vol. 25, no. 5, pp. 52-62, Sep./Oct. 2005. [11] S. Carta, A. Alimonda, A. Acquaviva, A. Pisano, L. Benini, “A Control Theoretic Approach to Energy-Efficient Pipelined Computation in MPSoCs”, ACM Trans. Emb. Comput. Syst., vol. 6, no. 4, 2007. [12] Acquaviva, A., Alimonda, A., Benini, L. Carta, S., Pisano A., “A Feedback-Based Approach to DVFS in Data-Flow Applications”, IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems, vol. 28, no. 11, pp. 1691-1704, 2009. [13] L. Caviglione, “Extending HTTP Models to Web 2.0 Applications: The Case of Social Networks”, 2011 4th IEEE Int. Conf. on Utility and Cloud Computing (UCC), pp. 361 - 365, 5-8 Dec. 2011. [14] Choi, H.-K., Limb, J. O. (1999) A behavioral model of Web traffic, in Proc. of the 7th Int. Conf. of Network Protocols (ICNP ’99), Toronto, Canada, Oct. - Nov. 1999, pp. 327 - 334. [15] J. M. Pujol, V. Erramilli, G. Siganos, X. Yang, N. Laoutaris, P. Chhabra, P. Rodriguez, “The Little Engine(s) That Could: Scaling Online Social Networks”, IEEE/ACM Trans. on Networking, vol. 20, no. 4, pp. 1162 - 1175, Aug. 2012. [16] Advanced Configuration & Power Interface (ACPI) Specification 4.0a, [online] http://www.acpi.info, last accessed March 2013. [17] MySQL Tech Talk, Facebook, Palo Altro, Nov. 2010, [online] http://www.livestream.com/facebookevents/video? clipId=flv_cc08bf93-7013-41e3-81c9-bfc906ef8442, last accessed March 2013. [18] Facebook, BigPipe: Pipelining Web Pages for High Performances, [online] http://www.facebook.com/note.php? note_id=389414033919, last accessed March 2013. [19] Facebook, Tornado Web Server, [online]. http://developers.facebook.com/blog/post/301/, last accessed March 2013. [20] E. Frachtenberg, “Holistic Datacenter Design in the Open Compute Project”, IEEE Computer , vol.45, no.7, pp. 83 – 85, July 2012. [21] L. Niccolini, G. Iannaccone, S. Ratnasamy, J. Chandrashekar, L. Rizzo, “Building a Power-Proportional Software Router”, In Proc. of the 2012 USENIX conference on Annual Technical Conference (USENIX ATC’12), Berkeley, CA, USA, pp. 1-8. [22] Y. Jia, Z. Shao, “A Benchmark for Hive, PIG and Hadoop”, Apache.org, [online], last accessed March 2013. [23] A. Pavlo, E. Paulson, A. Rasin, D. J. Abadi, D. J. DeWitt, S. Madden, M. Stonebraker, “A Comparison of Approaches to LargeScale data Analysis”, in Proc. of the 2009 ACM SIGMOD Int. Conf. on Management of Data (SIGMOD ’09), pp.165 –178. [24] A. Pisano, E. Usai, “Sliding Mode Control: a Survey with Applications”, in Mathematics and Computers in Simulation, vol. 81, no.5, 954 - 979, 2011.

Suggest Documents