Adaptive Performance Control of Computing Systems via Distributed Cooperative Control: Application to Power Management in Computing Clusters
by Mianyu Wang∗ , Nagarajan Kandasamy† , and Moshe Kam∗ December 2005 ECE Technical Report ACL-2005-02
DREXEL UNIVERSITY Department of Electrical and Computer Engineering 3141 Chestnut Street Philadelphia, PA, 19104 U.S.A. ∗ †
E-mail: {jeremy, kam}@minerva.ece.drexel.edu Tel: +1.215.895.1740 E-mail:
[email protected] Tel: +1.215.895.1996
Abstract Advanced control and optimization techniques offer a theoretically sound basis to enable autonomic performance management in emerging distributed computing models such as utility computing. To tractably solve performance management problems of interest, including resource allocation and provisioning in such distributed computing environments, we develop a decentralized and cooperative control framework wherein the optimization problem for the system is first decomposed into simpler sub-problems, and each sub-problem is solved separately by individual controllers to achieve overall performance objectives. Concepts from optimal control theory are used to implement individual controllers. The proposed framework is highly scalable, naturally self healing (tolerates controller failures), and allows for the dynamic addition/removal of controllers during system operation. Also, the interaction between controllers requires very little information exchange, reducing the corresponding communication overhead. As a specific case study, we apply the control framework to an important resource management problem - to minimize the power consumed by a computing cluster subject to a dynamic workload while satisfying the quality-of-service (QoS) requirements specified by a service-level agreement. Our experiments with real-world workload traces show that the proposed technique has very low control overhead, and adapts quickly to both workload variations and controller failures. Keywords: Autonomic computing, distributed control, cooperative control, optimal control, resource management
1
Introduction
Since communication and network technologies started revolutionizing our life from the end of the last century, more and more applications such as information retrieval, multimedia service, and scientific or business computing have migrated from personal computers to distributed high performance servers. Through pervasive internet access, these resources can be easily accessed with various personal computers even embedded networked equipments. Distributed computing systems hosting these rapidly increasing applications, especially those for critical e-commerce, military, banking, and transportation applications must satisfy various and sometime stringent quality-of-service (QoS) requirement specified by service level agreement (SLA) while operating in highly dynamic environments. The workload may be timevarying and hardware and software components may fail during operating. In order to achieve the desired QoS, numerous performance-related parameters must be continuously tuned in such systems. As these systems become more complex, it is highly desirable for them to be largely autonomic or self-managing, requiring only high-level guidance from administrators [13, 30], in similar fashion to the human nervous system that adapts automatically to various external influences. The importance of energy and power consumption of microsystems has recently aroused the awareness of research society. Energy efficiency has becomes an important design goal of densely packed server clusters not only due to the electricity cost to power them on, but also because the excessive heat dissipation greatly impairs system reliability and poses a heavy burden on the cooling subsystems. It becomes a serious concern as some recent studies [9, 28, 23] have pointed out that each data center can consume as much as several Megawatts of power, and 22 T W h of electricity was consumed in 2003 for US nationwide data centers. Additionally due to the typically varying workloads, dynamic resources provisioning and power management is thus very significant when statically provisioning may be unable to provide enough flexibility and granularity to search optimal solution for operation of systems. An effective and straightforward mechanism for managing the power consumption of these systems is to temporarily turn off some of the servers. However they will not be available to serve any requests after turning off and it takes a while to bring them up. In the meantime, most processors today are designed to support dynamic voltage 1
scaling (DVS) to manage their power consumption. Since the energy dissipation scales quadratically to the supply voltage, DVS provides a substantial and flexible mechanism of energy saving. However, lower supply voltage and frequency will loose performance to achieve quality of service. An interesting and challenging problem occurs when one seeks the optimal operating conditions, satisfying the QoS requirement and maintaining the system energy efficiency at the same time. This report addresses this dynamic self-optimization problem of autonomic systems which aims to achieve QoS objectives by adaptively tuning operating parameters with minimal human intervention. Control theory is a well-established mathematical technique and engineering methodology that has been used to design adaptive resource management schemes for various applications such as task scheduling [8, 18], bandwidth allocation and QoS adaptation in web servers [2], load balancing in e-mail and file servers [17, 25], network flow control [22] and power management [19, 29]. The above methods all use classical feedback control to produce control actions based on measurement of the system output to regulate key operating parameters. Feedback control assumes a linear time-invariant system model, and the closed-loop controller is designed as an open-loop system transfer function under stability and sensitivity requirements. However, in more complex control problems, a pre-specified plan, i.e, the feedback map, is inflexible and does not adapt very well to the constantly changing operating conditions. Moreover, cost of the control actions themselves is not accounted for. To overcome the aforementioned limitations of classical output feedback control, more advanced concepts, borrowed from model predictive control [20], have been recently studied to manage computing systems with a limited set of control inputs [1, 14]. The actions governing (non-linear) system operation are obtained by optimizing its forecast behavior for the specified QoS criteria over a limited prediction horizon. This scheme is more general and widely applicable than classical control, taking QoS objectives and system operating constraints into consideration. Recently, some studies [10] started to use optimal control theory to dynamically control server frequency, more specifically linear quadratic regulator (LQR) algorithm was applied to an approximated second-order linear system in which DVS controlled frequency and response time are regarded as input and output respectively. Model parameters are computed by system identification methods using the empirical data. 2
The accuracy of this approximated model strongly depends on the sampling empirical data and this methods may bring some implementation difficulties on model identification when applied on-line. In this report, we develop a general technique to design self-managing (self-optimization) systems using concepts from general optimal control theory [7, 16]. This technique allows for multiple QoS objectives and system operating constraints to be explicitly expressed by a cost function and solved for a finite look-ahead control horizon. The optimal control inputs governing the operating parameters of the system are obtained by solving a discrete two-point boundary value problem and the corresponding control algorithm is derived via Pontryagin’s maximum principle [12]. It accommodates time-varying state-space models as well as control actions incurring dead times, i.e., the delay between applying the control input and the corresponding system response. The systems under consideration include stand alone or distributed web servers, database servers and high performance application servers, and even mobile/embedded systems. Unlike [1, 14] that target systems with a limited and discrete set of control inputs, the proposed approach is aimed at systems having a much richer set of control settings where the number of available inputs is high or even continuously adjustable. In such cases, traversing a discrete search space is computationally expensive, and in fact, the complexity increases exponentially with the number of control inputs. Therefore, heuristic techniques are typically used to obtain suboptimal solutions. In our approach, the control problem of interest is first formulated and solved assuming a continuous input (output) domain (or a continuous approximation of a discrete one). The obtained solution may then be mapped to an appropriate value within the discrete set. As a case study, we apply optimal control theory to manage the power consumed by a single processor subject to a time-varying workload while satisfying the QoS requirements in terms of average response time. And then the design methods is extended to a single queue multiple processors case. Assuming processors with dynamic voltage scaling [26] capability, optimal operating frequencies are obtained by actively evaluating the trade-off between power consumption and QoS requirements. We describe the processor state-space model, formulate the power management problem within the optimization framework, and develop the discrete time online controller 3
Λ(k)
dj
...
d 2 d1
P
Ν (k )
q(k)
Figure 1: A dynamic model of the single queue single processor using Pontryagin’s maximum principle. Controller performance is evaluated using a representative e-commerce workload. We also formulate a decentralized optimal control problem when we usually use a cluster of processors serving a common queue of requests. Through effective estimation, single processor optimal control scheme can be easily applied to distributed systems without additional derivation and algorithm design. The scalability, modularity , robustness, reliability, and autonomy of decentralized control are demonstrated through simulation experiments. The rest of the report is organized as follows. Section 2 describes the system model, the SLA based dynamic optimization objective function, and key optimal control concepts. The control problem is then formulated and solved by the discrete version of Pontryagin’s maximum principle in Section 3. Then we extend and solve a decentralized control problem in a single queue multiple processors case in Section 4. Simulation results using representative workloads are provided in Section 5. Finally, we conclude with discussion on the future work in Section 6.
2
Preliminaries
This section describes the system model, specifies the self-optimization objectives, and introduces optimal control concepts.
2.1
System Model
The operating dynamics of processor P is captured via the simple queuing model shown in Fig. 1. External client requests, such as files, web pages, and objects, must be serviced by P in a first-come first serve fashion. An important advantage 4
of this model over queueing theory based model is that we do not assume an a priori arrival rate distribution but use real-world workload traces. Λ(k) and N (k) denote the average arrival and processing rates of requests {dj } during time step k. The queue size q(k) during time step k is a one-dimension state variable in the following nonlinear time-varying discrete-time state equation: ½ µ ˆ q(k + 1) = max q(k) + Λ(k) −
u(k) cˆ(k) · umax
ω(k) = (1 + q(k)) · cˆ(k)
¶
¾ Ts , 0
(1) (2)
We assume that P can be operated at frequencies adjustable over a continuous domain [umin , umax ]. Therefore, if the time required to process dj at time k while operating at the maximum frequency umax is cˆ(k), the uncertain but predictable processing time, then the time needed to process dj while operating P at frequency u(k) ∈ [umin , umax ] is c/α(k), where α(k) = u(k)/umax is the scaling factor. The queue size at the end of the next sampling period Ts is determined by current queue ˆ size q(k), packet arrival rate estimate Λ(k) as an environment disturbance, and the corresponding processing rate α(k)/ˆ c(k). The operating frequencies u(k) define the constrained control input for the system. The system output, described by (2), is the response time ω(k), which includes the waiting time for packets in the queue and the processing time in P .
2.2
Optimal Control Concepts
Optimal control employs a predictive or proactive approach to generate a sequence of control inputs over a look-ahead horizon while estimating changes in operating conditions [16]. A convex cost function comprising both the state and control (decision) vectors is to be minimized within the constraints imposed by the underlying system dynamics. The discrete-time optimal control problem is to find the sequence ui , ..., un−1 to minimize Ji = Φ(n, xn ) +
n−1 X k=i
5
Lk (xk , uk )
Figure 2: The optimal control block diagram subject to the system model constraint xk+1 = f k (xk , uk ) where [i, n] is the control horizon of interest, Φ(n, xn ) a cost function of final step n and the final goal state, and Lk (xk , uk ) is a time-varying cost function at each intermediate time step k within [i, n]. The idea to solve this kind of problem is to introduce a Lagrange multiplier λk for the state equation constraint at each time step and then to solve it as an unconstrained optimization problem. After defining a Hamiltonian function H, which is an adjoint cost function including the state equation constraints, we can establish the adjoint system comprising the original state equation and the costate equation governing the Lagrange multiplier λk . State xk develops forward while the costate recurs backward in time, thereby defining a two-point boundary-value problem (TPBVP) [16]. The TPBVP yields the optimal solution, and usually the control is the state feedback of the system. Figure 2 shows the structure of a typical optimal controller for our discrete time queueing system. The relevant inputs of the operating environment such as workloads arrival patterns are predicted, then the controller computes the optimal inputs over a finite control horizon, according to the estimation of future environmental disturbance and the observation of current system states. During each control horizon, the control task is solved as a state regulation problem, updating initial and final states. Run-time adjustable parameters r and s are the weighting factor of performance index which will be defined in the next section.
6
Figure 3: A family of utility functions
2.3
Performance Specifications
At each time step k, the optimal controller aims to satisfy the QoS goal for incoming requests while minimizing processor power consumption. We represent the QoS objective via a service level agreement (SLA), usually included as part of the business contract between a service provider and its clients. It specifies the desired performance objectives for both sides and the financial rewards (consequences) for meeting (missing) those objectives [24]. An SLA may also depend on the level of load presented by the client. From the server’s viewpoint, it specifies rewards, in term of dollars, for achieving different (average) response times as well as the refund to be paid if a certain threshold is exceeded. To be consistent with the optimal control objective, i.e., to minimize a weighted cost function including both power consumption and the QoS requirement, usually for system response time ω, we use a family of utility functions S(ω) to represent or approximate SLAs, as shown in Fig. 3. Usually a staircase function can be used to directly translate SLAs into mathematical function form. The overall performance 7
is penalized by a positive value (refund) if failing to meet the critical threshold, and a negative value (reward equivalently) when a target response time is met. Since this category of non-differentiable functions will bring some trouble in solving optimal control based on Lagrangian multiplier methods, a group of polynomials with the same target response time ω0 (minimum point) can be used to specify the same SLAs. We use a continuous quadratic form of the SLA in the weighted cost function for the optimal controller.
2.4
Workload Forecasting
Returning to Fig. 2, to estimate processor behavior over the prediction horizon, the environment disturbance in terms of the packet arrival rate must be estimated. Various prediction models have been previously proposed for performance estimation of computer systems. In [32], an autoregressive model to predict trends in network traffic is developed, while [33] combines a Kalman filter with an autoregressive model to detect changes in web server workloads. The authors of [34] presents shortand long-term prediction algorithms to estimate various performance variables in a computer system including abnormal events such as QoS violations and system failures. We developed an appropriate forecasting model to predict request arrival rates based on key characteristics of representative workloads. A number of published e-commerce workloads [3, 21, 4] exhibit cyclical trends. Therefore, we conclude that such workload patterns may be predicted using an ARIMA model [5], used when the increase (decrease) in a series of values persists for an extended time. We use a state-space form of this model, implemented using a Kalman filter, to provide load estimates to the controller.
3
Problem Formulation
This section formulates processor power management as an optimal control problem with state and input constraints and derive the online control solution by a discrete version of Pontryagin maximum principle and numerical algorithm.
8
Assuming a continuous domain of processor operating frequencies, an optimal controller is designed to find the optimal control input u(k) which maximizes the operating profit for the processor. Given the discrete-time dynamic system described in equations (1) and (2) with initial condition q(0) = q0 , the performance index of interest is as follows: J = Φ(q(N )) +
N −1 X
[S(q(k)) + R(u(k))]
k=1
1 = v(q(N ) − rN )2 2 N −1 X 1 1 + [ s(ω(k) − ω0 )2 + ru2 (k)] 2 2 k=1
(3)
where response time ω(k) is a function of state variable q(k) according to (2); ω0 is the response time set point. Φ(q(N )) is a quadratic cost function for final step, which penalizes the number of requests left in the queue at the end of control horizon, where rN is usually set to zero in order to deplete the queue. The timevarying intermediate cost function includes the weighted utility function S(q(k)) and a quadratic term for power consumption R(u(k)). The performance index (3) is subject to the dynamic system equation constraint which is now rewritten as a differentiable state equation: µ ˆ q(k + 1) = q(k) + Λ(k) −
u(k) cˆ(k) · umax
¶ Ts
(4)
with state and control inequality constraints. (
q(k) ≥ 0 umin ≤ u(k) ≤ umax
(5)
The control problem is to find the control u∗ along the finite horizon [0, N ] that drives the system along a trajectory q ∗ subject to the above constraints such that the performance index J is minimized.
9
4 4.1
Online Optimal Controller Design Optimization with state and control constraints: Pontryagin’s maximum principle
The formulated problem has inequality constraints on both state variables and control inputs [12, 6]. Rewriting inequalities (5) in a congruous form, C1 : −q(k) ≤ 0
(6)
C2 : −u(k) + umin ≤ 0
(7)
C3 : u(k) − umax ≤ 0
(8)
where (6) is called a pure state constraint. We call q(·), u(·) a feasible pair if q(·) is the state trajectory corresponding to u(·) satisfying constraints (6)-(8). Such a pair minimizing equation (3) is called an optimal pair. For constraint C1 , there are two possibilities for the optimal pair: −q(k) < 0 or q(k) = 0, and similarly for constraints C2 and C3 . In the former case, the constraint is not effective and can be ignored. In the latter case when feasible pair is on the constraint boundary, Pontryagin’s maximum principle states that the necessary condition for a sequence of feasible pairs to be the optimal solution is that there must exist a sequence of costate λ and Lagrange multiplier µ values such that H(k, q ∗ , u∗ , λ∗ , µ∗ ) ≤ H(k, q, u∗ , λ∗ , µ∗ ), ∀q ≥ 0 (9) where the Hamiltonian function H is given by H = S(q(k)) + R(u(k)) ¶ ¸ · µ u(k) Ts +λ(k + 1) q(k) + Λ(k) − c · umax +µ1 (k)(−q(k)) + µ2 (k)(−u(k) + umin ) +µ3 (k)(u(k) − umax )
(10)
The Lagrange multipliers have a physical meaning: they capture the sensitivity of the cost function to variations in the queue length and boundary constraints.
10
The Lagrange multiplier µ = [µ1 )(k), µ2 (k), µ3 (k)]T must satisfy the following equations: µ1 (k) ≥ 0, µ1 (k)(−q(k)) = 0
(11)
µ2 (k) ≥ 0, µ2 (k)(−u(k) + umin ) = 0
(12)
µ3 (k) ≥ 0, µ3 (k)(u(k) − umax ) = 0
(13)
For the constraints (6)-(8), a subinterval with (k1 , k2 ) ⊂ [1, N ] with k1 < k2 is called an interior interval of a trajectory q(k) if Ci < 0 for all k ∈ (k1 , k2 ), i = 1, 2, 3. A subinterval [k1 , k2 ] with k1 < k2 is called a boundary interval if Ci = 0 for all k ∈ (k1 , k2 ). An instant k1 is called an entry time if there is an interior interval ending at k = k1 and a boundary interval starting at k1 . Correspondingly, k2 is called an exit time if a boundary interval ends at k = k2 with an interior interval starting at k2 . If the system trajectory touches the boundary at time k, then k is called a contact time. Together, the entry, exit and contact times are termed junction times. The control algorithm must consider the following complete set of cases. Case 1: In the interior interval, constraints (6)-(8) are less than 0. So, according to equation (11)-(13), we have µ1 = µ2 = µ3 = 0, and the terms corresponding to the boundary constraints vanish in the Hamiltonian function. Thus, constraints (6)-(8) are not effective and can be safely ignored. The optimal control for the system is solved using the Lagrange multiplier approach [7], yielding the following state equation (14), costate equation (15), and stationarity conditions (16): q(k + 1) =
λ(k) = = 0 = u(k) =
u(k) ∂H k ˆ = q(k) + Ts Λ(k) − Ts ∂λ(k + 1) cˆ · umax ∂H k ∂f k ∂Lk = λ(k + 1) + ∂q(k) ∂q(k) ∂q(k) λ(k + 1) + s · q(k) ∂H k Ts =− λ(k + 1) + ru(k) ∂u(k) cˆ · umax Ts λ(k + 1) r · cˆ · umax
11
(14)
(15)
(16)
with boundary conditions q(0) given and λ(N ) =
∂Φ(q(N )) = v(q(N ) − rN ) ∂q(N )
(17)
Substituting u(k) in (14) with (16) yields a discrete TPBVP, given initial and final costate q0 and λN , respectively. This can be numerically solved by shooting methods that provide an effective initial guess for the costate and then evaluate the solutions for the difference equations. Case 2: In the boundary intervals when constraint C1 = 0, the queue is empty. Now, assuming the boundary constraints of C2 and C3 are not met during these intervals, we have µ1 > 0, µ2 = µ3 = 0 according to the definitions (11) - (13). Then we can setup the following equations for solving the optimal control using partial derivatives of the Hamiltonian function. q(k + 1) =
∂H k ∂λ(k + 1) u(k) =0 cˆ(k) · umax
(18)
∂H k = λ(k + 1) + s · q(k) − µ1 (k) ∂q(k)
(19)
∂H k Ts =− λ(k + 1) + r · u(k) ∂u(k) c · umax
(20)
ˆ = q(k) + Ts · Λ(k) − Ts
λ(k) = 0=
∂H k = −q(k)λ(k + 1) 0= µ1 (k)
(21)
Note that condition (21) is nothing but the boundary constraint. Due to the constant state variable, discrete TPBVP is simplified to a direct evaluation of recursive costate equation given the local initial condition of entry time. Case 3: In the boundary intervals when C2 = 0, the processor is operating at the lowest frequency. Once again, assuming the boundary constraints for C1 will not be encountered during these intervals, i.e. the queue is not empty, we have µ2 > 0, and µ1 = µ3 = 0. The optimal controller is given by the same state and costate equations as (14) and (15), but the stationarity condition is given by (22) and (23)
12
as follows: 0=
∂H k Ts = ru(k) − λ(k + 1) − µ2 (k) ∂u(k) cˆ · umax
(22)
∂H k = −u(k) + umin ∂µ2 (k)
(23)
0=
Case 4: This case is derived in similar fashion to case 3. We only need to replace −µ2 and umin with µ3 and umax , respectively.
4.2
Numerical Algorithm
The key to solve optimal control problem is to solve a TPBVP given initial and final state. This kind of problem in discrete time Hamiltonian systems usually can be solved by shooting method which makes an initial guess, evolves the systems, adjusts the guess according to the error on the final state. However, the state and control inequalities constraints bring more difficulties in determining the entry time and exit time, then the trajectories of Lagrange multipliers µi to exactly solve the optimal control, even though Bryson and Ho [7] mentioned the corner conditions for inequality constraints. We propose a quick and straightforward numerical algorithm which takes advantage of off-the-peg scalar nonlinear zero finding algorithms (available in MATLAB and even MS Excel) and considers all the cases mentioned in last section. Figure 4 shows the numerical algorithm based on the derivations from Section 4.1. Although designed for our specific case study, the idea and approach is applicable to other resource management problems provided that the system state-space models and cost functions are correctly established. During each sliding look-ahead control horizon N , the controller aims to regulate the processor queue size down to zero taking the current queue size as initial state. Given ˆ the prediction of arrival rate Λ(k) and processing time cˆ(k) by the estimators over the control horizon N , OptCoPM algorithm starts from an initial guess for costate variable λ at time 0, and then updates the costate equation (15) and computes optimal control according to the costate at each time step. Once we have the control we can update the state equation (14) for the next step. Lagrange multipliers are either zero (inside the constraints (6)- (8)) or some nonzero amounts (on the
13
ˆ cˆ, v, r, s) Procedure OptCoPM (q0 , qN , N, Λ, /* Given initial and final state q0 , qN , control horizon N , traffic and processing time estimate Λ, c, cost weights v, r, s */ repeat guess costate λ(0); x(0) := q0 , µi (0) := 0, i ∈ {1, 2, 3}; /* Initial state and Lagrange multipliers */ k := 0; repeat Compute costate at time k + 1: λ(k + 1) := λ(k) − s ∗ x(k) + µ1 (k); Compute control from costate variable: u(k) := λ(k) ∗ Ts /ˆ c(k)/umax /r; if u(k) > umax then u(k) := Umax ; /* control upper limited */ µ3 (k) := λ(k + 1) ∗ Ts /ˆ c/umax − r ∗ umax ; /* compensate the costate equation by µ3 */ else if u(k) < umin then u(k) := umin ; /* control lower limited */ µ3 (k) := λ(k + 1) ∗ Ts /ˆ c/umax + r ∗ umin ; /* compensate the costate equation by µ2 */ else µi (k) := 0, i ∈ {2, 3}; end if Compute costate at time k + 1: ˆ x(k + 1) := x(k) + (Λ(k) − u(k)/umax /ˆ c(k)) ∗ Ts ; xtemp := x(k + 1); /* temporary variable */ if x(k + 1) < 0 then x(k + 1) := 0; /* state positively constrained */ µ1 (k + 1) := −s ∗ xtemp ; /* compensate the state equation by µ1 */ end if until k ≥ N until |x(k) − (λ(k)/v + qN )| < ² /*terminal condition*/ Compute N-step optimal control uopt according to solved initial costate λ(0); Return uopt ; Figure 4: The optimal control algorithm boundaries of constraints) determined by compensating the equations after setting state or control variables to be boundary values. The algorithm terminates once
14
Arrival rate (request/sec) oper. freq. u(k) (Hz) queue size x(k)
(a) traffic trace 300 200 100 0
1800 1600 1400 1200 1000 800 600
1
2
4
5
6 7 time k (b) optimal control (v=50, s=5)
8
9
10
8
9
10
8
9
10
r=5 r=25 r=50
1
2
3
4
5
6 7 time k (c) state trajectory (v=50, s=5)
4
5
r=5 r=25 r=50
2000 1000 0
3
1
2
3
6
7
time k
Figure 5: An optimal control example with fixed weights v and s but different r: (a)traffic trace, (b)operating frequencies, and (c)queue sizes TVBVP boundary condition (17) is met in a certain level of precision.
4.3
Example
We hereby give an example to demonstrate how optimal control algorithm generates the optimum decision of operating frequency according to the choice of different combination of cost weights within one finite horizon. A processor whose operating frequency is manipulated from 600 MHz to 1.8 GHz by an optimal controller is serving the traffic load as Fig. 5(a) shows. Maximum processing ability is about 206 requests per second assuming each request has the equal processing time. We will deal with the variable processing time in the following on-line simulation. Sampling period is equal to 5 seconds, and a finite control horizon of 10 steps was simulated. We fix the weights for final state v = 50 and queue size 15
s = 5, and compare the state trajectory and control inputs under different weights for energy r. As Fig. 5 (b) and (c) illustrate, a larger r will produce a smaller energy dissipation by using lower operating frequency which however will slow down the processing speed to increase average queue size and therefore deteriorate the quality of service. In section 6 we will propose and test a decentralized control scheme when on-line continuously running optimal controllers are combined with Kalman filters.
5
Stability
The continuous state discrete-time dynamical queueing model (4) is defined by a first order nonlinear affine system where the affine dynamic is defined over a positive S real state-input space R+ {0} × R+ . Strictly, the state of system is not continuous because the queue length can only be integer. However we still can treat the system as continuous state system considering the fact that the measurement and estimation of the arrival rate and service time per request are acquired as positive real values. For simplicity of the stability analysis, we rewrite the system in a more generous form, x(k + 1) = max {x(k) + Ts · Λ(k) − Ts · u(k), 0} (24) Without loss of generality, in this system we drop the frequency scaling and request processing time c(k) by a generalized input u(k), and use commonly used symbol x to stand for state variables. We discuss the stability of the autonomous system (the system without external input), and the input-state stability of the non-autonomous system, or equivalently stabilization. Considering a finite stability horizon N , the system is said to be marginally stable or stable if, for each ε > 0, there is δ = δ(ε) > 0 such that x(0) < δ ⇒ x(k) < ε, ∀0 ≤ k ≤ N . It is unstable if it is not stable. It is asymptotically stable if it is stable and δ can be chosen such that x(0) < δ ⇒ lim x(k) = 0. We have k→N the similar definition of stability with Khalil[15] but all the state is constrained to be nonnegative. We consider a finite horizon for reasons of the system’s real-time response. If the state decays too slowly, we claim it unstable or marginally stable 16
in the stability horizon, although it might be asymptotically stable in the long run, or globally asymptotically stable. Due to the uncertainty of the arrival rate Λ(k), we think of our model as a perturbation of the nominal system x(k + 1) = x(k), which apparently is marginally stable. However, the perturbed system is unstable because the arrival rate is always nonnegative even though it is bounded. Then for the non-autonomous system, a N N X X necessary condition for stability is Λ(i) ≤ u(i). The processing capability i=0
i=0
has to be no less than the arrival capability. The system can be stabilized because the arrival rate is bounded, and then even though we cannot guarantee that the requests can be handled by one processor, we apply multiple processors to satisfy the stabilization necessary condition. Then the optimal control can always stabilize the system and more importantly, it generates the optimum decision. In the next section, we will introduce the decentralized cooperative decision-making to show these great features of optimal control.
6
Decentralized Control
In this section we extend our optimal control algorithm to the design of the decentralized controllers in a distributed computing environment to solve a class of multi-processor or server cluster self-optimization problem. Consider a multi-processor system or server cluster consisting of m processors, as shown in Fig. 6. Requests in the common queue which holds the confluent traffic from different clients are assigned to the m processors through a central dispatcher. Zhang [35] formulates this problem into active processor selection, task allocation and voltage scaling, and solved the stochastic optimization problem based on discrete event system model and queueing theory. Under our continuous-state discrete-time dynamic system model, we adopt a hybrid task allocation strategy. During each P time step, a fraction αi (0 ≤ αi ≤ 1, m i=1 αi = 1) of incoming requests is statically allocated to processor i. These fraction are dynamically updated according to the operating frequencies of all the processor during the current time step. Once a processor fails, a zero fraction will be calculated and no more task will be assigned to that processor. 17
Figure 6: A multi-processor distributed control architecture Assuming that the uncertain traffic and processing time are predictable by some effective filters, we can extend our optimal controller to this distributed scenario for each processor with DVS feature. The set of self-optimizing processors can be considered as non-communicating multi-agents, which are not required to have information about the exact behavior of the other agents. Each processor can be diverse in parameters configuration and processing ability, and has its own objective function to be optimized. The structure of distributed controller is also shown in Fig. 6. Inside the dynamic optimization module, the controller will still run the aforementioned online optimal control algorithm for the processor operating frequency. However the state-space model of the mth processor now takes multiple processor’s cooperation effect into consideration: ¶ ¾ ½ µ u(k) + ν ˆ (k) ˆ Ts , 0 (25) q(k + 1) = max q(k) + Λ(k) − cˆ(k) · umax Besides the forecast for the arrival rate of request and processing times in the next N steps, the model needs another estimate νˆ(k) which predicts the total frequencies of other processors normalized in the frequency range of current processor if the cluster 18
Procedure OLOptCo (N, Q0 ) /* Given finite control horizon N and initial queue size */ Initialize the ν estimator using Q0 ; /* Main loop*/ repeat ˆ 1 , ..., Λ ˆ N } for the next N steps; Acquire the traffic estimates {Λ Acquire the request processing time estimates cˆ; Update ν estimator based on the new queue size Q; ˜ := Λ ˆ − νˆ/ˆ Λ c/Umax ; /* relative arrival rate */ Call the optimal control algorithm: ˜ cˆ, ...); uopt := OptCoP M (Q, 0, N, Λ, Apply the first control uopt (1) to the processor DVS; k := k + 1; Store the history data of control; until Processor failure || Other terminal conditions met Figure 7: The on-line control iterative is heterogenous. The observation of this estimate takes the following form: µ ν(k) =
∆q(k) Λ(k) − Ts
¶ c(k)umax − u(k)
(26)
where ∆q(k) is the change of the queue, which is produced by the arrival and departure served by processors. ν(k) is nothing but the residue of ∆q(k) after deducting arrival and its own serving ability. One exception is the following situation: when the queue is empty or almost empty, and the arrival traffic is so light that even one processor can process them in one sampling period. Because we don’t provide methods to turn off the servers, in this case ν estimator will underestimate but still can guarantee the QoS requirement by over-controlling. Then we can still apply the single processor’s Pontryagin’s maximum principle based optimal control algorithm by using the relative arrival rate which is the arrival rate subtracting the processing rate caused by ν estimate. In order to apply our proposed optimal control algorithm to a cluster of servers, a synchronizing mechanism is required to guarantee all the distributed controller maintain the same discrete time clock. This is because each controller uses the common queue length as it state variable of dynamic model. The iterative approach in Fig. 7 shows the on-line DVS scheme of one decentralized 19
controller. A Kalman filter running as a traffic predictor will provide N-step ahead forecast of arrival rate to all the decentralized controllers at each time update. Since the execution time of each individual requests can also be predicted through effective statistical methods [14], the prediction of processing times during the next N sampling periods can be forecast by a tuned EWMA filter. Each controller will maintain its own ν estimator to forecast its co-processors’ contribution, and we find Kalman filter will provide the best precision. Then we can compute the relative arrival rate for each processor and use exactly the same optimal control solver to find out the optimal frequency decision for the next N steps. But the whole N-step controls are not necessary to be applied. After all the processors apply the first step of control for the upcoming sampling period, we have the new information of traffic rate and queue size to update the filters so that we can have more precise predictions. As we can see, solving optimization problem using decentralized control in this multiprocessor scenario will make the complexity only increase linearly with respect to the number of processors. The only overhead except control algorithm is to maintain a ν estimator for each processor. For more practical consideration, if the processor does not support continuous DVS, optimal control can feed into a discretizer shown in Fig. 6 to obtain the suboptimal decision for processor frequency. We will also simulate this case in the next section.
7 7.1
Performance Evaluation Workload Generation
The performance of the optimal controller and application in distributed systems are evaluated using a representative e-commerce workload. Our experiments simulated multiple servers processing a synthetic workload used in [14]. This workload rate information was generated using real-world HTTP traces of requests made to an Internet service provider in the Washington DC area over a week, parts of this workload are shown in Fig.8(a) and Fig.11(a). This workload also includes the generation of execution times of the individual requests and their distribution within 20
(a) Traffic trace and estimate Arrival (req./s)
800 actual estimated
600 400 200 0
50
100
150 200 250 time k (c) Response time observed at the end of each sampling period
Time (sec.)
8 ω0=4 sec
6 4 2 0
50 5
x 10
100
150 200 time k (c) Operation cost during each sampling period
Cost
15
250
proc.1 proc.2 proc.3 proc.4
10 5 0
50
100
150 time k
200
250
Figure 8: (a) The synthetic workload and the predicted values obtained using the Kalman filter, (b) the queue sizes, and (c) response times the arrival stream. The distribution was determined using two important characteristics of most web workloads: popularity and temporal locality. Then the processing time of each individual request was simulated from the distribution. Our approach, however, cannot be directly compared to similar work like [14] or [19] because we are studying the performances of multiple decentralized controllers not just single processor.
7.2
On-line Control Simulation
In this experiment, we simulated four heterogeneous servers consisting of two types of frequency continuously adjustable processors to serve the synthetic workload. Homogeneous systems will be much easier to be handled by our approach. We choose four servers for the purpose of comparing the behaviors of different servers even 21
2000
2000
1500
1500
freq. (MHz)
freq. (MHz)
Operating Frequency (optimal control) of processor 1 Operating Frequency (optimal control) of processor 2
1000
500
1000
500 50
100
150 time k
200
250
50
100
150 time k
200
250
2000
2000
1500
1500
freq. (MHz)
freq. (MHz)
Operating Frequency (optimal control) of processor 3 Operating Frequency (optimal control) of processor 4
1000
500
1000
500 50
100
150 time k
200
250
50
100
150 time k
200
250
Figure 9: The optimal operating frequency settings computed by the four distributed controllers though the approach can be extended to large scale distributed systems. Among these two types of processors, processor 1 and 2 both have minimum speed 600M Hz and maximum speed 1.8GHz, and the speeds of processor 3 and 4 range from 800M Hz to 2.0GHz. All the servers had wo = 4 seconds desired response time for the requests. We set their control/prediction horizons to 5 steps, and synchronized their sampling period to 1 seconds. Fig. 8(a) illustrates a small portion of the workload and its corresponding traffic predictor (Kalman filter) outputs. With a linear time series trend model [1 1; 0 1], we observe that the Kalman filter estimates this workload arrival rate information precisely. The distributed controllers acquire the predicted arrival rate of 5 steps ahead from the Kalman filter and compute their own optimal sequences of frequencies over this sliding control horizon. But only the first decision of frequency is applied then each controller repeats the prediction and computation procedure 22
ν estimate of processor 1 7000
ν estimate of processor 2 7000
actual estimated
6000
5000 freq.(MHz)
freq.(MHz)
5000 4000 3000
4000 3000
2000
2000
1000
1000
0
actual estimated
6000
50
100
150 time k
200
0
250
50
ν estimate of processor 3 7000
250
5000 freq.(MHz)
freq.(MHz)
200
actual estimated
6000
5000 4000 3000
4000 3000
2000
2000
1000
1000
0
150 time k
ν estimate of processor 4 7000
actual estimated
6000
100
50
100
150 time k
200
0
250
50
100
150 time k
200
250
Figure 10: The ν estimates computed by the four distributed controllers when new workload and queue actual observation are available. In order to show the effect of cost weights on the behaviors of distributed controllers, we set the weights of the processors to be the following two groups of values: weights towards energy r = 1 and weights towards QoS (response time) s = 50 for processor 1 and 2; r = 0.5 and s = 100 for processor 3 and 4. The first two processors have more intentions to minimize their power consumption while the second two processors take the QoS requirement as higher priority. To compare the effects of r and s, the weights v are set to 5 for all the processors, which describe how close the final queue sizes are regulated to the desired values in each control horizon. Different decision behaviors under different combination of cost weights can be clearly observed from the frequency switching activity of each controller in Fig. 9, which exactly matches our expectations for each individual controller. Overall collaborated performance of distribute controllers is promising; in this ex-
23
periment, they cooperated to maintain the response time below or close to desired ω0 = 4sec in Fig. 8(b). The real-time costs of different processor are shown in Fig. 8(c), while these costs should be distinguished from the optimization objective cost function which is the summation of costs over control horizon. At the same time the choices of operating frequency are optimal, strictly suboptimal due to the errors of traffic estimator and the aforementioned ν estimators. However this effects have been minimized by using optimal filters (Kalman filters). The comparison of actual trajectories of ν and estimates can be satisfactorily found in Fig. 10. The large errors only happen when the traffic load is lower than the minimum processing ability of the rest of the processors. As we explained in the previous section, it won’t bias the correct decision. The overhead of optimal control is small; our simulation shows that for each 2 second long time step, the computation time for the two filters updating and optimal control of 4 distributed controllers is about 20ms, thus we believe that the overhead for an individual optimal controller is tiny.
7.3
Response to server failures
Besides the optimal performance and relative small computation cost, another impressive feature of our decentralized optimal controllers is self-adaptiveness to one or more processors’ failure, which greatly improves the overall system robustness. In this set of experiments, we use another portion of the synthesized workload to test the system reaction to processor failure. This portion of workload and the corresponding prediction shown in Fig. 11(a) were selected lighter than the previous simulation to guarantee the traffic can be handled by at least two processors because we will manually bring two processors down to simulate processors’ failure. We applied the same decentralized controllers to control the behavior of the same four web servers. We kept all the parameters including cost weights, control horizon and sampling period unaltered. To simulate the processor’s failure, we manually shut off processor 2 after 120 steps, and processors 3 after 150 steps, which can be observed from Fig. 12 that the operating frequencies suddenly dropped to zero. However, from the response time results in Fig. 11(b) we note that the performance of overall system was not subject to deterioration due to the loss of some proces24
(a) Traffic trace and estimate Arrival (req./s)
800 actual estimated
600 400 200 0
50
100
150 200 250 time k (c) Response time observed at the end of each sampling period
Time (sec.)
8
ω0=4 sec
6 4 2 0
50 5
x 10
Cost
15 10 5 0
100
150 200 time k (c) Operation cost during each sampling period
250
proc.1 proc.2 proc.3 proc.4
50
100
150 time k
200
250
Figure 11: (a) The synthetic workload and the predicted values obtained using the Kalman filter, (b) the queue sizes, and (c) response times sors. From the user’s point of view, the quality of service was not affected by this outage. Fig. 12 shows the reaction of the processors to this exception. After processor 2 fails at k = 120 and processor 3 fails at k = 150, the rest members of the cluster rapidly bump up their frequencies to process the requests which can not be processed by the failed ones. As noted in the design of distributed controller, there is no communication need for the failed processors to notify their group. The reactivity was accomplished by using the aforementioned ν estimator to estimate other processors’ contribution to the common task. Each processor only needs to know the total number of processors in the current cluster and has observation to the queue. As we can see in Fig. 13, the ν estimators in processor 1 and 4 were aware that their neighbor’s contribution decreased after k = 120, and k = 150. This drove the processor 1 and 4 worked faster, and processor 4 even faster than 25
2000
2000
1500
1500
freq. (MHz)
freq. (MHz)
Operating Frequency (optimal control) of processor 1 Operating Frequency (optimal control) of processor 2
1000
500
1000
500 50
100
150 time k
200
250
50
100
150 time k
200
250
2000
2000
1500
1500
freq. (MHz)
freq. (MHz)
Operating Frequency (optimal control) of processor 3 Operating Frequency (optimal control) of processor 4
1000
500
1000
500 50
100
150 time k
200
250
50
100
150 time k
200
250
Figure 12: The optimal operating frequency settings computed by the four distributed controllers processor 1 because of their setting of cost weights. Since the optimal controllers are running on-line, the cost weights are able to be adjusted adaptively by some human interference or some other upper level administrative software, which brings more flexibility to real-time applications.
7.4
Controller parameters
One of the important parameters of this category of discrete controllers is sampling period. Sampling period specifies the controller how often it will generate a new control input to the processor which will affect the average response times of the incoming requests and processor power consumption. When the sampling period is increased, the controller needs longer time to adjust the processor to handle the time-varying requests, which brings larger error to the target response time. 26
ν estimate of processor 1
ν estimate of processor 2
7000
7000 actual estimated
6000
5000 freq.(MHz)
freq.(MHz)
5000 4000 3000
4000 3000
2000
2000
1000
1000
0
actual estimated
6000
50
100
150 time k
200
0
250
50
ν estimate of processor 3
200
250
7000 actual estimated
6000
actual estimated
6000 5000 freq.(MHz)
5000 freq.(MHz)
150 time k
ν estimate of processor 4
7000
4000 3000
4000 3000
2000
2000
1000
1000
0
100
50
100
150 time k
200
0
250
50
100
150 time k
200
250
Figure 13: The ν estimates computed by the four distributed controllers (a) MSE of Response Time vs. Sampling Periods 50
2
6 x 10 (b) Operation Cost vs. Sampling Periods
45 1.9
40 Operation cost
35 MSE
30 25 20 15 10
1.8
1.7
1.6
1.5
5 0
0
5 10 15 Sampling periods Ts (seconds)
1.4
20
0
5 10 15 Sampling periods Ts (seconds)
20
Figure 14: (a) MSE and (b) cost over sampling period Fig. 14(a) shows an almost linear increasing function of the mean square error between the actual response time and the set point with respect to sampling period.
27
5
(a) MSE of Response Time vs. Control Horizon 0.4
8.405
0.35
x 10
(b) Operation Cost vs. Control Horizon
8.4
Operation Cost
0.3
MSE
0.25 0.2
0.15
8.395
8.39
8.385
0.1 8.38
0.05 0
0
5 10 15 20 Size of prediction/control horizon
25
8.375
0
5 10 15 20 Size of prediction/control horizon
25
Figure 15: (a) MSE and (b) cost over control/prediction horizon for a single processor This simulation result allows the randomness of arrival rate and processing time of each request and is generated by Monte Carlo methods. Also we plot the simulated average operating cost versus the sampling period in Fig. 14(b). The cost saturates at around 10 seconds. Without considering the overhead of too frequent dynamic voltage scaling, the sampling period is suggested to be chosen as small as possible so that we can minimize the cost and error. Another important parameter is control/prediction horizon, which means how far we can look ahead and do an optimal provisioning of computation resources. We first simulation an single processor case, assuming the controller has the perfect prediction of traffic to only study the controller parameters. And in this simulation we widened the frequency range so that the control boundary (7) and (8) are not meet a lot because when the boundary constrains are hit the cost is not continuous function and not really minimized even though this the best decision the controller can make. Fig. 15 shows that the larger horizon we specified the more optimality we can achieve by lowering the mean square error (a) and operation cost (b). However, in Fig. 16, the simulated performance of one out of four processors shows almost the opposite tendency with respect to the control/prediction horizon, even though we still hold the perfect traffic prediction assumption. This is because the cooperation within the cluster is achieved by applying estimator to predict the teamwork. The larger prediction horizon we apply, the larger error for the prediction the controllers have and thus deteriorate the overall system performance which is in28
5
8
7
7.5
6
7
Operation Cost
MSE
(a) MSE of Response Time vs. Control Horizon 8
5 4 3
2
x 10
(b) Operation Cost vs. Control Horizon
6.5 6 5.5
0
5 10 15 20 Size of prediction/control horizon
5
25
0
5 10 15 20 Size of prediction/control horizon
25
Figure 16: (a) MSE and (b) cost over control/prediction horizon for one processor in decentralized control timately affected by the estimator design and precision. When we use the traffic estimator to feed the forecast to the controller, the simulation shows strong stochastic characteristics and deterministic monotonic performance curve will not be able to obtained. In practical use, a horizon from 4 to 7 is recommended to balance the tradeoff between optimality and estimation error.
7.5
Effects of Discrete Control Inputs
In many practical systems, frequency settings can not be continuously tunable and thus must be selected from a discrete domain. It is feasible to apply our controller to such systems by discretizing the optimal solutions. The final set of experiments assume that server frequencies are not continuously tunable, but must be selected from a discrete domain. As discussed in Section 6, our approach is still applicable to such systems by simply discretizing the obtained solutions. Figs. 17 and 18 show cluster performance where servers 1, 2, 3, and 4 allow their frequencies to be tuned in discrete steps of 100 Hz. As the results show, overall system performance is still good since control errors introduced by previous discretization steps are compensated by future control actions.
29
(a) Traffic trace and estimate Arrival (req./s)
800 actual estimated
600 400 200 0
50
100
150 200 250 time k (b) Response time observed at the end of each sampling period
Time (sec.)
8 ω =4 sec
6
0
4 2 0
50 5
x 10
100
150 200 time k (c) Operation cost during each sampling period
Cost
15
250
proc.1 proc.2 proc.3 proc.4
10 5 0
50
100
150 time k
200
250
Figure 17: (a) The synthetic workload and the corresponding predictions, (b) the average response time achieved by the cluster, and (c) the cost incurred by each controller
7.6
Summary
To summarize this section, our experiments proves our idea of applying optimal control theory to design an optimal or at least suboptimal (due to the finite horizon of prediction) system for a self-optimizing computing system. The design even can be quickly deployed to a distributed application. The performance of distributed controller is appealing, and more impressively this distributed control design is robust to processor exception but simple. Finally, as we reported, the controller overhead corresponding to each period was negligible.
30
2000
2000
1500
1500
freq. (MHz)
freq. (MHz)
Operating Frequency (optimal control) of processor 1 Operating Frequency (optimal control) of processor 2
1000
500
1000
500 50
100
150 time k
200
250
50
100
150 time k
200
250
2000
2000
1500
1500
freq. (MHz)
freq. (MHz)
Operating Frequency (optimal control) of processor 3 Operating Frequency (optimal control) of processor 4
1000
500
1000
500 50
100
150 time k
200
250
50
100
150 time k
200
250
Figure 18: The frequency settings computed by the controllers where servers allow their frequencies to be tuned in steps of 100 Hz
8
Conclusions and Future Work
We have proposed a general technique to design autonomic computing systems using concepts from general optimal control theory. In the proposed approach, the constrained control inputs governing the operating parameters of the system are obtained by solving a discrete two-point boundary value problem. The corresponding control law is derived via Pontryagin’s maximum principle. As a specific case study, we minimize the power consumed by a single processor subject to a timevarying workload while satisfying the QoS requirements. Controller performance was evaluated using a representative workload with encouraging results. Future research will proceed in the following directions: • Clients are typically grouped into multiple classes based on their service level agreements to provide differentiated service. The control formulation devel31
oped in this paper must be extended to tackle this more complex problem, where at each time instant, the controller must now decide on two variables— the operating frequency and the fraction of processing capacity that must be given to each client queue. • We will develop a more general distributed control structure using the optimal control concepts presented here to adaptively manage the operation of a more complicated computers cluster with multiply queue multiply processors.
References [1] S. Abdelwahed, N. Kandasamy, and S. Neema, “A Control-based Framework for Self-managing Distributed Computing Systems”, Proc. ACM Workshop Selfmanaging Systems, 2004. [2] T. F. Abdelzaher, K. G. Shin, and N. Bhatti, “Performance Guarantees for Web Server End-system: A Control Theoretic Approach”, IEEE Trans. Parallel & Distributed Systems, 13(1):80-96, January 2002. [3] M. Arlitt and T. Jin, “Workload Characterization of the 1998 World Cup Web Site”, Technical Report HPL-99-35R1, Hewlett-Packard Labs., September 1999. [4] M. F. Arlitt and C. L. Williamson, “Web Server Workload Characterization: The Search for Invariants,” Proc. ACM SIGMETRICS Conf., pp. 126-137, 1996. [5] G. P. Box, G. M. Jenkins, and G. C. Reinsel, Time Series Analysis: Forecasting and Control, 3rd Edition, Prentice-Hall, Upper Saddle River, NJ, 1994. [6] M. Boundkhel, L. Tadj, and Y. Benhadid, “Optimal Control of a Production System Inventory-level-dependent Demand”, Applied Mathematics E-Notes, 5(2005), 36-43. [7] A. E. Bryson and Y.-C. Ho, “Applied Optimal Control: Optimization, Estimation, and Control”, Hemisphere Publishing Co., Revised Printing, 1981. [8] A. Cervin, J. Eker, B. Bernhardsson, and K. Arzen, “Feedback-Feedforward Scheduling of Control Tasks”, Journal of Real-Time Systems, 23(1-2),2002.
32
[9] J. S. Chase, D. Anderson, P. Thakar, A. Vahdat, and R. Doyle, “Managing Engergy and Server Resources in Hosting Centers”, Proc. 18th Symp. on Operating Systems Principles, Oct. 2001. [10] Y. Chen, A. Das, W. Qin, A. Sivasubramaniam, Q. Wang, and N. Gautam, “Managing Server Energy and Operational Costs in Hosting Centers”, Proc. ACM SIGMETRICS’05, Jun. 2005. [11] A. G. Ganek and T. A. Corbi, “The Dawn of the Autonomic Computing Era”, IBM Systems Journals, 42(1):5-18, 2003. [12] R. F. Hartl, S. P. Sethi, and R. G. Vickson, “A Survey of the Maximum Principles for Optimal Control Problems with State Constraints”, SIAM Review, Vol.37, No.2, pp.181-218, June, 1995. [13] IBM Autonomic Computing, http://www.ibm.com/autonomic/, 2005. [14] N. Kandasamy, S. Abdelwahed, and J. P. Hayes, “Self-Optimization in Computer Systems via Online Control: Application to Power Management”, IEEE Intl. Conf. Autonomic Computing, 2004. [15] H. K. Khalil, “Nonlinear Systems, 3rd Edition”, Prentice Hall, 2002. [16] F. L. Lewis and V. L. Syrmos, “Optimal Control”, Wiley-Insterscience, 2nd Edition, 1995. [17] C. Lu, G. A. Alvarez, and J. Wilkes. “Aqueduct: Online Data Migration with Performance Guarantees”, In Proc. USENIX Conf. File Storage Tech., pp.219230, 2002. [18] C. Lu, J. Stankovic, G.Tao, and S. Son, “Feedback Control Real-time Scheduling: Framework, Modeling and Algorithms”, Journal of Real-time Systems, 23(12):85-126, 2002. [19] Z. Lu, et al., “Control-theoretic Dynamic Frequency and Voltage Scaling for Multimedia Workloads”, Intl. Conf. Compilers, Architectures, & Synthesis Embedded Systems (CASES), pp.156-163, 2002. [20] J. M. Maciejowski, “Predictive Control with Constraints”, Prentice Hall, London, 2002.
33
[21] D. Menasce, et al., “In Search of Invariants for e-Business Workloads”, ACM Conf. Electronic Commerce, pp.56-65, 2000. [22] S. Mascolo, “Classical Control Theory for Congestion Avoidance in High-speed Internet”, Conf. Decision & Control, pp.2709-2714, 1999. [23] J. D. Mitchell-Jackson, “Energy Needs in an Internet Economy: A Closer Look at Data Centers”, Master’s thesis, University of California at Berkeley, July 2001. [24] G. Pacifici, M. Spreitzer, A. Tantawi, and A. Youssef, “Performance Management for Web Services”, Research Report RC22676, IBM T.J. Watson Research Center, Yorktown Heights, NY 10598 (2003). [25] S. Parekh, N. Gandhi, J. Hellerstein, D. Tilbury, T. Jayram, and J. Bigus, “Using Control Theory to Achieve Service Level Objectives in Performance Management”, Proc. IFIP/IEEE Int. Symp. on Integrated Network Management, 2001. [26] T. Pering, T. Burd and R. W. Brodersen, “The simulation and Evaluation of Dynamic Voltage Scaling Algorithms”, Intl. Symp. Low Power Electronics & Design (ISLPED),pp76-81, 1998. [27] P. Pillai and K. G. Shin, “Real-time Dynamic Voltage Scaling for Low-power Embedded Operating Systems”, Proc. Conf. Mobile Computing & Networking (MOBICOM), pp. 251-259, 2001. [28] E. Pinheiro, R. Bianchini, E. Carrera, and T. Heath, “Load Balancing and Unbalancing for Power and Performance in Cluster-Based Systems”, Proc. Workshop on Compilers and Operating Systems for Low Power, Sept. 2001. [29] T. Simunic and S. Boyd, “Managing Power Consumption in Networks on Chips”, Proc. Design, Automation, & Test Europe(DATE), pp.110-116, 2002. [30] M. M. Waldrop, “Autonomic Computing: The Technology of Self- Management”, Foresight & Governance Project, http://www.thefutureofcomputing.org/Autonom2.pdf. [31] V. Sharma et al., “Power-aware QoS Management in Web Servers”, Proc. RealTime Syst. Symp., pp.63-72, 2003.
34
[32] D. Shen and J. L. Hellerstein, “Predictive Models for Proactive Network Management: Application to a Production Web Server,” Proc. Network Operations & Management Symp., pp. 833-846, 2000. [33] R. Vilalta et al., “Predictive Algorithms in the Management of Computer Systems,” IBM Systems Journal, vol. 41, no. 3, pp. 461-474, 2002. [34] F. Zhang and J. L. Hellerstein, “An Approach to Online Predictive Detection,” Proc. Modeling, Analysis & Simulation Computer & Telecom. Syst., pp. 549 556, 2000. [35] F.Zhang and S. T. Chanson, “Power-Aware Processor Scheduling under Average Delay Constraints” Proc. 11th IEEE Real Time and Embedded Tech. and Appl. Symp.(RTAS’05), Mar. 2005.
35