Feedback Control for Differentiated Servers Models

0 downloads 0 Views 78KB Size Report
Using Feedback Control to Manage QoS for Clusters of Servers Providing Service ... improve the performance of computing systems that offer .... ARMA models are in the discrete time domain. Control theory techniques are usually based on frequency domains. ..... [3] Katsuhiko Ogata, Modern Control Engineering, Prentice.
Using Feedback Control to Manage QoS for Clusters of Servers Providing Service Differentiation Wael Fouad and Hanan Lutfiyya Department of Computer Science The University of Western Ontario London ON N6A 5B7 Canada {wfouad, hanan}@csd.uwo.ca Abstract-This paper considers the use of feedback to improve the performance of computing systems that offer differentiated services. The motivation of the work is based on the increasing demand on application servers. It is not always sufficient to buy high-performance software for the server. Multiple servers may be needed. To guarantee that QoS requirements are satisfied, it is possible to statically assign resources for a specific class. This often results in underutilization of resources. This paper describes a novel technique that is based on control theory principles applied to a server cluster that provides differentiated service. The paper shows that feedback can be used to adjust the number of client requests concurrently being processed based on dynamic information such as CPU utilization. The paper also compares the use of the proposed technique with a dynamic non control-theoretic approach that is not based on control theory principles. Results show a dramatic increase in the number of served users using the control theory principles compared with a non control-theoretic approach during the same experiment duration. The improvement provided by the proposed technique exceeded 20%. KEY WORDS Differentiated Services, Feedback Control I. INTRODUCTION

Different applications that make use of the service provided by a cluster, consisting of n servers: s0, s1, s2,…sn-1, often have different performance requirements of the service. This differentiation allows for differentiated resource allocation. Service differentiation is often based on associating requests with a service class. A service class is associated with a specific set of Quality of Service (QoS) requirements. A QoS requirement is defined as a non-functional requirement that may be expressed using metrics such as response time or throughput. The QoS requirements associated with service class ci should provide a better service than those associated with the class ci+1. To satisfy QoS requirements, computing resources may be statically assigned to a class. Thus the requests associated with class ci is directed to a subset of s0, s1, s2,…sn (without loss of generality the rest of the discussion in this paper assumes that for class ci the subset is si ). That subset processes only those requests associated with class ci. It is not always possible to accurately predict peak load and prepare enough computing resources statically since client request rates tend to be bursty and fluctuate dramatically. Trying to allocate computing resources to accommodate the

potential peak is not cost-effective if the peak does not occur often. The problem with a static approach is that it may underutilize resources that could have been used for other service classes [6]. In addition the QoS requirements of an application or the QoS requirements associated with a service class may change over time An example of a dynamic approach is the following: If s1 is over-utilized and so is underutilized then requests associated with c1 should be directed to s0. If both servers are over-utilized then client requests are not admitted. A dynamic approach assumes that attributes that characterize system behavior are monitored and this monitored information can be used in resource allocation decisions. This monitored information is referred to as feedback. An attribute of system behavior is referred to as a controlled output parameter (cop). The feedback is used to adjust the system’s tuning parameters (tp) that impact the system behavior as represented by one or more control output parameters. A resource allocation decision refers to the maximum number of requests of class cj allowed to be concurrently processed at si. An example of a tuning parameter (tp) is the value that represents the number of class cj requests allowed at server si. This paper uses a feedback approach based on control theory principles [4], [3]. The motivation for using this approach is that the choice of a correct value for a tuning parameter is difficult. The optimal setting depends not only on factors such as hardware which is static but also on factors that change such as workload, the number of concurrent jobs executing in the system and the QoS requirements. Control-theoretic principles provide techniques that facilitate the determination of a good value for a tuning parameter. The paper is organized as follows. Section II describes the application of control theory to a cluster of machines where there is a differentiation between requests based on associated service classes. The control output parameter is CPU utilization. Section III describes the experimental setup. Section IV summarizes the results. Section V provides a discussion of results. Section VI is a related work section. Section VII provides conclusions and future work. II.

MODEL USING CPU UTILIZATION AS CONTROL OUTPUT PARAMETER

A control-theoretic approach requires that periodically the value of a control output parameter is provided to a control process that compares the value of the controlled output parameter (cop) to a reference value selected by the administrator in order to calculate an error value. This error value is the difference between the reference value and the

value of the controlled output parameter. Based on the error, the control process changes a tuning parameter. No error implies that the reference point has been reached and thus the tuning parameter(s) do not need to be changed. The choice of controlled output parameters depends on the metrics used in the QoS requirements associated with the service classes. In this work, it is assumed that the metric used in the QoS requirements is the desired average time it takes to process the request. This is referred to as the average service time. Response times increase with increased load i.e., an increase in CPU utilization. Hence a possible controlled output parameter is the CPU utilization. Another possible controlled output parameter is service time. The model in this section assumes that CPU utilization is the controlled output parameter. The same model could be used where the controlled output parameter is the average service time. A. Model Description The model consists of n servers s0, s1, s2,…sn-1 as shown in Fig. 1. It is assumed that higher priority classes have a fewer number of requests expected for that class since it is assumed that the cost associated with that class is higher for applications generating requests for the class. If si+1 is overloaded and si is underutilized, then it is feasible to use the excess resource capacity at server si by directing requests of class ci+1 to si in addition to si+1. Users Class 0

rs0 rs1 . . . rsn-1

e0 e1 . .

Controller

.

en-1

tp0 tp1 . . . tpn-1

Class 1

Switch

Class m-1

Server 0

Sensor 0

Server 1

Sensor 1

cop0 cop1

t, the measured CPU utilization for si is denoted by cpusi(t). The reference value for the CPU utilization for si is denoted by rsi(t). The controller measures the difference between the desired CPU utilization (reference) and the measured CPU utilization for each server si. That is, esi(t) = rsi(t )-cpusi(t). The controller uses the error value esi(t) to adjust the values of tuning parameters. B. Transfer Functions The use of controllers is common in traditional engineering domains. Controllers use defined relationships between inputs and outputs that are defined mathematically. To relate inputs and outputs, this work uses the autoregressive, moving average (ARMA) model [1] which is an example of an empirical approach. The relationships described in this section are similar to those derived in [4]. Discrete time is assumed with uniform interval sizes. The general form of the ARMA model is given by: n

m

i =1

j =0

y (t ) = ∑ ai y (t − i ) + ∑ b j x(t − j )

The input x(t) of the ARMA model represents a tuning parameter and the output, y(t), represents a controlled output parameter. The parameters n and m are the order of the model, and the ai, bj are constants that are estimated from data using least squares regression [7]. By identifying the values for n, m, ai, bj, the transfer function can be derived. The ARMA model is used to relate the output of the model to the input and also to the history of the output. ARMA models are in the discrete time domain. Control theory techniques are usually based on frequency domains. Thus, transfer functions should be converted from time to frequency domain. The frequency domain is referred to as the z-domain. The following formula is applied: ∞ (2) Y (z) = y (t ) z − t



copn-1 Server n-1

Sensor n-1

Fig. 1 Differentiated service model

The number of executing processes depends on the number of client requests that are being processed. Assuming that there are no processes spinning, the CPU utilization is a function of the number of requests. It is possible to choose the maximum number of requests that can concurrently be processed as a tuning parameter. If there is no service differentiation based on classes then we assume that this tuning parameter is sufficient. However, in the presence of service differentiation it becomes feasible to have m tuning parameters for each server. For si and class cj, usicj (t) is the maximum number of requests from class cj allowed to be processed concurrently at si . The maximum number of requests allowed to be processed concurrently at server si at time t is represented by u s (t ) and i

(1)

t =0

This is known as the z-transform [3], where z is a complex number and y(t) is the output in the time domain. This allows for the use of existing control theory principles that are usually based on frequency domains. Note that in the time domain, lower case is used (e.g., y(t)) and in the frequency domain uppercase is used (e.g., Y(z)). By applying the z-transform given in (2) and applying it to the ARMA model in the time domain given in (1), a general formula in the z-domain can be derived [4]: m (3) ∑ b j z n− j Y ( z) H ( z) =

X ( z)

=

j −0

z − (∑i =1 a i z n −i ) n

n

Extensive experimentation showed that the ARMA model is a good fit for server si, if we set y(t)=cpusi(t), x(t)=usi and n = 1, m=0. That is, N si ( z ) =

CPU si ( z ) U si ( z )

=

zb0 z − a1

(4)

j=n −1 us = ∑ usi c j . i j=0

Least squares regression was used to estimate the values of the parameters of the ARMA model b0 and a1. It was found that the R2 that measures the goodness of the model is no lower than 88% for each of the servers. The experiments assumed that there was a mix of I/O and CPU bound requests (and thus varying times were required) and that the arrival rate was Poisson.

There is a sensor associated with each server to measure the accurate control output parameter on a periodic basis. At time

C. Controller Design

where usi(t) is a tuning parameter which changes by time, Ksi is the integral gain, and esi(t) is the error that the controller’s goal is to eliminate. The control law used in this work states that a maximum number of requests allowed to be served concurrently for each class of requests and type of server is to be adjusted dynamically based on the previous values of CPU utilization and a weighted control error. Using z-transform properties, 1 the z-transform of usi(t) and usi(t-1) is Usi (z) and U si ( z ) z respectively. Thus, the application of the z-transform presented in (2) to the general form in (5) results in the following: 1 (6) U si ( z ) = U si ( z ) + K si E si ( z ) z By taking Usi(z) as a common factor in (6), then (7) can be derived 1 (7) U si ( z )(1 − ) = K si E si ( z ) z The transfer functions considered can be expressed as a ratio of two polynomials in z. Therefore, the general form of the transfer function of the controller, Gsi(z) is derived as shown in (8): G si (z) =

U si (z ) E si (z )

= K si

z z −1

(8)

The higher the value of Ksi, the more quickly the value of the tuning parameter will change. However, a very large Ksi can cause oscillations or even instabilities. The general form of the transfer function that derives CPUsi(z) is provided by: ⎛ Us (z) ⎞ ⎛ CPUs (z) ⎞ i ⎟ = E (z) * K * z * zb0 CPUsi (z) = Esi (z)⎜ i ⎟.⎜ si si ⎜ Es (z) ⎟ ⎜ Us (z) ⎟ z − 1 z − a1 i ⎝ i ⎠⎝ ⎠

(9)

and E s i (z) = R s i (z) − CPU s i (z)

(10) Solving these equations, the general transfer function for the closed loop system, Tsi(z), shown in Fig. 1 can be derived as follows: Tsi (z) =

CPU si (z) R si (z )

=

K si z(zb 0 ) (z − 1)(z − a 1 ) + K si z(zb 0 )

(11)

A classical control theory technique called root locus can be applied to calculate the value of K s i . Software packages such as Matlab perform the root locus technique [3]. This technique studies the poles and the zeros of closed loop transfer functions such as the ones presented in equation (11) as the gain increases from 0 to infinity. The roots of the numerator are called its zeros and the roots of the denominator are its poles. For equation (11), the zeros are the values of z where CPUsi(z) = 0 (thus Tsi (z)=0) and the poles are where Rsi (z)=0 (thus Tsi (z)= ∞ ). If any of the poles of

Ts i (z)

lies outside the unit circle, then Tsi(z) is said to be

unstable. The gain value associated with a pole that lies outside of the unit circle is not considered. Appropriate root values are those values that have a magnitude less than one, and also on the real axis to minimize oscillations. An example of the root locus plot is shown in Fig 2. The horizontal axis of the plot corresponds to the real part of z (Re(z)) and the vertical axis is the imaginary axis of z (Im(z)). More information on root locus can be found in [3]. Root Locus 1 0.8 0.6

K=0.2

0.4 0.2 Imag Axis

A control law describes how the controller changes the value of a tuning parameter. Due to its simplicity and yet efficiency, an integral control law was used. The integral controller produces a control action that continues to increase its corrective effect as long as the error persists. If the error is small, the integral controller increases the correction slowly. If the error is large, the integral action increases the correction more rapidly. The integral controller has the following general time domain formula: u si (t ) = u si (t − 1) + K si esi (t ) (5)

0 -0.2 -0.4 -0.6 -0.8 -1 -1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

Real Axi s

Fig 2. Sample Root Locus Plot

It is very difficult to separate measurements of the CPU utilization of different classes of requests. Hence, gain values are calculated for a control law that assumes that the tuning parameter is the total number of requests to be processed concurrently at server si. However, for server si this gain should not be applied equally to uSiCj (t) and uSiCk (t) since cj are preferred over ck requests. Hence, a weight is associated with a service class is desired. This weight is used to distinguish between the gains to be used in the control laws for uSiCj (t) and uSiCk (t) . That is, the control laws become: u s i c j (t ) = u s i c j (t − 1) + K si * Ws i c j esi (t ) (12) where parameter Ksi is the integral gain and Wsici is the weight associated with cj. A weight for cj is chosen to be higher than the gains for ck in order to offer the requests associated ci a higher service and such that j =n−1 u s = ∑ u si c j . i j =0 EXPERIMENTAL SETUP The server application is implemented so that it is representative of typical application servers. This implies that these servers fork a process to handle each incoming client connection and log information to a log file. A sensor process for measuring CPU utilization uses the UNIX sar command which is used every one second. The values are accumulated in a one minute time interval for computing an average. The reference values for CPU utilization are set at 85%. The workloads used in the experiments have the following properties: (i) The arrival rate is Poisson; (ii) It is assumed that the higher priority a service class is, the fewer requests expected for that class. This is a reasonable assumption since it is expected that fewer users will pay a higher cost associated with a higher service class. However, spikes in the arrival rate of a higher service class may occur; (iii) The requests are a mix of I/O and CPU bound jobs. Each server has an upper bound on the number of requests that can be handled concurrently. Once a request III.

has been admitted, it will be processed. Preemption is not supported which is typical for many servers including web servers. If the number of requests allowed is too high, then even if incoming requests are not admitted the system is still saturated for a long time with the requests being processed regardless of the approach being used to reduce the number of requests allowed to be concurrently processed. Experimentation showed that without placing an upper bound on the number of requests that a server is allowed to handle concurrently, the system would be saturated with requests that result in a continuously very high utilization (close to 100%) until all the requests being served are served. This suggests that an upper bound value is needed to prevent this from occurring. The experimental results presented in this paper assume three servers and three classes. It is possible to define many policies that are used by the switch to direct the incoming requests to the appropriate server. The experimental results are presented in this section assume these policies: • The basic policy assumes that requests of class ci are directed to server si. This can be generalized to a request of class ci is directed to a subset of s0, s1, s2,…sn-1, where it is assumed that the number of classes is smaller then the number of servers. • The threshold policy assumes that the system starts with the basic policy. If a request of class cj is assigned to si and si is over-utilized (defined as the CPU utilization exceeding a specific threshold), then an alternative server that is underutilized (defined as the CPU utilization below a specific threshold) is used (assuming one exists). The search starts with s0 and proceeds to sj. This is the general form. In the specific set of experiments presented in this section, we assume that the only redirection allowed is redirecting the requests from s1 or s2 to s0. The control-theoretic technique is compared with a dynamic approach that is derived from the approach used in TCP congestion control. Essentially this approach reduces uSiCj by half when the utilization at si is high. It reduces it by half again if the high CPU utilization persists. It increments linearly by a constant amount when the CPU utilization is considered relatively low. This is referred to as dynamic non control-theory (DNCT) approach.

EXPERIMENTAL RESULTS For the basic and threshold policies we ran experiments using the control-theoretic approach and DNCT. We ran experiments where a spike in requests of class c0 occurred. This section summarizes the results of the experiments. Due to the lack of space we do not present graphs for each of the figures but rather summarize the results. We measure the average service time of the requests and the number of requests processed over the time period that the experiment ran. For experiments that had a spike in user class c0 occurring at s0 and using the threshold policy, the use of the controltheoretic technique processed about 27% more requests than the DNCT approach. For experiments without a spike at s0 and using the threshold policy, the control-theoretic technique processed about 12.17% more requests than the DNCT technique. In both cases, the average service time was slightly less using the control-theoretic technique than the DNCT technique. IV.

These results are not surprising. The control-theoretic technique is more responsive since the number of users depends on the difference between the actual CPU utilization and the desired CPU utilization. For experiments that had a spike occurring at s0 and using the basic policy, the use of the control-theoretic technique processed about 20% more requests than the DNCT approach. For experiments without a spike at s0 and using the threshold policy, the control-theoretic technique processed about 10% more requests than the DNCT technique. In the case of a spike, the average service time was slightly less using the control-theoretic technique then the DNCT technique. However, the reverse was true when there was not a spike. It is not clear why this was the case. It should be noted that although the discussion focused on the total number of requests and the average service time of all the requests, similar results held when examining the metrics for each class. V. DISCUSSION

This section briefly describes observations and lessons learned. Preemption. Preemption was not used in this work even for lower priority requests at servers for higher service classes. This is typical of many servers that use some form of admission control. If a process is short-lived then the process of preemption could take more time then allowing the process to finish. If the process is not short-lived, then the time already spent processing the request is wasted. It is possible to have separate queues for each request at a server. We note that for a server si requests from class ci are initially rejected until the number of requests from other classes is reduced. Use of Average Service Time. The previous section described experiments where CPU utilization was the controlled output parameter. We have also done experimentation where the average service time was the controlled output parameter. This is of interest since the average service time is more likely to be used in QoS requirements. Our initial results show that there is little difference between using CPU utilization or average service time. However, this is based on experiments where the mix of CPU bound and I/O bound jobs is the same even during a spike. It is not clear if this is always the case. More experimentation is needed. Using Multiple Controlled Output Parameters. Currently this work uses one control parameter. Initial work suggests that there is little difference between using average service time and CPU utilization. However, the experiments do not use preemption and thus at a server si it is possible that ci requests are rejected (thus implying more violations of the QoS requirements) before appropriate adjustments are made. For some service classes it may be more feasible to reject fewer ci requests since the cost of the QoS violation may be high. This may involve examining more than one controlled output parameter. These parameters may include a subset of CPU utilization, average service time and the number of QoS requirement violations for a specific class.

RELATED WORK This paper is an extension of the work described in [4], [5]. The work described in [4] applies control theory principles to a single Lotus Notes server. The work described in [5] applied feedback control to a single Apache server where multiple input and multiple output parameters were taken into consideration. The work described in [2] was an extension to the work described in [4] but with multiple servers, where requests were treated equally. Other techniques besides control-theoretic techniques have been used to assign resources to service classes. For example, optimization techniques (e.g., [8]) have been developed. The problem with optimization techniques is that analysis usually has to be done off-line which implies that it not adequate in an environment where dynamic changes implies that a responsive dynamic allocation technique is needed for allocating resources. Other techniques use the same action each time resource allocation needs to be adjusted e.g., the work in [6] distributes incoming requests to the server with the smallest number of active requests being processed. The work presented in this paper has the advantage that it responds based on the difference between the reference point and the value of the controlled output parameter. Thus adjustment depends on the size of the error. The smaller the error then the less the required adjustment needed and viceversa. The adjustment based on non-control-theoretic techniques is less likely to have the controlled output parameter oscillate around its reference point. VI.

VII. CONCLUSIONS AND FUTURE WORK

This paper describes a general model of feedback control using control theory principles applied to a cluster of servers. It shows that feedback can be used to adjust the number of client requests currently being processed based on dynamic information such as CPU utilization and average service time. m different types of requests were assumed in the paper: c0, c1, … cm-1. We assumed n servers: s0, s1, …sn-1. The discussion assumed that ci requests were directed to si. However, the model can handle having ci requests directed to a subset of servers. Requests from ci can be directed to other servers besides si if the server over-utilized and there are other servers that are not over-utilized. The results showed an improvement in average response times for lower-priority requests with minimal impact on higher-priority requests. The discussion section identified problems and issues to still be discussed. One issue not discussed is that most of the experimentation used simple policies. An interesting issue is to examine the use of more complex policies and more complex controlled output parameters e.g., one that represents profit. REFERENCES [1] ByoungSeon Choi, “ARMA Model Identification”, New York : Springer-Verlag, 1992. [2] Wael Fouad and Hanan Lutfiyya, “Using Feedback Control to Manage QoS for Clusters of Differentiated Servers”, In Proceedings of International Conference on Parallel and Distributed Computing and Systems, November 2004. [3] Katsuhiko Ogata, Modern Control Engineering, Prentice Hall, 3rd edition, 1997. VIII.

[4] S. Parekh, J. Hellerstein, N. Ghandhi, D. M. Tilbury, T. Jayram and J. Bigus, “Using Control Theory to Achieve Service Level Objectives in Performance Management”, In Proceedings of the International Conference on Integrated Network Management (IM2001), May 2001. [5] S. Parekh, N. Gandhi, J. Hellerstein, D.M. Tilbury, Y.Diao, “MIMO Control of an Apache Web Server”, Modeling and Controller Design, 2002. [6] K. Shen, H. Tang, T. Yang, and L. Chu, “Integrated Resource Management for Cluster-based Internet Services”, In Proceedings of the Fifth Symposium on Operating Systems Design and Implementation, Boston, MA, Dec. 2002. [7] Thomas H. Wonnacott, Ronald J. Wonnacott, Introductory Statistics for Business and Economics, fourth Edition, Wiley, 1990. [8] H. Zhu, H. Tang, and T. Yang, “Demand-Driven Service Differentiation in Cluster-Based Network Servers”, In IEEE Infocom, Anchorage, Alaska, Apr. 2001.