FeBid 2008 Third International Workshop on Feedback Control Implementation and Design in Computing System and Networks 2008, Annapolis, Maryland, US
Towards Fault-Adaptive Control of Enterprise Computing Systems—A Position Paper Dara Kusic
∗
Drexel University Philadelphia, PA 19104
Nagarajan Kandasamy
†
Drexel University Philadelphia, PA 19104
Sherif Abdelwahed
‡
Mississippi State University Mississippi State, MS, 39762
[email protected] [email protected]@ece.msstate.edu Guofei Jiang NEC Laboratories America Princeton, NJ 08540
[email protected]
ABSTRACT
systems is to formulate them as control problems in terms of cost or performance metrics. This approach offers some important advantages over heuristic or rule-based policies in that we can design a generic control framework to address a class of problems such as power management or resource provisioning using the same basic control concepts. Moreover, we can verify such a control scheme’s feasibility with respect to the performance goals prior to actual deployment.
There is a growing interest in implementing online control frameworks that manage distributed computing systems for power and performance objectives. While such frameworks continuously manage the system to optimize resource allocation and respond to dynamic environment input, they often rely upon static models of application behavior that do not adapt to slow behavior changes that occur during normal operation. By introducing adaptive models that dynamically adjust to the changing performance profile of an application, a robust controller can maintain performance objectives through normal changes that can occur in production and those introduced by software errors. In this paper, we characterize the effects of events that change an application’s performance profile over time. Such studies motivate the need for model-adaptive control to maintain system power and performance objectives over time under dynamic operating conditions.
Researchers from academia and industry have successfully applied classical feedback and proportional, integral, and derivative (PID) control to CPU provisioning [1], load balancing [2], and power-management problems in Web servers [3, 4]. Assuming a linear time-invariant system, an unconstrained state space, and a continuous input and output domain, a closed-loop feedback controller is designed under stability and sensitivity requirements. Others have used concepts from model-predictive control (MPC) to pose resource provisioning problems as optimization problems, which are then solved online under dynamic operating constraints [5, 6]. These techniques can also accommodate hybrid and nonlinear system behavior, dead times, as well as the cost of control. Recently, we have applied an MPC-based scheme to manage power consumption and performance in a virtualized computing environment, and experimental results obtained using a cluster of Dell PowerEdge servers indicate that the controller achieves a 26% reduction in power consumption costs over a 24-hour period when compared to an uncontrolled system while achieving the desired performance goals [7].
Key words: Adaptive model learning, dynamic control, robust control, resource provisioning
1. INTRODUCTION Web-based services such as online banking and shopping are enabled by enterprise applications. We broadly define an enterprise application as any software hosted on a server which simultaneously provides services to a large number of users over a computer network. A promising method for automating system management tasks in enterprise computing ∗D. Kusic is supported by NSF grant DGE-0538476. †N. Kandasamy acknowledges support from NSF grant CNS-0643888. ‡S. Abdelwahed acknowledges support from the NSF SOD Program, contact number CNS-0613971.
This paper motivates the need for fault-adaptive control of distributed computing systems. Our experience with the aforementioned cluster suggests that long-running software components managed by a controller be treated as adaptive processes that exhibit slow behavioral changes over time as a result of both internal and external influences. For example, consider Trade6, a stock-trading application from IBM where users can browse, buy, and sell stocks [8]. Trade6 has two main components—the WebSphere Server, housing the application logic, and DB2, the back-end database— that can be distributed across multiple machines comprising the application and database tiers. Now, the response time achieved by a long-running Trade6 application can be time
1
varying even under a constant workload intensity due to: (1) internal changes to the application as it “ages” (e.g., slow performance degradation caused by memory leaks) and/or (2) external actions wherein users and stocks are added to the database or removed from it. This time-varying behavior has significant implications on model-based control techniques such as MPC, in that, if a static model is used to predict the response time achieved by Trade6, then, over time, the predictions will not match the actual response times.
Table 1: Control performance, in terms of the average energy savings and SLA violations, for five different workloads as published in [7]. Total % SLA % SLA Workload Energy Violations Violations Savings (App. 1) (App. 2) Workload 1 18% 3.2% 2.3% Workload 2 17% 1.2% 0.5% Workload 3 17% 1.4% 0.4% Workload 4 45% 1.1% 0.2% Workload 5 32% 3.5% 1.8%
Based on above discussion, we propose FACT—Fault Adaptive Control Technology—that will integrate online parameter tuning, machine learning, and model-based diagnosis within the overall control framework to: (1) improve the accuracy of partially specified system models, (2) maintain the correctness of the model against slow behavioral changes to system components over time, and (3) identify and localize component failures resulting in abrupt changes to system behavior.
ing constraints and is applicable to computing systems with non-linear dynamics where control inputs must be chosen from a finite set. The testbed hosts multiple web-based services, comprising front-end application servers and back-end database servers. The applications perform dynamic content retrieval (customer browsing) as well as transaction commitments (customer purchases) requiring database reads and writes, respectively. The hosting hardware consists of a cluster of heterogenous Dell PowerEdge servers running VMware ESX Server 3.5 virtualization software to logically partition the computing resources (CPU, memory, disk, etc.). Incoming requests from workloads such as those shown in Fig.2 for each service are dispatched to a dedicated cluster of virtual machines (VMs) distributed upon the host machines.
We discuss FACT in the context of power and performance management in a virtualized execution environment. Virtualization allows a single server to be shared among multiple performance-isolated platforms called virtual machines (VMs), where each virtual machine can, in turn, host multiple enterprise applications. Virtualization also enables ondemand or utility computing—a just-in-time resource provisioning model in which computing resources such as CPU, memory, and disk space are made available to applications only as needed and not allocated statically based on the peak workload demand. By dynamically provisioning virtual machines, consolidating the workload, and turning servers on and off as needed, hosting center operators can maintain the desired performance objectives while achieving higher server utilization and energy efficiency [7, 9, 10].
The virtual clusters generate revenue as per the non-linear pricing scheme or service-level agreement (SLA) shown in Fig. 1 that relates the average response time achieved per transaction to a dollar value that clients are willing to pay. Response times below a threshold value result in a reward paid to the service provider, while response times violating the SLA result in the provider paying a penalty to the client.
The rest of this paper is organized as follows. Section 2 describes the physical testbed and the resource provisioning problem, and summarizes the main results. Section 3 uses the Trade6 and RUBBoS applications to make the case for treating long-running software components as adaptive processes. Specifically, we show that, even under a constant workload intensity, increasing (or decreasing) the size of the back-end database (an external influence) and small memory leaks (internal changes to the application) changes the response times achieved by these applications. Section 4 describes the proposed FACT framework as well as the research challenges that must be addressed to realize FACT in practical settings. The key challenge is to integrate control, machine learning, and diagnosis processes into a common model-based framework that will adapt to slow behavioral changes to the underlying system components as well as identify faulty components to help the reconfiguration and recovery process.
The control objective is to maximize the profit generated by this system by minimizing both the power consumption and SLA violations. To achieve this objective, the online controller decides the number of physical and virtual machines to allocate to each service where the VMs and their hosts are turned on or off according to workload demand, and the CPU share and fraction of incoming workload to allocate to each VM. Experimental results show that the hosting environment, when managed using the implemented LLC approach saves, on average, 26% more power over a twenty-four hour period when compared to a system operating without dynamic control. An uncontrolled system is defined as one in which all available host machines remain powered on. Power consumption during run-time was estimated from models of the measured power consumption of the host machines in various operating states (e.g. “idle”, in boot, n virtual machines serving workloads). These power savings are achieved with very few SLA violations, 1.6% of the total number of service requests. An uncontrolled system incurs some SLA violations due to normal variability in application performance– about less than 1% to 5% of the total requests made to the system. Results of the energy savings and SLA violations of two applications for five different workloads similar to those
2. PRELIMINARIES Recently published work by the authors [7] has shown the effectiveness of an online control framework to manage a small virtualized hosting environment for power and performance objectives. The resource-provisioning problem is posed as one of sequential optimization under uncertainty and solved using limited lookahead control (LLC), a predictive control approach previously introduced in [11]. This approach allows for multi-objective optimization under explicit operat-
2
A stepwise pricing SLA for the online services
Measured Average Response Time for the Trade6 Application 800
DVD Store SLA 7e-4
Trade6 SLA
700
5e-4
600 Response time in ms
Revenue, dollars
RUBBoS SLA 3e-4 1e-4 0 -1e-4
4.5 GHz CPU Share VM
500 3 GHz CPU Share VM
6GHz CPU Share VM
400 300
Trade6 SLA
200 -3e-4
100 1
0
2 3 Response Time, seconds
4
0
Figure 1: A pricing strategy that differentiates the online services. 14000
Number of arrivals
12000
ω (k )
4000
Predictive filter
ωˆ (k + 1DVD ) Store workload
RUBBoS workload
Approximation model
2000 0 0
s (k )
System
500 1000 1500 2000 u (k ) Time instance in 30 second increments
2500
Figure 2: Transaction requests to the online applications, plotted at 30-second intervals.
λ (k )
Predictive filter
λˆ (l )
System model
In order for the LLC approach or any model-based control strategy to be of practical value over time, it must adapt to changes in the operating conditions that can compromise the accuracy of predictions about application behavior. Without robustness to application dynamics that increase resource utilization and response times, the number of SLA violations shown in Table 1 will increase even as workload intensity remains constant. Some common conditions that slowly change an application’s performance profile over time are presented in the next section.
uˆ (l )
xˆ ( l )
x (k ) System
Optimizer
u (k )
Figure 3: The schematic of a limited lookahead conωˆ (k + 1) troller. Predictive System ω (k ) filter
3. SOFTWARE COMPONENTS AS ADAPTIVE PROCESSES
model
shown in Fig. 2 are summarized in Table 1. sˆ( k + 1)
35
In previous work using the LLC approach to perform dynamic resource provisioning [7, 12], we generated models of the applications to use in the online controller shown in Fig. 3 via simulated-based learning. An example of the profiling data captured via offline simulations of the Trade6 application is shown in Fig. 4. To collect this data, we allocated 1 MB of memory to a virtual machine and varied the CPU share to measure the average response times under increasing workload intensities. The data is then used to construct lookup tables of the average expected response times per CPU share and workload intensity. These are static models and, as such, are inherently inflexible to slow behavioral changes in the system due to software ageing or external changes introduced by a system administrator.
Trade6 workload
8000 6000
25 30 Workload in arrivals per second
Figure 4: The measured average response times used to construct a static model for the Trade6 application as a function of a VM’s CPU share. The VM is allocated 1 GB of memory.
Workload Arrivals to Three Online Applications
10000
20
uˆ ( k + 1)
We use three multi-tier, transaction-based applications for testing in our virtualized hosting environment. The first, IBM’s Trade6 benchmark—a multi-threaded stock-trading application that allows users to browse, buy, and sell stocks— is used in the work presented in [7]. Trade6 is a transactionbased application integrated within the IBM WebSphere Application Server V6.0 and uses DB2 Enterprise Server Edition V9.2 as the database component. This execution environment is then distributed across multiple servers comprising the application and database tiers.
A key aspect of implementing the LLC approach is to fors(k ) mulate a model that will estimate the future state of the System Optimizer system in order to evaluate control decisions. Fig. 3 illusu (k ) trates the basic LLC concept where the environment input λ is estimated over a prediction horizon h and used by the system model to forecast future system states x ˆ. At each time step k, the controller finds a feasible sequence ˆ ω ( k + 1 ) Predictive Approximation ω (k ) {u∗ (l)|l ∈ [k + 1, k + h]} of control actions within the prefilter model diction horizon that maximize the profit generated by the cluster. Then, only the first control action sˆ(k +in1) the uˆchosen (k + 1) sequence, u(k + 1), is applied to the system and the rest are s (k ) at time k + 1 when discarded. The entire process is repeated the controller can adjust System the trajectory, given Optimizer new state in(k ) formation and an updated workload uforecast.
The second application hosted within our virtualized cluster is Dell’s DVD Store [13], an open source simulation of an online e-commerce site that we host on an Apache Tomcat 6.0.14 application server and have adapted for use with
3
Trade6, 4 req./sec., 1 MB/min. memory leak
50K stocks, 25K users
350
35K stocks, 18K users
Response time in ms
Average measured response time for 2K requests
Trade6 application performance with database scaling 400
300 250
30K stocks, 15K users 200 150
25K stocks, 13K users
100
10K stocks, 5K users 50
1K stocks, 500 users 0
0
5
10
15
20
25
30
35
Arrivals per second
Time instance in 1 min. increments
Figure 5: The changing performance profile of the Trade6 application as the database size grows.
(a) Trade6, 4 req./sec., 5 MB/min. memory leak
Response time in ms
IBM’s DB2 Enterprise Server Edition V9.2 for the database component. The DVD Store has several default database sizes available, and for our experiments we have chosen the ‘small’ database size. Rice University’s servlet implementation of their RUBBoS application [14] is the third application hosted within our virtualized cluster. RUBBoS is a bulletin board benchmark similar to the online forum Slashdot which has browsing capabilities similar to an e-commerce site, and posting capabilities that are similar to purchase transactions that require database writes. We host RUBBoS on an Apache Tomcat 6.0.14 application server and have adapted the database component to IBM’s DB2 Enterprise Server Edition V9.2. The experimental results for all three applications reflect application workloads requiring a 50/50 mix of database reads and writes.
Time instance in 1 min. increments
(b) Trade6, 4 req./sec., 10 MB/min. memory leak
3.1 Effect of Database Size on Response Time Response time in ms
We first sought to characterize the effects of database file size growth on an application performance. The size of an application’s database may increase due to production activity as new users accumulate, or as additional data is introduced by a system administrator. The database size affects the retrieval time for dynamic content, as well as the time to commit new transactions. Fig. 5 clearly shows the effects of database size on the average response time per request for the Trade6 application. The default database for the Trade6 application consists of 500 users and 1,000 stocks and is scaled upward from that size. As the size of the database grows, the maximum workload arrival rate that can be tolerated within the bounds of the SLA shown in Fig. 1 decreases. An adaptive online controller would scale back the workload directed to each node to accommodate the increased response time per transaction.
Time instance in 1 min. increments
(c)
3.2 Effect of Memory Leaks on Response Time While growing database sizes can be considered an effect of normal operation, memory leaks caused by software bugs are a common source of errant operation in an application, particularly when using the C/C++ programming languages [15]. A memory leak is defined as memory that has been dynamically allocated by an application, such as by using the malloc() operation in the C language, but that is never released back to the operating system. Memory leaks that accumulate will eventually allocate the resource to capacity and the operating system will reserve the swap space
Figure 6: The measured response times for the Trade6 application as 1 MB, 5 MB, and 10 MB memory leaks accumulate per minute. The VM is allocated a 6 GHz CPU share and 1 GB of memory. (virtual memory). It is typically at this point when the application has reserved the system memory to capacity that performance significantly degrades. Fig. 6 and Fig. 7 show the response times measured pertransaction for the Trade6 and RUBBoS applications, re-
4
RUBBoS, 4 req./sec., 1 MB/min. memory leak
Response time in ms
at the application server and database tiers and the workload arrival rate was held constant. The point at which application performance degrades significantly is commensurate with the rate of the memory leak, occurring at about t = 150, or two and a half hours into a 5 MB per minute memory leak, and at half that time, occurring at about t = 70, or an hour and fifteen minutes into a 10 MB per minute memory leak. The 1 MB per minute memory leak shown in Fig. 7(a) exhibits a more gradual increase before causing the application server to crash, at about t = 1100. In both applications, the number of per-transaction SLA violations, those requests that exceed the acceptable response time thresholds in Fig. 1, jumps from less than one hundredth of a percent to 2 − 5% of the total requests over a given time period. Because SLA goals are set somewhat conservatively, the percentage of per-transaction SLA violations may not be large. However, the increases in average response times are a more dramatic measure of performance degradation. The Trade6 application exhibits somewhat modest increases in average response times (about 60% to 75% above normal) as the memory is allocated to capacity, however, the RUBBoS application exhibits more severe increases in average response times (about 150% to 500% above normal) under the same conditions.
Time instance in 1 min. increments
(a)
Response time in ms
RUBBoS, 4 req./sec., 5 MB/min. memory leak
4. THE FACT FRAMEWORK AND ASSOCIATED RESEARCH CHALLENGES To make good provisioning decisions, an online controller must maintain an accurate model of application behavior under various workload intensities when hosted by a VM given a particular CPU and memory allocation. In previous work, we have captured this behavior using supervised learning, i.e., by extensively simulating the VM (and the underlying application) in offline fashion and using the observations to train an approximation structure such as a neural network for use at run time [16]. However, it is generally not possible, via offline simulations, to create an exact model of system dynamics using a limited set of training data. Therefore, we propose to integrate online parameter tuning and modellearning techniques within the LLC control framework to: (1) improve the accuracy of partially specified system models, and (2) maintain the correctness of the model against slow behavioral changes to system components over time.
Time instance in 1 min. increments
(b)
Response time in ms
RUBBoS, 4 req./sec., 10 MB/min. memory leak
We can use supervised online learning to construct (approximate) dynamical models of the application components under control. These models capture the complex and nonlinear relationship between the response time achieved by a VM, and the CPU and memory share provided to it, under dynamically changing operating conditions. We propose to incorporate the online control framework with an online model-learning component as shown in Fig. 8 aimed at developing autonomic systems that anticipate changes in operating conditions and react to gradual behavior changes in the system components.
Time instance in 1 min. increments
(c) Figure 7: The measured response times for the RUBBoS application as 1 MB, 5 MB, and 10 MB memory leaks accumulate per minute. The solid line indicates the average response time per one minute increment. The VM is allocated a 6 GHz CPU share and 1 GB of memory.
Fig. 8 shows FACT components integrated with the LLC framework shown in Fig.3, using a recurrent neural network for the System Model [17]. Re-training of the neural network utilizes datasets of length M , where the previous environment inputs and system state inputs are retained from time (k − 1) through time (k − M ), using blocks denoted as Z −1
spectively. The solid black lines through Fig. 6 and Fig. 7 indicate the response time averaged over the last 20 time samples. In both applications, the memory leak was effected
5
e(k )
[2] Y. Diao, J. Hellerstein, A. Storm, M. Surendra, S. Lightstone, S. Parekh, and C. Garcia-Arellano, “Using mimo linear control for load balancing in computing systems,” in Proc. of the IEEE American Control Conf., Jun. 2004, pp. 2045–50. [3] V. Sharma, A. Thomas, T. Abdelzaher, K. Skadron, and Z. Lu, “Power-aware qos management in web servers,” in Proc. of the IEEE Real-Time Systems Sym., Dec. 2003, pp. 63–72. [4] J. L. Hellerstein, Y. Diao, S. Parekh, and D. M. Tilbury, Feedback Control of Computing Systems. Wiley-IEEE Press, 2004. [5] B. Moerdyk, R. Decarlo, D. Birdwell, M. Zefran, and J. J. Chiasson, “Hybrid optimal control for load balancing in a cluster of computer nodes,” in Proc. of the IEEE Conf. on Control Applications, Oct. 2006, pp. 1713–18. [6] C. Lu, X. Wang, and X. Koutsoukos, “Feedback utilization control in distributed real-time systems with end-to-end tasks,” Perf. Eval. Review, vol. 16, pp. 550–61, Jun. 2005. [7] D. Kusic, J. Kephart, J. Hanson, N. Kandasamy, and G. Jiang, “Power and performance management of virtualized computing environments via lookahead control,” in Proc. IEEE Intl. Conf. on Autonomic Computing (ICAC), Jun. 2008. [8] IBM, Trade6 Performance-Characterizing Application for WebSphere. http://www.ibm.com/developerworks/edu/dm-dwdm-0506lau.html, 2005. [9] G. Khanna, K. Beaty, G. Kar, and A. Kochut, “Application performance management in virtualized server environments,” in Proc. of the IEEE Network Ops. and Mgmt. Sym., Apr. 2006, pp. 373–381. [10] M. Steinder, I. Whalley, D. Carrera, I. Gaweda, and D. Chess, “Server virtualization in autonomic management of heterogeneous workloads,” in Proc. of the IEEE Sym. on Integrated Network Management, May 2007, pp. 139–148. [11] S. Abdelwahed, N. Kandasamy, and S. Neema, “Online control for self-management in computing systems,” in Proc. IEEE Real-Time & Embedded Technology & Application Symp. (RTAS), 2004, pp. 368–376. [12] D. Kusic and N. Kandasamy, “Risk-aware limited lookahead control for dynamic resource provisioning in enterprise computing systems,” in Proc. of IEEE Intl. Conf. on Autonomic Computing (ICAC), June 2006. [13] Dell, Dell DVD Store Database Test Suite. http://linux.dell.com/dvdstore. [14] RUBBoS, RUBBoS: Bulletin Board Benchmark. http://jmob.objectweb.org/rubbos.html. [15] “Software trends for the 21st century,” Computer Weekly, Jun. 6 2006. [16] D. Kusic and N. Kandasamy, “Approximation modeling for the online performance management of distributed computing systems,” in Proc. of IEEE Intl. Conf. on Autonomic Computing (ICAC), June 2007. [17] D. Mandic and J. Chambers, Recurrent Neural Networks for Prediction. John Wiley & Sons, Ltd., 2001.
Adaptation System algorithm λ(k −1) λ (k − 2) λ(k − M) z-1
λ (k )
Predictive System filter
λˆ (k )
z-1
Neural System network uˆ ( k ) xˆ ( k ) Σ + x(k − 1) x(k − 2) x(k − M ) z-1
System
z-1
x(k )
z-1
u (k )
z-1
Optimizer System
Figure 8: The proposed fault-adaptive control technology framework utilizing a recurrent neural network
P
for the time delay. A comparator, denoted as , inputs the actual state of the system, x and compares it against the estimated state of the system as output by the neural network, denoted as x ˆ. The error, e(x), is then applied to a model adaptation algorithm. The process by which the model is updated will be dependent upon the modeling approach chosen. For example, for a simple lookup table, the adaptation process is a matter of updating the table entries. For abstract models, such as the recurrent neural network in Fig. 8, the adaptation process involves re-training the neural network as new data becomes available. The cost in terms of the time to perform the adaptation will depend upon the length M of the datasets. The key challenge is to learn the new system behavior under dynamic conditions and update the model used by the controller in real time. An accurate system model helps to avoid the excessive switching a controller will introduce into the system to correct for misguided control actions, and to reduce the number of SLA violations. The key challenge will be for the model adaptation unit to learn an accurate new model of the system in a timely manner. As shown in Fig. 7, changes in the application behavior can occur suddenly. To tackle this problem, it is useful to study several modeling techniques (non-linear regression, auto-regressive moving averages, etc.) for their estimation accuracy and convergence time.
5. CONCLUSION This results presented in this paper support the position that to obtain good control performance, the dynamical models of application components should be refined to match the timevarying behavior of the real system. We propose the development of FACT—Fault Adaptive Control Technology— that continuously refines system or component models using feedback from the system. It observes the current system state and compares it to the estimated system state to update the underlying approximation structure (e.g., lookup tables, neural networks). We also propose an analysis of how the convergence speed of a given learning structure impacts control performance. The potential compensatory behavior of the controller can also be considered.
6. REFERENCES [1] W. Xu, X. Zhu, S. Singhal, and Z. Wang, “Predictive control for dynamic resource allocation in enterprise data centers,” in Proc. of the IEEE Network Ops. and Mgmt. Sym., Apr. 2006, pp. 115–26.
6