Multi-Tiered On-Demand Resource Scheduling for VM-Based Data Center 1
Ying Song1,2
Hui Wang1
Yaqiong Li1,2
Binquan Feng1,2
Yuzhong Sun1
Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing; 2 Graduate University of Chinese Academy of Sciences, Beijing, China; {songying, wanghui, liyaq04, fengbinquan}@ncic.ac.cn,
[email protected]
Abstract The trend of using virtualization for server consolidation is more and more popular in enterprise data center. However, ondemand resource allocation among the concurrent hosted services in such a virtualized environment is still a challenge. In order to optimize resource allocation among services in data center, this paper proposes a multi-tiered resource scheduling scheme which automatically provides on-demand capacities to the hosted services via resources flowing among VMs. We model the resource flowing using optimization theory. Based on this model, we present a global resource flowing algorithm in the multi-tiered resource scheduling scheme. This algorithm preferentially ensures performance of some critical services by degrading of others to some extent when resource competition arises. Using our RAINBOW prototype, we evaluate the multitiered resource scheduling scheme with the performance improvements for the most critical services up to 9%~16%, which are 75% of the maximum improvement margin, while performance degradation of others is up to 2%, and leads to 1%~5% improvements in resource utilization than RAINBOW without resource flowing. Compared with the existent scheme, our work leads to 9% less improvements for critical services, while introduces 39% less degradation to low priority services.
1. Introduction Virtualization offers opportunities not only to better isolation and manageability but also to on-demand, finer-grained resource provision. Thus, virtualization technology, such as virtual machine (VM), is ubiquitously used in data center for server consolidation. Improving resource utilization in such a multi-services sharing computing environment is a key technology to save power of data center to some extent [7]. However, nowadays, such enterprise data centers are often underutilized and idle-working even when workloads of some hosted services are high. On one hand, the barrier caused by the computer architecture and the operating system imposes restrictions on the improvement of resource utilization. Even multi-core could not break the barrier of upper bounds of speedup [12]. On the other hand, lack of efficient, on-demand and finegrained resource scheduler also limits the improvement of resource utilization. Based on the resource reallocating scheme provided by VMMs (Virtual Machine Monitor, i.e. Xen [15] and VMware [20]), many researchers [9][17] focus on improving resource utilization as well as guaranteeing quality of the hosted services via on-demand local resource scheduling models or algorithms within a physical server. However, most of them could not be good solutions to tradeoff between resource utilization and QoS. For example, Padala’s controller [17] improves resource utilization and performance of some services by hugely reducing performance of others. How to improve resource utilization, as well as guarantee QoS, is a challenge in a VM-based data center. Our previous paper [23] also proposes a set of local resource scheduling algorithms, which improve the resource utilization as well as improve performance of some critical
services with small performance degradation of others. Yet, local optimization could not always lead to global optimization [11]. It is necessary to provide a global resource scheduling in a shared computing environment. In this study, we design a multi-tiered resource scheduling scheme to ensure QoS as well as improve the resource utilization in our service computing framework - RAINBOW. We use resource flowing to denote the process in which resources released by some VMs/services are allocated to others. Based on our resource flowing model, we design a global resource flowing algorithm as a complement to the local resource flowing algorithms (proposed in [23]) to optimize the resource allocation in a VM-based data center. This algorithm preferentially ensures the performance of some critical services by degrading of others to some extent when the resource competition arises. We consider CPU and memory flowing which could be generally extended to other resources such as I/O. We implement a Xen-based RAINBOW prototype to evaluate our multitiered resource scheduling scheme on a workload scenario reflecting resource demands of services in a real enterprise environment. The experimental results show that RAINBOW with our resource flowing algorithms improves performance by 9%~16% for those critical services, while introducing up to 2% performance degradation to others, with 1%~5% improvements in resource utilization than RAINBOW without resource flowing. The performance improvements for the most critical service introduced by our work are up to 75% of the maximum improvement margin. Compared with Padala’s work [17], our work leads to 9% less improvements for critical services (28% improvements introduced by [17] and 19% improvements caused by our work), while introducing 39% less degradation to others (41% and 2% degradation caused by [17] and our work, respectively). The results indicate that our work improves the resource utilization, and meets QoS goals of services. This paper has the following main contributions. 1) We present a multi-tiered resource scheduling scheme for VMbased data center. 2) We model the resource flowing using optimization theory and resolve it by the Simplex Method. 3) Based on the model, we present a global resource flowing algorithm in our multi-tiered resource scheduling scheme to optimize resource allocation among services. The rest of this paper is organized as follows. Section 2 introduces the motivation. In section 3, we discuss related work. Section 4 introduces our RAINBOW and multi-tiered resource scheduling scheme. Section 5 models the resource flowing. A global resource flowing algorithm is proposed in section 6. Section 7 discusses the implement of our prototype and the experimental results. We conclude in section 8.
2. Motivation In VM-based data center, services run over various capacities (i.e. computing, storage, and communication capacities, which are provided by physical components, such as CPU, memory and network bandwidth), ignoring the positions and architecture of physical components (i.e. multi cores and heterogeneous
components). Such ignorance is provided by virtualization, for example, most cloud computing infrastructures (i.e. Amazon EC3) use virtualization to provide isolated capacities to the hosted services. Dynamic load changes, as well as different QoS requirements, of services in their life give rise to diverse time-varying capacity demands. It is necessary to provide ondemand capacities to services via optimizing capacity flowing among those services. Such on-demand capacity flowing is implemented by fine-grained resource (i.e. CPU and memory) flowing among VMs.
Figure.1 The evolution of resource management The VM-based resource management differs from previous work in the granularity (from nodes to components) and dimensions (from one to two) illustrated in figure1. Traditional resource management corresponds to the scheduler in figure 1(a), which dispatches jobs/requests onto a set of exclusively servers. In such a case, resources of some servers may be severely wasted even when the queues of jobs/requests on these servers are full, which results from the data dependency and resource competitions among these jobs/requests. As to the VM-based resource management (figure 1(b)), scheduler#1 corresponding to the traditional resource management dispatches jobs/requests onto a set of VMs. It adds a new dimensioned resource scheduler (scheduler#2) to optimize the usage of fine-grained resources via resource flowing among VMs when the resource utilizations are imbalance in these VMs no matter whether the queues of jobs/requests on these VMs are full or not. Most contemporary VMMs (i.e. Xen and VMware) provide partial technical support rather than strategy to the resource flowing. They need a better second dimensioned resource scheduler to optimize the usage of resources, and improve quality of the hosted services. Optimizing resource flowing among VMs is a key technology in such a platform.
3. Related Work Currently, a large body of papers is on managing data center that provides on-demand resources. Several studies provide on-demand resources at the granularity of physical/virtual servers. Oceano [10] dynamically allocates resources for an e-business computing utility. It focuses on sharing at the granularity of whole servers. SoftUDC [13] proposes a software-based utility data center that adopts the strategy of on-the-fly VM migration, which is also implemented by VMware’s VMotion [14], to provide automatic load balancing. In [21], a virtual-appliance-based autonomic re-
source provisioning framework is provided. It dynamically allocates resources to applications via adding/removing VMs on physical servers. All these studies are in contrast to our scheme that controls resource flowing at the granularity of resource components, i.e. CPU time slots. There is a growing body of work on providing on-demand fine-grained resources in a VM-based data center [2][5][6][9] [17][20][23]. In [17], dynamic CPU allocation is done based on the VM utilization and application-level QoS metrics. But it only focuses on CPU reallocation and uses the fixed reallocation threshold according to the experience, while our scheme focuses on both CPU and memory flowing as well as automatically adjusts resource overload thresholds according to the time varying workloads of the hosted services. In [9], a two-level resource management system with local controllers at the VM level and a global controller at the server level is proposed. These local and global controllers only correspond to our locallevel scheduler. Our previous work [23] focuses on both local CPU and memory flowing to achieve better QoS and higher resource utilization using the fixed resource overload threshold according to our experience. In [2]&[6], the authors address dynamic resource allocation in multi-tier virtualized service hosting platforms. All the above works only focus on dynamic resource allocation among VMs within a server ignoring the resource optimization among services in the entire system. In this paper, we not only care about the local scheduling in a single server but also deal with the global scheduling to optimize resource allocation among the hosted services. In [22], the authors optimize global resource allocation for multi-tier services. But this optimization is central controlled, which has the problems of complexity (collecting and computing resource allocation to each VM hosted in every VMM), availability (the single point failure) and non-timeliness (the execution intervals (22min) could not be small because of the scalability). However, the hosted web-based services are interactive with sudden demands on resources. Such slow response on fine-grained resource allocation could not introduce the optimized allocation in realistic web-based workloads. In contrast, our work attempts to address the issue of global resource allocation using a multi-tiered resource scheduling scheme. The local scheduler with the simple function is working in small intervals (1s) in each server, which could fast respond to sudden resource demands by the hosted services. The global scheduler with the simple function is working in 1min/5min intervals as a complement to the local scheduler. All these schedulers work independently. Any scheduler’s failure (even the global scheduler) could not lead to the failure of resource allocation in the system. To the best of our knowledge, no other studies proposed the same multi-tiered resource scheduling scheme and algorithms as ours to optimize fine-grained resource allocation in a VMbased data center.
4. Multi-Tiered Resource Scheduling Scheme Based on the service computing framework-RAINBOW proposed in [23], we present a multi-tiered resource scheduling scheme to optimize resource allocation in a VM-based data center.
4.1 RAINBOW Statement In RAINBOW (illustrated in figure 2), a set of VMs serving a particular service is called a group. The key principle is that VMs belonging to a single group are spread across multiple servers, while each server hosts VMs belonging to different groups. This principle aims to reduce the competitions for resources by the hosted services in a server.
• The global-level scheduler controls the resource flowing among services via adjusting activity of each service. In RAINBOW, multiple copies of each service encapsulated in VMs are split onto multiple servers, namely, the service can use resources in these servers. Adjusting activities of services effects the resource allocation among VMs hosting these services on each physical server, which results in resource flowing among services.
Figure.2 Service computing framework - RAINBOW RAINBOW is divided into three layers: the service layer, the virtual resource layer, and the physical resource layer. In the service layer, each service, which is allocated a priority denoting how critical the service is, dispatches workloads to various VMs in its group according to its scheduling algorithms. In order to provide capacities to the hosted services ondemand, we control virtual resource flowing among VMs in the virtual resource layer. Such virtual resource flowing is implement by a set of physical resource flowing (‘resource flowing’ for short) algorithms taking the priorities of the hosted services into account in the physical resource layer. For the purpose of controlling the physical resource flowing, we propose a multitiered resource scheduling scheme.
4.2 Multi-Tiered Resource Scheduling Scheme
Figure.4 Three-tiered scheduling scheme In the above schedulers, the key work is to design the resource flowing algorithms. Resource flowing algorithms should solve four problems. 1) Which resources will flow? 2) When will such resources flow? 3) Which VMs will be the source and target of flow? 4) How many resources will flow? This paper focuses on the global resource flowing algorithm. In order to answer these four problems in the global resource flowing algorithm, we model the resource flowing first.
5. The Resource Flowing Model
Figure.3 Three tiered logical flowing Logically, there are three correlated capacity flowing tiers (illustrated in figure 3) in RAINBOW: capacity flowing 1) among services (Tier#1); 2) within a service group (Tier#2); 3) among VMs (Tier#3). These logical tiers correspond to two implemental tiers: resource flowing 1) among VMs within a server (local) and 2) among VMs residing in different servers (global). However, there is no technological support on such global resource flowing. Local optimization could not always lead to global optimization [11]. In RAINBOW, VMs devoting to the same service are located in different physical servers. The local resource flowing in each server leads to independent capacity allocation to these VMs, and could not optimize capacity allocation among the concurrent services. Thus, we provide a multi-tiered resource scheduling scheme illustrated in figure 4 to optimize capacity allocation not only among VMs within a server but also among services. In this scheme, there are three tiers correlative schedulers: the application-level scheduler, the local-level scheduler and the global-level scheduler. • The application-level scheduler is implemented by service software to dispatch requests/jobs onto VMs hosting this service. How to design an application-level scheduler is not within the scope of this paper. • The local-level scheduler controls the resource flowing among VMs within a server taking the priority, threshold of resource overload (activity we called in this manuscript) of each service and resource utilization of each VM into account, which is introduced in detail in [23].
We consider the global resource flowing among services and model it by optimization theory. This model is a general one which can be respectively used by CPU or memory or other resources. Based on this model, in section 6 we present a global resource flowing algorithm which provides on-demand resources to the hosted services. First we introduce the following notations and concepts: z R - The total CPU or memory or other resources, which are available to all services, such as 16GB memory. z K - The number of services hosted in the data center. z Rout - The resources allocated to all the VMs. z Ci-min - The minimum threshold of resources allocated to VMs hosting service i, which is used to avoid huge interaction among the services when competition for resources arises. Cimin is set by experience in our experiments, and will be justified in the near future. z Rit - Resources allocated to VMs hosting service i at time K
R ≥ ∑ Rit
t, which must obey the rules: and Rit≥Ci-min>0. z Dit - Resources demanded by VMs hosting service i at time t, which are proportional to the arrival rates of requests. z Pi - The priority of service i. It indicates how critical the requirement for QoS of this service. If i