2011 Sixth Annual ChinaGrid Conference
Dynamic Deployment and Management of Elastic Virtual Clusters
Xiaohui Wei, Haibin Wang, Hongliang Li, Lei Zou College of Computer Science and Technology Jilin University Changchun, P.R.C China
[email protected], {hbwang08, hongliang09,zoulei09}@mails.jlu.edu.cn Abstract—Virtual clusters are the fundamental support of
a lot of difficulties to the resource managers and system
new generation virtual High Performance Computing
administrators. For example, system heterogeneity must
(HPC) systems. As a high level virtual system component,
be adapted, security attack from malicious hackers must
virtual cluster is designed based on virtual machine and
be filtered, and node failure which may be caused by a
virtual network to provide virtual executing environment
memory leak from some process must be avoided. Virtual
for large-scale parallel applications. This paper proposes a
machine provides a higher level abstraction for resources
novel solution for dynamic deployment and customization
[1]. At this abstraction level, the resource users and
of Elastic Virtual Cluster (EVC). In this work, we support
managers see a pool of uniformed virtual machines, each
customization of virtual clusters, such as OS images,
of which can be configured independently. Virtualization
network topologies and
technology provides a promising way to build new
cluster software. Virtual clusters
are automatically configured with isolated virtual networks
generation
high
performance
computing
systems.
(network address, DNS domain, NIS domain, etc.) and
Nowadays, moving computing platform from traditional
software environments. Two data transmission protocols
physical resources to virtualized resources has become a
are presented in this paper to accelerate image deployment.
trend.
In addition, we propose a novel solution that virtual
A lot of research into integrating virtualization
machines (VMs) on the same host share image without the
technology into current distributed physical resource
aid of VMM. Experiment results show that virtual clusters
models have been conducted. The existing works either
can be deployed and managed on distributed physical nodes
focus on the virtual infrastructure’s creation and
efficiently and therefore can be used by the upper level
management or focus on the service models based on the
resource consumers or applications.
virtual infrastructures. Our work falls into the former and mainly provide virtual clusters as a kind of virtual
Keywords: elastic virtual cluste; VJM; disk-image
computing infrastructure for resource consumers and
deployment; cloud computing
I.
managers.
INTRODUCTION
In this paper we propose and implement elastic virtual
With the development of grid virtualization and cloud
cluster framework that supports:
computing, virtualization technology has attracted more
•
and more attention. In the traditional distributed
resources that are controlled by different VMMs.
computing area, a computing task is represented as a job
•
involving all the physical resources and software
Physical resource selection that maps virtual machines to computing nodes.
execution environments. Such a job is bound to the physical node and operating system too tight thus it poses 978-0-7695-4472-4/11 $26.00 © 2011 IEEE DOI 10.1109/ChinaGrid.2011.31
Dynamic creation of virtual machines on physical
35
•
•
Aggregation of virtual machines that run on more
cluster as a resource allocation unit. CARE[11]
than one administration domains to form virtual
framework did the similar works but they aimed to create
clusters.
virtual machines as the supplement of physical resources
Efficient virtual machine images’ distribution,
thus the computing resources are mixed with physical
updates, caching, and sharing independent of
nodes and virtual nodes.
VMMs. •
Dynamic
preparing
software
III.
execution
DESIGN
EVC provides its users a meta-data defined
environment to facilitate application’s running. Resizing or merging compatible virtual clusters at
abstraction of both execution environments and optional
runtime thus provides the functionality to
job description that can be instantiated dynamically on
cooperate with dynamic workloads balancer.
physical resources spanning one or more administrator
The rest of this paper is organized as follows. Section
domains. Users’ jobs can be dispatched to the virtual
2 describes the related works and literals. Section 3
cluster using well-defined protocols. Virtual clusters will
elaborates the key issues in EVC design.
be destroyed or merged with other compatible virtual
•
Section 4
clusters after job running.
describe EVC architecture in our implementation. Section 5 shows the experiment results of deploying virtual
We split the process of deploying a virtual cluster into
clusters dynamically using EVC. We summarize our
three major phases: 1) vjob scheduling phase, in which a
work in section 6, and the promising future works are
set of physical nodes are selected to host the virtual
also explained.
machines. 2) image distribution phase, in which the image template is distributed to the selected computing II.
With
the
RELATED WORK
advantage
of
isolation,
nodes in an efficient and economic way, and 3) virtual machine
security,
aggregation
and
virtual
cluster
customization, and legacy support, virtual machines
auto-configuration phase, in which the virtual machines
provide a good platform for Grid computing [1]. Many
are configured to form a virtual cluster and software
works have been done to integrate virtualization
environment are prepared.
technology into Grid, such as Globus Virtual Workspace
A. Virtual Cluster Representation
[12][14]. OpenNebula [17] is an open source solution for
Xen [2], VMware [3], VirtualBox [4] etc. are popular
virtual infrastructure management but has no high level
virtual machine monitors (hypervisors). They respectively
virtual clusters that are composed of virtual machines
defined their own way of vitalizing physical computing
across several remote physical clusters. Cluster on
nodes. Many attributes, such as virtual network structure,
Demand (COD) project [15] multiplex a physical cluster
disk image format, virtual machine administrator
and enables a grid client to obtain a physical cluster
privilege and so on are not compatible with each other. A
partition based on credentials. Image propagation
universal virtualization mechanism must adapt all these
problem is a key issue and [6] investigates this problem
differences and provide unified interface for upper level
and propose several optional solutions. Virtuoso [16],
users.
Violin [7], ViNet [8] and LimeVI [9] explore the virtual
To represent a virtual cluster as a basic resource unit,
networking issues, while LimeVI propose its virtual
each virtual cluster is assigned a unique public network
network architecture to facilitate the virtual cluster live
address by which any other services or end users can
migration.
communicate with it. The virtual cluster that is composed
EVC is mainly to provide a lightweight and efficient
of virtual machines is described in universal XML
virtualized resource manager and makes the virtual
format. Figure 1 is a sample XML file which describe a
36
virtual cluster with 32 virtual machines. For that
B. Virtual Job Scheduling
description, 32 vjobs will be created. Each vjob
Co-allocation problem must be considered when
corresponds to a virtual machine. All the virtual machines
instantiating virtual cluster using the combined resources
have same configurations and they are transparent to the
from multiple administrator domains. We borrow ideas
EVC users.
and approaches from Virtual Job Model(VJM)[5] that dispatches virtual jobs (vjobs) to co-reserve physical resources for the virtual cluster. Resource selection strategies and deadlock detection technologies are reused in EVC to cope with the resource competing issues. Vjob in EVC refers to the process of creating and configuring virtual machine on the remote computing nodes. Vjob scheduling has an important impact on the whole performance of virtual cluster's deploying and running. Poor performance clusters would then cause degrade services for end users. In our preliminary work we
adopt
two
kinds
of
user
jobs
namely
computation-intensive and communication-intensive, and defined
two
resource
selection
strategies
in
correspondence. For communication-intensive jobs, we
Figure 1. Virtual Cluster Description File
leverage the VJM (Virtual Job Model) resource selection
Each virtual machine in the same virtual cluster is
strategy to select a subset of physical nodes which span
arranged a private network address for communicating
minimal
with other virtual machines. The virtual machine that
administration
domains.
For
computation-intensive jobs, we use best-effort strategy to
hosts the public network interface is called a headnode.
select computing nodes.
Headnode is selected randomly and can be changed at
Best-effort strategy sorts the physical resources and
runtime. All the virtual machine configurations are
then matches the vjob to the physical resource using a
handled by the vjob agent and that information is then
greedy algorithm. The resource properties of physical
recorded to a disk image. When a virtual machine is
resource we concerned including dynamic workloads and
booted, an OS boot script is executed to read the
image template cache. Thus the sorting algorithm prefers
configuration file and configure the virtual machine. Each
the following order: 1) physical resources that has image
virtual machine is described in an internal metadata
cache and low workloads, 2) physical resources that has
expressed by XML format, as Figure 2 illustrated.
no cache but low workloads, 3) physical resources that has cache but high workloads, 4) physical resources that has no image caches and high workloads. The workloads of physical resource is measured by the proportion of the number of running virtual machines to the maximum number of virtual machines that can run simultaneously on that physical resource. Physical resources' statuses are required when
Figure 2. Internal Virtual Machine Metadata
scheduling. They are collected and filtered by the InformationService, for example, Globus MDS. In our
37
work we extended the MDS IP(Information Provider)
will
be
used
component to collect the extra resource properties. The
retransmission.
therefore
avoid
a
second-time
in
However, it is not enough to use only image caching
ResourceSelector module for demonstrate, but more
technique. A physical node may host multiple virtual
sophisticated strategies are easily plugged in.
machines of one virtual cluster, so we must copy the
two
simple
strategies
are
implemented
cached image for every virtual machine. This work is
C. Image Preparation
time consuming and a waste of disk space. To solve this
Preparing disk images for each virtual machine is the
problem, we combine UnionFS [10] and Bind-Mount to
most time-consuming work during deployment. This
form a copy-on-write base image.
work must be done efficiently and economically, that is,
For one physical
node, the same base image is cached only one copy and
use as less time as possible, and cause least overheads to
shared by all the virtual machines.
the system. In our work, a virtual machine is composed of four disk images named base image, swap image, data image and status image. This structure speeds up image distribution because we can prepare images concurrently. Another benefit of this structure is that it occupies less disk space on the ImageServer because some image, for example, swap image, needs not to be allocated on the ImageServer. Base image contains permanent code and data such as operating system, development library and
Figure 3. Image caching based on administrating domains and sharing
software stack. It always resides on the ImageServer. It is
between virtual clusters. Each virtual cluster has its own data image.
better to keep the base image pure (all spaces are
D. Virtual Machine Aggregation and Customization
occupied by valid data) and make its size as small as
The virtual machine aggregation process consists of a
possible. Swap image act as the swap partition in the
succession of service configurations including virtual
virtual machine and is created by the vjob agent on the
network service, NFS, SSH and user/group account. All
computing node. Data image stores the user data and
the required information for aggregation is carried by
resides on the DataServer. Status image stores the
vjob hence there is no need to interact with EVC client.
temporary status file during virtual machine runtime.
The configuration parameters are written in a disk file
The size of base image in our experiment is about 2G.
which resides in the status image. When virtual machine
BitTorrent Protocol [13] is used to distribute the base
boots, a modified OS startup script is executed, the status
image from ImageServer to computing nodes. This
image is mounted, and the configuration parameters are
solution not only reduces the distribution time but also
parsed.
reduce the workloads on ImageServer.
A special program we developed and
pre-installed in the base image receives the parameters
In addition to using BitTorrent as the distribution tool,
and co-ordinates the configuration with other virtual
we propose an image cache and sharing mechanism to
machines.
further cut down the image preparation time. Figure 3
Currently EVC supports software customization by
demonstrates our mechanism. The base image is kept on
allowing users to specify the software which can be
the physical nodes after the virtual cluster is destroyed.
installed after the virtual machine boots. The software
The next time when the same image template is selected
in which users can select is confined to the operating
to instantiate the virtual cluster this cached base image
system releases.
38
The software customization is done
using the software manager such as 'yum' command in
receives a grid job request specified by Globus RSL
Redhat system, while the PacketServer is the online
language and run the job in the virtual cluster. VirtualClusterFactory is the core component of EVC.
software repository. IV.
It delegates all the required tasks to create a virtual
IMPLEMENTATION
cluster.
When
receiving
metadata
from
VirtualClusterAllocator, this module parses the metadata EVC builds its virtual clusters dynamically on the
and verifies the availability of physical resources,
distributed physical nodes and provides unified API and
network addresses and image templates. If all the
command line interfaces for upper level services or
required resources can be acquired, this module invokes
applications. Figure 1 illustrates the EVC architecture.
the ResourceSelector module to select the most
VirtualClusterAllocator is the main interface exposed
appropriate subset of physical resources for the vjobs.
by EVC. Its main function is to allocate or free virtual
Once all the vjobs get scheduled, it then dispatches all the
clusters for resource users just as the malloc/free library
vjobs to the local resource managers through some
routines in C programming language. When allocating a
protocol such as GRAM.
virtual cluster, it can reuse an idle virtual cluster which is
cluster is added to a common pool, which is maintained
compatible with the user’s resource specifications or
by the VirtualClusterPool module.
The newly created virtual
create a new virtual cluster. The allocating request is transformed to virtual cluster metadata and passed to VirtualClusterFactory. The JobExecutor is a thin layer of encapsulation on top of VirtualClusterAllocator. It
Figure 4. EVC Architecture
VirtualClusterPool consists of four sub modules: 1) A
the newly created virtual cluster and refreshes the
container contains all the virtual clusters which exists and
certificates at runtime. 3) VirtualClusterMessageCenter
have not been destroyed in current system. 2)
module which is responsible for the message exchange
VPoolCAManager module which allocates certificate to
between the vjob agents and local EVC client, such as
39
virtual cluster status query and virtual cluster exit
which all the vjobs has been scheduled by SGE, and 3)
notification. 4) VirtualClusterMonitor module which
virtual cluster creation phase, in which virtual machine is
monitors the virtual cluster and virtual machine status and
booted, configured, and aggregated.
check the integrity of virtual clusters.
Figure 5 and Figure 6 compares the ratio of time used
VirtualNetworkManager maintains and updates all the
by each of the three phase to the time taken by the whole
resources for the virtual network. It arranges public and
deployment process under the two different vjob
private network addresses; allocate unique domain
scheduling strategies.
names, and updates DNS map entries for the newly created virtual clusters. ImageInfoManager stores and updates the metadata of virtual machine image templates. about
the
cached
virtual
machine
time
ImageCacheManager traces the location information images.
ImageTransferService module distributes the image template from the ImageServer to the computing nodes. V.
EXPERIMENTAL RESULTS
100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% 3
5
10
15
20
25
virtual cluster size
We conduct our experiment to create virtual clusters with different size on a grid environment to evaluate the
Figure 5. Virtual cluster deployment using best-effort resource
performance of EVC. The physical resource in our
selection algorithm
experiment consists of 7 physical nodes which are grouped into three clusters, as table 1 shows.
Cluster
EXPERIMENT ENVIRONMENT
nodes
slots
VMM
C1(SGE)
3
12
Xen 3.3.0
C2(SGE)
3
8
Xen 3.3.0
C3(Fork)
1
-
VMwareServer2.0
time
TABLE I.
The two SGE clusters consist of simultaneous computing nodes that are of Dell OPTIPLEX 380
100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% 3
platform. Every node has 2.93G dual-core cpu and 2G
5
10
15
20
25
virtual cluster size
memory. NFS is used to access the cluster storage. The fork cluster has 2.66G 8-core cpu and 4G memory. All
Figure 6. virtual cluster deployment using VJM resource selection
the three clusters are managed by Globus middleware.
algorithm
The network bandwidth in our experiment is 100Mib/s. In our experiment the image templates are pre-staged
Figure 7 compares the total time taken to deploy
into the cluster storage. We divide the deployment into
virtual clusters using the two different vjob scheduling
three phases: 1) vjob creation phase, in which virtual
algorithms. From the graph we can see VJM vjob
cluster metadata is parsed, vjobs are created and
scheduling strategy cost less time than the best-effort
scheduled, and network addresses are allocated. 2)
scheduling strategy. The reason is that VJM strategy can
resource co-allocation and image preparation phase, in
40
save much more inter-cluster communication time than
time(in second
best-effort strategy.
[2]
200 [3] [4] [5]
150 100 50 0 3
5
10
15
20
25
[6]
virtual cluster size best-effort
vjm [7]
Figure 7. Virtual cluster deploying timing results [8]
VI.
CONCLUSION AND FUTURE WORKS [9]
In this paper, we propose an architecture that deploys virtual clusters on distributed physical resources. It can be used by the upper level resource consumers or other
[10]
services as a virtualized resource manager. We use vjob concept to adapt many kinds of popular virtual machine monitors/hypervisors. Two simple vjob scheduling
[11]
strategies are developed for demonstration, but new strategies can be easily introduced. We also investigate the image propagation problem and use BitTorrent [12]
protocol to distribution images. An image caching and sharing mechanism is also proposed to cut down the virtual cluster deployment time. From the experiment we can see vjob scheduling has
[13] [14]
an important impact on the performance. More sophisticated scheduling strategy needs to be developed.
[15]
Virtual machine live migration is also a very useful technique. We can apply this technique to virtual clusters so that we can adjust workloads dynamically.
[16]
ACKNOWLEDGMENT. The authors would like to acknowledge supports from
[17] [18]
the China NSF (No.60703024) and Program for New Century
Excellent
Talents
in
University
(NCET-09-0428) of Ministry of Education of China. REFERENCE [1]
R. J. Figueiredo, P. A. Dinda, and J. A. B. Fortes, "A case for grid computing on virtual machines", in Proceedings of the 23rd
41
International Conference on Distributed Computing Systems, 2003. Barham, P., B. Dragovic, K. Fraser, S. Hand, T. Harris,A. Ho, R. Neugebar, I. Pratt, and A. Warfield. Xen and the Art of Virtualization. In ACM Symposium on Operating Systems Principles (SOSP). VMware: http://www.vmware.com/ VirtualBox.http://www.virtualbox.org/ Xiaohui Wei, Zhaohui Ding, Shaocheng Xing, Yaoguang Yuan, Wilfred Li. VJM: A Novel Grid Resource Co-allocation Model for Parallel Jobs. 2nd International Conference on Future Generation Communication and Networking Symposia, 2008. Matthias Schmidt, Niels Fallenbeck, Matthew Smith, Bernd Freisleben. Efficient Distribution of Virtual Machines for Cloud Computing. Euromicro Conference on Parallel, Distributed, and Network-Based Processing, 2010, 567-574. Xuxian Jiang, Dongyan Xu. VIOLIN: Virtual Internetworking on OverLay Infrastructure. Department of Computer Sciences Technical Report CSD TR 03-027, Purdue University, July 2003. Tsugawa, M. Fortes, J.A.B.. A Virtual Netowk(ViNe) Architecture for Grid Computing. Parallel and Distributed Processing Symposium, 2006. Xiaohui Wei; Hongliang Li; Liang Hu; Qingnan Guo; Na Jiang. LimeVI: Extend Live-Migration-Enabled Virtual Infrastructure across Multi-LAN Network. 2010 Fifth International Conference on Frontier of Computer Science and Technology (FCST), Changchun, Jilin Province, 18-22 Aug. 2010, 22-29. D. Quigley, J. Sipek, C. P. Wright, and E. Zadok. UnionFS: Userand Community-oriented Development of a Unification Filesystem. In Proceedings of the 2006 Linux Symposium, Ottawa, Canada, July 2006(2): 349—362. Thamarai Selvi Somasundaram, Balachandar R. Amarnath, R. Kumar, P. Balakrishnan, K. Rajendar, R. Rajiv, G. Kannan, G. Rajesh Britto, E. Mahendran, B. Madusudhanan. CARE Resource Broker: A framework for scheduling and supporting virtual resource management. Future Generation Computer Systems, March 2010, 26(3):337-347. I. Foster, T. Freeman, K. Keahy, D. Scheftner, B. Sotomayer, and X. Zhang. Virtual Clusters for Grid Communities. In CCGRID’06: Proceedings of the Sixth IEEE Internatinal Symposium on Cluster Computing and the Grid(CCGRID’06), pages 513-520. IEEE Computer Society, 2006. BitTorrent. http://www.bittorrent.com/, June 2008. K. Keahey, I. Foster, T. Freeman, X. Zhang, and D. Galron, "Virtual workspaces in the Grid", in 11th International Europar onference, Lisbon, Portugal, September 2005. J. S. Chase, D. E. Irwin, L. E. Grit, J. D. Moore, and S. E. Sprenkle, "Dynamic virtual clusters in a grid site manager", in HPDC'03: Proceedings of the 12th IEEE International Symposium on High Performance Distributed Computing, June 2003. A.Sundararaj, P. Dinda. Towards Virtual Networks for Virtual Machine Grid Computing. Proceedings of the third USENIX Virtual Machine Research and Technology Symposium (VM 04), May, 2004. OpenNebula, http://www.opennebula.org Hongliang Li, Xiaohui Wei, Huixin Yao. CLIMP: Concurrent Live Migration Protocol for Elastic Virtual Clusters. Accepted by ICIC Express Letters, vol.5, no.9(B), pp. 3429-3436, Sep. 2011.