Automatic software deployment using user-level virtualization for cloud ...

9 downloads 176485 Views 697KB Size Report
Sep 5, 2011 - appliances, and each is a pre-built software solution, comprised of one or more .... (3) On-demand deployment and usage accounting. Fig. 1.
Future Generation Computer Systems 29 (2013) 323–329

Contents lists available at SciVerse ScienceDirect

Future Generation Computer Systems journal homepage: www.elsevier.com/locate/fgcs

Automatic software deployment using user-level virtualization for cloud-computing Youhui Zhang ∗ , Yanhua Li, Weimin Zheng Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China

article

info

Article history: Received 1 November 2010 Received in revised form 26 March 2011 Accepted 5 August 2011 Available online 5 September 2011 Keywords: Cloud computing User-level virtualization Virtual machine Deployment

abstract Cloud Computing offers a flexible and relatively cheap solution to deploy IT infrastructure in an elastic way. An emerging cloud service allows customers to order virtual machines to be delivered virtually in the cloud; and in most cases, besides the virtual hardware and system software, it is necessary to deploy application software in a similar way to provide a fully-functional work environment. Most existing systems use virtual appliances to provide this function, which couples application software with virtual machine (VM) image(s) closely. This paper proposes a new method based on the user-level virtualization technology to decouple application software from VM to improve the deployment flexibility. User-level virtualization isolates applications from the OS (and then the lower-level VM); so that a user can choose which software will be used after setting the virtual machines’ configuration. Moreover, the chosen software is not pre-installed (or pre-stored) in the VM image; instead, it can be streamed from the application depository on demand when the user launches it in a running VM to save the storage overhead. During the whole process, no software installation is needed. Further, the enormous existing desktop software can be converted into such on-demand versions without any modification of source code. We present the whole framework, including the application preparation, the runtime system design, the detailed deployment and usage workflow, and some optimizations. At last, test results show that this solution can be efficient in performance and storage. © 2011 Elsevier B.V. All rights reserved.

1. Introduction

However, because VA couples the application software and VMs closely, it also has some drawbacks:

Infrastructure cloud service providers (e.g., [1,2]) deliver virtual hardware and software in their datacenters, based on the demand from customers. Then, customers avoid capital expenditure by renting usage from the provider and they consume resources as a service. Usually, besides virtual hardware and system software, it is necessary to deploy application software in a similar way; therefore customers can get a fully-functional work environment conveniently with the required application software. Most existing solutions allow cloud customers to order Virtual Appliances (VAs) [2–4] to be delivered virtually on the cloud. For example, VA marketplaces [2,5,6] provide lots of categories of appliances, and each is a pre-built software solution, comprised of one or more Virtual Machines that are packaged. VA-based methods can reduce time and expenses remarkably associated with application deployment.

(1) Lack of flexibility. For example, a customer needs software A and B to work together in a virtual machine, while the provider only has two separate VAs containing A and B respectively. Then, the provider has to create a new VM template to combine A and B together. In theory, such combinations are countless.



Corresponding author. Tel.: +86 10 62783505x3. E-mail address: [email protected] (Y. Zhang).

0167-739X/$ – see front matter © 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.future.2011.08.012

(2) Inefficiency of storage. Each VA comprises one VM image at least, which means the OS has to be combined in the image. Therefore, the storage overhead is larger, although some technologies (e.g., Just enough OS [7], De-duplication [8,9]) have been employed to reduce the overhead. The essential reason of these drawbacks lies in that the VA solution heavily depends on the virtual machine technology and the latter only isolates system software from hardware. Therefore application software has to be packaged in the whole system for deployment. To solve this problem, this paper introduces a double-isolation mechanism that uses the user-level virtualization technology to further isolate application software from OS, while the VM-level isolation is still kept. Therefore, application software can be deployed in a fine granularity to increase the flexibility and decrease

324

Y. Zhang et al. / Future Generation Computer Systems 29 (2013) 323–329

the storage overhead. In this paper, we call such application software as on-demand software. Based on this design philosophy, we make the following contributions: (1) The whole deployment framework based on the double-isolation mechanism. The deployment of application software on user-level virtualization is the focus. It includes the on-demand software preparation, deployment, runtime system, customization and usage accounting. (2) User-level virtualization of on-demand software. Some essential technologies, like converting legacy software into the on-demand style or the runtime system of user-level virtualization, are implemented. Especially, our methods can support existing application software without any modification of source code. (3) A central distribution system for on-demand software. One or more central data servers are used to provide software on demand for the deployed virtual machines, rather than place software within VMs in advance. Because of the commonality of frequently-used applications in the Cloud Computing environment, this technology can decrease the storage consumption significantly. Moreover, some access optimizations, including the contentaddressable storage and local cache, are presented, too. (4) The system prototype. In addition, tests show that this solution is efficient in performance and storage. In the following sections, we first present the whole framework and the user-level virtualization technology for on-demand software. The central distribution system for Cloud Computing and the related optimizations are given in Section 3. The prototype is introduced in Section 4, as well as the performance tests. Section 5 gives related work; the conclusion and future work are presented finally. 2. The framework 2.1. Software deployment overview To deploy on-demand software in the Cloud Computing environment, it is necessary to provide a system to own the following functions: (1) Software preparation. Most existing software needs to be installed before it can run normally. However, in our design, the on-demand software requested by a customer can be used instantly without any installation process. Thus, we should convert software into the ondemand mode in advance, and all on-demand software is stored in the software depository for users’ selection. The details are presented in Section 2.2. (2) Software selection. For most existing cloud service providers, a customer usually chooses one or more VAs before deployment, which means that the required software and its lower-level VM(s) are selected at once. In contrast, we provide a more flexible selection procedure: a customer can choose the wanted OS, as well as any number of software in separated stages. For example, Lisa orders a Windows VM as her remote work environment on the cloud; and then she can select any on-demand software (only if it can run in the Windows OS) that she will use in the VM. It means that we can provide any combination of VM and software, rather than depend on the limited number of existing VM templates. (3) On-demand deployment and usage accounting.

Fig. 1. On-demand software and virtual appliance.

After the preparation and selection, software is not to be stored in the VM image (like the Virtual Appliance does). In contrast, one or more central data servers are used to provide software on demand for the deployed virtual machines. It means only when the customer actually uses the chosen software, will it be streamed from the data server and run locally without installation. In other words, on-demand software is stored remotely and run locally; a local cache is also used to improve the access performance. Inherently, this deployment mode enables a fine-grained billing mechanism: the accurate running time of any on-demand software can be gotten and used as the accounting basis. The technical details are presented in Section 2.3 about the runtime design. (4) Software customization. Another problem of the VA-based solution is how to save the user’s customization. When Lisa finishes her work, she wants to terminate the rent agreement, but reserve her customization of application software, like the default homepage, browser bookmarks/history, cookies and even toolbars’ positions, etc.; then it is possible for her to restore these favorites when she rents the same virtual environment again. For the VA-based solution, it is difficult to implement this function efficiently. One way is to use application-specific tools to draw the customized configurations [10]. Another is to save the difference between the current VM image and the original one, which will contain too much unrelated data. We solve this problem through the runtime environment based on user-level virtualization, which is independent of the concrete software and achieves higher storage efficiency. The details are presented with the runtime design in Section 2.3. 2.2. Preparation of on-demand software According to the on-demand software model we presented in [11], any software can be regarded as containing three parts: Part 1 includes all resources provided by the OS; Part 2 contains what are created/modified/deleted by the installation process; and Part 3 is the data created/modified/deleted during the run time. The resources here mainly refer to files/folders, environment variables and/or the related system registry keys/values (for Windows OS). Because the traditional solution only depends on the VM, it has to carry the OS image in order to take Part 1, as well as Part 2, to construct the whole virtual appliance. For the new solution, user-level virtualization isolates application software from the OS and our solution only makes software run on compatible hosts (which implies that all resources of Part 1 are available on the local system), so that only Part 2 is needed to be drawn to build the on-demand software. The difference between these two solutions is illustrated in Fig. 1. An installation snapshot is taken to build the on-demand software: we start with a machine in a known state (for example, immediately after the OS was installed); then, we install software and finally identify everything that was added to the system by that process of the installation. Typical additions mainly consist

Y. Zhang et al. / Future Generation Computer Systems 29 (2013) 323–329

325

Fig. 3. Execution stack compared to VMM. Fig. 2. User-level virtualization runtime environment.

of directories and files and/or registry entries in Windows. The additions (Part 2) form the on-demand software at that time. How to deal with Part 3 will be presented in the next section. 2.3. The runtime environment of on-demand software To make the on-demand software run smoothly in a compatible OS, it is necessary to construct a runtime environment where software can locate and access any necessary resources transparently, just as it has been installed. Another issue is how to capture Part 3, which is created dynamically, to reflect the user’s customization. In our system, on-demand software runs in a user-level virtualization environment layered on top of the local machine’s OS. This environment intercepts all resource-accessing APIs, including those accessing the system registry and files/ directories, from the software. In detail, the environment redirects all accesses for Parts 2 and 3 to the actual storage positions (like the local cache or the central data server) and guides other visits (for Part 1) to the local OS (as presented in Fig. 2). For UNIX and UNIX-like OS (for example, Linux), the ptrace [12] mechanism can be used to intercept system APIs dynamically; for Windows OS, the Detour library [13] provided by Microsoft can also complete the similar function. During the run time, the software instance will access resources of all parts on the fly: some resources are read-only while some may be modified/added/deleted. Therefore, no part is fixed: the resource modified will be moved into Part 3. The principle lies in that any modification is always saved in a separate position (Part 3); any browsing operation (like list all files in one folder) will return the combination of corresponding results from all parts (if there is any duplication, Part 3 has the highest priority and Part 1 is the lowest); for any read, the same strategy is adopted. Then, the on-demand software can run without installation as the runtime system provides all necessary resources transparently. And no trace will be left on the host because any modification can be intercepted to store into Part 3, instead of the system’s default position(s). Compared with the VA solution, our system has some extra features besides the deployment flexibility and storage efficiency: (1) On-demand software can see all local resources (except for those overlaid by its Parts 2 and 3) and can communicate with other programs running on the same host machine (including other on-demand software running in the virtualization environment, and native software) through the local file system and local IPC. In contrast, applications in one VA cannot communicate with others in another VA directly. The comparison is given in Fig. 3. (2) A fine-grained difference between the original image and the current one can be extracted. Because the runtime environment operates above the OS, it can distinguish the modifications made by different software (for example, based

Fig. 4. System overview.

on the program name). The, the user’s customization of each application can be extracted accurately for reuse, as requested by Section 2.1. For the VA-based system, it is difficult to do so because the virtual machine monitor works under the guest OS and lacks the necessary semantic information. The whole system of the above-mentioned functions or procedures is described in Fig. 4. At first, normal software is converted into the on-demand version and stored in the central depository; then, the user can select any needed software on configuration. After system deployment, the user can launch software streamed from the central server(s) on-demand and the user-level virtualization environment is employed to make the software run without installation and keep the customization transparent. 3. Central distribution system Elasticity [14] is one of the key features of Cloud Computing, which usually means to add or remove resources at a fine grain within a short lead time and allows matching resources to workload much more closely. To combine software with the VM image (like VA does) is not an efficient method. It brings about management complexities and storage inefficiency: on-demand software is distributed to multiple running instances, and many copies of the same software may exist in different instances. Therefore, a central deployment mechanism is employed: ondemand software is located on central depository server(s); the storage position can be mounted as a virtual local disk on a customer’s VM (through a user-level file system framework); and the customer owns the access right to visit the software he/she has chosen. When the customer boots up the VM and launches software, it will be streamed from the depository, which means that the transfer of the software’s bits to the local VM is overlapped with its execution. This enables a fast software deployment without waiting for an entire image to be downloaded prior to starting its execution. One issue introduced by the central mode is write-conflict: various instances of the same software will run simultaneously, which may modify the same data object on the central depository. To solve this problem, a copy-on-write (COW) method is adopted:

326

Y. Zhang et al. / Future Generation Computer Systems 29 (2013) 323–329

Fig. 5. Framework of the user-level file system.

as mentioned in Section 2.3; any modification happening during the run time is considered belonging to Part 3, which is stored in a separate position located in the local VM. Then, the central depository is a read-only storage and any modification happens locally. 3.1. User-level virtual file system A user-level file system usually works as a proxy for file system accesses: file operation requests from the application to the target files/folders/partitions will be forwarded to the corresponding user-space callback functions that complete the real job and send results back. The user-level file system often suffers from some performance loss because it introduces a longer data transfer-path and more context-switch operations. However, it reduces the development complexities, and more importantly, it is a flexible solution dependent on the OS to the minimum extent. The most famous framework of a user-space file system is FUSE [15]. FUSE only works in Linux OS. For Windows OS, DOKAN [16] is a good alternative (Fig. 5). Owing to the user-level file system framework, on-demand software is stored in the central depository and presented as files/ folders on a virtual local drive in the customer’s VM. Then, the customer can use them just as they use any local-installed software. The interval between the opening of the main program file of any on-demand software and its closing is regarded as the usage time for accounting. The user-level file system can also implement the COW method directly: when any modification of the virtual drive is captured, it will fetch the whole original file from the central server to a local position (outside the virtual disk) and redirect any following access (including the current modification) to this local version. It means that Part 3 of any on-demand software will be stored in the local VM. 3.2. Optimizations (1) Content-addressable storage (CAS). The central depository is CAS-enabled. CAS is a mechanism for storing information that can be retrieved based on its content. It usually uses cryptographic hashing to reduce storage requirements by exploiting commonality across multiple data objects. In detail, we select a similar CAS strategy as [17] adopted: ondemand software is partitioned into shards. Shards may correspond to a single file and registry entry. We compute the hash value of every shard, and the same values mean the corresponding shards are identical. Then, identical shards are given the same physical name and are only stored once. One example is shown in Fig. 6 the two C shards are the same. The depository maintains a key data-structure to map shard names to physical names (a many-to-one mapping); then when any shard is to be accessed by the user-level file system, its physical data can be located.

Fig. 6. CAS storage.

Because the depository is read-only (as we mentioned in Section 3.1), the CAS mechanism is able to work well. (2) Data cache. The central mode will decrease the running performance of on-demand software if any file access has to be redirected to the remote depository. To overcome this drawback, a local cache in the customer’s VM is enabled. Based on the detailed analyses of access patterns of software, we found, for much commonly-used desktop software (for example, Office applications, PhotoShop, some media players, network applications and so on), the most frequently-used files belong to those accessed during the start-up process, which only occupied a limited ratio (usually between 20% and 40%) of the whole storage capacity. Then, one local cache with limited space can achieve a fairly high hit rate and improve the access performance remarkably. For example, in our test cases, a 200 MB local cache can reach about 80% hit rate while the storage capacity of on-demand software is more than 800 MB. The local cache technology is straightforward: any remote data accessed during the running process is stored in a local position. Then, during the subsequent running processes, the data can be visited in the local position. To simplify management, the data offset and size of remote access are both set as 32 KB-aligned. Therefore, any remote read during the runtime will be converted into a request with the size of an integer multiple of 32 KB. This design also means that some pr-fetch is implied when a small piece of data is wanted, which can reduce the number of remote visits. The replace strategy is based on the usage frequency. Another beneficial fact lies in that, for any given software, the access pattern of its start-up process is almost fixed, which means the pre-fetch mechanism can work well. Based on profiling technology, our virtual file system can learn the access sequence of any software, and guide the pre-fetch accurately during the following execution times. 4. Prototype We have implemented such a Cloud Computing prototype based on the XEN VMM [18] for course experiments. Many courses

Y. Zhang et al. / Future Generation Computer Systems 29 (2013) 323–329

327

Till now, Lisa can use any chosen software without installation in her VM, because the central depository and the local runtime environment provide any necessary resource transparently. Moreover, any modification (Part 3 of on-demand software) is stored in a local position and Lisa can keep them separate. Then, after the system revokes this VM, Lisa can still restore the customization of her work environment when she requests the VM and on-demand software again. Fig. 7. System overview of the prototype.

ask students to complete study assignments and/or software experiments, so that some computer systems with specific software are required. We plan to construct such a platform to provide lecturers and students with the required systems instantly; and deployed resources will be revoked when the course finishes. Now this prototype is mainly for desktop applications based on Windows OS. Here we focus on its software deployment system. 4.1. Implementation For user-level virtualization on the client end, we use the Detour library to intercept those Windows APIs accessing the system registry and files/folders. In detail, interception code is applied dynamically at runtime—Detour replaces the first few instructions of the target API with an unconditional jump to the user-provided detour function. They are inserted at execution time. Moreover, we employ DOKAN to implement the user-level virtual file system based on the design described in Section 3, which is pre-installed in any VM image. A further optimization here lies in that we move the local cache into the virtualization environment from the file level, because any file API of on-demand software has been intercepted so that it is possible to locate the cached data above the file level. This enhancement can reach performance improvement because any hit access can be handled with less context-switch. Many existing applications are converted into the on-demand mode, including Office applications, SUN JVM, MATLAB, Lotus Notes, PhotoShop, Internet Explorer, Outlook Express, Winzip, UltraEdit, FlashGet, Skype, Bittorrent and lots of other frequently-used software (Fig. 7). All software is stored in a central storage server, where a filelevel CAS [19] is implemented to improve the storage efficiency. Owing to this feature, the storage space occupied by the above applications is reduced by about 11%. The concrete workflow is presented as follows: (1) A user, Lisa, chooses some on-demand software, as well as the VM. (2) The system assigns some physical server as the running host for Lisa and assigns an IP to the VM; the corresponding VM image (no on-demand software is included) will be copied into the assigned server before booting up. (3) With the RDP client-end program, Lisa logins into her VM through the campus network. (4) In the background, the pre-installed user-level file system in the VM connects to the storage server and mounts the local virtual drive. (5) The module of the user-level file system also acts as a shell: Lisa can browse her chosen software and launch any one through the shell. During the start-up process, the module applies interception code to construct the virtualization runtime environment. (6) When any read operation is captured, the runtime environment tries to locate it in the local cache at first; if missed, the central server may be visited through the user-level file system. For any modification, the COW mechanism is carried out as described in Section 3.

4.2. Open problems In the current version, only two mirrored central storage servers are deployed, and any user-level file system will be assigned a server randomly as its backend. We believe, with the expansion of the system, this simple method will lack scalability although for now it is enough. Another shortage is that we only implement the central deployment mechanism for application software, not for VM images. Then, we have to copy images to the local storage of the assigned server, which is not efficient in storage. 4.3. Performance test and analysis Four PC servers are used as VMs’ hosts. All are Linux-XEN PCs, equipped with 2 GB DDR2 RAM and one Intel Core Duo CPU. The hard disk is one 160 GB SATA drive. One 64-bit Linux storage server, equipped with an Intel Core 2 Duo E5500 CPU (2800 MHz) and 16 GB DDR3 RAM, provides a 430 GB RAID-5 storage space (based on SAS disks). All machines are connected with the 1000 M Ethernet. In each VM, one local cache with 200 MB is reserved. (1) Performance metrics and test methods. The most important measurement is the running performance of on-demand software, compared with its original version. Two kinds of time are measured. The first is start-up time: we launch some on-demand software application through the shell of the user-level file system in four VMs on PC servers (one VM on each server) and record their startup time respectively. One issue here is how to judge whether the start-up process finishes or not. Fortunately, Microsoft gives a special API, WaitForInputIdle, to judge whether the new process has finished its initialization and is ready to respond to a user’s input or not. As it returns, the average elapsed time is recorded as the start-up overhead. The second is the running time: after start-up, we use some scripts to control software to complete a series of operations (such as open a document and edit it before close, if taking the Office application as the example), which looks like being triggered by a real user. Some software tools can record the user’s inputs of the keyboard and mouse and replay them, which helps us to do so. Moreover, between any two continuous operations, some random waiting time (less than 1 s) is inserted to simulate the human’s thinking time. Then the auto-execution time is recorded. Ten applications, OpenOffice, PhotoShop, Lotus Notes, Firefox, VLC (a powerful media player), Winzip, UltraEdit, Skype, Gimp (an open source picture editor), Acrobat Reader, are used for tests. Another measurement is of the storage efficiency. We found that one VM image only containing Windows XP occupies about 1.5 GB space and if all the above-mentioned applications are installed in the VM, more than 800 MB space will be further used. Owing to the central distribution mode, 600 MB storage space can be saved, considering the local 200 MB cache. (2) Test cases. The running time and start-up time of locally-installed software in a VM are measured as the basis for comparison.

328

Y. Zhang et al. / Future Generation Computer Systems 29 (2013) 323–329

stage, the system cache is almost empty so that most data should be read from external IO modules at a much lower speed than the code execution. Therefore the analysis focuses on the performance of data access. Three types of data are accessed during the start-up stage. 1. Part 1. 2. Local-cached data. 3. Networked data.

Fig. 8. Comparisons of start-up time.

For any given software and host, the amount and the access rate of Part 1 are fixed. Therefore, to increase the other two access rates and the hit ratio of the local cache is the key point to improve the performance. Fortunately, for much desktop software, the most frequently-used files belong to those accessed during the start-up process; and the access pattern of its start-up process is almost fixed. Therefore, our local cache with limited space can achieve a fairly high hit rate, and the test results prove it. 5. Related work 5.1. On-demand software

Fig. 9. Comparisons of running time.

Then, we launch software in the virtualization environment without the user-level file system, which means all on-demand software is stored in the VM and no remote access happens. This case is used to show the performance loss caused by the user-level virtualization itself. Finally, the user-level file system is employed as well as the local cache, and the cache hit rate is set as 0%, 80% and 100% respectively. All measured results are compared with the corresponding basic values, and average values of the ratios are presented in the next section. (3) Results. Fig. 8 presents ratios of start-up times. We can find, the runtime environment itself causes very limited overhead, about 1%, because it works on the user space totally. The user-level file system itself, combined with the runtime environment, introduces about 19% overhead (the case where the hit ratio is 100%). When the hit ratio is 80%, which is the common case with the 200 MB local cache in our test, the extra overhead is 26% because 20% of visits have to access the remote server. The worst case is 91%, which happens when the cache is empty, for example, the VM is used for the first time. Fig. 9 gives the results of running time (after start-up). It shows that, in this aspect, our system introduces less relative overheads: the runtime environment itself causes almost no extra overhead, less than 0.3%. For the user-level file system, because most frequently-used files belong to those accessed during the start-up process, the actual hit ratio is much higher in the run time, more than 90%. Moreover, the inserted interval between any two continuous simulated operations can hide pre-fetch latencies, which further decreases the relative overhead. (4) Analysis. As we know, the initialization of a process contains the phase of code execution and the IO phase interleaved: if any invalid page is accessed by the execution phase, a page-fault will be triggered to read the data. The first phase is mainly complemented in the physical memory while the other interacts with the IO system. During the start-up

On-demand software is regarded as the future usage mode for software. Most on-demand software is web-based applications and the existing desktop software cannot be used in this mode. Therefore, user-level virtualization technologies have been developed to convert legacy software into on-demand software, like [17,20,11,21]. Microsoft Application Virtualization [20] allows applications to be deployed in real-time to any client from a virtual application server. It removes the need for local installation of the applications to reduce the labor involved in deploying, updating, and managing them. Bowen and Joshua [17] have a Progressive Deployment System (PDS), which is a virtual execution environment and infrastructure designed for deploying software on demand. PDS intercepts a selected subset of system calls on the target machine to provide a partial virtualization at the operating system level. Our previous work [11,21] provides a solution to stream software to local PCs across the Internet, based on lightweight virtualization and p2p transportation technologies. Some technologies and models of them are employed here to implement the runtime virtualization environment. 5.2. Automatic service deployment for Cloud Computing Most existing service deployment systems for Cloud Computing are based on VAs. An early work on VA is [3]. It attempted to address the complexity of system administration by making the labor of applying software updates independent of the number of computers. It developed a compute utility, called Collective, which assigns virtual appliances to hardware dynamically and automatically. Later, VAs have been employed by some grid researchers to deploy services or software for grid systems, including [4,22,23], and have been gradually moved to cloud computing [24–26]. For example, VMPlant [4] provides automated configuration and creation of flexible VAs that can be dynamically instantiated to provide homogeneous execution environments across Grid resources. Bradshaw et al. [22] describes the requirements and services required to ensure the scalable management and deployment of VAs for grid computing. Kecskemeti et al. [23] describe an extension to the Globus Workspace Service to create virtual appliances and deploy them for Grid services. And then the same research group proposes an automated virtual appliance creation service [24] that aids the developers to create their own virtual appliances efficiently for infrastructure as a service cloud systems. Epstein et al. [25] gives a framework for virtual appliance distribution for a distributed cloud infrastructure service, which

Y. Zhang et al. / Future Generation Computer Systems 29 (2013) 323–329

addresses a fundamental storage staging problem in this context. Rodero-Merino et al. [26] proposes a mechanism to allow for services’ automatic deployment and escalation depending on the service status, and such a service management system sitting on top of different cloud providers is implemented. In contrast, our solution is using the user-level virtualization to improve the flexibility and storage efficiency of deployment. Compared with the preliminary deployment system [12] of us, this paper gives a more complete solution with some enhancements on optimization. 6. Conclusion and future work This paper provides a framework using user-level virtualization technology to decouple application software from VM images to improve the deployment flexibility for cloud computing. The main functions or procedures, including application preparation, runtime system, deployment and usage workflow, are presented. Compared with VA-based solutions, it also improves the storage efficiency, and users’ customization can been separated inherently and efficiently for reuse. Moreover, a central deployment mechanism is designed to manage/distribute all software, which cooperates the user-level file system in VMs to provide software data on demand. In addition, the CAS method is used to decrease the storage capacity required, and a local cache on the VM end improves the access performance remarkably. We implemented such a prototype and construct some tests to give performance metrics. Results show that this solution is efficient in running performance: for start-up time, the runtime environment itself causes very limited overhead, about 1%, while the whole system introduces 26% extra overhead with one 200 MB local cache on each VM; for running times, the extra overhead is less, about 10%. In the next step, we plan to solve the existing problems as mentioned in Section 4.2: to design more advanced storage strategies with an adaptive schedule supporting a larger-scale system; to employ some CAS technologies improves the efficiency of management and assignment of VM images. Acknowledgments The work is supported by the Open Research Fund Program of the Beijing Key Lab of Intelligent Telecommunications Software and Multimedia, the High Technology Research and Development Program of China under Grant No. 2008AA01A201 and the National Grand Fundamental Research 973 Program of China under Grant No. 2007CB310900. References [1] R. Buyya, C.S. Yeo, S. Venugopal, J. Broberg, I. Brandic, Cloud computing and emerging it platforms: vision, hype, and reality for delivering computing as the 5th utility, Future Generation Computer Systems 25 (6) (2009) 599–616. [2] VMWare public virtual appliances, 2010. URL: http://www.vmware.com/ appliances/. [3] C. Sapuntzakis, D. Brumley, R. Chandra, N. Zeldovich, J. Chow, M.S. Lam, M. Rosenblum, Virtual appliances for deploying and maintaining software, in: LISA’03, Proceedings of the 17th USENIX Conference on System Administration, USENIX Association, Berkeley, CA, USA, 2003, pp. 181–194. [4] I. Krsul, A. Ganguly, J. Zhang, J.A.B. Fortes, R.J. Figueiredo, VMplants: providing and managing virtual machine execution environments for grid computing, in: Proceedings of the ACM/IEEE SC2004 Conference on High Performance Networking and Computing, IEEE Computer Society, 2004, pp. 1–7. [5] Public EC2 Amazon machine images, 2010. URL: http://developer. amazonwebservices.com/connect/kbcategory.jspa?categoryID=171. [6] TurnKey Linux, 2010. URL: http://www.turnkeylinux.org/. [7] Ubuntu JeOS instructions, 2010. URL: http://www.ubuntu.com/server/ features/virtualisation. [8] Anthony Liguori, Eric Van Hensbergen, Experiences with content addressable storage and virtual disks, in: WIOV’ 08: Proceedings of the First Workshop on I/O Virtualization, USENIX Association, San Diego, CA, USA, 2008.

329

[9] N. Tolia, M. Kozuch, M. Satyanarayanan, et al. Opportunistic use of content addressable storage for distributed file systems, in: Proceedings of the 2003 USENIX Annual Technical Conference. San Antonio, TX, USA, 2003, pp. 127–140. [10] Migosoftware, 2010. URL: http://www.migosoftware.com/default.php. [11] Youhui Zhang, Gelin Su, Weiming Zheng, Converting legacy desktop applications into on-demand personalized software, in: IEEE Transactions on Services Computing, IEEE computer Society Digital Library (14 Jun. 2010) doi:10.1109/TSC.2010.32. [12] ptrace — Linux man page, 2010. URL: http://linux.die.net/man/2/ptrace. [13] Galen Hunt, Doug Brubacher, Detours: binary interception of win32 functions, in: Proceedings of the Third USENIX Windows NT Symposium, July, 1999. [14] M. Armbrust, A. Fox, R. Griffith, A. Joseph, R. Katz, A. Konwinski, G. Lee, D. Patterson, A. Rabkin, I. Stoica, M. Zaharia, Above the clouds: a Berkeley view of cloud computing, Tech. Rep. UCB/EECS-2009-28, University of California at Berkley, February 2009. [15] Filesystem in userspace, 2010. URL: http://fuse.sourceforge.net/. [16] Dokan—user mode file system for windows, 2010. URL: http://dokandev.net/en/. [17] Bowen Alpern, Joshua Auerbach, PDS: a virtual execution environment for software deployment, in: Proceedings of the First ACM/USENIX International Conference on Virtual Execution Environments, Chicage, Illinois, USA, 2005. [18] Computer Laboratory — Xen virtual machine monitor, 2010. URL: http://www.cl.cam.ac.uk/research/srg/netos/xen/. [19] Youhui Zhang, Dongsheng Wang, Applying file information to block-level content addressable storage, Tsinghua Science & Technology 14 (1) (2009) 41–49. [20] Microsoft Application Virtualization, 2010. UEL: http://www.microsoft.com/ systemcenter/appv/default.mspx. [21] Youhui Zhang, Xiaoling Wang, Liang Hong, Portable desktop applications based on P2P transportation and virtualization, in: LISA’08: Proceedings of the 22nd Large Installation System Administration Conference, USENIX Association, San Diego, CA, USA, 2008, pp. 133–144. [22] R. Bradshaw, N. Desai, T. Freeman, K. Keahey, A scalable approach to deploying and managing appliances, in: Proceedings of TeraGrid ’07 conference, Madison, Wisconsin, USA, 2007. [23] G. Kecskemeti, P. Kacsuk, G. Terstyanszky, T. Kiss, T. Delaitre, Automatic service deployment using virtualisation, in: Proceedings of 16th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2008, IEEE Computer Society, Toulouse, France, 2008, pp. 628–635. [24] Gabor Kecskemeti, Gabor Terstyanszky, Peter Kacsuk, Zsolt Nemétha, An approach for virtual appliance distribution for service deployment, Future Generation Computer Systems (2010) doi:10.1016/j.future.2010.09.009. [25] A. Epstein, D.H. Lorenz, E. Silvera, I. Shapira, Virtual appliance content distribution for a global infrastructure cloud service, in: Proceedings of 2010 IEEE InfoComm, San Diego, CA, USA, 2010, pp. 1–9. [26] Luis Rodero-Merino, Luis M. Vaquero, Victor Gil, Fermín Galán, Javier Fontán, Rubén S. Montero, Ignacio M. Llorente, From infrastructure delivery to service management in clouds, Future Generation Computer Systems 26 (8) (2010) 1226–1240. Youhui Zhang is an Associate Professor in the Department of Computer Science at the University of Tsinghua, China. His research interests include cloud computing, network storage and microprocessor architecture. He received his Ph.D. Degree in Computer Science in 2002.

Yanhua Li is a Ph.D. student in the Department of Computer Science at the University of Tsinghua, China. His research interests include cloud computing and microprocessor architecture. He received his Bachelor Degree in Computer Science in 2009.

Weimin Zheng is a Professor in the Department of Computer Science at the University of Tsinghua, China. His research interests include high performance computing, network storage, parallel compiler.

Suggest Documents