Using User-level Virtualization in Desktop Grid Clients for Application ...

3 downloads 36771 Views 241KB Size Report
Clients join the system voluntarily, download and run tasks of an application. In general, the client-side of a desktop grid should meet the following requirements:.
Using User-level Virtualization in Desktop Grid Clients for Application Delivery and Sandboxing

Youhui Zhang, Yanhua Li, Weimin Zheng Department of Computer Science and Technology Tsinghua University Beijing, China [email protected] Abstract—Desktop grid is a form of distributed computing that harvests the computing power of idle desktop computers whether these are volunteers or deployed at an institution. This paper proposes to design and implement a client-end framework of desktop grid, based on cloud storage, which can also deliver legacy applications on-demand to volunteer machines. From technical viewpoint, an isolated execution environment based on user-level virtualization will be achieved: on one side, some resource-accessing APIs for applications are intercepted and redirected to real storage positions (on the cloud); therefore legacy applications can run without installation to simplify maintenance. On the other side, this mechanism can also construct a sandbox to restrict the access range of applications for security. This proposal adopts promising cloud-related technologies for requirements of volunteer computing; tests show that it only causes limited performance loss. Keywords-user-level virtualization; desktop grid; sandboxing

I.

INTRODUCTION

More and more scientific research institutions employ grid computing [1] methods to solve challenging algorithm problems, especially for those complex applications with a large number of parameters. Desktop grid computing is voluntary, which can integrate thousands of individual and heterogeneous desktop computing resources to provide extremely powerful computing capabilities. The existing desktop grid platforms include SETI@home[2], BOINC[3], HowU[4], Paradropper[5], XtremWeb[6] and so on. All of them own some similar features, like low costs, easy to deploy and high performance. The common architecture of desktop grids typically consists of one or more central servers and a large number of clients. The central server provides the applications and their input data. Clients join the system voluntarily, download and run tasks of an application. In general, the client-side of a desktop grid should meet the following requirements: • Easy to deploy applications. • Security running. • Transparent to applications. • Unobtrusiveness. However, these features are sometimes contradictory: some legacy applications that have either no source code

available to modify or would require too much effort to port. Moreover, some ported applications may be malformed or malicious; how to maintain the running security and the unobtrusiveness becomes a real challenge. To overcome these problems, we adopt some technologies from desktop cloud. As we know, the so-called desktop cloud has gradually emerged and enables common users to access desktop applications as a service through the Internet. Besides many desktop cloud solutions based on Web applications [7][8] or thin-client computing [9][10], there are a few solutions using the user-level virtualization technologies, like [11][12][13], which is our point of interest. The user-level virtualization environment can intercept resource accesses from the application, and redirect them to those resources stored in the network as needed. Then, the client can directly use the existing desktop software as a network service, which is stored in the network and running on the local computer on-demand. This paper proposes a new solution on the client-end of the desktop grid, which uses the user-level virtualization to ensure easy deployment, running security, application transparency and unobtrusiveness. Moreover, a distributed user-level virtual file system based on cloud storage is implemented to enhance the convenience and transparency of usage further. II.

RELATED WORK

Desktop Virtualization [14][15], combined with cloud computing, allows users to run desktops on virtual machines (VM) hosted at the data center and access them as a service through some remote desktop protocol, which is also called as desktop cloud . In this mode, the client PC is used as a GUI, which transfers input and output between the user and the data center where software is really running. Therefore, a user might not be content when it is employed across the Internet because of the long network latency [7]; and all of the software running in the server may become the performance bottleneck. To solve these issues, some desktop clouds use Web applications [7], [8] where the browser is employed as the running platform. However, the enormous legacy desktop software cannot be used in this mode directly. To fully utilize local PCs and legacy desktop software, several light-weight desktop cloud solutions are proposed to

use the user-level virtualization technologies, like SoftGrid [11], Citrix’s Virtual Desktop [12] and IBM’s PDS [13]. Our previous works [16], [17] provide such a solution based on lightweight virtualization and p2p transportation technologies. On the other hand, cloud storage services may be accessed through a Web service application programming interface, like Amazon’s S3 interface [18]. In this work, such services can be used as the storage backend. BOINC [3] is a system that makes it easy for users to create and operate distributed projects for desktop grid computing. For legacy applications, BOINC provides a wrapper to handle the simple cases; however it lacks flexibility. To overcome this problem, GenWrapper [19] provides a more generic solution for wrapping and executing an arbitrary set of legacy applications by utilizing a POSIXlike shell scripting environment. Compared with GenWrapper, our work is a total solution that forms a whole virtualized running environment, rather than only a shell extension. On the other hand, compared with other solutions based on the virtual machines (VM), like Virtuoso [20], InVIGO [21] and Denali [22], this work is light-weight because there is no need to bring the entire system image and the user-level causes less performance loss obviously. One similar work is Entropia [23], which supports the safe and controlled execution of applications expressed as native x86 binaries. Compared to it, our solution employs the distributed user-level virtual file system on cloud storage to simplify the application delivery and improve the system compatibility further. III.

ARCHITECTURE

The desktop grid aggregates the raw desktop resources into a single logical resource, which provides high performance for applications through parallelism. The client provides protection for the desktop PC, an easy deployment mechanism and a transparent running-environment for the applications. A. The Overall Architecture The desktop grid system contains two parts: the server side and the client side (Figure 1). Further, the server side can be regarded as including two layers:

Figure 1. Architecture of the desktop grid Computing

• The Job Management Layer For the desktop grid, a large distributed computing job is usually broken down into a large number of individual subjobs; each is submitted into the resource management layer for execution. A sub-job is the unit of schedulable work that is to be delivered and run on a desktop machine. • The Resource Management Layer The desktop grid often consists of client resources with a wide variety of configurations. This layer takes the sub-jobs and matches them to appropriate client resources (the related information is given from the client-side) and schedules them for execution. On the client-side, the client management provides basic communication to and from the server and reports the dynamic information about each physical client to the centralized server. Moreover, it allows the running of sub-job applications in the user-level virtualization environment, as well as providing all necessary running resources transparently. B. The Client Framework As mentioned in Section 1, our contributions are focused on the client-end; therefore we concentrate on the client design. Each client management contains three components, the main controller, the sandbox and a virtual file system based on cloud storage. • The Main Controller It monitors and controls the sub-job running on the client machine. The desktop controller gets assigned a sub-job from the server (the resource management layer), and is responsible for launching the sub-job process(es) in the sandbox. It also monitors the process(es) for unobtrusiveness, which is described in detail in Section 4.3. • The Sandbox It is a user-level virtualization environment layered between the sub-job process(es) and the operating system. It provides desktop security, a clean execution environment, application security and is part of the solution for providing unobtrusive features. To contain the sub-job inside the sandbox, we use the system API wrapping technology to insert a mediation layer between the sub-job and the local resource. • The User-level Virtual File System Many existing desktop gird systems usually contain two separated steps for the client-side to run a sub-job application: downloading the application files (including the executable file, necessary DLLs, all of the input files and so on), and then running. In this design, we introduce a distributed virtual file system to combine these steps together to simplify the application deployment. In detail, the application files are stored in the cloud storage system (for example, Amazon’s S3). When some sub-job application is assigned to a client, the latter mounts the remote position as a local virtual file system (it can be a separated virtual drive or a virtual folder). Then the main controller can launch it as a local application directly; some pre-fetch and local cache mechanisms are adopted to

improve the running performance. During the running time, if the application modifies and/or creates some files (like the output files), a copy-on-write (COW) mechanism is used to avoid write conflict among multiple clients and all modifications will be cached in the local.

Figure 2. The Virtual File System (Clients run the APP_A application; each mounts the remote storage as a local position while the COW mechanism is adopted.)

It provides a file-system image for any application; then the application can use the most common file IO as the input/output method, which simplifies the application porting and is usually compatible with quite a few legacy programs. Moreover, this design can make full use of the advantages of cloud storage, such as dynamic scaling of infrastructure and rapid service provisioning; then it reduces the server-end complexity. IV.

THE CLIENT IMPLEMENTATION

The client-end is installed and running on each volunteer machine. It controls one or more sandbox and each sandbox is responsible for managing a sub-job. Note that a single subjob can consist of multiple processes. Now we use a trampoline approach from Detours [23] to implement the sandbox. Based on the interception, we can construct some virtualized resources to ensure convenient and safe running of the application. For each sub-job, an instance of the virtual file system is used to create a dedicated virtual position to contain its application files and provides the full isolation between different jobs. A. Application Deployment Application files are stored in some network position. Then, if one volunteer requests to run the application or if it is assigned to execute on this volunteer’s machine, the main controller will mount the corresponding network position as a local virtual folder and start up the remote executable file directly.

We use the Dokan 0.6 [24] Framework as the basement of our user-level virtual file system. Dokan is an open-source framework of the user-space file system for Windows OS. It contains a user mode DLL and a kernel mode file system driver. File operation requests from user programs will be sent to the Windows I/O subsystem which will subsequently forward the requests to the Dokan driver. File system programs are able to register callback functions to the file system driver, which invokes these callback routines in order to respond to the requests it received. The results of the callback routines will be sent back to the user program. For example, one sub-job application, App_A, contains the following files: the main executable file, some DLL files, one configuration file with some input data stored in a separated folder and an independent output folder. After patching, all files are put in the WEB server. Any file or folder of the application is identified by its full path name. The URL looks like http://server_address/subjobs/App_A/full...path/filename. Then, App_A will be mounted as a virtual Dokan directory under the local NTFS drive while the original hierarchy is maintained. Therefore, any access to the virtual folder is converted into an HTTP request. When the application accesses its files, this method can provide a consistent view of file system. If some path out of the virtual directory is to be accessed, the sandboxing mechanism can intercept these accesses because the virtual file system cannot capture them. Besides mapping the application’s directory hierarchy into the local ones, there is another feature: all application files now are fetched on-demand: this means only the data actually used by the concrete running procedure will be fetched from the remote storage. This can reduce the local disk consumption, especially for those applications with quite a few running configurations and only a limited number of them are executed by a given client. • Write 1) If an existing file is to be modified, a copy-on-write mechanism is triggered: this file will be fetched from the remote fully and saved in the local cache; any subsequent access would be issued to the local version. 2) If a new file is created and written, it is also stored in the local. After the sub-job completes, all created or modified files are uploaded in a batch mode and each file name is prefixed with the client name for distinguishing. B. Desktop Control After distribution, the main controller launches the application within the sandbox. Besides the common control operation, like suspend and termination, the controller should monitor and limit the usage of a variety of key resources (including CPU, memory, disk, I/O, threads, processes, etc.) to ensure unobtrusiveness; the amount of resources that a sub-job consumes does not interfere with the usage of the desktop machine. If a sub-job attempts to use too many resources, the controller will pause or terminate all of the sub-job’s processes.

By inserting an agent thread into the target process, we achieve the above-mentioned functions. C. Access Control All APIs accessing files and registry entries have been wrapped and the access control strategies are embedded into the inserted code. By default, all local resources are readonly and the COW mechanism is used for any modification, which has been described in the previous subsection. Moreover, the user can set some part of the local resources as not accessible for a sub-job. Any violation will cause the Main Controller to kill the process. V.

PERFORMANCE EVALUATION

We deploy such a prototype in our lab and several applications are tested within this system. We have measured the performance of four programs from different application areas: Monte Carlo-based molecule modeling (abbreviated as MC): models many molecules in a cubic box. There are interactive forces between each pair of the molecules, which conform to the Lennard-Jones potential model. SUMIGFD [25]: an explicit PDE solving program for seismic processing; the kernel is a 45-90 degree Finite difference migration program. PDE (Partial Differential Equation)-based options pricing (abbreviated as OP): solves the Black Scholes pricing model with complex boundary conditions. The more price paths that are sampled, the more accurate the expected value will be. HmmerSearch: an important bio-informatics technique for understanding possible drug toxicity, potency and other biological interactions; the key is to find and align distantly related gene-sequences and identify known sequence domains in new sequences. All experiments are performed on a Windows 7 client PC equipped with one i5 (2.53GHz) CPU, 3GB memory and a 100M Ethernet connection. The WEB server is also located in the campus network; the network through-put between the client and server is about 1.96MBps and the average response time is about 5ms. The running time for these four applications in three cases is tested: 1) All applications are running in the native mode (note that the downloading time is counted as a part of running time); 2) Running in our client-end and the hit ratio of the local cache of virtual file system is 100%; 3) Running in our client-end and the local cache is empty in the beginning. The storage consumption is presented in Table 1: for Case 1, it is the total size of all application files; for Case 2 and 3, it is the size of the local cache when the application has finished. For MC, it is a very small program. Therefore the latencies of remote accesses almost have not caused any performance loss, especially with the relatively long running time. SUMIGFD is a program with medium-sized storage consumption. The remote accesses have caused very limited

performance loss. During the second run-time (all needed data has been cached), our system almost causes no performance loss. HmmerSearch and OP both have a relatively large inputset. Therefore, the remote accesses have caused a little performance loss. For OP, the loss is higher because its run time is shorter than that of HmmerSearch. All results are presented in Figure 3. Comparing Case 1 and 2, we can see that the overhead caused by the environment itself can be almost ignored (< 1.2%). For the medium or large-sized application, the networked virtual file system with an initially-empty cache introduces some performance loss. Fortunately, because most desktop grid applications are computation-intensive, the real influence is also limited (< 18%). Moreover, some applications (HmmerSearch and OP) consume less local storage, since the virtual file system introduces the fetch-ondemand mechanism and not all input data are needed for computation.

Figrure 3. Run-time comparisons

VI.

CONCLUSIONS

This paper presents a client-end design for desktop grid computing based on some cloud technologies. We use the user-level virtualization to construct a transparent and secure running environment for applications, and to avoid the potential resource contention with local programs. Moreover, a distributed user-level virtual file system based on cloud storage is implemented to enhance the convenience and transparency of usage further. Performance tests show that the user-level virtualization itself only causes very limited extra overhead. For the virtual file system, it introduces higher IO latencies; fortunately, most desktop grid applications are computation-intensive and have a long running-time. Therefore, the actual influence is limited. ACKNOWLEDGMENT This work is supported by the Open Research Fund Program of the Beijing Key Lab of Intelligent

Telecommunications Software and Multimedia, and the National Grand Fundamental Research 973 Program of China under Grant No. 2007CB310900. REFERENCES [1]

[2] [3]

[4]

[5]

[6]

[7] [8] [9] [10]

[11] [12] [13]

Fran Berman, Geoffrey Fox and Tony Hey. Grid Computing: Making the Global Infrastructure a Reality. Published by Wiley Press, March 2003. http://setiathome.berkeley.edu/. D. P. Anderson. BOINC: A System for Public-Resource Computing and Storage. 5th IEEE/ACM International Workshop on Grid Computing, November 8, 2004, Pittsburgh, USA. Deqing Zou, Hai Jin, Zongfen Han, Xuanhua Shi, Weizhong Qiang, "An adaptive scheduling model for HowU grid system", Mini-Micro Systems (in Chinese), Nov. 2004, 25 (11):1889~1893. Liu Zhong, Dou Wen, Zhang Wei Ming, Zou Peng. Paradropper: A General-Purpose Global Computing Environment Built on Peer-toPeer Overlay Network. Proceedings of the 23rd International Conference on Distributed Computing Systems, Washington, DC, USA, 2003. G. Fedak, C. Germain, V. Nri and F. Cappello. XtremWeb: A Generic Global Computing System. CCGRID2001 Workshop on Global Computing on Personal Devices, May 2001. http://www.eyeos.org/. http://www.gladinet.com/p/moreaboutdesktop.htm. http://www.zoho.com/. Kirk Beaty, Andrzej Kochut, and Hidayatullah Shaikh. Desktop to cloud transformation planning. Proceedings of 2009 IEEE International Symposium on Parallel & Distributed Processing. Rome, Italy, May 23-29, 2009, pp.1-8. www.microsoft.com/systemcenter/softgrid/default.mspx. www.citrix.com/virtualization/virtual-desktop.html. Bowen Alpern, Joshua Auerbach, Vasanth Bala, Thomas Frauenhofer, Todd Mummert, and Michael Pigott. PDS: a virtual

[14] [15]

[16]

[17]

[18] [19]

[20] [21]

[22]

[23] [24] [25]

execution environment for software deployment. Proceedings of the First ACM/USENIX international conference on Virtual execution environments, March, 2005, pp. 175-185. IBM Virtual Infrastructure Access Service Product. https://www935.ibm.com/services/au/gts/pdf/end03005usen.pdf. Windows Server 2003 Terminal Services. http://www.microsoft.com/windowsserver2003/technologies/terminal services /default.mspx. Youhui Zhang, Xiaoling Wang, and Liang Hong. Portable Desktop Applications Based on P2P Transpor-tation and Virtualization. Proceedings of the 22nd Large Installation System Administration Conference (LISA '08) San Diego, CA. USENIX Association, November 9–14, 2008. Pp. 133–144. Youhui Zhang, Gelin Su, and Weiming Zheng. Converting Legacy Desktop Applications into On-Demand Personalized Software," IEEE Transactions on Services Computing, 14 Jun. 2010. IEEE computer Society Digital Library. IEEE Computer Society. Amazon Simple Storage Service (Amazon S3). http://aws.amazon.com/s3/. Attila Csaba Marosi, Zoltan Balaton, Peter Kacsuk. GenWrapper: A generic wrapper for running legacy applications on desktop grids. 2009 IEEE International Symposium on Parallel&Distributed Processing. Rome, Italy. May, 2009. P. Dinda. Virtuoso: Distributed computing using virtual machines, 2003. R. Figueiredo, P. Dinda, and J. Fortes. A case for grid computing on virtual machines. In Proceedings of the 23rd International Conference on Distributed Computing, May 2003. A. Whitaker, M. Shaw, and S. Gribble. Denali: Lightweight virtual machines for distributed and networked applications. In Proceedings of the USENIX Technical Conference, June 2002. G. Hunt and D. Brubacher. Detours: Binary interception of win32 functions. In USENIX Windows NT Symposium, July 1999. http://dokan-dev.net/en/. http://www.cwp.mines.edu/cwpcodes.

Suggest Documents