Providing Grid Services based on Virtualization and Cloud Technologies

3 downloads 10065 Views 227KB Size Report
CESGA is operating a totally virtualized grid infrastructure that supports several ... frastructure. Currently cloud computing technologies are being explored.
Providing Grid Services based on Virtualization and Cloud Technologies J. Lopez Cacheiro⋆ C. Fernandez,⋆⋆ E. Freire,⋆ ⋆ ⋆ S. Diaz,† and A. Simon‡ CESGA. Santiago de Compostela. Spain

Abstract. CESGA is operating a totally virtualized grid infrastructure that supports several production sites for different grid projects (EGEE, EELA, int.eu.grid, Ibergrid, and other regional grid projects) as well as several development sites created on demand to test new middleware releases both for EGEE certification and pre-production activities. The final architecture that results from several years of development is described showing how to apply modern virtualization solutions like Xen hypervisor to migrate from an entire physical cluster to virtual machines. Thanks to a collaboration with FORMIGA’s project the infrastructure also includes resources from computer labs. Worker Node VMs are automatically started following a pre-defined schedule that guarantees that the computers are not in use. Extensive benchmarks, including MPI jobs, have been performed to quantify the performance loss of the virtual infrastructure. Currently cloud computing technologies are being explored as a way to improve the service deployment process in our platform.

1

Introduction

CESGA [1] has been providing support for an increasing number of new grid users and projects for the last years. At present CESGA is supporting these projects EGEE III [2], INT.EU.GRID [3], IBERGRID [4], EELA-II [5], Spanish NGI [6] and FORMIGA [7]. All these projects are based on the gLite middleware [8] developed inside EGEE, and Spanish NGI also supports Globus 4 Toolkit [9]. Apart from supporting these grid projects, CESGA collaborates testing new middleware releases for EGEE certification and pre-production activities. These activities require a continuous deployment of updated and new services. The main reason to migrate CESGA grid services to virtual machines was the need to support new hardware running old operating systems (OS), like Scientific Linux 3 (SL3) [10] which was required by gLite in 2007. Thanks to the hypervisor new hardware (network interfaces, hard diks, etc) can be used in a ⋆ ⋆⋆ ⋆⋆⋆ † ‡

e-mail: e-mail: e-mail: e-mail: e-mail:

[email protected]. [email protected]. [email protected]. [email protected]. [email protected].

transparent way by the guest OS. After new infrastructure deployment the next step was to integrate it with computer labs to take advantage of unused cpu resources. This task was done by FORMIGA project, quite similar to BOINC [11] project, but focused on computer labs gridification for EGEE. VMs performance is also important to determine migration pros and cons. Along the past four years several papers were published with similar results. Walker [12] has compared benchmarks results running openMPI and openMP tests over NCSA cluster and Amazon EC2 [13], the results shows up to 21% degradation for openMP and up to 1000% degradation for MPI running computational fluid dynamics application (SP) using the Beam-Warming approximate factorization method. On the other hand Alexei Bavelski [14] has performed I/O drive bound tests which have revealed 5 to 10 times worse performance over Xen virtual machines where performance being worse in case of purely hard drivebound operations, such as sequential and random disks reads and writes. Caleb Ho [15] has also tested Xen performance using parallel applications, in the worse case VMs are up to 2 times slower. To finish this benchmarks list, Wei Huang [16] has written a dissertation about high performance networks I/O In Virtual Machines executing NAS Parallel Benchmarks [17] which shows up to 17% loss over Xen virtual machines. In this paper have been included several benchmarks to show specific CESGA virtual platform performance. This paper is structured as follows: in section 2, CESGA virtual infrastructure is presented. In section 3, impact on performance of virtualization is evaluated. Section 4 describes the use of the emergent cloud technology to improve resource management. Finally, section 5 provides a summary of the main conclusions of this work.

2

CESGA grid infrastructure

Two years ago the migration of our worker nodes (WN) to virtual machines (VM) was started mainly due to limited hardware support in SL3, the only OS supported by gLite at that time. In these sections the main characteristics of CESGA infrastructure are presented: virtualization, shared batch system and WN and the reuse of computer labs resources. 2.1

Virtualization

All grid services at CESGA are now running under a totally virtualized infrastructure which allows to easily support new projects on demand. When a new physical machine arrives it is quickly configured and installed using a specific kickstart script which automatically installs a new Xen [18] dom0 server from a local repository using Preboot eXecution Environment (PXE). Each new deployed dom0 can run several VM with different OS like SL3 or SL4, depending on the service requirements. When it is needed to configure a new service, a golden-copy from our local VM repository is used to deploy it into any available

dom0 and then, if it is based on gLite [8], it is configured using Yaim [19] and CESGA global site-info.def in a few minutes. The main advantages of our virtualized grid infrastructure are: – Better resource utilization: New services can be installed fast and there is no need to spend money on new hardware. It can be a solution for sites and users who increasingly demand more services. – Power saving: Several grid services are consolidated in a single server (up to 8 services depending on the requirements of the service) reducing the number of servers required and therefore reducing overall power consumption in the datacenter. – Easy replication: The deployment of a VM from an existing template or golden copy is done in just a few minutes. – Load balancing: If a VM requires more physical resources, it can be given more resources or migrated to another dom0. – Fault-tolerance: Xen combined with Logical Volume Manager (LVM) offers roll-back possibility using snapshots. In case of failure, a VM can be replicated using our daily backups and started in a different dom0. – Flexibility: Possibility of using old OS versions, like Scientific Linux 3 with modern hardware. 2.2

Shared Batch System and Worker Nodes

In order to share all the WN among the different grid projects supported at CESGA a shared batch system is required. In our case Grid Engine (GE) batch system [20] is used. gLite middleware requires a Computer Element (CE) for each grid infrastructure. The batch server is shared using one single GE qmaster server and a shadow qmaster for fault tolerance purposes with our current configuration. Jobs are submitted from different sources but all jobs are collected in a single batch server (qmaster in the GE nomenclature) that distributes them between all the available WN. At this point, a difficulty appears due to the fact that different WNs belonging to different grid projects require a different environment and in some cases even a different middleware version. This issue is solved by the GE JobManager developed at CESGA, it is configured on each CE to load the specific gLite environment for each project. With respect to the Storage Element (SE), it is only necessary to add the VO, VOMS servers and the users assigned to each grid project and reconfigure them with Yaim (see Figure 1). 2.3

Integrating computer lab resources

FORMIGA project has extended the gLite middleware to take advantage of idle resources at computer labs. The project final prototype is already working in several computer labs of the University of Santiago de Compostela and CESGA. These computers run Xen or VMware depending on the OS installed

Fig. 1. CESGA grid infrastructure schema with shared WN, GE batch system and SE

by the computer lab administrator, operating transparently to the user. The virtual machines installed at computer labs are used as WN that communicate with one CE allocated in CESGA securely through a virtual private network (VPN) managed using SSL/TLS for asymmetric encryption between the WN in computer labs and CESGA servers (CE, SE) (see Figure 1). All nodes are interconnected through the VPN, each virtual machine, after starting its network interface with connection to Internet, starts the client openvpn service. For security reasons only computer labs administrators can access directly to executed virtual machines, grid non-privileged users are authenticated using their specific X.509 certificate signed by a certificate authority (CA) to execute their jobs. The usage of virtual machines allows an easy deployment of the middleware in the computer labs and permits to migrate jobs between computers, besides the use of Xen on the platform provides flexibility to manage them. The VPN helps to avoid the restrictions of firewall and private networks usually configured in the computer labs.

3

Benchmarks

One of the main concerns about using a virtual infrastructure is the performance loss incurred by the virtualization layer. Not many years ago nobody would believe that a complete production site could be run efficiently under a virtual environment and at that time virtualization technology was useful only for academic or testing purposes due to their performance. With the improvements in virtualization technology and the arrival of Xen, an open source virtual machine monitor (VMM) that implements the concept of paravirtualization, new doors have been opened to implement a production grid infrastructure. To quantify the performance loss incurred by using virtualization technology, several benchmarks have been performed both using this virtual grid infrastruc-

ture and a physical one and shows a general overview of CESGA infrastructure performance. The main results of these benchmarks are presented in this section. In our performance evaluation a system composed of two Dell PE1955 blades with the following characteristics were used: – – – – – –

2 x Intel(R) Xeon(R) CPU E5310 @ 1.60GHz QuadCore 4GB DDR2-667 RAM, 73.4GB HDD Gigabit ethernet: Broadcom 5708 Dom0: Fedora Core 6 x86 64 VM: Scientific Linux 4 i386 Xen: Linux Kernel 2.6.18-1.2798.fc6xen

These two blades are very representative of CESGA configuration where most of the servers are of this type so it will give us a good estimation of the performance of our overall grid infrastructure. Each benchmark was executed three times using these dedicated nodes exclusively, the results shown are the arithmetic average of the measures. The benchmarks performed can be divided in two groups, synthetic benchmarks where the performance of specific components of the system was evaluated, and application benchmarks where some common applications has been selected to evaluate their performance in both platforms. 3.1

Synthetic Benchmarks

Several synthetic benchmarks were selected to measure CPU, filesystem and network performance: – – – –

Intel Linpack [21]: CPU performance Bonnie++ 1.03a [22]: filesystem performance benchmark tool Iozone 3.323 [23]: newer filesystem benchmark tool Iperf 2.0.4 [24]: modern alternative for measuring maximum TCP and UDP bandwidth performance – Effective Bandwith Benchmark (bef f ) 3.5 [25]: measures the accumulated bandwidth and latency of MPI jobs, an older version of this tool is part of the well-know HPCC benchmark suite All those benchmarks were executed using the same parameters both in the virtual system and in real machines. A summary of the results can be seen in figure 2. The file system performance of virtual and physical machines is very similar (less than 1% performance difference) except in the case of the write performance tests where there is an important performance penalty incurred by virtual machines. In the sequential write test the VM looses suffers a 15% performance degradation with respect to the physical one and in the random write test this degradation reaches 30%, these results are better than previous published works [14], only 1,3 times slower instead of 10 in the worse case, and could be affected

Fig. 2. Benchmark summary. Shows the performance loss of a virtual machine calculated as |Result(V M ) − Result(REAL)| /Result(REAL)

by the type of hardware used. This means that write performance is seriously impacted in virtual machines and their use for highly intensive I/O applications is not recommended. In our virtual infrastructure this is partially avoided by the fact that an externally exported StorageWorks Scalable File Share (SFS) filesystem [26] is used as the main storage source for the SE machines. Another important aspect is the CPU performance, because HPC grid and cloud computing users demand fast CPUs for their applications and VMs must provide an adequate performance. It should be mentioned here that CESGA virtual grid services are running in paravirtual machines. This has the disadvantage that the original guest must be modified to use the Xen kernel but offers many advantages in terms of performance. In this case the virtual guest is aware that it is running in a virtualized environment and it communicates directly with the Xen kernel hypervisor reducing the performance penalty incurred by virtualization. To compare CPU performance, all tests were performed running on x86 64 CPUs (with 51.2 Gflops of theoretical peak) using a x86 64 bit Xen kernel version both for real and VM machines. The synthetic benchmark chosen for these tests was Intel Linpack [27], both the x86 64 and i386 versions. This is due to the fact that the OS of virtual machines is the i386 version of Scientific Linux 4 and not the x86 64 version because the x86 64 version of gLite still has many bugs for a production environment.

Virtual machines performance running x86 64 linpack is exactly the same than real machines, but unfortunately for i386 binaries the situation is drastically different. VM looses about 22% performance (see Figure 2), in dom0 we obtain 37 GFlops but in Xen VM we obtain 29 Gflops running linpack compiled for i386 architecture. This situation may be due the Xen hypervisor must translate i386 instructions running into x86 64 kernel. This situation can be critical for user applications compiled for i386 which are executed on VMs with x86 64 kernels, the best option in this case is to recompile user applications for x86 64 architecture or change VM to it’s x86 64 version. Finally to evaluate network status it was used iperf benchmark, in this case were used two dom0s and two VMs to measure network speed between them. The result for real machines was a bandwidth of 871 Mbits/sec meanwhile for VMs this result decreases to 740 Mbits/sec, this loss (about 10%) can affect to jobs that requires more bandwidth than CPU usage. Bandwith performance can be a bottleneck for MPI jobs too, to measure MPI network performance was executed the algorithm of effective bandwith (bef f ). This algorithm measures the accumulated bandwidth of the communication network of parallel and/or distributed computing systems using several message sizes. The final result for real machines was bef f = 53.92 MB/s , Latency= 83.26 µs and for VMs bef f = 42.63 MB/s , Latency= 117.14 µs. This means a 40% loss in MPI latencies and a 21% loss in bef f which greatly affects MPI applications throughput as discussed below. 3.2

Application Benchmarks

Synthetic Benchmarks could be a good reference to compare different machines but it is ever better to test ”real-life” applications. In this case two well known scientific applications, platform and middleware independent, were selected: – Gaussian g03: Computational chemistry simulation package [28] – Gromacs 3.3.2: Molecular dynamics simulation package [29] In Gaussian case was used the serial version program running Na(H2 O)4 S4 symmetry example (Test339). In this benchmark results were quite similar on both cases, the test was terminated in 910s and 920s running on a real machine and VM respectively. This benchmark clearly demonstrate the analogous throughput between real and VMs when no intensive I/O operations are needed. The official Gromacs DPPC benchmark was executed using eight MPI processes running on same node. DPPC emulates a phospholipid membrane for a total of 121,856 atoms. VMs are 1% slower running this test and the loss is almost negligible. One of the most remarkable characteristics of this test was the great difference between real and virtual machines running the same DPPC benchmark over openMPI. It were used two nodes running eight MPI processes in each one. On real machines the test was finished in 604s, 28% faster than MPI execution using

a single node, but on the other side, VMs are much slower, finishing the tests in 2379s (1529s slower that running the same job only using 8 cpus). This result , VMs 294% slower than real machines running MPI, was also detected in other benchmarks [12] (up to 1000% slower in the worse case) and [15] [16] where parallel benchmarks are executed up to 20% slower over virtual machines. This shows that virtual machines performance is still weak running MPI applications. The cause of this low throughput is due the high network latencies dependency of Gromacs 3, this dependency is not always linear and has a great impact on the performance drop when is executed in a virtual machine where the VMM increases I/O overhead, translating VMs network accesses.

4

Towards Cloud Computing

As discussed above, most of the CESGA Grid infrastructure is based on virtual machines, this adds a lot of flexibility to the architecture, specially when services are necessary in a short time. This infrastructure uses a central repository which stores different virtual machines as Scientific Linux 4/5, Open Suse 9/10, etc. These virtual machines are like dummy boxes without any service, when a new gLite service is needed the dummy virtual machine image is copied by hand from repository to a specific Xen dom0. After that new VM is started on their new location, a new IP is assigned to it and middleware services are installed following the normal process. This procedure saves a lot of work and is very flexible, virtualization allows moving VM images to another location in a few seconds, change their available memory in hot, make image snapshots to recover them from a disaster, change CPU allocated,etc. Since Amazon announced their Elastic Compute Cloud [13] (”EC2” for short) on 2006, the word cloud has appeared repeatedly talking about new computer technologies. Eucalyptus [30] is an open source project which implement cloud computing technology and it is compatible with Amazon’s EC2 interface. Until now virtual machines are started by hand in our virtual grid infrastructure, but using a web service like Eucalyptus, VMs (or instances in cloud terms) could be started with a minimal effort running new gLite services, only by clicking in a web interface. Within this mechanism a grid user could start their own User Interface with a few clicks. One of the many benefits of this new structure is that grid administrators do not have to search for available resources, Eucalyptus will do the work for them. Another benefit is related to VM control; grid users/administrator can see all their running and stopped machines just connecting to a single web page or submit their own virtual images if they do not want to run pre-installed instances. This combined with the G-Fluxo [31] portal, a web portal for grid job submission, can be used by users to share resources more efficiently and more easily. At the time of writing this article, we are testing Eucalyptus in our grid farm. The first tests have been promising, Cloud services do not consume too much memory and Cloud Nodes and Controllers use secure internal communi-

cations with WS-security without efficiency lost. There are still questions to be addressed as fault tolerance if a Node Controller fails, Cluster Controller must check this issue and replace failing instances in a new dom0. Other question is about coexistence between Eucalyptus VMs and other VMs on same CN, at the moment Eucalyptus essentially ignores VMs started outside of its control, but probably these features will be available in next releases.

5

Conclusions

The main advantages of CESGA virtual grid infrastructure are flexibility, improved resource utilization, easy replication, load balancing and fault-tolerance capabilities. In order to share available resources between all the different grid projects supported at CESGA, the infrastructure uses a single batch system shared among all CE that allows to use a common pool of WN for all projects. The modifications required in the WN has been described and could be implemented for other grid projects. Additionally idle resources at computer labs are included in the infrastructure by using the software developed in FORMIGA’s project that allows an easy integration of these spare resources with any gLitebased environment. The results of our benchmarks show that the performance of the virtual infrastructure is very similar to the one of its physical counterpart (less than 1% performance difference) with two exceptions that it is worth to mention: I/O write performance and MPI latency. In the case of I/O write performance a degradation up to 30% could be experienced and, in the MPI latency case, benchmarks show a 40% loss, but the worst throughput result was running gromacs MPI over virtual machines. This means that CESGA virtual grid infrastructure is not recommended for running highly intensive I/O applications that rely heavily on random write performance or highly parallel MPI jobs where latency plays an important role. The second limitation does not represent a big problem in a typical grid environment where most of the jobs are not using MPI, but scientists should be aware of this performance penalty if they pretend to migrate MPI applications to cloud computing. To solve this issue, one of the principal virtualization objectives is to improve drivers performance and develop new OS-bypass and VMM-bypass mechanisms to avoid VMs I/O overhead. In our virtual infrastructure the effects of the limited write performance are partially avoided by the fact that an externally exported SFS filesystem is used as the main storage source for the SE machines. Cloud services add a new layer of abstraction, where available resources are managed more easily and grid administrators do not have to worry about searching the pool of resources, cloud does it. To make this possible, new open source projects like Eucalyptus offer us an excellent bridge to convert our Xen based infrastructure into a new cloud.

Acknowledgments This work was supported by European Comission under the project EGEE-III (INFSO-RI-222667), and by Xunta de Galicia under the projects FORMIGA (07TIC01CT) and G-Fluxo (07SIN001CT).

References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11.

12. 13. 14.

15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31.

: CESGA. http://www.cesga.es/ : EGEE. http://eu-egee.org : I2G Portal. http://www.i2g.eu/ : Ibergrid Wiki. https://web.lip.pt/wiki-IBERGRID/index.php?title=IBERGRID : EELA Portal. http://www.eu-eela.eu/ : NGI Portal. http://www.e-ciencia.es/wiki/index.php/Portal:Grid : FORMIGA. http://formiga.cesga.es : gLite. http://glite.web.cern.ch/glite : Globus toolkit. http://www.globus.org/toolkit/ : Scientific Linux. https://www.scientificlinux.org/ Anderson, D.: Boinc: A system for public-resource computing and storage. In: Proceedings 5th IEEE/ACM International Workshop on Grid Computing, Pittsburgh, EEUU (2004) Walker, E.: Benchmarking Amazon EC2 for high-performance scientific computing. ;login 33 (2008) 5 : Amazon Elastic Compute Cloud. http://aws.amazon.com/ec2/ Bavelski, A.: On the Performance of the Solaris Operating System under the Xen Security-enabled Hypervisor. PhD thesis, Linkopings universitet (Department of Computer and Information Science) (2007) Ho, C.: Evaluation of Xen: Performance and Use in Parallel Applications. EECE 496 Project Report (2007) 21 Huang, W.: High Performance Network I/O In Virtual Machines Over Modern Interconnects. PhD thesis, The Ohio State University (2008) : NASA. NAS parallel Benchmarks. http://www.nas.nasa.gov-/Software/NPB/ : XEN. http://www.xen.org : Yaim. http://www.yaim.info/ : Grid Engine (GE). http://gridengine.sunsource.net : Intel Linpack. http://software.intel.com/en-us/articles/intel-math-kernel-librarylinpack-download/ : Bonnie++ webpage. http://www.coker.com.au/bonnie++/ : Iozone. http://www.iozone.org/ : Iperf. http://sourceforge.net/projects/iperf/ : Effective Bandwidth Benchmark. https://fs.hlrs.de//projects/par/mpi/b eff/ : SFS Web page. http://h20341.www2.hp.com/HPC/cache/276636-0-0-0-121.html : Intel Corporation. http://www.intel.com/ : Gaussian webpage. http://www.gaussian.com/ van der Spoel, D., Lindahl, E., Hess, B., Groenhof, G., Mark, A.E., Berendsen, H.J.C.: Gromacs: Fast, flexible and free. J. Comp. Chem. 26 (2005) 1701–1718 : Eucalyptus webpage. http://eucalyptus.cs.ucsb.edu/ : G-FLUXO. http://gfluxo.cesga.es/

Suggest Documents