GPU Computing in EGI Environment Using a Cloud Approach

3 downloads 162588 Views 457KB Size Report
make them available in a Infrastructure as a Service (IaaS) model private Cloud. To this end a centralized mechanism that interfaces a generic EGI Grid site to an ...
2011 International Conference on Computational Science and Its Applications

GPU computing in EGI environment using a Cloud approach Flavio Vella, Riccardo M. Cefal`a, Alessandro Costantini, Osvaldo Gervasi, Claudio Tanci Dept. of Mathematics and Computer Science University of Perugia Via Vanvitelli,1 06123 Perugia, Italy [email protected], [email protected], [email protected], [email protected], [email protected] To this end a centralized mechanism that interfaces a generic EGI Grid site to an IaaS provider has been implemented to fulfill the job requirements. In order to develop and test the whole infrastructure, a fully working testbed has been built with the adoption of the Eucalyptus software system to implement a private Cloud over the cluster. Moreover, the need of the creation of virtual appliances to match the requirements of jobs that adopt Multy-Many core technology, such as OpenCL [14] or CUDA [11], has been addressed. In section 2 the Grid and Cloud paradigms are described; in section 3 the state of the art of HPC computing in EGI [6] Grid is addressed; in section 4 the articulation of the developed model and the evaluation of using virtual GPUs in HPC are described and analyzed. Our conclusions are summarized in section 5.

Abstract—Recently GPU computing, namely the possibility to use the vector processors of graphics card as computational general purpose units of High Performance Computing environments, has generated considerable interest in the scientific community. Some communities in European Grid Infrastructure (EGI) are reshaping their applications to exploit this new programming paradigm. Each EGI community, called Virtual Organization (VO), often requires specific environments, making necessary for each Grid site to enable an efficient system to fulfill VO’s software requirements. Cloud Computing and more generally the opportunity to transparently use computational resources, together with the consolidation of virtualization technologies, allows to provide to the end users the required environment for their activities. The present work is aimed at exploring the possibilities to provide for each VO an on-demand GPU environment (GPU framework, Operating System and libraries) and making it accessible via the EGI infrastructure using a Cloud approach.

II. G RID

I. I NTRODUCTION The growing interest for GPU computing in EGI communities such as Computational Chemistry (COMPCHEM VO) and Theoretical Physics (THEOPHYS VO) as well as other communities is a strong incentive to develop appropriate distribution models integrating the so called High Performances Computing (HPC) and High Throughput Computing (HTC) resources. In the present work the possibility of enabling the convenient usage of GPUs devices for VO users, exploiting the capabilities of the EGI infrastructure, and the emerging paradigm of Cloud Computing, has been explored. A strategy to provide on-demand execution environments has been proposed through the joint usage of traditional and widespread gLite [7] components and the popular standard EC2 web-service [1] APIs. An entire job flow that allows to discriminate the GPU resources requests, through JDL [4] parameters, has been defined in order to allocate, in a dynamic fashion, the required resources on a Cloud-like infrastructure either public, private or hybrid. To achieve this goal, part of the work has been devoted to the virtualization of the physical GPU resources in order to make them available in a Infrastructure as a Service (IaaS) model private Cloud. 978-0-7695-4404-5/11 $25.00 © 2011 IEEE DOI 10.1109/ICCSA.2011.61

AND

C LOUD

Cloud and Grid paradigms share some essential driving ideas and overlapping areas which lead to both the construction of large scale federated Grid infrastructures and the adoption of Cloud solutions that: - encapsulate the complexity of hardware resources and make them easily accessible by means of high-level user interfaces, - address the intrinsic scalability issues of large scale computational challenges, - cope with the need of resources that cannot be hosted locally. However, among the key differences between Grid and Cloud there are those related to abstraction and computational models [21]. According to this concept, Grids are designed bottom up as a federation of existing resources (typically legacy clusters built around a LRMS that exploits the batch computational model) for which the development of applications must take into account aspects related not only to the application features but also to Grid abstraction. This process, aimed at enabling the application to run in such environments, can be rather complex. On the contrary, Clouds enable the users to choose between different computational models suited to match the requirements of a particular application. This can be done 150 31

with the adoption of general interfaces [23] that often lead to simpler interaction and application development.

Hybrid batch independent: in this model the local Grid site spawns resources (even whole clusters) on public/private clouds on the basis of the jobs requests. The integration is done at CE level. The first model presents an important limitation: the tight dependency from LRMS that heavily reduces the portability to other batch systems. On the other hand, the second model allows the decoupling of the LRMS from the Cloud istances management, and it enables the creation of multiple virtual clusters, including LRMS. Starting from this consideration, the second model has been adopted to develop a functional prototype able to integrate Grid and Cloud paradigms described in section IV.

A. Limits of batch model Usually, different scientific applications have different computational needs (architectures, OS, libraries) and in Grid this could represent a problem due to its hardware heterogeneous nature and to the particular software selection agreements at organizational level that characterize the computational resources in a Grid environment. From a batch point of view this represents a set of requirements influencing scheduling decisions that have to be taken by top and local level resource managers. In the batch model, in fact, the resources are often statically managed and partitioned preventing any possible dynamical arrangement to match the application requirements. This makes the workload in Grid often unpredictable leading to an unbalanced distribution on the use of the available resources and to an important reduction of Quality of Services (QoS). Even if this issue has been addressed in various works [8], [18], [26], [27], the proposed solutions are often heavy customized and too tightly dependent from particular technological choices. With the recent explosion of Cloud computing and the spread of the IaaS provision model and stacks [5], [10], [13], there is a standardization of interfaces [1], [12] and a simplification of virtual resource management which allow the extension of private resource pools (in number and typology) bringing a positive effects to the QoS. There are still several aspects that need to be investigated and defined in the Cloud era (as data management and security) that are, however, well-established in Grid leading to the conclusion that a more reasonable approach could be an integration model that combines the features of both paradigms.

III. HPC

IN THE

EGI G RID

In this section we expose some consideration about the state of the art of High Performance Computing in EGI Grid with a focus on the changes brought by Many/Multi Core Computing. EGI provides access to High Capacity Systems to solve high dimensional problems using more than 140.000 core processors and 25 PetaByte of disk space distributed in about 300 Grid sites. On one hand, the Grid can be seen as the ideal computing infrastructure to solve very large computational campaigns, distributing calculations an the available Grid resources in a secure way. The possibility to solve embarrassingly parallel problems (High Capacity) as well as medium sized parallel problems, involving communications between nodes, make the Grid environment an ideal infrastructure to handle multi dimensional problems (see Figure 1). At present EGI allows the execution of High Performance scientific applications using, at Grid site level, the Message Passing Interface (MPI) communications protocol.

B. Integration Possibilities Even if the Cloud paradigm has been primarily defined, since their first formulation [19] as a business model, as matter of facts the IaaS model seems to overcome some batch related limitations due to its on-demand and adaptive features. By analyzing the batch oriented approach of Grid and the service oriented nature of Cloud, two different integration model has been considered: Hybrid with batch-dependent cloud-enabled LRMS: in this model a single local batch system is used to schedule the jobs on a pool of dynamically provisioned resources either on premises or public/private clouds. For example, [8] uses this approach.

Figure 1.

151 32

Capacity and capability in Grid Computing

MPI has gained an important role in many VOs such as COMPCHEM and THEOPHYS, paying some efforts to integrate the parallel environment with gLite middleware and, in general, with the high dependency from lower abstraction layers of batch jobs that require MPI. Nevertheless, the use of MPI in Grid takes some limitations due mainly to the policies adopted by the scheduler on each Grid site. As an example, when an MPI job requires processors and the cluster have only free processors, the job cannot be executed until the total amount of required processors will be available on the Grid site reducing the advantages of parallelization or causing job failures (see Figure 2.a). Moreover, MPI jobs performances are directly dependent from the local site configurations and characteristics: same MPI jobs will have different performances in different Grid sites depending on network, number of cores per CPU and other parameters often hidden to the user by the Grid abstraction (see Figure 2.b).

In this context, the opportunity of Many/Multicore Computing, or rather the use of heterogeneous resources (CPUs and GPU) as compute units, and in particular GPU Computing, could solve some of cited issues. First, one application developed with CUDA or OpenCL is inherently parallel but the management of its workflow is simple like a single job. Second the cost of communication among threads of such application is smaller on a single node in respect of an MPI one. In general, an application as the like would be less dependent from the said sites peculiarities. Obviously, not all HPC scientific applications can be implemented in Many/Multicore fashion and MPI paradigm will keep on playing an important role in these cases.



IV. A SIMPLE

INTEGRATION MODEL

In this section we will introduce a simple prototype that follows the third approach cited in II-B. The basic idea behind the prototype is to add to a typical grid site an application that “speaks” with the CREAM CE BLAH component and an IaaS provider. To create a working testbed we chose and implemented a simple infrastructure for the provision of GPU virtual instances using Eucalyptus [25] on top of the Xen Hypervisor [15]. A. The role of gLite CE The gLite Grid workflow revolves around Computing Elements (CE) that connect local resources to the Grid infrastructure. The role of the CE is to provide services for the execution of jobs on Grid resources that are managed by a LRMS such as Torque, LSF or Condor. In this environment the Computing Resource Execution and Management (CREAM) [17] CE is the last CE implementation in the gLite middleware which exposes a gSOAP web service interface to other Grid components such as job management clients allowing standard (via WMS [16]) as well as direct submission from the adopted User Interface. CREAM can submits requests directly to the LRMS through the Batch-system Local ASCII Helper (BLAH) component [2], [24], that provides an abstraction layer interface to the underlying LRMS. The BLAH service provides a minimal, pragmatically designed common interface for job submission and control to a set of batch systems [2].

(a) Success rate of MPI jobs performed in 2010 (squares) and 2009 (diamonds).

B. Eucalyptus Standardization is still an ongoing process in the Cloud Computing landscape. Until now the interfaces of interaction with IaaS clouds that received more attention are arguably the Amazon EC2 [1] Tools and the Open Cloud Computing Interface (OCCI) [12]. Eucalyptus (Elastic Utility Computing Architecture for Linking Your Programs To Useful Systems) is a software platform for the implementation of private clouds. Its development started at University of California, Santa Barbara

(b) Best performances (squares) and the worst performances (diamonds) of MPI jobs. Figure 2. Success rate (a) and performances (b) analysis of MPI job submission in the EGI sites that support COMPCHEM VO [20].

152 33

2000s, when both Intel and AMD implemented extensions to assess this problem. When hardware prerequisites are met Xen permits to transparently assign PCIE devices to VMs. This assignment can be performed at the creation of the virtual machine or at runtime. A single PCIE resource can not be shared between VMs. GPUs virtualization presents additional issues. Graphic adapters support many legacy x86 features that must be provided in the virtual environment. At the present time only a few GPUs officially support passing through to virtual machines. Others, even if not officially supported, may work with special purpose patches to Xen source code. The selection of commodity components, new chipsets, motherboards, CPUs and GPUs require a particular care since some components are not compatible with full hardware virtualization. A different approach separates and decouples the graphic driver in the front-end deployed in the virtual machine and the back-end in the real machine with a communication channel [22] established to communicate between the two environments. This technique allows sharing of GPUs between multiple virtual machines too. To asses O PEN CL performance gap between a real machine and a virtual one a testbed has been implemented using an Asus P7P55 LX motherboard with support for both I NTEL V IRTUALIZATION T ECHNOLOGY (VT- X ) and I NTEL V IRTUALIZATION T ECHNOLOGY FOR D IRECTED I/O (VTD ) provided by the I 7 870 CPU, a 4 cores, 64-bit, x86 part from I NTEL. As graphic adapter a F IRE S TREAM 9270, positioned by AMD as a so-called S TREAM C OMPUTING S OLUTION was tested. Xen relays on a Unix like OS (Domain 0) for control and management of the virtual machines (Domains U) and drivers support for the real hardware (Fig.4). AMD provides drivers and a development environment for Gnu/Linux distributions so Debian Squeeze 6.0 was chosen as OS for the Domain 0 as well as for the virtualized environment.

and it was first released in 2008. At the time of writing it is distributed in two releases: an open source one, that reached version 2.0.2 and is available for some of the major GNU/Linux distributions and an Enterprise Edition. Eucalyptus is built with a clean and flexible architecture that allows the deployment of IaaS Clouds on existing cluster infrastructures with a relatively small effort. To run the instances it leverages the most used and advanced virtualization software on GNU/Linux: it supports both the Xen Hypervisor and KVM [9]. It has four main components (Figure 3) which can be installed on a stand-alone or distributed fashion and which communicate through Web Services: Node Controller: installed on each node that will host the virtual instances, controls the allocation and sends informations about the availability of resources. Cluster Controller: Each Cluster Controller could represent a single cluster of Node Controllers and abstracts it in a way that could resemble the Availability Zone concept found in Amazon EC2. Cloud Controller: aggregates the informations from Cluster Controllers and exposes the cloud interfaces to the users. It acts as the entry point of the cloud for users’ interaction. Storage Controller and Walrus: it is used to store the virtual appliances and to offer a storage service to cloud users. Eucalyptus offers various means for the network virtualization that can interoperate with the existing infrastructure and allow the extension and isolation of entire subnetworks. As said, the interfaces to the Cloud Controller are EC2 compliant and this allowed us to use boto [3] API to develop our testbed.

Figure 3.

A sketch of the Eucalyptus architecture.

Figure 4.

C. Xen and GPU virtualization Xen is an industry standard hypervisor supporting x86, x86 64, IA64 and ARM architectures. Historically x86 architecture included privileged instructions that made full hardware virtualization difficult to achieve until the late

Xen architecture

To evaluate our configuration we used Phoronix test suite benchmark that it was performed using Xen 4.0 packages provided by Debian and PCIE pass through as an initial 153 34

attempt to asses viability. As shown in Fig.5 the Virtual GPU (in grey) is slower than Real GPU (in black) about 3 per cent.

Figure 5. Comparation of performance between virtual GPU (in gray) and real GPU (in black) using Phoronix test suite.

D. Implementation

Figure 7.

The architecture of our implementation is sketched in figure 7. The main components are the Grid site CE and BLAH, a daemon called resource-marshal and an IaaS provider that exposes EC2 interfaces. For each job the CE receives, after the BLAH parsing operation, we are able to gather all the informations for that job and marshal its execution. To do this we developed a basic daemon that manages the allocation of instances according to job informations and simple policies. In particular, job informations are used to determine the flavor of the required virtual appliance, specified via glue schema in the JDL. A sample JDL file is shown in Figure 6. If an instance is already available the job proceeds immediately. To implement such behaviour a call to simple client application is interposed in the BLAH XXX job submit scripts that inquires the resources marshal daemon about availability of resources. The resource-marshal keeps track of available instances and send requests of new ones to the IaaS provider via Python boto EC2 interfaces. We didn’t applied a strict termination policy for the instances in our testbed environment. However we determined the following as possible policies: job termination events can be used to trigger the reclamation of unused instances Grid users proxy life time and renewal mechanisms could be bound to the instances life cycle.



some instances could be kept permanently running if explicitly requested. Virtual appliances could be provided by users using the common bundling operation used by many IaaS providers. In our prototype we bundled a basic Scientific Linux 5 CERN Worker Node image with gLite middleware installed. V. C ONCLUSIONS In this work we presented a system to provide to EGI users a specific on-demand GPU environment to transparently execute jobs on GPU devices. The proposed system uses a Cloud approach, based on EC2 Compliant Clouds, in order to control the specific GPU-enabled VO’s environment from EGI middleware interfaces. We adopted a Hybrid, batch independent, approach in which the local Grid site spawns resources on public/private clouds on the basis of the jobs requests. This integration of Grid and Cloud environments has been performed at CE level implementing an application able to connect the EGI Grid (through the CREAM BLAH component) and the IaaS provider (implemented with Eucalyptus). This enables a fine grained control over the Virtual Instances and allows to perform the Accounting or resources. The proposed solution enables the users to run their applications (written in OpenCL or CUDA) in parallel, exploring the Many/Multicore capabilities of the Grid nodes. This innovative approach provides to the users an interesting alternative to MPI for running parallel jobs. The system is currently in its testing phase at UNIPERUGIA Grid Site and supports COMPCHEM VO. We plan to extend the work done to facilitate the access of users to the implemented solution and in particular to GPU resources. Furthermore we plan to establish synergies with

 

Type = "job" ; e x e c u t a b l e ="/bin/sleep" ; [...] C e R e q u i r e m e n t s="other.GlueHostMainMemoryRAMSize > 2048 && ( Member ( "GPU\" , o t h e r . G l u e H o s t A p p l i c a t i o n S o f t w a r e R u n t i m e E n v i r o n m e n t ) ) ";



Figure 6.

Implemented prototype architecture.

JDL example with GPU flavor

154 35

some other ongoing projects and initiatives, in particular with the StratusLab EU FP7 project.

[16] P. Andreetto, S. Andreozzi, G. Avellino, S. Beco, A. Cavallini, M. Cecchi, V. Ciaschini, A. Dorise, F. Giacomini, A. Gianelle, et al. The gLite workload management system. In Journal of Physics: Conference Series, volume 119. IOP Publishing, 2008.

VI. ACKNOWLEDGMENTS Acknowledgments are due for the financial support to COST CMST Action D37 GRIDCHEM, through the activities of QDYN and ELAMS working groups, EGIInspire contract 261323, MIUR PRIN 2008 contract 2008KJX4SN 003, ESA ESTEC contract 21790/08/NL/HE, Phys4entry FP7/2007-2013 contract 242311, Fondazione Cassa di Risparmio of Perugia and Arpa Umbria.

[17] P. Andreetto, S. Bertocco, F. Capannini, M. Cecchi, A. Dorigo, E. Frizziero, A. Gianelle, F. Giacomini, M. Mezzadri, S. Monforte, et al. Status and Developments of the CREAM Computing Element Service. [18] R. Buyya, D. Abramson, J. Giddy, and H. Stockinger. Economic models for resource management and scheduling in grid computing. Concurrency and computation: practice and experience, 14(13-15):1507–1542, 2002.

R EFERENCES [1] Amazon elastic compute cloud (ec2) web site. http://aws. amazon.com/ec2/.

[19] R. Chellappa. Cloud computing—emerging paradigm for computing. INFORMS 1997, San Diego, 1997.

[2] Blah guide. https://twiki.cnaf.infn.it/cgi-bin/twiki/view/ EgeeJra1It/BLAH guide.

[20] A. Costantini, L. Pacifici, A. Lagana, and S. Crocchianti. MPI support: Users view and prospectives. 2010.

[3] Boto python interface to amazon web services. http://code. google.com/p/boto/. [4] Cream jdl specification. 592336.

[21] I. Foster, Y. Zhao, I. Raicu, and S. Lu. Cloud computing and grid computing 360-degree compared. In Grid Computing Environments Workshop, 2008. GCE’08, pages 1–10. Ieee, 2008.

https://edms.cern.ch/document/

[22] F. Giunta, R. Montella, and others. A GPU Accelerated High Performance Cloud Computing Infrastructure for Grid Computing Based Virtual Environmental Laboratory.

[5] Eucalyptus website. http://www.eucalyptus.com. [6] European grid initiative (egi) web site. http://www.egi.eu.

[23] S. Jha, A. Merzky, and G. Fox. Using clouds to provide grids with higher levels of abstraction and explicit support for usage modes. Concurrency and Computation: Practice and Experience, 21(8):1087–1108, 2009.

[7] glite web site. http://glite.cern.ch/. [8] Infn wnodes web site. http://web.infn.it/wnodes/. [9] Kvm web site. http://www.linux-kvm.org/.

[24] E. Molinari. A local batch system abstraction layer for global use. In Proc. of XV International Conference on Computing in High Energy and Nuclear Physics (CHEP’06), 2006.

[10] Nimbus web site. http://www.nimbusproject.org/. [11] Nvidia cuda web site. home new it.html.

http://www.nvidia.it/object/cuda

[25] D. Nurmi, R. Wolski, C. Grzegorczyk, G. Obertelli, S. Soman, L. Youseff, and D. Zagorodnov. The eucalyptus open-source cloud-computing system. Proceedings of Cloud Computing and Its Applications, 2008.

[12] Open cloud computing interface (occi) web site. http://www. occi-wg.org/.

[26] L. Servoli, M. Mariotti, and R.M. Cefala. A proposal to dynamically manage virtual environments in heterogeneous batch systems. In Nuclear Science Symposium Conference Record, 2008. NSS’08. IEEE, pages 823–826. IEEE.

[13] Open nebula website. http://opennebula.org/. [14] Opencl web site. http://www.khronos.org/opencl/. [15] Xen web site. http://www.xen.org/.

[27] B. Sotomayor, K. Keahey, and I. Foster. Combining batch execution and leasing using virtual machines. In Proceedings of the 17th international symposium on High performance distributed computing, pages 87–96. ACM, 2008.

155 36