Use of Late-Binding Technology for Workload Management System in ...

4 downloads 3417 Views 2MB Size Report
between the jobs and computing element (CE), and the validation of WorkerNode (WN) environment help significantly reduce the failure rate of Grid jobs.
Use of Late-Binding Technology for Workload Management System in CMS Sanjay Padhi, Haifeng Pi, Igor Sfiligoi, Frank Wuerthwein University of California, San Diego

Abstract—Condor glidein-based workload management system (glideinWMS) has been developed and integrated with distributed physics analysis and Monte Carlo (MC) production system at Compact Muon Solenoid (CMS) experiment. The late-binding between the jobs and computing element (CE), and the validation of WorkerNode (WN) environment help significantly reduce the failure rate of Grid jobs. For CPU-consuming MC data production, opportunistic grid resources can be effectively explored via the extended computing pool built on top of various heterogeneous Grid resources. The Virtual Organization (VO) policy is embedded into the glideinWMS and pilot job configuration. GSI authentication, authorization and interfacing with gLExec allows a large user basis to be supported and seamlessly integrated with Grid computing infrastructure. The operation of glideinWMS at CMS shows that it is a highly available and stable system for a large VO of thousands of users and running tens of thousands of user jobs simultaneously. The enhanced monitoring allows system administrators and users to easily track the system-level and job-level status.

C

I. I NTRODUCTION

ONDOR glidein can be used to submit and execute the Condor [1] daemons on a Globus [2] resource. During the period of the daemons running, the remote machine appears as part of the Condor pool at the submission host. The condor glidein technology includes two important features: •



The glideins, as pilot jobs, can be late-binded with user jobs. Any technical problem occurred at remote machines can be detected by glideins, which allows the submission host to hold user jobs or redirect them to other functioning resources. In addition to detect the Grid failure at middleware level, pilot jobs can also check the configuration of WorkerNodes (WN) and the running environment. Equipped with late-binding, a significant faction of user job failure can be avoided. The host that manages the glideins can build a virtual unified condor pool on top of remote computing machines even of different sites. Thus the complexity of the Grid resources provided by a number of sites is largely reduced. This technology essentially creates a layer of abstraction for the heterogeneous Grid resources and makes the Grid transparent to the high level job creation and submission tools.

Condor glidein technology provides an ideal platform for some Virtual Organizations (VO) to build an advanced workload management system to efficiently deal with a large number of user jobs, different types of applications and a large group of resource providers.

In addition to the advantages mentioned above which are mainly on the submission side, the Grid resource providers will also get better CPU efficiency due to more prompt discovery and utilization of opportunistic resources by glideins. Since the user jobs are associated with glideins, it is possible for Grid sites to identify and classify the type of user jobs and implement more delicate scheduling priority for different types of applications. The glidein-based workload management system (glideinWMS) [3] is an implementation of Condor glidein technology for the distributed Grid computing at Compact Muon Solenoid (CMS) experiment [4]. This paper describes the integration strategy, operation experience and benchmark performance of glideinWMS at CMS. It should be emphasized that the core of the system is a general purpose workload management system and can be easily ported to other Virtual Organization’s computing environment. II. A RCHITECTURE

OF GLIDEIN WMS

Details of the glideinWMS architecture and working mechanism are described in [3]. Here is a brief review of major logical components of the system: • A condor central manager that runs condor collector and negotiator. • One or more condor submission machines that run condor schedd. The users send normal condor jobs to these schedds. • A glideinWMS collector that mainly serves as a dashboard for message passing between glideins and user jobs. • One or more VO frontends that regulate the number of glideins to be submitted according to the number of user jobs waiting in the queue. • One or more glidein factories that submit glideins to the Grid. • glideins (pilot condor job) that are submitted by glidein factories and run at the Grid sites. The architecture of glideinWMS is shown in Fig. 1. In the following, the mechanism of how glideinWMS works are discussed: • Submission of glideins Two components: glidein factoriy and VO frontend, together decide how glideins are sent and to which Grid sites. These two processes communicate via ClassAds using condor collector daemon. The glidein factory publishes the information of Grid sites according to the list of







sites in the configuration. The VO frontend uses the jobs in the submission queue to match to the factory ClassAds. If a match occurs, the VO frontend will instruct the glidein factory about how glideins should be submitted. Once the factory receives the “instruction” (actually the ClassAds from VO frontend), it will submit the glideins to the Grid. Each glidein will run condor startd if it is able to run at the Grid site. All the condor startd are collected to form a condor pool by the glideinWMS. Condor pool and job management The glideinWMS condor pool manages the condor execution daemons, condor startd, which are submitted as Grid jobs (glideins). The condor central manager manages the condor pool by collecting the information from all other daemons especially the daemons of user jobs in the queue at submission machines and deamons of running glideins at the WorkerNode of Grid sites. The condor negotiator can match the attributes of user jobs to those of glideins. If a match occurs, the user job can be sent to that matched WN via condor shadow daemon and run by the condor starter at the WN. Condor Connection Broker (CCB) is used to provide oneway connection between glideinWMS and WorkerNode (WN) if there is a firewall between them. GSI authentication and authorization Since the Grid sites treat glideins as normal Grid jobs, one or more valid X509 proxy are used for glideins. The user jobs contains the valid X509 proxy of the user which is handled by condor schedd. The authentication of users can be file system-based or GSI-based. For a site serving a large group of Grid users, GSI-based authentication using GUMS is a scalable and flexible approach. Security in running glideins and user jobs glideins behave like a Grid user running at the Grid sites and manages the actual user jobs that are submitted to glideinWMS system. The condor execution daemon, condor startd, running at the WN is under the identity of glidein’s. The most secure mode for glidein to process actual user jobs is to use gLExec, so that the site is able to identify the actual user via its X509 proxy and use a different user id to run the jobs. But for the WN environment at a site, the availability of gLExec and setup of a large group of user accounts for one or more VO(s) with full mapping (X509 proxy to username) may be absent, so the glidein needs to use its identity to run the user jobs, which leads to security problem. So it is advised not to run glideins at the system without the support of gLExec. If a fully specified Grid job creation and management system is used by a VO, the execution commands and libraries for the user jobs are well defined and checked and leaves little room for unsecured access to Grid resources. But there is still possible potential security hole in the system for users to be able to run any code. We believe this is largely subject to VO’s policy. A system built on top of glideinWMS needs to be secure, which is



beyond the scope of glideinWMS’s security model. Configuration of glideins glidein contains a shell script designed to download and execute other files. These files are maintained at the web server by glideinWMS. A VO or service provider that runs glideinWMS has a lot of flexibility in putting useful commands or implementing useful functionalities for glideins, e.g. let the glideins check the Grid middleware status, application software availability, I/O of the site, etc.

Fig. 1.

Archiecture of glideinWMS

III. I NTEGRATION OF

GLIDEIN WMS WITH

CMS

COMPUTING TOOLS

The implementation of glideinWMS at CMS is to take it a layer of service as shown in Fig. 2, which is interfacing with two major CMS Grid job creation and submission systems: • Production and Data Reprocessing via a tool called ProdAgent [5]. This system targets at relatively small amount of privileged users that run the data processing for the CMS VO. The Grid jobs are run at Tier-0, Tier-1 and Tier-2 sites with dedicated resources. The use of glideinWMS aims at better management of large scale dedicated CPU resources, short turn around time in handling jobs and very low job failure rate. • Data analysis via a tool called CRAB [6]. This system targets at all Grid users at the CMS VO. The Grid jobs are primarily run at Tier-2 sites and with relatively small amount for Tier-3 sites, in which those sites publish the datasets for data analysis. For most of time, user jobs need to compete on the CPU resources. Multiple destinations can be specified by users if the data are available at those destination sites. Before using glideinWMS, a resource broker (RB) can used to select a site from the white list of the sites defined at user jobs, or the system needs to randomly pick a site to send the jobs. The use of glideinWMS aims at better discovery and scheduling of CPU resources for targeted sites, handling large scale of user jobs, handling large scale of highly distributed CPU resources, and low job failure rate.

glideinWMS is also used for user-level Monte Carlo production, which targets at those opportunistic CPU resource across the CMS VO.

Fig. 2. Integration of glideinWMS with CMS job creation and management tools

The technical details of integration between glideinWMS and CRAB is described in the following and shown in Fig. 3. •





CRAB uses a server/client model to create, submit and manage user jobs. The jobs are created at the client side and transfered to server. The server sends the user jobs to the Grid and maintains the job status at the database which can be retrieved by the clients. The implementation of glideinWMS creates a service layer that makes user jobs late-bind with the sites, discovers the valid CPU resources at the site, and makes a simple unified condor pool for the CRAB server. The CRAB server will treat the Grid like a large local condor pool and put user jobs on the queue of the pool. gLExec and GUMS are used together to map Grid user proxy to local user and send the condor jobs via mapped username. In this way, the CRAB server can identify every single Grid user and use appropriate identity to send the jobs to the glideinWMS. An enhanced monitoring system is accomplished for CRAB server and glideinWMS for both system administrator and Grid users. The status of glideins and general statistics of user jobs related to targeted Grid sites can be retrieved from the glideinWMS built-in monitoring web service. The specific job monitoring and debugging can be done via tools provided by glideinWMS that can directly access the running jobs at the WN via glideins. The job monitoring web service for users is achieved by the tool described in [7] that is able to provide a logical view of user jobs in quasi-real time with plenty of details including logs to evaluate the status of the jobs.

Fig. 3.

Integration of glideinWMS with CRAB infrastructure

IV. O PERATION E XPERIENCE

OF GLIDEIN WMS IN

CMS

The development and operation team of University of California San Diego (UCSD) and Fermi National Accelerator Laboratory (FNAL) have accumlated more than 3 years of experience in operating the glideinWMS for dedicated data reprocessing or more general service for physics data analysis. A detailed description of the experience of operating glideinWMS at CMS can be found in [8]. Here is a summary of operation scalability of the glideinWMS at CMS • Over 50 CMS Tier-2 and 10 tier-3 sites participate a production glidein factory with 16 thousand pledged slots as of June 2009. • Sustainable running over 12 thousand user jobs simultaneously and peaking at 14 thousand, 200 thousand idle jobs in the glideinWMS queue, 22 thousand running glideins. • Large scalability in collector over WAN and in condor schedds • High success rate for analysis jobs. Common failure in grid middleware level is prevented. Pilot jobs are able to detect setup issues in the running environment and application software at the site . • More than 4 thousand user-level production jobs with low priority running at opportunistic resources. • The operation of glideinWMS is integrated with the overall management of user analysis and data operation. Various efforts have been taken to analyze and prevent the root cause of job failure with glideinWMS and external monitoring infrastructure. CMS has conducted several large scale computing exercises to test the readiness and scalability of the overall computing infrastructure. The overall status of job running by glideinWMS and utilization of CPUs for the Scale Testing for the Experimental Program in 2009 (SETP09) [9] are shown in Fig. 4 and 5 respectively. The number of jobs running at various sites using glideinWMS for Common Computing Readiness Challeneg in 2008 (CCRC08) [10] is shown in Fig. 6.

Fig. 4. Number of jobs running at STEP09 which are submitted via glideinWMS system

[3] The glidein based Workload Management System. Available: http://www.uscms.org/SoftwareComputing/Grid/WMS/glideinWMS/. [4] The Compact Muon Solenoid Experiment. Available: http://cms.web.cern.ch/cms/index.html. [5] ProdAgent Project at CMS. Available: https://twiki.cern.ch/twiki/bin/view/CMS/ProdAgent. [6] CRAB Project at CMS. Available: https://twiki.cern.ch/twiki/bin/view/CMS/WorkBookRunningGrid. [7] C. Dumitrescu, A. Nowack, S. Padhi and S.Sarkar, A Grid Job Monitoring System, CHEP 09, Prague, Czech Republic. [8] D. Bradley, O. Gutsche, K. Hahn, B. Holzman, S. Padhi, H. Pi, D. Spiga, I. Sfiligoi, E. Vaandering, and F. Wurthwein, Use of glidein-ins in CMS for Production and Analysis, CHEP 09, Prague, Czech Republic. [9] Scale Testing for the Experimental Program in 2009. Available: https://twiki.cern.ch/twiki/bin/view/CMS/Step09. [10] Common Computing Readiness Challeneg in 2008. Available: https://twiki.cern.ch/twiki/bin/view/CMS/CMSCCRC08.

Fig. 5. Number of CPUs running during a period of 2 months which are submitted via glideinWMS system. Red histogram shows the CPUs for Monte Carlo Production, blue for data analysis jobs, and black for analysis jobs running at SETP09

Fig. 6. Total number of jobs running at CMS Tier-1 and Tier-2 sites at CCRC08 submitted via glideinWMS system

V. C ONCLUSION GlideinWMS, an implementation of late-binding technology based on condor glidein infrastructure, is demonstrated by CMS for submitting Grid jobs for user analysis and data reprocessing via centralized services. The scalability of glideinWMS is shown sufficient for the CMS VO that utilizes ten of thousands cores and runs hundreds of thousands of jobs daily. The integration of glideinWMS with CMS tools paves the way for the CMS applications to be equipped with this technology and benefit from all the advantages that glideinWMS brings to the Grid computing: late-binding between applications and Grid resources and a simplified unified resource pool that makes the heterogeneous Grid resources transparent to the high-level applications. R EFERENCES [1] Condor Project. Available: http://www.cs.wisc.edu/condor/. [2] Globus Project. Available: http://www.globus.org/.

Suggest Documents