Document not found! Please try again

An Innovative Application Execution Toolkit for Multicluster Grids

2 downloads 37860 Views 360KB Size Report
is submitted, a user is able to monitor execution progress of the whole application ... An application researcher accesses the services provided by the Pelecanus ...
1

An Innovative Application Execution Toolkit for Multicluster Grids Zhifeng Yun1,2 , Zhou Lei1 , Gabrielle Allen1,3 , Daniel S. Katz1,2 , Tevfik Kosar1,3 , Shantenu Jha1,3 , Jagannathan Ramanujam1,2 1 Center for Computation & Technology, Louisiana State University 2 Department of Electrical & Computer Engineering, Louisiana State University 3 Department of Computer Science, Louisiana State University

Workspace Workspace Workspace Refresher Refresher Refreshe (IV(

Abstract—Multicluster grids provide one promising solution to satisfying growing computation demands of compute-intensive applications by collaborating various networked clusters. However, it is challenging to seamlessly integrate all participating clusters in different domains into a virtual computation platform. In order to take full advantages of multicluster grids capability, computer scientists need to deal with how to collaborate practically and efficiently participating autonomic systems to execute Grid-enabled applications. We make efforts on grid resource management and implement a toolkit called Pelecanus to improve the overall performance of application execution in multicluster grids environment. The Pelecanus takes advantages of the DA-TC (Dynamic Assignment with Task Containers) execution model to improve resource interoperability and enhance application execution and monitoring. Experiments show that it can significantly reduce turnaround time and increase resource utilization for certain applications with large number of sequential jobs.

Super-Scheduler

Internet

-+&.%/0(1$&2#3/'"#(&%44+45 -+&.%/0(1$&2#3/'" -+&.%/0(1 /(367#+'/0('0%#."&8 /(367#+'/0(' /(

!"#$%&'"()&$*+,"& !"#$%&'"()&$*+,"& !"#$%&'"()&$*+,"&

Local Scheduler High Speed Network

A virtual A virtual A workspace virtual workspace workspace is deployed is deployed is deployed into a into resourc a into res Waiting Queues provider provider provider usingusing theusing workspace the workspace the workspace service. service. The service Th wo are VMs are VMs are running VMs running on running the on resource the on resource the resource provider's provide pro n Local Local Scheduler Scheduler (which (which must (which must be must VM-enabled) be VM-enabled) be VM-enabled) High Speed Network

Index Terms—Multicluster grids, resource management, execution models

I. I NTRODUCTION Cluster-based computing has been dominantly adopted by various compute-intensive and/or data-intensive applications, and myriad investments on cluster systems have been provided by institutions and governments worldwide. However, it is well-known that the processing capability of any existing single cluster is still far from adequate for growing compute demands of large-scale applications. Therefore, approaches have been pursued to share workloads of individual applications across multicluster grids so as to achieve substantial computational capability improvement and efficient resource utilization. The structure of a multicluster environment is shown in Figure 1. There is a super-scheduler over multiple clusters, which assigns application tasks onto participating clusters for execution. The assigned tasks are viewed as normal jobs by local scheduling systems on participating clusters. A local scheduling system arranges actual job executions under its own scheduling policies. However, it is challenging to efficiently manage application execution across a Grid, due to the nature of participating clusters and network connection [3]. Participating clusters are geographically distributed, heterogeneous, and self-administrative. The network connection provided by the Internet is vulnerable in security and unpredictable in performance. The completion of a submitted application depends on the accomplishment of all the application tasks assigned

!"#$%&'() !"#$%&'() !"#$%&'() *)#+,()*)#+,()*)#+,()

High 9 Speed 9 Network 9

Compute Nodes

Fig. 1.

The multicluster structure.

on different clusters. The slowest cluster is the bottleneck of application execution. Application execution also has to take the risk of any system failure in the participating clusters. Some approaches have been proposed to solve these problems. For example, the Network Weather Service [10] provides methods for predicting the performance of computational grid resources to support adaptive application scheduling on super-scheduler level. Condor [8] Glidein technology provides a mechanism to temporarily add remote resources to local Condor pool to make full use of available resource. SAGA [7] API provides a mechanism for application to dynamically use available resources based on the Bigjob abstraction. However, all these researches emphasize on high level solutions over Grid computing environment and ignore the efficiency after application tasks have been dispatched across a Grid. Plenty of time is wasted to wait for local resource scheduling. Besides, they heavily depend on the software installation, and leave the execution bottleneck caused by the slowest participating resource untouched. We developed a new toolkit named Pelecanus to efficiently manage multicluster grids resources for application execution enhancement. This toolkit adopts virtualization technology and DA-TC (Dynamic Assignment with Task Containers) model to

2

improve resource interoperability, execution management, and application monitoring. Experiments show that the turnaround time can be decreased significantly with respect to the iterations of the execution, compared with other existing execution models. This paper provides a brief description on the Pelecanus. The remainder of the paper is organized as follows. Section 2 illustrates the Pelecanus in terms of design objectives and architecture. Section 3 introduces the implementation issues, gives the details of execution management and application monitoring. Section 3 also presents the result by using Pelecanus to support large-scale reservoir modeling with EnKF automatic history matching. Section 4 gives the concluding remarks. II. P ELECANUS OVERVIEW The design goal of Pelecanus is to reduce execution time, increase resource utilization, and improve application monitoring in multicluster grids environment. There are four major contributions as follows to the design and implementation of Pelecanus: 1. It can significantly decrease execution turnaround time. Normally, an application cannot be completed until all of its application tasks assigned to multiple clusters are finished. The last completed tasks decide the turnaround time of an application, no matter how fast the other tasks are executed. The Pelecanus could employ dynamic load balancing for application execution, so that the slow clusters will be a beneficial factor, instead of bottleneck. 2. It can increase the reliability of application execution in multicluster grids. Any participating cluster failure will not affect the execution of this application. 3. It provides flexible and user-friendly interfaces for application scientists to perform their jobs and monitor their application execution. 4. It is executed in the user space. Therefore there is no specific requirement for system configuration and software installations on participating clusters. The architecture of the Pelecanus is demonstrated in Figure 2. It is built on the top of local cluster schedulers, playing key roles in resource management, workflow control, user interface, and high availability. The Grid Execution Management Service (GEMS) engine provides the core services, including cluster interoperability, workflow management, and execution implementation, based on the DA-TC [4] execution model. There are three interfaces: standalone Graphic User Interface (GUI), Web Interface (i.e., grid portal), and High Availability (HA) Interface. This architecture employs web server and HA server for web user access and reliability improvement, respectively. The functionality of the Pelecanus has five major perspectives: 1. Security service. A single entry point is provided for a user to access clusters and grid services, based on various existing technologies on credential and access management. 2. Resource monitoring and discovery. Grid monitoring and discovery technologies, such as Globus [2] trigger service,

Web User

Web User

Web User

Web Interface

Web Server

HA Server

HA Interface

Graphic User Interface

GEMS Engine - Cluster Interoperability - Pattern-based Workflow - Pattern Execution Solution

GUI User

GUI User

Local Cluster Schedulers Processing Nodes

Fig. 2.

Pelecanus architecture.

Ganglia [5] cluster toolkit, and MonALISA [6], are evaluated and integrated into the Pelecanus. 3. Application specification. A user specifies application executable, resource requirement, dataset location, and which execution pattern will be adopted by an application. There is an application workflow template for each execution pattern. 4. Application monitoring and steering. Once an application is submitted, a user is able to monitor execution progress of the whole application and any particular component on a cluster. Meanwhile, the user is allowed to steer the workflow to adapt runtime status of resources and execution progress. 5. Data management. Data manipulation is provided to support application deployment on remote clusters and massive input dataset operation. An application researcher accesses the services provided by the Pelecanus via standalone GUI or web interface (i.e., grid portal). The first thing to run an application is to obtain authorization and authentication of available resources. Then, the user needs to specify which execution pattern the application will adopt. The Pelecanus provides corresponding application specification interface for each execution pattern, including executable location, data source, and workflow description. The GEMS engine, the core of the Pelecanus, intelligently manages application components and submits them onto proper clusters. Meanwhile, the user transparently monitors execution progress. A user will be allowed to steer application execution by re-organizing application workflow under dynamic load balance strategies adopted by GEMS engine. III. I MPLEMENTATION I SSUES A. Execution Model The GEMS engine is based on Dynamic Assignment with Task Container concept. It is designed to improve application execution in a multicluster grids environment. In the DA-TC

3

AEA

Local Scheduling Queue Queue Head Other Job: R Other Job: R

Activities for task container to execute a task

Task Container: R

- Status update - Info. retrieval from metadata - Stage in - Task invocation - Stage out - ...

Task Container: R

... task task

Task Container: R Other Job: Q Other Job: Q

Queue End

Other Users' Jobs/Tasks or DA-TC Task Containers

Fig. 3. The interaction diagram between AEA and TC. “R” denotes running and “Q” queuing. “Other” delegates the jobs submitted by other users.

valid TC status can not be provided to AEA due to network disconnection, system maintenance, or system failure. Task completion status is monitored by AEA at run time. If a task execution error happens, AEA can intelligently make decisions, e.g., resubmission on the same or different TC, to try to fix the problem.

  



 

circumstance, an application consists of a number of tasks. It is also assumed that there is no inter-task communication. There are two components adopted to execute an application: application execution agent (AEA) and task containers (TC). AEA is the local gateway to the Grid. It is in charge of deploying and submitting task containers to the participating clusters; monitoring container status; dynamically orchestrating workflow and assigning application tasks; steering application, task, and container executions; etc. A TC is submitted to a participating cluster as a normal job, waiting for scheduling by local resource management system. Once a TC obtains resource allocation, it is used to execute the tasks assigned dynamically by AEA. TCs take responsibilities of holding resources until application execution accomplished, task stage in/out, invocation, status monitoring, communication with AEA, etc. An application execution typically employs multiple TCs, depending on user configuration. By tracking the status of clusters, we keep updating a list of available clusters in the pool. Through cluster abstraction [9], we are able to get the static and dynamic information, such as available nodes, number of running jobs, number of queueing jobs, etc. The number of task containers submitted to each cluster is based on its relative speed, length of waiting queues, and its number of CPUs. Figure 3 shows the interaction diagram between AEA and TC. To carry out an application execution, the first thing for AEA to do is to submit TCs to participating clusters. The submitted TCs are placed as normal jobs at the end of the scheduling queues on participating clusters, waiting for resource allocation by local resource management systems. One participating cluster may host multiple task containers, according to different load balancing strategies adopted by AEA. After a TC obtains the required computing resources from a local scheduling system, it communicates with AEA for task assignment. First, the TC sends AEA a message to claim that it is ready to run a task. Second, AEA updates TC status table and then a task (or more) is selected, based on application workflow management strategies. Third, task stage in, execution, and stage out are performed, and the status tables associated with tasks and TCs on AEA are updated. After a task is completed successfully, AEA and TC are ready for the execution of next task. This dynamic task assignment strategy and the task container technology can essentially improve QoS of application execution in a multicluster grids environment. Queueing time is significantly reduced which results in reduced turnaround time. A TC is used to apply and hold resources for task execution, which provides quick execution for tasks that are dynamically assigned by AEA. By this dynamic load balancing method, the fast clusters will be assigned more tasks. The execution bottleneck caused by the slow clusters is eliminated. Any participating cluster, no matter how slow it is, becomes a beneficial factor, not a bottleneck, to speed up application execution. The overall waiting time of tasks is greatly shortened and resource utilization is enhanced. The reliability of application execution can also be upgraded. A task will not be assigned on a participating cluster if a











 





 







Fig. 4. Experimental results to compare turnaround time of automatic

history matching under two different submission strategies: DA-TC and traditional way.

We test this strategy on our multicluster grids testbed consists of five Linux clusters with local scheduling system PBS, connected by the Internet. Each cluster can be accessed by Globus GRAM or SSH.

4

Turnaround time of application execution is investigated as performance metric. Application scenario involved in this experiment is large-scale ensemble subsurface modeling, in which that multiple task farming steps are required and each step needs the results from previous steps. Two execution ways are compared. The one is the traditional way, by which tasks are directly assigned to the participating clusters. The other execution way is based on the DA-TC execution model. We perform three inverse modeling processes with 5, 10, and 15 iterations, respectively. Each iteration has 100 tasks. The execution time range of each task is from 1 minute to 30 minutes, which is generated by random. If the tasks of one iteration are submitted on a single CPU, the total CPU times would be 25 hours. Figure 4 demonstrates the experimental results to compare the turnaround time under two different submission strategies: DA-TC and traditional way. We observe that the turnaround time increases much faster under traditional way than under DA-TC when the number of iterations increases. The major reason is that task containers in DA-TC never release the resources until the whole automatic history matching process accomplished, but for traditional way, tasks in each iteration are submitted to the end of the local scheduling queues. In the DA-TC, all runs in a container has only one queue wait and dynamic load balancing also speeds up the execution. B. Application monitoring Large distributed systems require a large amount of monitoring data for a variety of tasks such as fault detection, performance analysis, performance tuning, performance prediction, and scheduling. The ability to monitor and manage distributed computing components is critical for high quality of services [1]. Users want to get more involved in grid environment and fully control their applications, such as checking the progress, obtaining intermediate results, terminating the job based on the intermediate results, steering the job to other nodes to achieve better performance, and checking the resources consumed by the job. All these requirements make it essential to build up more robust application monitoring systems to allow users to fully control and monitor their applications. Figure 5 gives the application monitoring approach in the Pelecanus. In each cluster (rectangular ones), a monitoring interface is used to monitor the information related to the components. When applications are submitted to the clusters, the interface can interact with the application and gather the information about both the system and the status of the application. A global monitoring service collects all the information of these clusters and provide interface for the users. A user can get all the intermediate results and progress of the applications, and switch the applications to achieve better performance or calculate and evaluate executions. By this approach, this application monitoring system can provide services to 1) capture the status of applications (running, waiting, failed); 2) obtain debug / performance related information; 3) detect events and program actions when the applications being executed; 4) enable manipulation on the target application; 5) provide real-time access for the users.

Client

Gather All Monitoring Information

Component

Web Interface

AEA

Component Component Monitoring Interface

Fig. 5.

Component

The application monitoring approach in Pelecanus.

IV. C ONCLUSIONS In this paper, we present a novel multicluster grids management toolkit called Pelecanus and describe in detail its development components and implementation issues. We have experienced practical applications using our toolkit, including reservoir uncertainty analysis, subsurface inverse modeling, sawing optimization, and water mediated attraction simulation. Experiments show that by using Pelecanus, the application execution time can be significantly reduced. The ultimate goal of our research is to provide user-friendly support for large-scale practical applications. We plan to improve application virtualization and optimize resource interoperability to support more compute-intensive applications, such as coastal studies, petroleum engineering, and bioinformatics. The DA-TC model and the Pelecanus toolkit will be adopted by these applications to provide massive computing power and meet various application specific requirements. R EFERENCES [1] A. Ali, A. Anjum, et al. “Job Monitoring in an Interactive Grid Analysis Environment”. Proceedings of Computing for High Energy Physics, Interlaken, Switzerland, September 2004. [2] I. Foster, C. Kesselman. “Globus: A toolkit-based Grid architecture”. The Grid: Blueprint for a New Computing Infrastructure, Morgan Kaufmann Publishers, San Francisco, CA, pp. 259-278, 1999. [3] I. Foster, C. Kesselman, S. Tuecke. “The anatomy of the Grid: enabling scalable virtual organizations”. International Journal of High Performance Computing Applications, 15(3):200-222, 2001. [4] Z. Lei, Z. Yun, et al. “Improving Application Execution in Multicluster Grids”. Proceedings of IEEE 11th International Conference on Computational Science and Engineering (CSE08), San Paulo. July 16-18, 2008. [5] M. L. Massie, B.N. Chun, D.E. Cueller. “The Ganglia Distributed Monitoring System: Design, Implementation, and Experience”. Parallel Computing, Volume 30, Issue 7, July, pp817-840, 2004. [6] H. B. Newman, I.C. Legrand, et al. “MonALISA: A Distributed Monitoring Service Architecture”. Proceedings of CHEP 2003, La Jolla, Ca, USA, March 2003. [7] SAGA. http://saga.cct.lsu.edu. [8] D. Thain, T. Tannenbaum, M. Livny. “Condor and the Grid”, Grid Computing: Making the Global Infrastructure a Reality. John Wiley, 2003. ISBN: 0-470-85319-0. [9] M. Xie, Z. Yun, et al. “Cluster Abstraction: Towards Uniform Resource Description and Access in Multicluster Grid”, Second International MultiSymposiums on Computer and Computational Sciences (IMSCCS 2007), Iowa city, Iowa, 2007. [10] R. Wolski. “Experiences with Predicting Resource Performance Online in Computational Grid Settings”. ACM SIGMETRICS Performance Evaluation Review, Volume 30, Number 4, pp 41–49, March, 2003.

Suggest Documents