Grid Resource Broker using Application ... - Semantic Scholar

4 downloads 63117 Views 103KB Size Report
selection and job submission for users, the application's interface is a web- ... Resource brokering has been an important area of research for the development of ...
Grid Resource Broker using Application Benchmarking Enis Afgan, Vijay Velusamy, Purushotham V. Bangalore Department of Computer and Information Sciences University of Alabama at Birmingham 1300 University Boulevard Birmingham, AL 35294 – USA {afgane, vvijay, puri}@uab.edu

Abstract. While the Grid is becoming a common word in the context of distributed computing, users are still experiencing long phases of adaptability and increased complexity when using the system. Although users have access to multiple resources, selecting the optimal resource for their application and appropriately launching the job is a tedious process that not only proves difficult for the naïve user, but also leads to ineffective usage of the resources. A generalpurpose resource broker that performs application specific resource selection on behalf of the user through a web interface is required. This paper describes the design and prototyping of such a resource broker that not only selects a matching resource based on user specified criteria but also uses the application performance characteristics on the resources enabling the user to execute applications transparently and efficiently thereby providing true virtualization.

1

Introduction

While Grid computing is rapidly evolving and becoming more widely accepted, traditional scientists may still find the use of middleware technologies cumbersome. While the middleware provides effective means to aggregate and virtualize resources, the discovery and categorization of vast resources in this heterogeneous and dynamic environment presents a problem for the end user due to the complexity of the information involved. A general purpose resource broker that facilitates the user’s resource selection and job submission automatically is required. While Grid Information Services [11] provides an overview of the available grid resources as well as provides information about the current status of the grid, an average user may be overwhelmed with the information to process, or the user may not have enough experience to select the best available resource. Automation of this selection process would simplify and expedite this process. This Resource Broker was developed as part of the framework available for extension as well as modification that performs application specific resource selection. In order to simplify the process of resource selection and job submission for users, the application’s interface is a web-based portlet built as an extension to the Open Grid Computing Environment (OGCE) [15] with a simple resource request format. The application is currently being tested to work with mpiBLAST [3]. The primary focus of this research is to design and prototype a resource broker that will not only select a matching resource based on user specified criteria but also use the application performance characteristics on these different

resources and enable the user to execute the applications transparently thereby providing true virtualization.

2

Related Work

Resource brokering has been an important area of research for the development of Grid computing. Most research that has been done on resource selection of heterogeneous resources has essentially been from the viewpoint of an application. Condor [13] with its matchmaker is another project that is very relevant to the idea of general resource selection. It is based on ClassAd language, which allows users, as well as owners of resources, to specify arbitrary restraints. The matchmaker is used to match user requests to the available and appropriate resources; in case of multiple matching resources, a ranking system is employed. This ranking system is based on userspecified constraints in order to return the best match. Nimrod-G [2] is another well-known resource broker. The main functionality provided by the Nimrod-G is automation of creation and management of large parametric experiments [17]. Besides the plain submission of a request for a resource search, users have an option to specify time and cost constraints which are later used in selecting the resource. If the constraints cannot be met, tradeoffs are explained to the user [2,17]. Another well-established resource broker is the Application-Level Scheduler (AppLeS). It is mainly used for scheduling and deploying of parameter sweep applications where tasks have no or little inter-task communication [6]. The main advantage of this resource broker is fault tolerance where any errors are processed and jobs resubmitted on other resources without the need for user intervention[6]. This research focuses on creating a general and easily applicable system that uses application benchmarking. The inspiration behind this application is to bring the resource selection and job submission in the grid to a practical level with user friendly orientation. Incorporation of the resource broker into the OGCE achieves the goal since the installation of the entire system is simple and part of a single package. Once the package is installed within a virtual organization, it can be accessed without any additional user-end configuration. Other approaches have not been completely integrated into a single yet complete system designed for easy installation and access.

3

The Resource Information Problem

The resource selection process is based on information available about a resource. Unlike the local resource schedulers (e.g., PBS [18], LSF [21], LoadLeveler [10], SGE [8]) that implement fine-tuned scheduling policies based on resource requirements running on the nodes as well as those waiting in the queue, resource selection and application-level scheduling in the Grid has a limited amount of information available. All the information is available from Grid Information Services (GIS or MDS) [11] provided by Globus [4]. This provides a way to discover current system information, such as the static configuration of a computer as well as the current load and status.

Some of the criteria used for resource selection at the application level include: • application-specific requirements • static resource capabilities • dynamic state of the resource This information needs to be collected individually and subsequently combined and processed into one meaningful result. Our resource broker conforms to this model, as is discussed throughout the rest of the paper. One more aspect worth noting is that any form of the application-level resource selection should be done based on the resource consumption in terms of that application. As an example, a heavily loaded resource from the view point of one application might be just the opposite for another application. This creates a wider pool of available resources as well as gives the resource selection process more options during the selection process. Consequently, in the context of limited information available in the Grid environment described earlier, the difficulty of writing a general purpose, yet efficient, scheduling algorithm increases.

4

Architecture

This section describes the integration of the resource broker with the OGCE, why we chose such an approach, as well as discusses resource broker architecture in detail. 4.1

OGCE and Resource Broker

OGCE [15] is a portal developed to provide easy access to Grid technologies (i.e., through the web-based interface). It provides sharable and reusable components for web-based access to scientific and business-oriented applications. Sharable components make it easy to quickly create Grid Portals from provided libraries that support baseline Grid technologies, such as file transfer, job launching and monitoring, and access to information services. This frees the developers to concentrate on the specialized needs of a particular scientific community [15]. Although GridSphere [14] could also be used for the portal, since OGCE is an NMI supported project , we have chosen to use OGCE for this architecture. Since OGCE was, mainly, designed and developed to hide the basic Grid infrastructure, a resource broker was not implemented because it is not part of the capability offered by Grid base services (e.g., MDS). Addition of the Resource Broker (RB) to OGCE adds a new layer of abstraction between the user and the Grid while it uses many base grid services, thus working toward the idea of seamless job submission. OGCE was built using J2EE technology to create multiple portlets representing different grid services. The resource broker adds a new service to the system using the same approach. 4.2

Architecture of the Resource Broker

The architecture of our resource broker itself is shown in Figure 1. The resource broker consists of four main components, each having the basic functionality of providing

standard interfaces to the rest of the application. Each of these components is discussed next. INTERFACE (OGCE) App/ User

Result

Resource Filter

Resource Lookup

Information Services

Request Resource Ranker

Resource MakeMatch

RB Fig. 1 Resource Broker architecture showing main components of the design

4.2.1 Resource Lookup is a module that acts as a front-end to the information services. Information available from Ganglia [7] is used for this purpose. Ganglia is a scalable distributed monitoring system for high-performance computing systems [7], and is, in turn, very similar to MDS [11] provided by Globus [5]. The major difference between Ganglia and MDS is the security aspect - MDS is configurable to use GSI [4], while Ganglia is an unshielded system. Any information retrieved from Ganglia should also be available from MDS and transformation from using one system to the other is easily supported by our resource broker. The Resource Lookup component is responsible for connecting to the information services and extracting the information for use by the RB. It generates a representation of the data about the available resources that is internal to the RB, thus, providing standard interface and format for the rest of the RB to use during the extraction of the data. Use of this interface module as a data reader makes use of different or, even, multiple information service providers. This simply requires a quick adaptation. The Resource Lookup does not do any sort of data organization or processing, other than storing it in the internal structure as it was received by the information source. 4.2.2 Resource Filter presents an effort to create, again, a generic functionality that, among others, implements a resource broker specific compareTo method which is specifically designed to compare the user request to the available resources. It provides a standard interface where the implementation of the selection process can be easily adapted for various formats of user request. In the current implementation of the resource broker, Resource Filter performs the basic filtering function of rejecting any non-matching string variables such as the operating system type, as well as the type of requested architecture. The idea of filtration was further extended to numerical-valued components. Rather than just performing a basic true/false match for each component, a system of weights is employed that gives more insight into how a resource compares to the requested values. The weights represent how a requested value of a component compares to the resource capability in terms of percentage. These values are later used in the ranking of the different resources. Another function performed by the Resource Filter is subdivision of compared results in fully matching resources and non-matching resources. The idea is to distinct but not throw away the non-matching resources since there might be some that are very near the request, or applicable to an application, which is determined from the weighted system in the MakeMatch component.

4.2.3 Resource Ranker is the most important single piece of this application. It is designed to rank multiple matching resources and return the most appropriate resource for the user-submitted application. Typically, this is challenging to design and implement due to the described lack of resource information available, as well as the complexity of the brokering algorithm employed. This component has two general parts: information collection and information processing. In the first of the three stages of the information collection component, a rough resource selection, primarily based on hardware requirements supplied by the user, is performed from a pool of all the available resources. This is followed by collecting application-specific performance information from individual resources. In the final step, this section processes each resource, one component at a time, applying a value function that takes into consideration the user’s rank of the individual component. Following this initial information collection, a cumulative resource rank value is calculated in the information processing section. In an attempt to generate rank values that are not based only on the static values of a resource, a dynamic load function is applied which combines its prediction values with the results from the second step in the information collection part. The essential idea behind the second step in the information collection section is to submit a job to each of the matching resources from step one using a sample problem set and then monitor some performance meters. Currently, this performance measuring process is done by brute force for each individual job submission, but there is work being done on individual application profiling which would be used to help in this step resulting in more accurate and prompt scheduling policies. The benefits of processing resources at the level of individual components allows for application oriented resource selection. Initially, this orientation is userdependant through the ranking values of the components, but as described in the future work section and in combination of the mentioned application profiling, a system will be provided where the selection of the best resource for the given application will be done by the resource broker itself.

4.2.4 Resource MakeMatch is an implementation of a technique that comes in very useful in the case where not a single full match is found, or even when there is just a small number of fully matching resources. There are three preferences the user can customize regarding this procedure. First, the user is given the option of selecting the minimum number of resources that must be fully matched before this system is invoked. The second option is for the user to specify a range of values for which a request is valid, and lastly, the user has the option to rank individual components as they are pertinent to the submitted application. In the case where no fully matching resources are found, this system is invoked by default in order to suggest some possible matches. The way this system works is that each of the non-matching resource’s individual component’s weighted values, as computed by the Resource Filter, are tested against the user-specified rank and/or range of possible values. Once all of the components are processed a custom rank for each resource is generated. This is used by the ResourceRanker in picking the most appropriate resource. This functionality was added for two reasons; one is to make user submissions simpler by increasing the chances of a match, and the second was inferred from [20]. In this paper, the authors point out that resources with weaker capabilities generally have a smaller variance and,

thus, provide a better base for predicting the future load, resulting in more accurate scheduling procedures. This implies that a resource that might not appear to be adequate for the job may, in the end, produce a useful result, and the resource broker is used to suggest some of those resources. Division of the work done by the resource broker in the manner just explained creates a small framework that can be used as a base for plugging in other components or replacing the current ones while maintaining the main functionality. An obvious possibility is the expansion of this resource broker into a scheduler once the advance scheduling becomes more developed [9]. This design is intended to provide a set of API to facilitate this extension. 4.3

User Request

A request includes the following two elements, apart from the instructions on how to run the application: • Resource description: user requirements of the required resource, such as CPU speed, available memory, type of operating system • Individual component ranks: for each component in the Resource description, a weight signifying the importance for that component’s full match Figure 2 shows a sample request to describe the options available for the user: /1/ /2/ /3/

cpuCount = 4 cpuSpeed = 2394 cpuType = Pentium IV …

/4/ cpuCountRank = 10 /5/ cpuSpeedRank = 6 /6/ cpuTypeRank = 8 …

Fig. 2 Partial showing of a sample user request

As the work in Global Grid Forum continues to approach the standard of the job description language [12], the simple request structure will allow us to use it in specifying job requirements more completely. 4.4

Brokering Algorithm

The brokering algorithm, or information processing section, is the second part of the ResourceRanker component. It is subdivided into two parts: resource rank value calculation and application of the load function. Figure 3 describes the algorithm. In order to calculate the initial resource rank value, each matching resource is processed on a per component basis taking into consideration user’s rank values for all the components. The resulting value relates each of the resources individually as well as to the initial request. A separate rank value, for each resource again, is calculated based on the performance measure of individual resource. The load function, supplied by the two initial resource rank values, considers current load and does prediction of future load based on load variance over the past 15 minutes, as provided by the information services. Both of the load functions are using fuzzy logic [19] with Fuzzy Engine for Java [16] as the fuzzy logic engine.

For each resource, a. For each component, i. Calculate rank value taking into consideration user rank specification b. For each resource, i. Calculate rank value based on application-performance measure c. Apply load function to adjust resource rank value i. Consider load variance and use it for future load prediction ii. Consider current load value Fig. 3 Brokering Algorithm for the generic resource selection

The load function for considering the change in load variance employs six different membership functions, ranging from high positive change in the load to the high negative change. The fuzzy engine uses the trapezoidal membership function to determine the degree to which the load change belongs in a group. The parameters for each of the membership functions are statically assigned for now, but hope to soon turn this system into a self-learning one where these parameters can be automatically adjusted as more and more jobs are submitted using this resource broker. The current load value function uses a simpler membership function set, as well as fewer rules. Since it is using only one input variable (e.g., load value), it has three membership functions and three rules that control the outcome. The membership functions range from low to high again with statically typed parameters. Fuzzy logic is used in this part of the application, primarily since it allows for an easy and, automatic, way to dynamically assign the same load value to different membership groups. Depending on different user applications, as well as when applied in relation to systems connected to the grid that do not belong in the computationally intensive category, different parameters for resource selection should be used. Using fuzzy logic is an elegant way to allow for these changes. As this system evolves, we foresee a user option to select the type of application and/or resource they have or require in order to use this system in the best possible way.

5

Application Deployment Case Study

Instead of measuring the time saved or time spent by the resource broker, we present a use case outlining the major steps. The application we use in our simulation and test case is mpiBLAST [3]. It is a tool used for sequence analysis and interpretation in genomic sequencing. This being a popular application among bioinformatics researchers, it is an important and excellent gateway for the resource broker to bring an existing cluster application to a Grid application. For our test case, we are assuming the application is installed on all of the remote resources and all the necessary files needed to run the application are available in a user accessible directory. Table 1 compares the steps for job submission via the resource broker versus using command line tools. Italicized steps are automatically done by the resource broker.

Table 1. Resource Broker vs. Manual Grid Job Submission

Resource Broker

Manual Job Submission

0. The user has to submit their credentials to MyProxy server. This is server administration dependant, but generally, the credential should be renewed once a month. 1. The user submits the request supplying desired resource information along with the parameterized command on how to run application (e.g. mpiBLAST) 2. The Resource Broker queries GIS. 3. The Resource Broker selects the best available resource. 4. The Resource Broker acquires user’s credentials. 5. The Resource Broker submits the job to a resource using user’s credentials. 6. The Resource Broker transfers any output files. 7. The user can monitor job progress from the webpage. 8. The user is notified when the output files have been transferred back to the user’s local machine.

1. The user must have a valid user certificate on the local machine and request a user proxy from GSI. 2. The user must query information services, (GIS/MDS) which send back an XML formatted document listing all the information about all the known resources. 3. The user must process the returned information and select the most appropriate resource for their application. 4. The user must submit the job to the selected resource. This involves using authentication, command line job submission (i.e. create RSL command) and making sure the application is submitted correctly. 5. The user can monitor the application progress using command line tools. 6. Upon the end of the run for the application, the user should retrieve any output files back to the local machine using GridFTP [1]

6

Future Work

The main focus of future work is to integrate this infrastructure with the university grid and facilitating its users. User and application profiles will be stored in a database for easy retrieval. The most interesting part of the application information retrieval involves application profiling which would gather run time information about specific applications on given resources. This information would enable the brokering algorithm to be enhanced and to make use of the precise past information when performing the selection. This specifically refers to the parameters used in the fuzzy logic components of the load function. These parameters currently have single, statically typed values which will be modified in stepwise fashion in the following two ways. The first option is to have different initialization procedures, depending on the application submitted, so that the parameters reflect application-oriented scheduling. The selection of different parameter options will be left to the user, based on their knowledge of the application. Another option regarding the modification of the parameter values refers to the adjustment of the values within an application group. Through the use of application profiling an automatic learning method will be implemented which, after the initial learning stage, would automatically monitor jobs for each resource selection, and adjust the parameter values. In the future, a set of API will be made available for extension. We hope to extend this application into a full grid access system that is easy to use and yet powerful in any

given environment, e.g., a co-scheduler which could be used to schedule jobs in the long run among the selected resources. As work on the Job Description Submission Language (JSDL) evolves to a usable standard [12] the user request format would also evolve from one specifying a resource in terms of its components to one describing the application itself.

7

Summary

This paper introduces a resource broker that bridges the gap between a user finding of a resource, and job submission. Athough manual operations could achieve the same goal, it would be time-consuming and complex. This resource broker was developed in order to make the transition to using Grid technologies simple and efficient. This is a general-purpose resource broker with a simple and understandable interface providing appropriate resource selection capabilities for different types of applications. It is an attempt at developing a small framework where custom or tested components can be added as well as replace current ones. The current scheduling algorithm works based on information retrieved from MDS as well as performance measure of submitted application. It uses fuzzy logic during the resource profiling which would easily adapt to multiple user requirements based on different types of resources and jobs. We have tested our application with the mpiBLAST to validate our architecture design as well as the brokering algorithm. Preliminary results are promising, especially with respect to usability. To the best of the authors’ knowledge this study is the first investigation of incorporating application profiling into the resource scheduler, in order to simplify the selection, usage and enable efficient utilization of resources. Future plans include testing and adopting more applications, incorporating a metascheduler, and including application profiling with feedback.

Acknowledgments The authors would like to acknowledge their colleagues from High Performance Computing Laboratory, specifically Zhijie Guan for providing support as well as insight into some of the problems faced during this work. This work was in part supported by The Department of Computer and Information Sciences at the University of Alabama at Birmingham.

References 1

2

Allcock, W., Bester, J.Bresnahan, J., Chervenak, A., Liming, L., and Tuecke, S., Draft GridFTP Protocol, 2001, [Last accessed, Available from http://wwwfp.mcs.anl.gov/dsl/GridFTP-Protocol-RFC-Draft.pdf. Buyya, R., D. Abramson, and J. Giddy, "Nimrod-G: An Architecture for a Resource Management and Scheduling in a Global Computational Grid", In Proceedings of 4th

3

4

5

6

7 8

9 10 11 12

13

14

15 16 17

18 19

20

21

International Conference and Exhibition on High Performance Computing in AsiaPacific Region (HPC ASIA 2000), at Beijing, China, May 14-17, 2000. Darling, Aaron E., Lucas Carey, and Wu-chun Feng, "The design, implementation and evaluation of mpiBLAST", In Proceedings of ClusterWorld Conference & Expo in conjunction with the 4th International Conference on Linux Clusters: The HPC Revolution 2003, at San Jose, CA, 2003. Foster, I., C. Kesselman, G. Tsudik, and S. Tuecke, "A Security Architecture for Computational Grids", In Proceedings of ACM Conference on Computer and Communications Security, ACM Press, at San Francisco, CA, 1998, pp. 83-92. Foster, Ian and Carl Kesselman, The Globus toolkit, In The Grid: Blueprint for a New Computing Infrastructure, Chapter 11, Edited by Foster, Ian and Carl Kesselman, pp. 259--78, San Francisco, California, 1999. Fran, B., W. Rich, F. Silvia, S. Jennifer, and S.Gary, "Application-Level Scheduling on Distributed Heterogeneous Networks", In Proceedings of Supercomputing '96, ACM Press, at Pittsburgh, PA, 1996, p. 28. Ganglia, 6/1/2004, 2004, [Last accessed 6/15, 2004], Available from http://ganglia.sourceforge.net/. Gentzsch, Wolfgang, "Sun Grid Engine: Towards Creating a Compute Power Grid", In Proceedings of Proceedings of the 1st International Symposium on Cluster Computing and the Grid (CCGRID '01), IEEE Computer Society, 2001, pp. 35-6. Grid Scheduling Architecture Research Group, 2004, [Last accessed 6/15, 2004], Available from http://forge.gridforum.org/projects/gsa-rg. "IBM LoadLeveler: User's Guide", International Business Machines (IBM), September, 1993. Information Services/MDS, 6/14, 2004, [Last accessed 6/15, 2004], Available from http://www.globus.org/mds. Job Submission Description Language Working Group (JSDL-WG), 3/29, 2004, [Last accessed 6/15, 2004], Available from http://www.epcc.ed.ac.uk/%7Eali/WORK/GGF/JSDL-WG/. Litzkow, M., M. Livny, and M. Mutka, "Condor - A Hunter of Idle Workstations", In Proceedings of 8th International Conference of Distributed Computing Systems, June 1988, pp. 104-11. Novotny, J., M. Russell, and O. Wehrens, "GridSphere: A Portal Framework for Building Collaborations", In Proceedings of 1st International Middleware Conference, at Rio de Janeiro, Brazil, June 16-20, 2003. OGCE - Open Grid Computing Environments Collaboratory, 1/22, 2004, [Last accessed 6/15, 2004], Available from http://www.ogce.org/index.php. Sazonov, E. S., Open source fuzzy inference engine for Java, [Last accessed 6/15, 2004], Available from http://www.clarkson.edu/~esazonov/FuzzyEngine.htm. Steen, Martin van, Nimrod-G Resource Broker for Service-Oriented Grid Computing, 2004, [Last accessed 6/15, 2004], Available from http://dsonline.computer.org/0107/departments/res0107_print.htm. Systems, Veridian, OpenPBS v2.3: The Portable Batch System Software, 2004. The MathWorks, Inc, What Is Fuzzy Logic? , 2004, [Last accessed 6/15, 2004], Available from http://www.mathworks.nl/access/helpdesk/help/toolbox/fuzzy/index.html. Yang, L., J. M. Schopf, and I. Foster, "Conservative Scheduling: Using Predicted Variance to Improve Scheduling Decisions in Dynamic Environments", In Proceedings of Super Computing 2003, ACM Press, at Phoenix, AZ, 2003. Zhou, Songnian, "LSF: Load Sharing in Large-scale Heterogeneous Distributed Systems", In Proceedings of Workshop on Cluster Computing, 1992.