service to information and monitoring systems is proposed according to the. 1 ... about itself and other entities tied to the resource via a GRIS server running on it.
INFORMATION SYSTEM ARCHITECTURE FOR BROKERING IN LARGE SCALE GRIDS Zolt´ an Balaton, G´ abor Gomb´as, Zsolt N´emeth MTA SZTAKI Computer and Automation Research Institute P.O. Box 63., H-1518 Hungary
{balaton, gombasg, zsnemeth}@sztaki.hu
Abstract
Information systems are inevitable parts of grid architectures. Existing systems however have weaknesses in supporting every requirement of a resource broker. In this paper a new grid information system architecture is presented that aims to overcome the limits of other systems in large scale grid applications. This architecture offers more efficient query processing, greater flexibility, better scalability and fault tolerance. It also has the advantage that it is built on already existing and proven technologies.
Keywords:
information system, brokering, large scale grids
1.
Introduction
Grids [2] are based on large scale resource sharing. The notion of a grid system assumes a virtual pool of resources. Apart from computational resources grids are expected to operate on a wider range of resources like storage, network, data, software [3], graphical and audio input/output devices, manipulators, sensors, and so on [4]. The virtual pool of resources is dynamic and diverse. Resources can be added and withdrawn at any time, their performance or load can change frequently. The user has very little or no a priori knowledge about the actual type, state and features of the resources constituting the pool. An anticipated number of resources in the pool is in the magnitude of 1000 and above. There are various types of information in the grid that must be acquired, gathered, processed and presented for numerous purposes. Due to the large diversity of information a single, uniform grid information service may not be sufficient for all the possible requirements. A division of the information service to information and monitoring systems is proposed according to the
1
2
Balaton, Gomb´ as and N´emeth
following points. The information system is used as a database that stores and retrieves data, i.e. publishes information. It can answer complex queries involving different types of data with complex structures. On the other hand it cannot provide highly dynamic information due to the large scale of the grid. Typical data in the information system is related to brokering decisions: specifics of a resource, availability of a service, location of a resource, etc. Potentially a large community can use the same data provided by an information system. On the other hand monitoring is an activity that creates data representing an observed property that can be continuous or discrete in time. Typical information in this category is status information such as occurrence of a certain event, current load of a resource, state transition, etc. Monitoring is always related to certain entities and it usually yields a time series of data with simple types. Monitoring information can be highly dynamic. For the same reason, i.e. monitoring is related to certain resources or processes, the scope of potential recipients of the data is small and known in advance. It can be concluded that because of the different requirements arising for these two categories of information it is advantageous to handle them in two specialised systems. Nevertheless, the information and monitoring systems are complementary and offer two ways of information processing tailored to the special properties for better efficiency. In the grid entities can be categorised as information consumers and information producers. Consumers (e.g. a user or a resource broker) want to get relevant and precise information in a reasonably bounded time, without much effort explicitly searching for it. On the other side, producers (e.g. an entity on behalf of a resource) want to publish information without knowledge about the actual number, type and location of potential consumers. They are responsible for providing actual and valid data about the resource. The role of an information system is making a proper relationship between producers and consumers. Consumers formulate questions and expect the information system to convey the appropriate information published by the producers. Producers rely on the information system, so that the information they published reaches all the trusted consumers and only the trusted consumers. Other requirements to the information system are efficiency, scalability to a large number of producers, consumers and high volume of data as well as reliability which includes fault tolerance and security. Although the information service is distributed, it should provide a single coherent view of the system and allow multiple simultaneous access to the information at many interface points. Total coherency can only be solved in such a distributed information system with high costs, therefore some degree of intentional incoherency is practically acceptable. The main problems current grid information systems are facing are scalability, efficient queries and precise data. It is hard to find an all-in-one solution
Information System Architecture for Brokering in Large Scale Grids
3
that handles every kind of information in one architecture. According to the principles stated before if the information is divided based on functionality it is easier to find an efficient solution for every category separately. In this paper we focus on one major problem: providing resource brokers the information they need to allocate resources to jobs. In the next section we analyse the weaknesses of a representative current information system which become crucial if the number of resources, users and jobs starts to grow. In Section 3 a solution is introduced that can scale up to tens of thousands of resources. We conclude with the advantages of our proposed architecture.
2.
The Globus Information System
One of the current representative information systems for the grid is the Globus Metacomputing Directory Service (MDS). Currently implemented in its second version, called MDS-2 [1] which is a redesign of the first version. MDS-2 still uses the LDAPv3 protocol [6] as its predecessor but its architecture is different. It consists of two basic elements: information providers implemented by the Grid Resource Information Service (GRIS) servers and aggregate directory services implemented by the Grid Index Information Service (GIIS) servers. MDS-2 aims to handle all grid information in a uniform system. The information in MDS-2 is tied to resources. Each resource provides information about itself and other entities tied to the resource via a GRIS server running on it. If a user is interested in information about a specific resource she can directly query the GRIS running there. It is a highly distributed solution which is scalable for a large number of users and resources. However it is not possible to search for specific resources if there were only GRISes. This problem is solved by GIIS servers, which collect, cache and manage information from other GRISes providing a central repository to facilitate searches. Each so called virtual organisation can have one or more GIISes that provide a specialised searchable index of the available resources. The GIIS servers do not solve all problems however. For small grids a central GIIS may work well but this is not scalable beyond about 1000 resources. Although LDAP allows splitting of the Directory Information Tree (DIT) into smaller parts which can be stored on different servers, it does not solve the problem, either. Since frequently changing data becomes stale quickly the tree of the servers forming a cache chain cannot be tall. This limits the achievable scalability and hinders the efficiency of the information system. If the DIT is distributed between several servers searches may be very inefficient. It is because the DIT is distributed along the hierarchy defined by the names (DNs) of the entries and in the query one can only specify which servers
4
Balaton, Gomb´ as and N´emeth
to search in terms of this hierarchy (with the base and scope in the query). The LDAP query language however permits searches based on any properties of the entries. Thus one can easily specify a query that does not match the hierarchy predefined by the DNs of the entries, which can result in querying every server storing the distributed LDAP database. Thus a distributed LDAP database can only answer certain searches efficiently that match the predefined hierarchy. In MDS-2 information is local to the resources and not to the users who query it. Typical questions to an information system can involve joins of properties that are not contained within one entry. Since the GIISes only cache entries that have been retrieved recently, if one of the properties needed to evaluate the query is not cached or stale it has to be pulled from the corresponding GRIS. As a consequence of the pull model used in MDS-2 the amount of data moved is proportional to the number of queries. (The pull model in turn is a consequence of LDAP being optimised for searching and reading data and not for frequent updates.) Even if clients cache information the amount of moved data is still proportional to the number of users. This also limits the scalability of the system and leads to high resource usage.
3.
A proposed new architecture
Figure 1 shows the basic components of the proposed system. Parts drawn with solid lines make up the information system described hereinafter, parts drawn with dashed lines show how it is integrated with the rest of the grid. Resource Classifier Physical Rsrc. Resource Allocator
Figure 1.
Advertisment Distribution System Resource Broker
Resource Selector
Basic components of the system
Resource. Although resources are not strictly part of the architecture, the data the information system handles describe these entities. Resource Classifier. The job of the Resource Classifier is to periodically submit an advertisement about the parameters of the resource to the Advertisement Distribution System (ADS). In the consumer-producer model, the Resource Classifier acts as the producer. A Resource Classifier may handle the advertisements of one or more physical resources if they are under the same administrative control. The Resource Classifier can use the monitoring system or the Globus GRIS to gather the information it needs.
Information System Architecture for Brokering in Large Scale Grids
5
Advertisement Distribution System. The ADS is responsible for taking the advertisements from the Resource Classifiers and distributing them to Resource Selectors. It has to be a “store and forward” system which stores the most recently posted advertisements and forwards them to the Resource Selectors. It must be distributed because the basic structure of the grid is distributed, too and it should exhibit good scalability. The ADS should minimise the network traffic required to distribute advertisements to Resource Selectors. The ADS has to be robust and fault tolerant also. Local problems (network errors, machines being down etc.) should affect the rest of the information system as little as possible.
Advertisement
Job description Query Evaluator
Filter Maintenance Figure 2.
Database Resource List Structure of the Resource Selector
Resource Selector. The Resource Selector (depicted on Figure 2) is the consumer of the advertisements distributed by the ADS. It consists of four major components: The Filter is used to do a preliminary selection of the incoming advertisements for increasing efficiency. First, the filter discards every advertisement that does not follow the required syntax or does not have a proper digital signature. Next, the filter discards advertisements based on local configuration that are known to never match the broker’s needs. All of the filtered information are put into a local Database to provide an efficient way to perform complex search operations. The database is not directly visible from outside of the resource selector so it can be any database that fits the possible search operations the best. Although other database systems can be used as well, the most likely candidate is an SQL-based RDBMS. The Maintenance subsystem handles database maintenance tasks such as removing expired advertisements thus reducing the information held in the database and increasing its efficiency. The Query Evaluator (QE) is the interface to the Resource Broker. It receives a job description from the Resource Broker and extracts the list of resources capable to run the job from the database. The list of resources is sorted based on some heuristics. The heuristics can depend on several factors including the performance of the resource and the probability that the resource has enough free capacities to actually run the job. The list of resources is then
6
Balaton, Gomb´ as and N´emeth
passed to the Resource Broker for further processing and to select the resource that will run the job. The presence of a Resource Allocator service (like Globus GRAM) that can allocate a resource and run a job on it, and a Submitter service which can submit a job to such an allocator and track it while it’s running are assumed.
3.1
Advertisements
The information system and the Resource Broker are interfacing at the Resource Selector (see Figure 2). Given a job description, the Resource Selector returns a sorted list of resources. The order of resources represent a preference based on the information gathered from the resource advertisements such as which resource offers the best performance, the lowest price or the highest probability to accept a job. Advertisements allow the Resource Selector to decide if a resource is capable to run a specific job. For this purpose an advertisement has to include the capabilities of the resource. Typical information an advertisement may contain: • Administrative information, e.g. creation date, expiration date and digital signature. • Resource description, e.g. CPU architecture and speed, operating system etc. • Availability information, e.g. maximum number of processors a job can request, forecast of resource allocation, etc. • Policy information, e.g. the accounting method. The most important criteria for advertisements is that they should not contain state information. There are three reasons for it: first, state information expires very fast, and it may not be valid by the time it reaches the Resource Selector. Second, site administrators may not wish to publish state information (especially if the resources are not dedicated exclusively to grid tasks). The third reason is that state information may not be interpreted correctly outside of the publishing resource. The simplest example for the latter case is when the site administrators implement priorities among users, and high-priority users (the list of whom is not public for political reasons) are allowed to take over the resource even if it is heavily loaded by low-priority users. In this case publishing the load can prevent jobs of high-priority users to use the resource since they think it is not available. Sophisticated brokering however, does require state information as well, that is essentially excluded from the information system. Recall two of the three reasons why state information is not included in advertisements: it is highly dynamic and possibly cannot be interpreted outside the publishing site. There are well established models for describing the behaviour of resources. Resources can be considered as servers and queueing theory has been studied and applied
Information System Architecture for Brokering in Large Scale Grids
7
in many similar cases. If resources are described by a stochastic model, their characteristic parameters can be published in the information system and the model itself serves as an environment for interpretation. The parameters can be considered static over a longer period of time. It is a crucial issue and is being currently investigated how frequently the parameters must be republished in order to have a precise estimation of the real information. In such a way data representing state information can be included in the information system. Thus, apart from the configuration data, pseudo-static state information can also be taken into consideration for enhancing brokering decisions with quality, cost and performance issues. Advertisements should only contain the minimum information needed. Limiting information may however increase the flexibility of the system. For example, the name and version of the operating system is important and should be published, but the list of installed software is not necessary. This enables the local resource management to automatically install software packages if a job needs them without any additional support. On the other hand, listing the presence of software packages that a lot of jobs depend on can improve scheduling performance. The same is true for policy information. Sites are expected not to publish detailed policy information, but it is quite important to know how much the user is going to pay if she chooses the resource. The monitoring system can provide further information that is not considered important for brokering or not general enough to include in advertisements, but which can still be useful for some applications. Another important aspect of advertisements is that they do not need to describe physical resources. A cluster aimed for running mainly small jobs can announce that the maximum number of processors that can be requested is 8 while the whole cluster may have 64 processors. This virtualisation of the physical resource makes easy to allocate only a part of the resource to grid processing and use the rest of the resource for solving local problems.
3.2
Implementation with USENET News
The well-known USENET News [5] is a distributed, replicated database of articles and the functionality of ADS described in Section 1 can be implemented using the news service. Resource Classifiers act as simple news user agents and post the resource advertisements in news article format to the nearest news server. The news network take the role of the Advertisement Distribution System by distributing the posted articles to all other news servers. The Filter of a Resource Selector acts as a simple leaf news server by either accepting one incoming feed from the nearest backbone server or by sucking articles from the nearest reader server.
8
Balaton, Gomb´ as and N´emeth
The requirements set earlier for the ADS are fulfilled. The News system is completely distributed. It is also possible that different grid collaborations use the same information system without interfering with each other by posting their advertisements to different newsgroups. The network traffic is optimal since information spreads along a spanning tree only. The USENET News system is very robust. If a news server becomes unreachable, all traffic previously flowing through it will automatically turn to alternate routes without any intervention. If the failure was caused by the server, Resource Classifiers and Resource Selectors can switch to other news servers and continue their operation. The News system is highly scalable. Current news servers can handle over 1.5 million articles with a total volume of over 300 gigabytes per day and these numbers are constantly growing.
3.3
Advantages of the proposed architecture
Compared to other information systems, this solution has further advantages as well. All data are placed near the consumer (resource broker) meaning that even complex search operations are fast and effective using the user’s own resources. The total network traffic correlates to the number of resources and not to the number of jobs. This is important, since in a real grid environment there are much more jobs than resources. Neither the Resource Classifiers nor the Resource Selectors need to know anything about each other’s locations thus, transparency is provided, too. The only information they need is the address of the nearest ADS server (e.g. a news server serving grid newsgroups). A concern about databases is how trustworthy they are. In our proposed architecture security can be solved easily by requiring Resource Classifiers to digitally sign generated advertisements. Thus Resource Selectors can decide which Resource Classifiers they trust and configure their Filters accordingly. The other concern about security is the information that is made publicly available. Since the content of advertisements is flexible, system administrators can refrain from making sensitive information publicly available. A private collaboration using grid technology may also encrypt advertisements or can create a private ADS network (e.g. a separate newsgroup) which distributes advertisements only among the collaboration participants.
4.
Conclusion
In this paper a novel architecture for a grid information system has been introduced. As the grid is growing, together with the increasing number of available resources, discovery type queries will be dominant over lookup type queries. The current information system in Globus is not efficient enough in supporting large-scale discovery type queries. The reasons were analysed in the paper.
Information System Architecture for Brokering in Large Scale Grids
9
Our approach that aims at supporting the resource brokering in large-scale grid applications has four fundamental points. First, information relevant for a brokering procedure is included only. Dynamic data with short lifespan is replaced by stochastic parameters. Second, information is propagated to the consumers. This way, databases can be tailored to the local needs and queries can be executed locally. This method also reduces network traffic, speeds up query processing and utilises local resources only. As opposed to previous models, resource advertisements are sent through the network and not the queries thus, the overall network traffic is proportional to the number of resources and not to the number of users or jobs. Third, the proposed ADS can be based on the USENET News architecture. In this way an existing, ready-made, well established mechanism can serve as an information distribution system. Thus, neither the producer nor the consumer have to know about each other but they can rely on a robust, efficient and pervasive network. Fourth, the advertisement scheme offers a way for extensible and flexible description of resources that allows additional services and easier implementation of security mechanisms.
5.
Acknowledgements
We would like to thank P´eter Kacsuk, Norbert Podhorszki, Ferenc Szalai and Ferenc Vajda for their valuable comments, discussions and considerable help in forming the ideas presented in this paper. This work was partially supported by the European Commission under contract number IST-2001-32133 and the Hungarian Scientific Research Fund (OTKA) under grant number T032226.
References [1] K. Czajkowski, S. Fitzgerald, I. Foster, C. Kesselman: Grid Information Services for Distributed Resource Sharing. Proc. 10th IEEE International Symposium on HighPerformance Distributed Computing (HPDC-10), IEEE Press, 2001. [2] I. Foster, C. Kesselman: The Globus Toolkit, In The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann Publishers, 1999. pp. 259-278. [3] A. S. Grimshaw, W. A. Wulf, J. C. French, A. C. Weaver, P. F. Reynolds: Legion: The Next Logical Step Toward a Nationwide Virtual Computer. Technical report No. CS-9421. June, 1994. [4] A. S. Grimshaw, W. A. Wulf: Legion - A View from 50,000 Feet, Proc. 5th IEEE International Symposium on High Performance Distributed Computing (HPDC-5), IEEE Computer Society Press, Los Alamitos, California, August 1996 [5] B. Kantor, P. Lapsley: Network News Transfer Protocol, IETF RFC977, February 1986. [6] M. Wahl, T. Howes, S. Kille: Lightweight Directory Access Protocol (v3), IETF RFC 2251, December 1997.