LNCS 3251 - An Ontology-Based Model for Grid ... - Springer Link

An Ontology-Based Model for Grid Resource Publication and Discovery Lei Cao1 , Minglu Li1 , Henry Rong2 , and Joshua Huang2 1

2

Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 20030, China {clcao,mlli}@sjtu.edu.cn E-Business Technology Institute, The University of Hong Kong, Hong Kong, China {hrong,jhuang}@eti.hku.hk

Abstract. Resource management system is the core component of a Grid system. It has two important functions: resource publication and discovery. This paper presents an ontology-based model for Grid Resource Publication and Discovery(GRPD). We adopt multiple domainspecific registries to manage corresponding resources of a Virtual Organization(VO) in order to obtain high GRPD efficiency. Resource descriptions and resource requests are all based on domain-specific ontology. The ontology-based matchmaker of the domain-specific registry plays the important role in resources selection. The “Index” node of a VO hosts the general registry. Other domain-specific registries are distributed in the VO. This is a two-level registry mechanism. A large-scale Grid system may contain many VOs. “Index” nodes from various VOs connect to each other in the peer-to-peer mode instead of the hierarchical mode.

1

Introduction

Grid technology is one of the most important technologies coming forth in recent years. It has emerged to enable large-scale flexible resources sharing among dynamic Virtual Organizations(VOs) in a networked environment. A basic service in Grid is resource discovery: given a description of resources desired, the resource discovery mechanism will return a set of (contact addresses of) resources that match the description[1]. We also call this process resource matching. Resource discovery in a Grid is a challenging task because of the following Grid features[2]: heterogeneous, dynamic, autonomic and numerous. These characteristics create significant difficulties for traditional centralized and hierarchical resource discovery services. Furthermore, existing resource description and resource selection in the Grid are highly strained because traditional resource matching is done based on symmetric, attribute-based matching. The exact matching and coordination between providers and consumers make such system inflexible and difficult to extend to new characteristics or concepts.

This paper has been supported by the 973 project (No.2002CB312002) of China, grand project of the Science and Technology Commission of Shanghai Municipality (No.03dz15027)

H. Jin, Y. Pan, N. Xiao, and J. Sun (Eds.): GCC 2004, LNCS 3251, pp. 448–455, 2004. c Springer-Verlag Berlin Heidelberg 2004

An Ontology-Based Model for Grid Resource Publication and Discovery

449

In this paper, we present an ontology-based model for Grid Resource Publication and Discovery(GRPD). Resource descriptions and requests are all fundamental in GRPD. We use separate ontologies to describe resources and requests respectively. Instead of exact syntax matching, our ontology-based matchmaker performs semantic matching using terms defined in those ontologies. The loose coupling between resources and requests removes the tight coordination requirement between resource providers and resource consumers. A large-scale Grid system may contain many VOs. Each VO has its own “Index” node that hosts the general registry. We build up multiple domainspecific registries in a VO, in accordance that ontology is domain specific. All heterogeneous resources in a VO will register themselves with those domainspecific registries in soft-state way via the general registry. This is a two-level registry mechanism. “Index” nodes from various VOs connect to each other in the peer-to-peer mode instead of the hierarchical mode. The rest of this paper is organized as follows. We discuss related work in Section 2. Section 3 presents our ontology-based model in detail. Section 4 presents an ontology-based matchmaking example. We conclude in Section 5 with lessons learned and future research plans.

2

Related Work

Condor-G[3] combines the inter-domain resource management protocols of the Globus Toolkit and the intra-domain resource management methods of Condor to allow users to harness multi-domain resources. Condor-G agent formulates resources information and user requests in the Classified Ads resource specification language, and then uses the matchmaker to make brokering decisions based on symmetric, attribute-based matching. Obviously, the matchmaker becomes the system bottleneck especially when the system’s scale becomes larger. In Globus[4], resource information is managed in the Information Services (MDS3) that consist of resource-layer services and some of higher-level services (a collective-layer Index Service). There is typically one Index Service per VO but, in large organizations, several Index Services can be hierarchically included in a higher-level Index Server. The Grid community agrees that it is not easy to devise scalable Grid resource discovery based on centralized or hierarchical mechanism when a large number of Grid hosts, resources, and users have to be managed[5]. Legion[6, 7] is a reflective, object-based operating system for the Grid. Proper scheduler objects use information from the collection and resource owners in making scheduling decision. Objects are used as the main system abstraction throughout. Co-allocation of resources is not supported. EU Data Grid[8] was designed to provide distributed scientific communities access to large sets of distributed computational and data resources. A job request is expressed in the Classified Ads of Condor. Resource discovery is done by queries and employs periodic push for dissemination. It does not support advanced reservation or co-allocation of resources.

450

Lei Cao et al.

Nimrod-G[9] is a Grid-enabled resource management and scheduling system based on the concept of computational economy. It uses Globus middleware services for dynamic resource discovery and dispatching jobs over computational Grids. Nimrod-G has the same shortcomings as Globus by using Globus MDS.

3 3.1

Ontology-Based Model Architecture

Figure 1 shows our model architecture in a VO. Domain-specific registries register their metadata with the general registry. Each one is responsible for resources registration of this domain and implements resource discovery by ontology-based matchmaking. We apply various domain ontologies in different specific registries. The mechanism strengthens both the efficiency of resources discovery and the scalability of the Grid system. We use redundant technology to enable the reliability of the Grid system. Each registry has its own online substitute. In our model we also use the P2P philosophy and techniques among VOs of the Grid system.

General registry in another VO Query

Client B

Query

Publication (based on ontology)

spo Re

Response Discovery (based on ontology)

nse

matched list/ NoMatchFound

General General registry registry Index node

Domain-A Domain-A registry registry Domain-B Domain-B registry registry Domain-C Domain-C registry registry

GSH of *Registry & ... Update/Keepalive

Client A Subscription

Domain-X Domain-X registry registry

Fig. 1. Model architecture

3.2

Grid-Related Ontologies

The term ontology has been in use for many years. Today’s use of ontology on the Web has a different slant from the previous philosophical notions. One widely cited definition of an ontology is “A specification of a conceptualization”[10]. It is domain specific so that there are many kinds of ontologies in the world. Using RDF-Schema[11], each ontology defines objects, properties of objects, and relationships among objects, which belong to one domain. We have designed and prototyped our matchmaker using existing semantic web technologies to exploit ontologies and rules for Grid resource matching. Resources in Grid environment are heterogeneous. We can simply group Grid resources into five domains. They are: (1)Computational resources (Cluster, PC, Supercomputer, Operating System, etc.), (2)Database and storage resources (Magnetic disk array, Optical disc library, Magnetic tape library, Oracle, Sybase, etc.), (3)Application resources


451

(Online game, Specific computing software, etc.), (4)Instrument resources (Telescope, Spectral analyzer, etc.), (5)Network resources (Rooter, Switcher, etc.). Besides this, we have built five domain ontologies for Grid resources using Protege-2000[12]. Each ontology includes three parts: resource ontology, resource request ontology and resource policy ontology[13, 14]: The resource ontology provides an abstract model for describing resources, their capabilities and their relationships. The majority of our resource vocabularies are taken from the Common Resource Model(CRM)[15]. We extend it to fit our abstract description requirements. The resource request ontology focuses on a request, properties of the request, characteristics of the request and the resource requirements. The ontology supports requests of multiple independent resources. The resource policy ontology describes the resource authorization and usage policies. 3.3

Two-Level Registry Mechanism Used in a VO

A large-scale Grid system is usually comprised of a large number of heterogenous resources located in different organizations. To have a registry that holds the handles for every resource in the Grid would be impractical. On one hand it will be too large. On the other hand it will be updated frequently after resources are created or removed from the system. Figure 2 shows two kinds of registry structures in our two-level registry mechanism. The general registry contains two parts: the Super-index neighbour set and the Domain-specific registry set. The former will be introduced in the next section. The later contains the metadata (e.g., GSHs) of all domain-specific registries in a VO. We use domainspecific ontology vocabularies to describe Grid resources. Those metadata of various Grid resources from the same domain are stored in a resource registration database of a domain-specific registry. The database is based on soft-state updates from resource providers. There are three parts in a domain-specific registry (See Figure 2). The resource registration database has been mentioned above. The ontology-based matchmaker is responsible for resource selection. It also consists of three components[13]: The Domain ontologies contains the domain model and vocabularies for expressing resource publications and job requests. The Domain background knowledge contains additional knowledge about the domain that is not captured by the ontology. TRIPLE[16], a rule system based on deductive database techniques is used to implement the background knowledge. General registry

Domain-specific registry Matchmaker Matchmaking rules

Superindex neighbour set

Domainspecific registry set

Resource registration database

Domain Background Knowledge Domain Ontologies (vocabularies) Deductive Database System (TRIPLE/XSB)

Fig. 2. Registry structures

452

Lei Cao et al.

The matchmaking rules define the matching constraints between requests and resources. We also use the TRIPLE rule language to implement these rules. Domain background knowledge uses the ontology vocabularies to capture background information. Matchmaking uses both domain ontology and domain background knowledge to match a request to resources. Our ontology-based matchmaker is built on top of TRIPLE/XSB deductive database system. Benefits of this two-level registry mechanism are: (1)Decreasing the centralized registry’s workload and increasing resource discovery efficiency by distributing various domain-specific resource queries to corresponding registries; (2)Strengthening the system’s adaptability because resources adopt soft-state method to register themselves and may only submit their keep-alive messages to domain-specific registry periodically. They can join or leave at any moment. (3)Improving the system’s scalability. The system doesn’t need to do much work but to remove (or add) the resource metadata from (or to) the domain-specific registry on the resource’s departure (or entrance). 3.4

Super-index Network

In large-scale Grid environment, we may use the P2P techniques to implement non-hierarchical decentralized Grid systems. Use of P2P protocols is expected to improve the efficiency and scalability of large-scale Grid systems[17]. The Grid system contains many VOs. Every VO has its own Index node to host the general registry we mentioned above. We connect those Index nodes from different VOs into a P2P network like “Super-peer Network”[18]. Certainly, the Index node will contain the neighbor set of other VOs’ Index nodes. Therefore we have two different scenarios: a typical Grid one in a VO, a traditional P2P one among VOs.

Index node

VO4 Index node

Index node Index node

VO1 Index node Index node

VO2

Index node

VO3

Index node

Fig. 3. Super-Index network among VOs

Because the Index node is prone to be a single point of failure and a potential bottleneck of its VO, we take some measures to avoid it: (1)Only host the general registry in the Index node. After Grid resources register themselves with domain-specific registries, they can exchange messages with them directly. (2)Use those nodes having the largest capabilities within each VO as the Index nodes. (3)Introduce redundancy into the design of the Index node to provide


453

more reliability to the VO and less load on the Index node. However, doing so will add additional costs, so it’s important to balance reliability and cost. In our model we have used two nodes as the “virtual” Index node to ensure good reliability. The same reliability problem is faced by those nodes that host domain-specific registries. We have also used the same measures to solve them. 3.5

Grid Resource Publication and Discovery

Grid Resource Publication(GRP) is depicted as follows: 1. The resource provider wants to join the VO. It submits the publication request to the VO’s general registry. 2. The general registry will verify the domain that the forwarded request corresponds with and deliver it to a domain-specific registry that is responsible for registration in the verified domain. 3. The domain-specific registry then returns success message to the general registry if the registration is successful. 4. The general registry returns the GSH of the domain-specific registry to the resource provider. 5. The resource provider will periodically send out the living messages to the domain-specific registry. Thus it is called soft-state registration. 6. The domain-specific registry may subscribe with the resource provider for some metadata. When subscribed metadata changes, the resource provider will notify the domain-specific registry in time. Grid Resource Discovery(GRD) is described as follows: 1. The requester submits the resource query to the general registry. 2. As step 2 in GRP. 3. The ontology-based matchmaker in the domain-specific registry makes matching between resources and request. 4. The domain-specific registry will return the matched resource list (or NoMatch-Found) to the general registry. 5. If no requested resource can be found, the general registry will use some policies to select another general registry from its neighbor set of VOs. The search will continue until the requested resources can be found or the Timeto-Live(TTL) expires. 6. If the requested resources can be found, the matched resource list will be sent back to the requester via the general registry. Due to the paper space, the sequence diagrams of GRP and GRD are omitted.

4

Ontology-Based Matchmaking Example

Here we present a matching example that cannot be done easily by attributebased matchmakers. We utilize our ontology-based matchmaker in the example.

454

Lei Cao et al.

Property Name

Property Values

Property Name

Property Values

MDiskArray.Name MDiskArray.AuthorizedGroup MDiskArray.NumberofAvailableDisks MDiskArray.RAIDLevel MDiskArray.Cost-perGB MDiskArray.IORate MDiskArray.MaxCapacity MDiskArray.UsingControler.type MDiskArray.UsingControler.name

"MDA1.cs.sjtu.edu.cn" "[email protected]" 20 3 $30 200MB/S 10TB "Hard" "Disk-controling card"

MTapeLibrary.Name MTapeLibrary.AuthorizedGroup MTapeLibrary.NumberofAvailableTapes MTapeLibrary.LinkType MTapeLibrary.Cost-perGB MTapeLibrary.IORate MTapeLibrary.MaxCapacity

"MTL1.cs.sjtu.edu.cn" "[email protected]" 10 "SAN" $10 35MB/S 50TB

(b) A Magnetic Tape Library with 50TB MaxCapacity

(a) A Magnetic Disk Array with 10TB MaxCapacity

Property Name

Property Values

JobRequest.Name JobRequest.Owner JobRequest.JobType JobRequest.NumberofResources JobRequest.RequestResource.ResourceType JobRequest.RequestResource.RankBy JobRequest.RequestResource.MinStorageSpace JobRequest.RequestResource.MinIORate

"Request1" "User1" "Save data online or nearline" 1 "large-capacity storage system" "Cost-perGB" 9TB 25MB/S

(c) Job Request

Fig. 4. Job request and available resources

Figure 4(a, b) shows examples of two instances of resources: a 10TB-MaxCapacity Magnetic Disk Array and a 50TB-MaxCapacity Magnetic Tape Library. We only list some relevant properties to the example. In the example, both storage resources belong to the Department of Computer Science and Engineering, Shanghai Jiao Tong University. They only allow users in the “[email protected]” group to access the resources. Figure 4(c) shows an example of a job request specifying that it wants one “Large-capacity storage system” resource for a “Save data online or nearline” job. The resource requirements are also specified in the list. As our background knowledge indicates that a “Save data online or nearline” job can be done by MDiskArray or MTapeLibrary systems, so two storage systems both are candidate resources. Assuming that User1 has an account that belongs to the “[email protected]” group, User1 is authorized to access both storage systems. The matchmaker then checks the capabilities of both resources against the resource requirements. Again, since our background knowledge specifies that these two storage systems are “large-capacity storage system”, both resources pass the “RequestResource.ResourceType” requirement criteria. Because both resources are compatible with the resource requirements, the “RankBy” is used to select the match. Since the “Cost-perGB” of “MTL1.cs.sjtu.edu.cn” is lower than that of “MDA1.cs.sjtu.edu.cn”, the matchmaker returns “MTL1.cs.sjtu.edu.cn” as a match.

5

Conclusion and Future Work

This paper has described an ontology-based model for GRPD. We focus on the grouping of Grid resources to build some domain-specific registries in a VO and give a two-level registry mechanism. The ontology-based matchmaker that locates in every domain-specific registry is responsible for the semantic matching between resources and requests. Doing so increases both the efficiency of resource discovery and the scalability of the Grid system. Using P2P techniques we construct a Super-Index network among VOs of the Grid system. In the near future, we plan to do a practical performance evaluation to show that our ontology-based model can be efficiently used in the Grid environment.


455

References 1. A. Iamnitchi, I.F.: On fully decentralized resource discovery in grid environments. In: Proceedings of International Workshop on Grid Computing. (2001) 2. Y. GONG, e.a.: Vega infrastructure for resource discovery in grids. Computer Science and Technology 18 (2003) 10 3. J. Frey, e.a.: Condor-g:a computation management agent for multi-institutional grids. In: Proceedings of the 10th IEEE International Symposium on High Performance Distributed Computing. (2001) 4. S. Tuecke, e.a.: Open grid services infrastructure (ogsi) version 1.0. In: Specification of GGF by OGSI-WG. (2003) 5. C. Mastroianni, D. Talia, O.V.: P2p protocols for membership management and resource discovery in grids. (2004) 6. S.J. CChapin, e.a.: The legion resource management system. In: Proceedings of Job Scheduling Strategies for Parallel Processing. (1999) 7. A. Natrajan, M.A. Humphrey, A.G.: Grid Resource Management In Legion. Kluwer Academic Publishers, Virginia (2004) 8. W. Hoschek, e.a.: Data management in an international data grid project. In: Proceedings of the 1st IEEE/ACM International Workshop on Grid Computing. (2000) 9. R. Buyya, D. Abramson, J.G.: Nimrod/g: An architecture for a resource management and scheduling system in a global computational grid. In: Proceedings of the 4th International Conference on High-Performance Computing in the Asia-Pacific Region. (2000) 10. Gruber, T.: A translation approach to portable ontology specification. In: Proceedings of the Knowledge Acquisition Workshop. (1992) 11. D. Brickley, R.G.: Rdf vocabulary description language 1.0. In: RDF. (2004) 12. Protege: Protege2000. In: http://protege.stanford.edu/. (2004) 13. H. Tangmunarunkit, S. Decker, C.K.: Ontology-based resource matching in the grid-the grid meets the semantic web. In: Proceedings of the 1st Workshop on Semantics in Peer-to-Peer and Grid Computing. (2003) 14. P. Pothipruk, P.L.: An ontology-based multi-agent system for matchmaking. In: Proceedings of the 1st International Conference on Information Technology and Applications. (2002) 15. E. Stokes, N.B.: Common resource model(crm). In: Specification of GGF by CMM-WG. (2003) 16. M. Sintek, S.D.: Triple-a query,inference,and transformation language for the semantic web. In: Proceedings of the 1st International Semantic Web Conference on The Semantic Web. (2002) 17. D. Talia, P.T.: Toward a synergy between p2p and grids. IEEE Internet Computing 7 (2003) 3 18. B. Yang, H.G.M.: Designing a super-peer network. In: Proceedings of the 19th International Conference on Data Engineering. (2003)