Using Ontology for Description of Grid Resources A.M. Pernas Department of Informatics (DInfo) Federal University of Pelotas (UFPel) 96010-900 - Pelotas – Brazil
[email protected]
Abstract Grid computing environments can share resources and services in a large-scale. These environments are being considered as an effective solution for many organizations to execute distributed applications to obtain high level of performance and availability. However, the use of a grid environment can be a complex task for an ordinary user, demanding a previous knowledge of the access requirements from a virtual organization. In order to improve the search of resources and its selection, in this paper we propose the use of ontology as alternative approach to help the use of grid services. This paradigm can provide a description of available resources in a grid configuration, leading users to desire operations and describing resources syntax and semantics that can form a common domain vocabulary. Our experimental result indicates that our proposal allows a better comprehension about available resources main text.
1. Introduction The term grid is sometimes used synonymously with a networked, high-performance computing infrastructure. This vision is certainly important from their utilization, but it is only a part of a much larger framework that also includes information handling and support for knowledge processing [1]. A grid environment infrastructure can be understood as a set of services, provided by institutions (or a particular individual) to be used from another. Its architecture may be viewed as service-oriented [1] [2], where two entities have a special importance: the producer (owner) of the service and the consumer of the service. In this vision, owners offer services to be used, according to some restrictions that must be satisfied before given access permission to consumer. The consumer may be a user, an institution or an application program, belonging
M.A.R Dantas Department of Informatics and Statistics (INE) Federal University of Santa Catarina (UFSC) 88040-900 - Florianopolis – Brazil
[email protected]
to another institution (in section 3 we present more details about the service-oriented architecture). Considering a service oriented approach, in this article we propose the use of ontology for the description of available resource in a grid computing environment. In order to became possible the proposal, an ontology based service was created helping users to execute their applications. The motivation for this research is based on the clear advantages in using ontology to have a common domain of concepts shared among ordinary users of a grid environment. In other words, as some projects have shown (e.g. [3]) the use of ontology can enhance the interoperability between different virtual organizations. In this article, we first define the ontology to provide more precise information of available resources in a grid. The next step was to create a grid service using the proposed ontology. In this scenario, users and application programs interact directly with the ontology based service to improve the utilization of the configuration. The paper is organized as follows. In section 2 we introduce some concepts of ontology. Service oriented in grid architecture is presented in section 3. The development of the proposed ontology based service is described in section 4. In section 5, we present some related works. Finally, in section 6 we present our conclusions and some future work
2. Concepts of ontology Ontology in computer science has been used for several years to build a vocabulary of a specific domain application area. The approach considers a formal explicit definition of some concepts that are shared among researches [4] [5] [6]. Following [7], a definition is related to an abstract model of a phenomenon which identifies relevant concepts of the event itself. The word formal can be understood by the fact that the ontology can be translated to a machine. Finally, the shared characteristic re-
Proceedings of the 19th International Symposium on High Performance Computing Systems and Applications (HPCS’05) 1550-5243/05 $20.00 © 2005 IEEE
flects the idea that ontology is a set of knowledge from a group. The use of ontology for the semantics description of a vocabulary is a complex process and it requires a careful study of the specific knowledge. However, it provides a clear understanding of characteristics and properties of classes and relations. In addition to that, it is possible to extent for the use of new domains. New classes, rules or vocabulary can be added for a new application domain. Ontology can be distributed and shared for utilization. In conjunction with other ontology and tools can provide also interoperability.
2.1. Ontology components The representation of an existing element in the ontology is characterized by the use of some components, known as epistemological elements. These components are able to identify each category of elements in the ontology. Elements and their meaning can have different terminology based upon the area which utilizes these components. However, in [6] there is an identification of four components, independent of any specific community that can be employed to define one ontology specification. The components are: • Classes: used in a more general way, can be used to express a task, a function, an action, a strategic and a process of thought; • Relations: represent interaction between classes and the domain. As examples we can consider sub-class from or connect to; • Functions: a special case of relations, where an element n of the relation is the unique for n-1 precedent elements; • Axioms: used to prototype sentences which are always true.
3. Service-oriented in grid architecture In [2] it is argued that the concept of a grid is motivated by a real and specific problem. The authors believe that the approach can be understood as well defined environments where resources and services can be shared among these configurations. These environments are called as virtual organizations (VOs). VOs are characterized by a number of organizations (or individuals) providing and consuming resources and services from the grid, following rules of sharing. Therefore, a large portion of grid operations are related to share of resources and services. Processors, memories, disks and software packages are able to be accessed from any node of the grid. However, to reach an ideal access transparency for resources and services it is necessary a complex control of produc-
ers and consumers. In other words, it is necessary the establishment of secure rules to share resources and services. In a service oriented architecture (such as the grid), the producer first establishment some rules as a standard to allow access to resources. The next step is characterized by a service level agreement between a producer and a consumer of the services. This agreement does not reflect only the producer part, but also the consumer side (e.g. Gang-Matching [8] - in this research work is presented a multilateral matching based on the consumer utilization policies). Therefore, service oriented grid architecture requires a scheme to support the interoperability between applications from the VOs and a high level of access transparency for resources.
4. Ontology for grid resources description In this section we present our research work that, different from [1] [18] presented in section 5, targets the construction of ontology for grid resources description. Therefore, the approach was to develop ontology which could represent more precisely resources of a grid configuration. The target community was people from computational science. However, it is interesting to note that the present research can be easily extended to other communities. We designed a vocabulary using the component axiom, which represents a key element to ensure all the ontology requirements. In addition to this component, other two structures help the ontology in the description of resources: • Metadata: reflects the information related to a data. In this research, the metadata stores information about the computational resources, examples are: the date when an element was already to use; how much time it is expected to allow access to a specific resource; address of the resource; capacity of storage; available memory; existing operating system; architecture type. • Semantics view: this structure stores information related to the present state of a computational resource. Thus, every time that a request comes to a semantic view, the structure returns information about that moment. In other words, a semantic view replies if the resource is available for use, out-ofwork or heavily loaded between others status. Metadata and semantics view are both used as additional references to ontology. Even not commonly used in other ontologies, these structures can improve the ontology action, returning answers more quickly. Thus, every
Proceedings of the 19th International Symposium on High Performance Computing Systems and Applications (HPCS’05) 1550-5243/05 $20.00 © 2005 IEEE
time when a new computational resource is added (or subtracted) from the grid, the metadata from this resource is updated to reflect the real situation. Because semantics views are created continuously on the system, these structures have updated information of computational resources. The ontology developed in this work was implemented using the OWL Language (Web Ontology Language). This language is used as a standard by the W3C. The OWL has the same features found in other languages used for ontology, such as: DAML-OIL (DARPA Agent Markup Language - Ontology Inference Layer), RDF (Resource Description Framework) e RDF-S (RDF Schema). OWL is designed for use by applications that need to process the content of information instead of only presenting information to humans. OWL facilitates greater machine interpretability of Web content than that supported by XML, RDF, and RDF Schema (RDF-S) by providing additional vocabulary along with a formal semantics [9]. In addition, OWL has incorporated the enhancement of the DAML-OIL language and a more extensive vocabulary. This feature provides mechanisms to create properties and classes, allowing the creation of relationship between classes, cardinality and characteristics of properties. The OWL language has three increasingly-expressive sublanguages: OWL Lite, OWL DL and OWL Full. These three languages differ from each other in the level of formality provide to users to create the ontology. In this research work we use the OWL Full. This language has an interesting level of formality and freedom, these aspects were important to the definition of our ontology, where for example we consider resource utilization policies from axioms. The ontology was edited using the Protégé-2000 software package [10]. In this environment, we describe concepts, attributes and relationships of the ontology. We chose this editor because it is free available, has support for several operation systems (e.g. Linux, Windows, MAC OS, Solaris, HP-UX) and a large number of plugins.
4.1. Grid architecture using ontology The proposed ontology of this article works directly in the directory of a grid configuration, where queries about resources are applied on the vocabulary defined by the ontology. Therefore, we eliminated the possibility of ambiguous interpretation on the search and read of information related to the environment. All the concepts utilized by applications (i.e. consumers and producers sides), references to presentation and search of the information have a unique meaning.
Figure 1 (modified from [11]) shows that is possible to understand that the ontology is a special layer in the configuration. Consumers´ queries come to the ontology part of the model. The ontology component utilizes the metadata and the semantics view to obtain information related to computational resources. The metadata and semantics views components help the ontology to obtain answers to consumers´ queries. The metadata structure receives information directly from resources and data files. On the other hand, the semantics view communicates with the Metacomputing Directory Service (MDS), this service provides a distributed access to the grid structure and information related to system components. Figure 1 illustrates a data flow necessary to obtain information using the ontology approach in a grid environment.
Figure 1. Grid architecture using the ontology approach.
4.2. Methodology Before start to design the ontology, it is necessary to establish which methodology is more suitable for the environment. Therefore, we consider the portability an essential factor to be considered. In grid configurations we have different systems from different virtual organizations. Because of that, in this research work we adopted the methodology presented in [6], which describes a mechanism to define a portable ontology for different system representations. This approach is interesting because it has the feature of reuse. However, it is important to remember that for any adopted methodology, the representation of community knowledge is a complex task. In our work, as presented in subsection 4.3, we adopt to research the meaning of the concepts major projects and grid environments.
4.3. Ontology development The primary task to develop the ontology was to probe how to describe the concepts related to grid computa-
Proceedings of the 19th International Symposium on High Performance Computing Systems and Applications (HPCS’05) 1550-5243/05 $20.00 © 2005 IEEE
tional resources. Thus, we search for the vocabulary utilized by the community and which resources were commonly employed in grid configurations. The probe was realized considering the NPACI [12], ESG (Earth System Grid) [13]; NASA's Information Power Grid (IPG) [14] and the Distributed ASCI Supercomputer Project 2 (DAS - 2) [15]. After our search we create the documentation required to build the ontology, witch consists of all concepts, described in a formally way, that will form the ontology. The following components characterize the proposal: • Data Dictionary - gathers all the classes and instances from the ontology, together with theirs meanings. At the end of the ontology creation, our Data Dictionary presented a whole of 14 classes. The first class created is Computational_Resources, that was the main class of the ontology because all classes was created as its subclass; • Concept Classification Tree - this component comprises all the lasses and subclasses of the ontology. These tree is showed in Figure 2, where are presented all ontology classes and subclasses;
Figure 2. Concept classification tree. • Table of Classes Attributes and Instances - it presents to each class and instance all their attributes. For each attribute is described: its relation logic and minimum value, which correspond to a restriction about how many elements must be related with this instance in any entry of the ontology; its type, related with the kind of data this instance must have; and its measure unit. An example is illustrated in Table 1, where we can find the structure of attributes related to Cluster, subclass from TypeOfMachine class; • Table of Instances - in this table we find description, attributes and value of each instance of the ontology; • Tables of Attributes Classification - it graphically illustrates attributes which are deduced upon the existence of other attributes from high hierarchy. In Figure 3, we illustrate a tree that was created referring to the OpSysType and architecture attributes,
belonging to the OpSystem, Architecture and Supercomputer. Table 1. Attributes table: class cluster. Attribute Name
Logic Minim. Relation Value
Type
Measure Unit
hostname
=
1
Character
-----
IPaddress
=
1
Character
-----
architecture
=
1
Instance
-----
OpSysType
=
1
Instance
-----
fileSysType
=
1
Instance
-----
numberOfNodos
>=
1
Integer
-----
numberOf Processors numProcessorsNodo numberOfAvailCPUs
>=
2
Integer
-----
>=
1
Integer
-----
>=
1
Integer
-----
diskSpaceGB
-----
-----
Integer
maxMemoGB
-----
-----
Integer
synonymous
-----
-----
Character
Gigabytes (GB) Gigabytes (GB) -----
Our next task was to reproduce this documentation to the OWL language, using the Protégé-2000 editor. In the editor, we also created axioms, employing the PAL (Protégé Axiom-Language). Figure 4 shows an example of an axiom which was created. In this axiom it is an access restriction, where it is only possible to occur if the operating system from the computational resource is AIX, disk space greater then 40 Gigabytes and memory greater then 128 Gigabytes. Metadata were also created using the Protégé-2000 and the OWL language. The metadata were defined based on the concept of each structure of the ontology (i.e. classes, attributes and instances).
IBMSPCluster Inference OpSysType Aarchitecture
OpSysType RedHat
Architecture IBMSP
Figure 3. Attribute classification tree: domain cluster.
Proceedings of the 19th International Symposium on High Performance Computing Systems and Applications (HPCS’05) 1550-5243/05 $20.00 © 2005 IEEE
configuration. The application was configured to accept incoming connections from consumers of a grid.
Figure 4. Axiom to access restriction.
4.4. Ontology based service We developed a grid service, to provide the necessary interaction between a consumer and the ontology. This service allows consumers to access the ontology and the environment. However, it was necessary to develop an application to allow the service to interact with the ontology. The application manipulates the structures existing into the ontology allowing the use from the created service. The application was implemented in Java Language, and directly access all the existing concepts of the ontology using APIs from the Protégé-2000 editor. This mechanism allows the creation of simple routines to interact with all the data inside the ontology. The application was designed in three modules, with the objective to satisfy all the service goals. The first module provides a list of all classes and instances defined in the ontology. The name of theses classes and instances are used by a consumer to process queries into the metadata and computational resources of the following modules. This first module is illustrated in Figure 5. In the second module of the application, consumers can probe for metadata from any class listed by the first module. Figure 6 represents an example of a search of the Server class, where it possible to see the metadata listing. The third module allows a search of any existing computational resource, where a consumer can visualize the entire configuration. The name of the computational resource is the only require input necessary to realize the search. Computational resources are characterized as an instance of a class (e.g. cluster, server or supercomputer). This information, which is necessary to access the environment, is provided in the first module. An example of computational resource search is presented in Figure 7, where is possible to observe information related to the computational resource ClusterLinux configuration. After creating the application it was possible to effectively develop the proposed service. The service was defined using the Globus toolkit [16] and it is characterized by the application preview described executing in a grid
Figure 5. Application component: responsible to list all ontology concepts.
Figure 6. Result displayed from Server metadata. After providing this service, it was possible for consumers to access a grid configuration utilizing a friendly interface. In this new environment, computational re-
Proceedings of the 19th International Symposium on High Performance Computing Systems and Applications (HPCS’05) 1550-5243/05 $20.00 © 2005 IEEE
sources are presented using a more clear description, as we verified with users from the Federal University of Pelotas configuration, where the application of these environment was used. In addition, because the service works accessing the ontology, through the application, new knowledge can be added (at any time) without any restriction to the service. The insertion of new data related to the computational resource does not influence on how the service works, only modifies the data related to the resource.
presented in [17] targets an automatic selection of resources because the ontology is utilized by an application to discover and select resources.
5.2. Ontology for grid
5. Related Work
Some research works are related to apply the ontology in existing grid environments. In this section we describe two of theses efforts, which we understand that are important configurations in scientific scenario. The Semantic Grid [1] is a grid infrastructure which has the goal to support applications related to e-Science. This approach is an open environment where researches and scientists can access the configuration to process their experiments, verify results and store researches to be free available to the community. In the Semantic Grid, ontology is applied to determinate knowledge term extension and relationship among terms inside the environment to help its construction, not having directly access for the consumer. The ontology is used to define: objects with proprieties and relation; tasks and process; attributes related to a specific knowledge and relationship among them; important attribute to establish the value of some content; models and views [1]. Another interesting project, where the ontology is employed using the DAML-OIL language, is the Earth System Grid. In [18] it is presented a research work for the development of a more wide scientific domain. The target is to provide access transparency to a classification base and data search, saving time and computational resources to users. In this project, consumers can access the metadata from the ontology, but this metadata is related to scientific (meteorological) information, in contrast with the metadata presented in our environment, that is related to computational resources. The ontology created in this project also does not present information about computational resources.
5.1. Resources selection in a grid environment
6. Conclusions and future work
In the literature it is possible to find some research works (e.g. [17] [18]), where authors use the ontology to help the use of a grid environment. In [17] it is presented a system to automatic select resources in grid configuration based on ontology. In this system authors have a proposal to create the ontology using a declarative approach to describe resources and restrictions. The goal is to provide an automatic and friendly environment to consumers to select resources. This research is similar to our work in some aspects. The main difference between the proposals is how the ontology created is employed. In our approach we focus in having a more precise description of the resources, thus it is expected a more efficient choice of a computational resource. On the other hand, the proposal
In this paper we presented a research work to provide access transparency to users of grid configurations. Our approach was based on ontology. We first presented some concepts of ontology and service oriented in grid configurations. The environment of our prototype was described starting with the methodology used, followed by some characteristics of the development and finally how the ontology base service works. The system has proved to be an efficient and friendly approach to provide grid resources to consumers. As a future research work we are planning to enhance the system to allow some dynamic changes, such as metadata or inclusion on the application. Other work is to cre-
Figure 7. Search related to ClusterLinux computational resource configuration.
Proceedings of the 19th International Symposium on High Performance Computing Systems and Applications (HPCS’05) 1550-5243/05 $20.00 © 2005 IEEE
ate an ontology to agriculture field and use the application on more wide and complex grid environment.
References [1] D. De Roure, N.R. Jennings and N.R. Shadbolt, “The Semantic Grid: A Future e-Science Infrastructure”, 2003. In. Grid Computing: Making The Global Infrastructure a Reality, F. Berman, A.J.G. Hey and G. Fox (eds), Southern Gate, Chinchester, England: John Wiley & Sons, 1080 p., 2003, pp. 437-470. [2] I. Foster, C. Kesselman, J. Nick and S. Tuecke, “The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration”, Presented at GGF4, 2002. [3] A.M. Pernas, “Ontologias Aplicadas a Descrição de Recursos em Grids Computacionais”, Federal University of Santa Catarina, Florianópolis, Brazil, 2004. [4] D. Fensel, “Ontologies: Silver Bullet for Knowledge Management and Eletronic Commerce”, Springer – Verlag, Berlin, 2000. [5] T. Gava and C. Menezes, “Especificação de Software Baseada em Ontologias”, In. III Escola Regional de Informática, Brazil, 2003, pp. 167-205. [6] T. Gruber, “A Translation Approach to Portable Ontology Specifications”, Knowledge Acquisition, 1993. [Online].Available http://www.ksl.stanford.edu/kst/what is-an-ontology.html. [7] R. Studer, R. Benjamins, and D. Fensel, “Knowledge Engineering: Principles and Methods”, IEEE Transactions on Data and Knowledge Engineering, 1998, pp. 161-197. [8] R. Raman, M. Linvy and M. Solomon “Resource management through multilateral matchmaking”, In Proc. of the Ninth IEEE Symposium on High-Performance Distributed Computing (HPDC9), Pittsburgh, PA, August, 2000, pp. 290-291. [9] D. McGuinness and F. Van Harmelen, “OWL – Web Ontology Language Overview”, 2004. [Online]. Available: http://www.w3.org/TR/2004/REC-owl-features-20040210.
[10] N. Noy, R. Fergerson and M. Musen, “The knowledge model of Protege-2000: Combining interoperability and flexibility”, 12th International Conference on Knowledge Engineering and Knowledge Management-Europe Knowledge Aquisition Workshop (EKAW), French Riviera, October, 2000, pp 2-6. [11] C. Goble and D. De Roure, “Semantic Web and Grid Computing”, September 2002. [Online]. Available: http://www.semanticgrid.org/documents/swgc/swgcfinal.pdf. [12] NPACI - National Partnership for Advanced Computational Infrastructure, “Partnership Report”, 2000. [Online]. Available: http://www.npaci.edu/About_NPACI/index.html. [13] I. Foster, D. Middleton, and D. Williams, “The Earth System Grid II: Turning Climate Model Datasets into Community Resources”, January, 2003. [Online]. Available: https://www.earthsystemgrid.org/about/docs/ESGOverview SciDACPINapa_v8.doc. [14] IPG, Information Power Grid - Nasa’s Computing and Data Grid, “What is the IPG?” October, 2002. [Online]. Available: http://www.ipg.nasa.gov/aboutipg/what.html. [15] K. Verstoep, “The Distributed ASCI Supercomputer 2 (DAS-2)”, May. 2000. [Online]. Available: http://www.cs.vu.nl/das2/. [16] I. Foster and C. Kesselman, “The Globus Project: a Status Report”, In Proc. of Seventh Heterogeneous Computing Workshop (HCW 98), IEEE Computer Society Press, March, 1998, pp 4-18. [17] H. Tangmunarunkit, S. Decker and C. Kesselman, “Ontology-based Resource Matching - The Grid meets the Semantic Web”, 1th Workshop on Semantics in Peer-to-Peer and Grid Computing (SemPGrid) at the Twelfth International World Wide Web Conference, Budapest, May 2003. [18] L. Pouchard, et.al. “An Ontology for Scientific Information in a Grid Environment: the Earth System Grid”. In Proc. of the 3th IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID’03), Japan, Tokyo, May, 2003, pp 626-632.
Proceedings of the 19th International Symposium on High Performance Computing Systems and Applications (HPCS’05) 1550-5243/05 $20.00 © 2005 IEEE