work, an international knowledge network in Science & Technology. In this paper, we ... Notes in Computer Science, 3257. Berlin: Springer, pp. ..... modeling tool [17], a tool that is integrated with the IRS-II infrastructure. Fig. 4. Overview of the CV .... http://informatik.uibk.ac.at/users/c70385/wese/wsmf.bis2002.pdf. (2002). 10.
A Framework to Improve Semantic Web Services Discovery and Integration in an E-Gov Knowledge Network1 Denilson Sell1, Liliana Cabral2, Alexandre Gonçalves1 Enrico Motta2 and Roberto Pacheco1 1 Grupo
Stela, Universidade Federal de Santa Catarina, Florianópolis, Brasil {denilson, alexl, pacheco}@stela.ufsc.br http://www.stela.ufsc.br 2 Knowledge Media Institute, The Open University, Milton Keynes, UK {l.s.cabral, e.motta}@open.ac.uk http://www.kmi.open.ac.uk
Abstract. One of the major challenges in Semantic Web Service (SWS) technology is the improvement of the services discovery process. This challenge is a critical issue to promote systems interoperability in the context of Scienti Network, an international knowledge network in Science & Technology. In this paper, we describe a framework under development to tackle this problem in the context of IRS-II SWS infrastructure. This framework (SeGOV), comprehends a set of ontologies to describe services in functional layers in order to improve the services descriptions and allow their discovery. We illustrate how this framework was applied to enable the discovery and interoperability of SWS in the Scienti Network and discuss some issues and benefits in the application of IRSII and SeGOV to promote SWS interoperability.
1 Introduction E-government projects are concerned with getting information systems geared to interoperability in order to exchange data and capabilities between governmental agencies and support citizens and stakeholders needs. In order to support the interoperability, it becomes necessary to tackle the heterogeneous services integration challenge. Service integration is one of the major issues in the context of the Scienti Network, an international network of information sources and knowledge for the Science & Technology (S&T) management. The Scienti Network is formed by countries in Latin 1
Sell, Denilson; Cabral, Liliana; Gonçalves, Alexandre; Motta, Enrico and Pacheco, Roberto (2004). A framework to improve semantic web services discovery and integration in an e-gov knowledge network. In: Motta, Enrico; Shadbolt, Nigel; Stutt, Arthur and Gibbins, Nick eds. Engineering Knowledge in the Age of the SemanticWeb. Lecture Notes in Computer Science, 3257. Berlin: Springer, pp. 509–510.
America, the Caribbean and Portugal [1]. Each country belonging to this network maintains a set of information repositories, information systems, web portals, knowledge systems and other information technology components developed to support S&T management. The main goal of Scienti Network is to coordinate the integration of such resources in order to promote the development of research and studies on international scientific and technological activities [1] [2]. A considerable advance in services integration was obtained with the Web Services advent. This technology is based on XML standards for services descriptions (WSDL [3]), messages protocol (SOAP [4]) and services registration (UDDI [5]). However, it is hard to identify the functionalities of a Web Service just by using a WSDL description, because this description is concerned only with the identification of the input and output data types and in its grounding details. Recently, researchers have been describing ways to enable the discovery of Web Services trough ontologies in order to accomplish both objectives of the Semantic Web (data integration) and of Web Services (application integration). This research area - Semantic Web Services (SWS) - aims to a web service being located through its semantic descriptions [6] [7] [8]. There are several efforts under way to define SWS implementation approaches such as WSMF [9], DAML-S [7] and IRS-II [10]. According to Cabral et al. [11], IRS-II is one of the most comprehensive approach, having a complete infrastructure to describe, deploy, discover, compose and invoke SWS. Like the other approaches, IRS-II defines a set of ontologies to describe the pre-conditions, post-conditions and goals of SWS, and bases the SWS discovery in a process of matching these descriptions. In addition, IRS-II has the advantage of separating the goal description (task) from the implementation description (PSM) of a service, towards to the creation of a knowledge-model [12] [13]. However, there are some issues in IRS-II (as in the other SWS infrastructures) that must be tackled in order to improve the discovery of SWS and solve interoperability problems in the context of Scienti Network, as follows: The discovery process of SWS is based only in their inputs, outputs, goals and conditions. These descriptions are not sufficient to identify a SWS in a complex environment such as in Scienti Network, where several services are related to similar domain concepts; End-users cannot comprehend the description of a SWS, thus turning hard to them create their own composition of services or select a relevant service related to their needs. In this way, end-users are still dependent of the intervention of developers to map and compose services, turning the system reactive and not pro-active; Finally, developers of Scienti Network would have problems in identifying a specific service in a huge collection of SWS, turning it hard to reuse code. We argue that the issues listed above are due to the way that SWS are currently described. As their description just identify superficially their capabilities and effects over domain concepts, it is difficult to discover a particular SWS (despite automatically or manually). This issue is particular important in a typical e-Gov project such as
Scienti Network, where there are several services related to the same domain concept but doing completely different things. The current SWS description approaches are based only in inputs, outputs and goals. They do not allow the dynamic discover and presentation of a SWS in the context of the interactions of an e-Gov end user, due to the lack of contextualization of the role of this SWS in the processes implemented by the e-Gov infrastructure. In the same way, this issue can turn it hard a developer to recognize a service and re-use it. In this paper we propose a framework (the Semantic e-Gov Framework – SeGOV) to expand the IRS-II task and PSM descriptions in the context of Scienti Network. This framework is based on the organization of services of the Scienti architecture [1] and foresees the distribution of the services in six layers, each layer containing a set of ontologies created to contextualize the SWS’ role in Scienti processes. The main contributions described in this paper are: We describe how the enrichment of the SWS descriptions can improve the discovery process; We propose that SWS discovery process should be based in matching of the description of the relation of SWS capabilities to one of the three SeGOV service layers: Transactional Layer, Presentation Layer and Knowledge Layer. Each layer is described by an ontology that comprehends the main operations found in Scienti Network; We propose that resources used by SWS should be described and linked to the above capability description in order to improve the discovery and composition processes in IRS-II. We have been describing these resources in three other layers of SeGOV, the Domain Concepts Layer, the Data Sources Layer and the Information Unit Layer. These three layers comprehends, respectively, the description of domain concepts, representation format of data in databases and the syntax description of the main information units correspondent to each SWS described in the service layers; We describe how SeGOV are extending IRS-II in order to promote interoperability in Scienti Network between the systems of one particular agency (the vertical bus) and between systems of two or more agencies (the horizontal bus). We describe the SeGOV framework in the next section. In Section 3, we describe the implementation of a prototype using the proposed framework. In Section 4, we review related work and give concluding remarks.
2 The Semantic E-Gov Framework - SeGOV SeGOV is a SWS description framework designed to improve SWS descriptions in the IRS-II infrastructure in order to enable services discovery and composition in the context of Scienti Network. This framework comprehends a set of ontologies that are distributed in a pyramidal structure, where each layer of the pyramid corresponds to one ontology. The Fig. 1 depicts the organization of SeGOV layers. The definition of these layers was based on the layers of the Scienti Network architecture [1].
Fig. 1. SeGOV Ontological Layers SeGOV comprehends six ontologies to describe SWS capabilities and the resources manipulated by these services. In SeGOV, SWS are classified according to its capability in one of the three service layers: Transactional Layer, Presentation Layer and Knowledge Layer. Each layer is described by a specific ontology in IRS-II comprehending the main operations found in Scienti Network. After classified, the service is linked with the concepts described in the corresponding service layer. In addition, SeGOV comprehends three context layers, the Domain Concepts layer, the Data Sources Layer and the Information Unit Layer. These layers comprehend, respectively, the description of domain concepts, the organization of data in databases and the syntactic description of the main information units related to Scienti Network. The context layers were designed to extend the capability description provided in the services layers with the resources used by SWS in its execution. The service and context layers of SeGOV can be briefly described as follows: Information Units Layer: this layer corresponds to the syntactic representation (XML Schemas and XML documents) of the information units of Scienti Network, such as Researcher CV, Research Group, S&T Institution, Research Project and Financial Support. These XML representations are used to guide the representation of the domain concepts and in exchange of data between SWS; Data Sources Layer: this layer describes the database schemas where each information unit is maintained. It also describes the data mart schemas detailing how the data is summarized and aggregated in dimensional data cubes; Domain Concepts Layer: this layer maintains the ontologies that describe all the information units that are represented in the Information Unit Layer. In addition, this layer maintain ontologies related to other concepts that are not represented as an information unit but that are useful to the services provided in Scienti Network. For example, we have represented concepts related to health research in order to support semantic search of
researcher CV over the Scienti databases (this example will be detailed in the prototype description section); Transactional Layer: this layer contains the ontologies that describe the services related to treatment and storage of data captured in the operational process of S&T agencies. Also, the ontologies that describe data warehousing processes are included in this layer. Each piece of software that is potentially useful for the end users or for other agencies is described by ontologies where the lower level of the description corresponds to a task and related resolution method (PSM) [13]. The ontologies include a complete description of the capability following hierarchical descriptions according to software engineer ontologies. Also, each service is linked to the related domain concepts and data sources, in order to provide to machine and end user a complete traceability of the services; Presentation Layer: it comprehends the ontologies that describe the information presentation instruments, such as portal functionalities and decision support systems. The way in that presentation services are described in this layer follows the way introduced in Transactional Layer, except that they follow a ontology related to functionalities related only to data presentation in Scienti Network; Knowledge Layer: this layer comprehends the ontologies that describe the instruments designed to extract knowledge from the data represented in the Data Source Layer and concepts represented in the Domain Concepts Layer. The ontologies involve the description of knowledge discovery algorithms and their relation to the other layers. The division of services, data and domain concepts in SeGOV layers enables the classification of the services of Scienti Network. All the layers are designed to be connected with each other, in order to help in the SWS discovery and composition. Also, this division allows to one layer be based and extend services described in layers that are bellow it. The ontologies maintained in each layer of the framework are based on upper level ontologies. These ontologies describe main concepts associated with each layer. Each SWS is related to at least one upper-level ontology and can be associated with many others in the same layer or in other layers. The upper level ontologies are extended until the description of a particular task and its relation with the domain concepts and the data. All SWS are located at the lowest level of their ontological hierarchy and described using the task-PSM ontology of IRS-II. These ontologies aim to enable the SWS discovery and interoperability between SWS inside the agencies (the vertical bus) and between SWS spread in the agencies (the horizontal bus) in the context of Scienti Network. The relations between ontologies allow the definition of SWS composition and discovery, and enable the invocation of SWS spread in the services layers, forming in this way a vertical bus of services. The vertical bus means that services can be connected inside a particular platform, independently of their programming language, architecture or infrastructure. The vertical bus is enabled by the combination of IRS-II
and the ontologies defined in SeGOV. Fig. 2 illustrates possible relations between SeGOV layers in the description of a specific SWS.
Fig. 2. Example of a SWS description in SeGOV layers
This framework also aims to enable the interoperability between agencies, through the definition of a horizontal bus. In this case, the upper ontologies will be shared among distributed agencies (pyramids) in order to enable the discovery of services. This approach will be adopted in the context of Scienti Network. However, in some cases, the standardization of upper ontologies will be difficult, as when some external resource should be mapped as a service in Scienti Network. In this case, another class of ontologies will be necessary, the mediators [10]. Mediators are mappings that describe the relation between concepts in two different ontologies. The mediators can be used to map the relations between service providers in order to describe the relations between their service ontologies, domain concepts or data description. The definition of common upper ontologies or the definition of mediators allow to the constitution of the horizontal bus, as illustrated in Fig. 3. In the following section, we describe the implementation of a prototype using SeGOV descriptions.
Fig. 3. Integration between players of Scienti Network in the Horizontal Bus
3 Prototype Scenario: Distributed CV Search We will illustrate how SeGOV was applied in a prototype application that comprehends the discovery and cooperation of distributed SWS in Scienti Network. The goal of this prototype is to demonstrate how our framework improved the SWS discovery in Scienti Network by relying on their semantic descriptions. Additionally, we illustrate how both the vertical and horizontal bus was enabled and allowed the semantic interoperability between Scienti services. As illustrated in Fig. 4, we have defined a prototype scenario for the semantic search of researchers’ CV over databases of Science and Technology (S&T) management agencies that integrate the Scienti Network. The prototype will allow users to inform some search arguments and process the search in the Scienti Network databases of two countries (Brazil and Colombia) in order to find CVs related to the search arguments (i.e. researchers that have papers, formations or others related to the search arguments). This type of search is an important issue in the context of Scienti Network, where by knowing who are researching in a specific area, it is possible, for example, to promote the definition of international research networks such as the network built in the context of the Genoma project [14]. We have defined the domain concepts describing the CV ontology according to SeGOV. In order to simplify the prototype development, we have defined a standard CV ontology for Colombian and Brazilian CVs, which are both defined in the LMPL XML Schema [15]. This XML Schema represents the description related to the Information Unit Layer. Due to the standardization of the domain ontology, it was not needed to specify mediators. To finish the definition of the Domain Concept Layer of SeGOV, we have defined the health research ontology in order to extend the CV ontology with external concepts and allow the development of the semantic search of CVs related to health research terms. The health research ontology is a vocabulary
based on DECS health dictionary [16]. This vocabulary maintains descriptions of 26,000 health terms in three languages (English, Portuguese and Spanish) and has been enriched to represent complex relationships and axioms to improve searches and other services. The ontologies have been described with the WebOnto knowledge modeling tool [17], a tool that is integrated with the IRS-II infrastructure.
Fig. 4. Overview of the CV search scenario
The SWS defined for this scenario are: A SWS (Search_Ontology) that search for synonyms to the terms entered by the user in the domain concepts ontology of our framework in order to extend the search results. This SWS was implemented in Java A SWS that search for the terms informed by the user and related synonyms in the Brazilian CV repository (the Find_Brazilian_CV SWS) and another that search in the Colombian CV repository (the Find_Colombian_CV SWS). Both are Java Web Services. The Search_Ontology SWS was classified as a service of the Transactional Layer because it is an operational task used only to support other tasks, and the remaining SWS were classified as services in the Presentation Layer, because they are search functionalities that can be linked by end users in the Scienti Network portal. All the above services were modeled as Problem Solving Methods (PSM) [13], connected to task descriptions and linked to the respective layer ontologies to allow their discovery and invocation in IRS-II. Fig. 2 depicted the distribution of part of the services ontologies. A brief description of the SWS and related tasks and PSMs are presented in the following tables.
Table 1. Description of the Search_Ontology SWS
PSM
Input
Task Search_Ontolo gy_Task Output
Term
Synonym
Search_Ontology_P SM
Language
Java Method
Java
Find_Synonym
Pre-conditions
Post-conditions
Know (Term) know (Synonym)
Know (Synonym)
Table 2. Description of the Find_Brazilian_CV SWS
PSM Find_Brazilian_CV _PSM
Input
Term Synonym
Task Find_Brazilian _CV_Task Output CV
Language
Java Method
Java
Find_CV_By_Term
Pre-conditions
Post-conditions Know (Curriculum)
Know (Term) know (Synonym) know (Curriculum)
[Country(Brazil) Country(Any)]
Table 3. Description of the Find_Colombian_CV SWS
PSM Find_Colombian_C V_PSM
Input
Term Synonym
Task Find_Colombi an_CV_Task Output CV
Language
Java Method
Java
Find_CV
Pre-conditions
Post-conditions Know (Curriculum)
Know (Term) know (Synonym) know (Curriculum)
[Country(Colombia) Country(Any)]
To limit the scope of the prototype, we have implemented the searches on samples of 20,000 Brazilian and Colombian CVs, and have deployed the prototype in a local network, but simulating two complete infrastructures in order to represent the two countries platforms. We have developed a Java class (Search_Scienti_CV) that describes the itinerary of the SWS execution. The execution flow of this scenario obeys the following order, where the words in italic represent an instance of some ontology defined in the domain layer or in the service layers: 1. The user access the Search_Scienti_CV class in the Brazilian S&T platform, enter some search terms and define that he wants to search for Brazilian and Colombian researchers’ CVs; 2. The class invokes the Brazilian instance of IRS-II Server asking for one SWS that needs to be a kind of search_concept service and that have as input and outputs term instances. IRS-II finds and invokes the
3.
4.
Search_Ontology_Task due to the match of its inputs, outputs and because this service is a subclass of Search_Concept description in the Transactional Layer. Search_Ontology_SWS is invoked, due to the link between Search_Ontology_PSM and Search_Ontology_Task. Search_Ontology_SWS looks for synonyms to the terms informed by the end user in the domain ontologies described in the Domain Concept Layer; Having the term informed by the user and respective synonyms found by Search_Ontology_Task, the Search_Scienti_CV class invokes again the Brazilian IRS-II Server asking for services that need to be a kind of Search service and that have as goal CV instances. Also, this SWS must has as inputs at least one instance of country (in this case Brazil and Colombia were selected) and a collection of instances of search terms. Automatically, IRS-II discovery and invokes two SWS – Find_Brazilian_CV_Task and Find_Colombian_CV_Task – due to their pre-conditions, post-conditions and service description. The Find_Colombian_CV_Task represents a SWS that is not implemented locally, but to the end user and IRS-II this is transparent; The Search_Scienti_CV class describes the results found by the two above SWS to the end user.
The key point to the discovery of SWS in this prototype was the way in that services were described in IRS-II. The three SWS were described as tasks and PSM, and linked to ontologies of SeGOV layers. The IRS-II Server automatically discovered the Find_Brazilian_CV and Find_Colombian_CV tasks due to their descriptions. In this way, a new CV search service of another country can be aggregated in Scienti Network and automatically be located and used by Search_Scienti_CV class. This issue is critical in the context of Scienti Network, because other countries are going to join the network. Without the service descriptions provided by SeGOV, IRS-II was not able to found the correct SWS to process the search due to the existence of other SWS with the same input (Term) and output (CV). In the Fig. 5, we illustrate the similarities between Find_Brazilian_CV_Task and Rank_Adv_Task. Rank_Adv_Task describes a SWS that ranks the top ten CV related to a given term in order to select advisors to judge research projects. As described in the scenario, SeGOV and IRS-II enable us to implement a prototype that involved the integration of services spread in the Brazilian and Colombian S&T management infrastructure and classified in different layers of SeGOV, forming in this way the vertical bus between Presentation Layer, Transactional Layer and Domain Concept Layer. Also, we were allowed to integrate services in two different infrastructure, forming in this way a horizontal bus between the Presentation Layer of Brazilian and Colombian infrastructures.
Fig. 5. Distinguishing similar SWS through their services ontologies.
4 Related Work and Conclusions SeGOV framework was originally conceived to improve the SWS discovery in IRS-II infrastructure in order to improve the integration of services in Scienti Network. There are some proposals to improve SWS discovery based on the application of ontologies to enrich the SWS description. However, these proposals are basing the discovery mainly in the inputs and outputs of a SWS, but with different algorithms and representation languages. Paolucci et. al. [8], for example, bases his discovery algorithm in DAML-S descriptions of inputs and outputs of SWS. Mandell and McIlraith [18], extend BPEL4WS with similar DAML-S descriptions. Borenstein [19] uses RDF to extend WSDL using the same descriptions. Benatallah et al. [21] describes an approach based on combination of Web Services based in their inputs and outputs through a best covering approach. Our framework differs from other proposals in the way that we define the description of the capabilities of a service contextualizing it in a functional layer of services. The descriptions are extended to represent also the relationships between a service and
domain concepts and related information units and database structures. The advantage of the combination of SeGOV with IRS-II over the other SWS discovery approaches is that it enables the discovery of one SWS even whether this SWS has the same inputs and outputs of other SWS. In a complex e-Gov infrastructure (such as Scienti Network) this issue is critical. We have illustrated through the implementation of a prototype how SeGOV turned it possible to IRS-II discover the appropriated SWS to process a semantic search. We are planning to use SeGOV and IRS-II to improve the semantic interoperability between agencies (through the horizontal bus) and inside their platforms (through the vertical bus) in a high-scale in the context of Scienti Network. We have been working in the specification of the layers in order to enable the connection of several services and resources of Scienti Network. This under-going work aims to improve not only the integration of systems through SWS discovery and composition, but also improve the capabilities of the existent services through the ontologies described in the SeGOV layers. In addition to the improvement of the SWS discovery, the description of the services in SeGOV will allow to developers and end users browse the described services through their semantic descriptions. In the point of view of developers, this can means the avoidance of reinvention of the wheel, due to the possibility of checking for existent code that can be reused. In the point of view of end users, this can means a new way of interacting with Scienti services, where the users will be able to compose their own services and finding the services through the navigation in the capabilities exposed as SWS. These issues will be comprehended by our future developments. Future work in the context of SeGOV also comprehends the improvement of the ontologies descriptions and exploit of the IRS-II mediators to improve the matching between heterogeneous ontologies.
Acknowledgements This research is supported in part by CNPq, Brazil, in the form of a scholarship held by Mr. Sell and by Mr. Gonçalves to carry out their doctorate. We sincerely thank Dr. Enrico Motta for allowing Mr. Sell and Mr. Gonçalves conduct their research in the Knowledge Media Institute.
References 1. 2. 3. 4.
Pacheco, R. C. S.: Rede SCienti. In: VI Congreso Regional de Información in Ciencias de la Salud. Puebla. (2003) De Los Ríos, R. E Santana, P. H. A.: El Espacio Virtual de Intercambio de Información sobre Recursos Humanos en Ciencia y Tecnología de América Latina y el Caribe - Del CV Lattes al CvLAC. In: Ciência da Informação. Vol. 30, (2001) 42-47 W3C: Web Services Description Language (WSDL). Note 15. Available online at: http://www.w3.org/TR/wsdl (2001) W3C: SOAP Especifications. Available online at http://www.w3.org/TR/soap. (2000)
5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20.
OASIS: UDDI Specification. Available online at http://www.uddi.org/specification.html (2001) McIlraith, S., Son, T. C., and Zeng, H.: Semantic Web Services. IEEE Intelligent Systems, Mar/Apr. (2001) 46-53 DAML Services Coalition: DAML-S 0.7 Draft Release. Available online at http://www.daml.org/services/daml-s/0.7/. (2002) Paolucci M., Sycara K., Kawamura T.: Delivering Semantic Web Services. Tech. report CMU-RI-TR-02-32, Robotics Institute, Carnegie Mellon University. (2003). Fensel, D., Bussler, C.: The Web Service Modeling Framework WSMF. Available at http://informatik.uibk.ac.at/users/c70385/wese/wsmf.bis2002.pdf. (2002). Motta E., Domingue J., Cabral L., Gaspari M.: IRS-II: A Framework and Infrastructure for Semantic Web Services. In: Proceedings of 2nd. International Semantic Web Conference (ISWC2003), Florida, USA. (2003). Cabral L., Domingue J., Motta E., Payne T., Hakimpour F.: Approaches to Semantic Web Services: An Overview and Comparisons. To be published in Proceedings of 1st. European Semantic Web Symposium (ESWS 2004). Heraklion, Greece. (2004) Motta E.: Reusable Components for Knowledge Modelling. IOS Press, Amsterdam, The Netherlands. (1999) Fensel, D. and Motta, E. Structured Development of Problem Solving Methods. IEEE Transactions on Knowledge and Data Engineering, 13(6) (2001) 913-932. National Human Research Institute: Genoma Project. Available at http://www.nhgri.nih.gov/. (2004) Comunidade LMPL: CURRICULO-VITAE. Available at http://lattes.cnpq.br/lmpl/. (2002). BIBLIOTECA VIRTUAL EN LA SALUD (BVS): DeCS - Descriptores en Ciencias de la Salud. Available at http://decs.bvs.br/E/homepagee.htm. (2002) Domingue, J.: Tadzebao and WebOnto: Discussing, Browsing, and Editing Ontologies on the Web. 11th Knowledge Acquisition for Knowledge-Based Systems Workshop, April 18th-23rd. Banff, Canada. (1998) Mandell D. J., McIlraith S. A.: Adapting BPEL4WS for the Semantic Web: The BottomUp Approach to Web Service Interoperation. In: Proceedings of the Second International Se-mantic Web Conference (ISWC2003), Florida. (2003). Borenstein J., Fox J.: Semantic Discovery for Web Services: a Step Toward Fulfillment of the Vision. Available at http://www.sys-con.com/webservices/articleprint.cfm?id=507. (2003) Benatallah B., Hacid M., Rey C. Toumani F.: Request Rewriting-Based Web Service Discovery. In: Proceedings of 2nd. International Semantic Web Conference (ISWC2003), Florida, USA. (2003).