EU-IST Project IST-2004-026460 TAO D1.2 SWS

0 downloads 0 Views 1MB Size Report
Apr 11, 2008 - entity'. 9 http://tomgruber.org/writing/cidoc-ontology-2003.pdf ... Are there some terminological resources (lexicons, taxonomies, thesaurus,.
D1.2 / SWS Bootstrapping Methodology

EU-IST Project IST-2004-026460 TAO TAO: Transitioning Applications to Ontologies

D1.2 SWS Bootstrapping Methodology Florence Amardeilh & Bernard Vatant (Mondeca) Nicholas Gibbins, Terry R. Payne, Ahmed Saleh & Hai H.Wang (University of Southampton) Abstract EU-IST Specific targeted research project (STREP) IST-2004-026460 TAO Deliverable D1.2 Second Draft (WP 1) The emergence of Web Services has been key in facilitating the construction of flexible and robust applications from distributed, heterogeneous components. This has resulted in a shift from traditional stovepipe applications to loosely federated, interacting services. Mechanisms to support the rapid composition of services rely on the specification of rich, formally specified Semantic Web Service Descriptions, which are dependent on the existence of shared ontologies. To facilitate the construction of Semantic Web Services, ontologies should be elicited or identified that represent both the domain of the application modelled, and the tasks that the services achieve. This document represents the early phases in the development of such a methodology by examining and presenting much of the literature relevant to the various stages of developing Semantic Web Services. These include the identification or design of ontologies, including methodologies for ontology design, ontology categories, and their types; as well as proposing an initial methodology which will form the basis of initial empirical work, and will provide a baseline for future evaluation.

Keyword list: Semantic Web Services, Ontologies, Methodologies

WP1 Semantic Web Services Bootstrapping Methodology Document ID: TAO/2008/D1.2/v2.0 Nature: Report Dissemination: PU Contractual date of delivery: 31/03/2008 Actual date of delivery: 11/04/2008 Reviewed By: Holger Lausen

1

D1.2 / SWS Bootstrapping Methodology

TAO Consortium This document is part of a research project partially funded by the IST Programme of the Commission of the European Communities as project number IST-2004-026460. Mondeca 3, cité Nollez 75018 Paris France Tel: +33 (0) 1 44 92 35 03 Fax: +33 (0) 1 44 92 02 59 Contact person: Jean Delahousse E-mail: [email protected]

University of Sheffield Department of Computer Science Regent Court, 211 Portobello St. Sheffield S1 4DP UK Tel: +44 114 222 1930 Fax: +44 114 222 1810 Contact person: Kalina Bontcheva E-mail: [email protected]

Sirma Group Corp., Ontotext Lab Office Express IT Centre, 5th Floor 135 Tsarigradsko Shosse Blvd. Sofia 1784 Bulgaria Tel: +359 2 9768 303 Fax: +359 2 9768 311 Contact person: Atanas Kiryakov E-mail: [email protected]

University of Southampton Southampton SO17 1BJ UK Tel: +44 23 8059 8343 Fax: +44 23 8059 2865 Contact person: Terry Payne E-mail: [email protected]

Atos Origin Sociedad Anonima Espanola Dept Research and Innovation Atos Origin Spain, C/Albarracin, 25, 28037 Madrid Spain Tel: +34 91 214 8835 Fax: +34 91 754 3252 Contact person: Alberto Capellini E-mail: [email protected]

Dassault Aviation SA DGT/DPR 78, quai Marcel Dassault 92552 Saint-Cloud Cedex 300 France Tel: +33 1 47 11 53 00 Fax: +33 1 47 11 53 65 Contact person: Farid Cerbah E-mail: [email protected]

Jozef Stefan Institute Department of Knowledge Technologies Jamova 39 1000 Ljubljana Slovenia Tel: +386 1 477 3778 Fax: +386 1 477 3131 Contact person: Marko Grobelnik E-mail: [email protected]

2

D1.2 / SWS Bootstrapping Methodology

Executive Summary The emergence of Web Services has been key in facilitating the construction of flexible and robust applications from distributed, heterogeneous components; thus establishing the Service oriented Computing Paradigm. This has resulted in a shift from traditional stove-pipe applications to loosely federated, interacting services that can span traditional organisational boundaries, and pool enterprise resources which may be distributed widely across different networks and locations. However, whilst there has been a keen interest and adoption of Web Services by Industry, this is largely because tools and development approaches have facilitated easier access and use of reusable Web Services for developers. In contrast, mechanisms to support the rapid composition of services through bespoke editors or at runtime rely on the specification of rich, formally specified Semantic Web Service Descriptions, which are dependent on the existence of shared ontologies. To facilitate the construction of Semantic Web Services, the services themselves should be described based on the functionality they provide, or when transitioning applications, the functionality they expose. In addition, ontologies should be elicited or identified that represent both the domain of the application modelled, and the tasks that the services achieve. To provide guidance (and tools) to facilitate the construction of Semantic Web Services, and support the discovery or inference of the necessary ontological knowledge, the TAO project is developing a methodology for the (semi-) automatic construction of Semantic Web Services descriptions resulting from the transitioning of applications. This document represents the early phases in the development of such a methodology by examining and presenting much of the literature relevant to the various stages of developing Semantic Web Services. These include the identification or design of ontologies, including methodologies for ontology design, ontology categories, and their types; as well as proposing an initial methodology which will form the basis of initial empirical work, and will provide a baseline for future evaluation. This deliverable is structured as follows: •

Section 2 introduces ontologies in much greater detail, by presenting a meronymic and categorical view of what an ontology is, before examining the ontology life cycle, in terms of design criteria, methodologies for ontology construction (both manual and automated approaches), and evaluation mechanisms.



Section 3 reviews the Semantic Web Service Lifecycle, and considers the requirements from each functional aspect of a service.



Section 4 presents an initial methodology, which will be used to bootstrap future development and empirical analysis of the TAO project.



Section 5 presents a cookbook style methodology, which is targeted on TAO scenario.

3

D1.2 / SWS Bootstrapping Methodology •

Finally, in Section 6 we conclude the report.

Terminology API DAML IT OWL RDBMS RDF SOA SOAP SPA SSOA SW SWS TAO W3C WS WSDL WSDL-S WSML WSMO WSMX XML

Application Programming Interface DARPA Agent Markup Language Information Technology Ontology Web Language Relational Database Management System Resource Description Framework Service-Oriented Architecture Simple Object Access Protocol Service Provider Agent Semantic Service Oriented Architecture Semantic Web Semantic Web Services Transitioning Applications to Ontologies World Wide Web Consortium Web Services Web Services Description Language Web Service Semantics Web Service Modelling Language Web Services Modelling Ontology Web Service Modelling eXecution environment eXtensible Mark-up Language

4

D1.2 / SWS Bootstrapping Methodology

Contents TAO Consortium .........................................................................................................2 Executive Summary .....................................................................................................3 Terminology..................................................................................................................4 Contents ........................................................................................................................5 1 Introduction..........................................................................................................7 2 Ontologies .............................................................................................................9 2.1 What is an Ontology?.....................................................................................9 2.2 Ontology Components .................................................................................10 2.3 Types of Ontologies.....................................................................................10 2.4 The Ontology Design Lifecycle...................................................................12 2.5 Overview of Design Criteria for Building Domain Ontologies...................13 2.6 Overview of Methodologies for Building Ontologies .................................13 2.6.1 Building Ontologies from Scratch ..............................................................14 2.6.2 Learning Ontology from Text..............................................................16 2.7 Evaluating Ontologies..................................................................................19 2.8 General Guidelines for Ontology Modelling .....................................................21 2.8.1 Designing Ontologies as Engineering Systems ...................................21 2.8.2 Building Ontologies as Part of A Transition Process ..........................22 2.8.3 Specifying The Target System Functional Requirements ...................23 2.8.4 Identifying The Target System Technical Constraints ........................23 2.8.5 Putting The Business Objects at The Core of The Ontology...............24 2.8.6 Manage Both Business Terminology and Logic..................................24 2.8.7 Consider Data Throughout The Transition Process.............................24 2.9 Discussion ....................................................................................................25 3 Semantic Web Services Lifecycle .....................................................................25 3.1 Semantics and Ontologies............................................................................25 3.2 Semantics for Web services .........................................................................26 3.3 The SOA Design Lifecycle..........................................................................28 3.4 Phases of the Semantic Web service Lifecycle............................................29 3.4.1 Semantic Web service Creation ...........................................................29 3.4.2 Semantic Web service Annotation.......................................................30 3.4.3 Semantic Web service Advertisement .................................................31 3.4.4 Semantic Web service Discovery ........................................................32 3.4.5 Semantic Web service Selection..........................................................32 3.4.6 Semantic Web service Composition ....................................................33 3.4.7 Semantic Web service Execution.........................................................33 3.4.8 Semantic Web service Monitoring and QoS........................................34 4 Overview of TAO Transitioning Methodology ...............................................34 4.1 Methodology Overview ...............................................................................35 4.1.1 Service-Oriented Ontology Learning...................................................35 4.1.2 Semantic Service Annotation...............................................................36 4.1.3 Service-Driven Ontology Refinement .................................................36 4.2 Ontology Learning .......................................................................................36 4.2.1 Domain and Goals of the Ontology .....................................................37 4.2.2 Define Guidelines to Ensure Consistency............................................38 4.2.3 Identify Knowledge Sources................................................................39 4.2.4 Building the Ontology..........................................................................41 4.2.5 Ontology Formalization .......................................................................42

5

D1.2 / SWS Bootstrapping Methodology 4.2.6 Ontology Evaluation and Modification ...............................................44 4.2.7 Ontology Maintenance and Evolution .................................................44 5 TAO Transitioning Cookbook ..........................................................................46 5.1 Case Study – Amazon Associates Web service (A2S) ................................46 5.1.1 Amazon A2S Data model ....................................................................47 5.2 Transitioning Cookbook ..............................................................................48 5.2.1 Knowledge acquisition.........................................................................50 5.2.2 Ontology Learning ...............................................................................51 5.2.3 Service and content augmentation .......................................................57 6 Conclusion ..........................................................................................................62 7 Acknowledgments ..............................................................................................62 Bibliography and references .....................................................................................63

6

D1.2 / SWS Bootstrapping Methodology

1

Introduction

The emergence of Web Services has been key in facilitating the construction of flexible and robust applications from distributed, heterogeneous components; thus establishing the Service oriented Computing Paradigm. This has resulted in a shift from traditional stove-pipe applications to loosely federated, interacting services that can span traditional organisational boundaries, and pool enterprise resources which may be distributed widely across different networks and locations. The rise in adoption of Web Services has mainly been due to the near ubiquitous World-WideWeb infrastructure, cross-platform interoperability, and that it is built upon de-facto Web standards for syntax, addressing, and communication protocols. The success of this novel computing paradigm has been greatly enhanced by the definition of standard markup languages, and in particular, of XML as a de-facto syntactic layer on top of the existing web transport layer (consisting of web protocols such as http, ftp, etc). XML has enabled the possibility of interoperation of applications transcending organisational boundaries by providing an agreed, universal syntax. In fact, XML has been used as a representational basis for expressing enterprise standards that cover the whole lifecycle of services, from the description of services through the specification of their interfaces, to the definition of workflows. However, whilst these approaches have facilitated easier access and use of reusable Web Services for developers; mechanisms to support the rapid composition of services through bespoke editors or at runtime rely on the specification of rich, formally specified Semantic Web Service Descriptions, which are predicated on the existence of shared ontologies. Semantic Web Services provide a declarative, ontological framework for describing services, messages, and concepts in a machine-readable format that can also facilitate logical reasoning. Thus, service descriptions can be interpreted based on their meanings, rather than simply a symbolic representation. Provided that there is support for reasoning over a semantic web service description (i.e. the ontologies used to ground the service concepts are identified, or if multiple ontologies are involved, then alignments between ontologies exist that facilitate the transformation of concepts from one ontology to the other), workflows and service compositions can be constructed based the semantic similarity of the concepts used. To facilitate the construction of Semantic Web Services, the services themselves should be described based on the functionality they provide, or when transitioning applications, the functionality they expose. In addition, ontologies should be elicited or identified that represent both the domain of the application modelled, and the tasks that the services achieve. However, this raises several important questions: how are 3rd party ontologies identified to mark up services; if such ontologies do not exist, how are they created; what tools should be used and in what order; what are the criteria that should be used when inferring these new ontologies, and how are such ontologies evaluated. Often, when developing new ontologies, the developer has to decide whether to model the ontology first for a new domain based on the epistemic view of the application, or fit the development of the application based on ontological models used by other service providers. To address these questions, and to provide guidance (and tools) to facilitate the construction of Semantic Web Services, and support the discovery or inference of the necessary ontologies, the TAO project will 7

D1.2 / SWS Bootstrapping Methodology develop a methodology for the (semi-) automatic construction of Semantic Web Services descriptions resulting from the transitioning of applications. The Semantic Web has built upon various notions underlying knowledge-based systems: namely the use of a modeling paradigm to model a domain; a representation language to support the sharing of facts (i.e. instances) and models (i.e. ontologies) between agents and applications, and reasoning mechanisms to facilitate the inference (and subsequent use) of any facts entailed by the model. Ontologies are an explicit, formal specification of a shared conceptualisation of a domain: they provide a machine readable, and agreed upon representation of an abstraction of some phenomenon. This typically involves defining the concepts within the domain being modelled, their properties and relationships with other concepts. In some cases, ontologies may be complemented by axioms, statements that are always true and that are used to constrain the meaning of concept definitions in the ontologies. Often the declarative definitions are not sufficient to constrain completely the meaning of concepts and to capture the “procedural” or decision making aspects of the application business logic. Therefore ontologies may need to be complemented by rules, grounded in ontological definitions, to facilitate enhanced representation and reasoning capabilities. Determining the correct granularity of concepts within an ontology or model is important as it not only affects the type of knowledge that can be represented, but also the type of knowledge that the model can entail (and hence the queries the model can answer). Whilst various methodologies exist to support the disciplined modelling of a domain, they typically assume that the model is defined with a specific task in mind, independently of the representation language used. Although suitable for more traditional, centralised systems designed to answer problems for a given domain, it raises challenges when modelling knowledge within an open environment where the consuming services (and their respective tasks) are largely unknown. Likewise, any assumptions about the use of universally shared ontologies that prescribe a given knowledge granularity become invalid. Thus, the Semantic Web relaxes such constraints by allowing different ontology users to extend existing ontologies to suit their own tasks and contexts. This document represents the early phases in the development of such a methodology by examining and presenting much of the literature relevant to the various stages of developing Semantic Web Services. These include the identification or design of ontologies, including methodologies for ontology design, ontology categories, and their types; as well as proposing an initial methodology which will form the basis of initial empirical work, and will provide a baseline for future evaluation. The report is structured as follows: Section 2 introduces ontologies in much greater detail, by presenting a meronymic and categorical view of what an ontology is, before examining the ontology life cycle, in terms of design criteria, methodologies for ontology construction (both manual and automated approaches), and evaluation mechanisms. We then review the Semantic Web Service Lifecycle in Section 3, and consider the requirements from each functional aspect of a service, before presenting an initial methodology in Section 4, which will be used to bootstrap future development and empirical analysis of the TAO project. Section 5 presents a

8

D1.2 / SWS Bootstrapping Methodology cookbook style guidelines on the how to adopt the methodology using TAO tools. Finally, in Section 6 we conclude the report. Whilst there may be some overlap of material presented in other TAO deliverables (including D1.1 and D5.1), this repetition has bee preserved to facilitate the use of these documents in isolation to their related counterparts.

2

Ontologies

The need to share diverse knowledge and/or information with other applications already built has given rise to a growing interest in research on ontology. This term has been originally used in Philosophy where it indicated the systematic explanation of Existence. More recently, the term has been used in various areas in Artificial Intelligence (AI) and more widely in Computer Science, such as knowledge engineering, knowledge representation, qualitative modeling, database design, language engineering, information integration, information retrieval and extraction, knowledge management and organization, agent-based system design (Guarino, 1998) and e-commerce. From a Knowledge Engineering perspective, ontologies are domain theories that specify a domain-specific vocabulary of entities, classes, properties, predicates, and functions as a set of relationships that exist among those vocabulary terms. Through the representation of domain-specific knowledge, ontologies provide a way of sharing and reusing knowledge among people and heterogeneous applications systems. In this context, domain and generic ontologies can be shared, reused, and integrated in the analysis and design stages of information and knowledge systems. 2.1

What is an Ontology?

There are different definitions in the computer science domain of what an ontology should be. Perhaps the most referenced one was published by Gruber (Gruber 1993): “…An ontology is an explicit specification of a conceptualization…” and later modified by (Borst 1997) to: “…An ontology is a formal specification of a shared conceptualization…”. A conceptualization refers to an abstract model of some phenomenon in the world by identifying the relevant concept of that phenomenon. Explicit means that the types of concepts used and the constraints on their use are explicitly defined. Formal refers to the fact that the ontology should be machine-readable. Shared reflects the notion that an ontology captures consensual knowledge, that is, it is not private to some individual, but accepted as a group. Other definitions emerge according to how ontologies are built and used. For example, ontologies can be distinguished to top-down or bottom-up approaches depending on what was available first the ontology or the knowledge base. Swartout et. al. (Swartout, Patil et al. 1996) defined ontology as a “hierarchically structured set of terms for describing a domain that can be used as a skeletal foundation for a knowledge base”, which is based on the fact that he has already built Sensus (a 9

D1.2 / SWS Bootstrapping Methodology natural language based ontology with more than 70000 nodes) and he uses it as a basis for building domain specific ontologies by identifying the terms that are relevant to a particular domain and then refine the skeletal ontology using heuristic. The result of the refinement mechanism is the skeleton upon which the knowledge base is built. This is known as the top-down approach since the ontology was the starting point. The opposite approach was taken from the KACTUS project (Bernaras, Laresgoiti et al. 1996), where the ontology is built after a process of abstraction of the content already represented in a knowledge base. 2.2

Ontology Components

As mentioned in the previous section, ontologies provide a common vocabulary of an area and define the meaning of the terms and the relations between them. Knowledge in ontologies is mainly formalized using five kinds of components: classes, relations, functions, axioms and instances (Gruber 1993).

2.3



Classes in the ontology are usually organized in taxonomies. Classes (or concepts) are used in a broad sense; a concept can be anything about which something is said and, therefore, could also be the description of a task, function, action, strategy, reasoning process, etc.



Relations represent a type of interaction between concepts of the domain. Examples of relations include: subclass-of and inverse-property.



Functions are a special case of relations in which the n-the element of the relationship is unique for the n-1 preceding elements. Examples of functions are Mother-of and Price-of-a-used-car that calculates the price of a secondhand car depending on the car-model, manufacturing date and number of kilometres.



Axioms are used to model sentences that are always true. They can be included in an ontology for several purposes, such as defining the meaning of ontology components, defining complex constrains on the values of attributes, the arguments of relations, etc., verifying the correctness of the information specified in the ontology or deducing new information.



Instances are used to represent specific elements. Types of Ontologies

Once the main requirements for an ontology have been identified, the next step is to decide the approach used to model the ontology, and to define the type of the ontology that needs to be built. Uschold and Gruninger (Uschold and Grüninger 1996) distinguished four kind of ontologies depending on the kind of language used to implement them. They are: 1) Highly informal ontologies if they are written in natural language; 2) Semi-informal ontologies if they are expressed in a restricted and structured form of natural language (i.e., using patterns); 10

D1.2 / SWS Bootstrapping Methodology 3) Semi-formal ontologies, which are defined in an artificial and formally defined language; and 4) Rigorously formal ontologies if they are defined in a language with formal semantics, theorems and proofs of such properties as soundness and completeness. With the evolution of the Semantic Web, ontologies gained more attention and industries began to require different kind of ontologies and offer various commercial products to handle them. Quite naturally, with the growing interest on the Semantic Web, the number of ontologies for various domains has risen. As such, the demand for supporting issues like integration and re-use has increased. (Guarino 1998) decided to tackle this issue by designing a classification system that uses the subject of conceptualization (i.e. level of generality) as a main criterion for classifying ontologies. Thus, ontology types can be distinguished as follows: •

Top-level ontologies describe very general concepts like space, time, event, which are independent of a particular problem or domain. Such unified toplevel ontologies aim at serving large communities of users and applications. They facilitate the (semi-) automatic integration and combination of different ontologies that are mapped to the same top-level ontology. Recently, these kinds of ontologies have been also introduced under the name foundational ontologies.



Domain ontologies describe the vocabulary related to a specific domain (such as wines or cars), e.g. by specializing concepts introduced in a top-level ontology.



Task ontologies describe the vocabulary related to a generic task or activity (such as building or selling), e.g. by specializing concepts introduced in a toplevel ontology.



Application ontologies are the most specific ontologies. Concepts in application ontologies often correspond to roles played by domain entities while performing a certain activity, i.e. application ontologies are a specialization of domain and task ontologies. They form a base for implementing applications with a concrete domain and scope.

In a similar vein, van Heijst (Heijst, Schreiber et al. 1997) identified similar dimensions along which ontologies could be differentiated: (i) application ontologies, (ii) domain ontologies, (iii) generic ontologies and (iv) representation ontologies. Whilst there are strong similarities in the notions of application ontologies and domain ontologies, van Heijst (Heijst, Schreiber et al. 1997) include the notion of representation ontologies that describe the underlying knowledge representation formalisms, without making any claims about the world, or about domains themselves. The OWL and F-logic ontologies could be considered as representation ontologies.

11

D1.2 / SWS Bootstrapping Methodology 2.4

The Ontology Design Lifecycle

As the design lifecycle for a SOA system is largely independent of the particular methodology used, so the design lifecycle for a domain ontology can be separated from the specific knowledge acquisition and modelling methodology used. We present the following sketch of a typical design lifecycle for an ontology: Knowledge Acquisition

Ontology Learning

Design Ontology

Refine Ontology

Evaluate Ontology



Knowledge Acquisition: In this context, knowledge acquisition is used to refer to the traditional knowledge-engineering approach to ontology design, in which a KE specialist elicits domain-specific knowledge through a process of structured interviews or similar techniques. In this sense, knowledge acquisition is an intensive task that is highly demanding of both the knowledge engineer and the domain expert.



Ontology Learning: In contrast to the knowledge acquisition process, ontology learning refers to the use of techniques for automatically or semi-automatically extracting ontologies from existing document corpuses. As such, the output from an ontology learning process should not be considered as the finished product, but as a first cut that is solidly grounded in the available documentation, and which will inform the later design of a more polished ontology for production use.



Ontology Design: The ontology design process refers to the process of formally codifying the knowledge that has either been manually acquired from a domain expert, or (semi-automatically extracted from a document corpus. This process may also encompass the identification and reuse of appropriate components within pre-existing ontologies, the alignment of the designed ontology with pre-existing ontologies, or the modularisation of the ontology to facilitate such alignment in future.



Ontology Evaluation: The ontology evaluation process assesses the fitness for purpose of a designed ontology.

12

D1.2 / SWS Bootstrapping Methodology •

2.5

Ontology Refinement: This process refers to the refactoring of the designed ontology to better represent the problem domain. Together with the ontology evaluation process, it corresponds to the notion of ontology evolution described in Section 3.6. Overview of Design Criteria for Building Domain Ontologies

Here we summarise some design criteria and a set of principles defined by(Bernaras, Laresgoiti et al. 1996), (Gruber 1993) and (Swartout, Patil et al. 1996) that have been proved useful in the development of domain ontologies:

2.6



Clarity and Objectivity, which means that the ontology should provide the meaning of defined terms by providing objective definitions and also natural language documentation.



Completeness, which means that a definition expressed in terms of necessary and sufficient conditions is preferred over a partial definition (defined only through necessary or sufficient condition).



Coherence, to permit inferences that are consistent with the definitions.



Maximum monotonic extendibility, which means that new, general or specialised terms should be included in the ontology in a such way that is does not require the revision of existing definitions.



Minimal ontological commitments. Ontological commitments refer to the agreement to use the shared vocabulary in a coherent and consistent manner. They guarantee consistency, but not completeness of an ontology, which implies making as few claims as possible about the world being modelled, thus giving the parties committed to the ontology freedom to specialize and instantiate the ontology as required.



Ontological Distinction Principle, which means that classes in an ontology should be disjoint.



Diversification of hierarchies, to increase the power provided by multiple inheritance mechanisms.



Modularity, to minimize coupling between modules.



Minimization of the semantic distance between sibling concepts, which means that similar concepts are grouped and represented using the same primitives.



Standardization of names whenever is possible. Overview of Methodologies for Building Ontologies

In this section, we consider existing methodologies for designing and building ontologies, both from an engineering and an automated perspective. A methodology is

13

D1.2 / SWS Bootstrapping Methodology a “comprehensive, integrated series of techniques or methods creating a general systems theory of how a class of thought-intensive work ought to be performed” (Electrical and Electronics 1990). Whilst several methodologies exist, there is no widely accepted methodology for ontology development, and the uptake of principled approaches has been slow. We present a selection of current design methodologies below. 2.6.1 Building Ontologies from Scratch Comprehensive surveys of the existing methodologies are provided in (Jones, BenchCapon et al.; Mariano 1999). Fernandez et. al in (Mariano, Asun et al. 2002) have identified seven ontology design methodologies that have been proposed to date, that aims to guide the process of developing ontologies from scratch; we summarise these below. Uschold and King’s Method: This method (Uschold 1995) is based on the experiences of building the Enterprise Ontology1, an ontology for enterprise modelling processes. The method provides guidelines for ontology development within the phases of: purpose identification (intended use of the ontology), ontology capture (identifying concepts and relations), coding (explicitly representing the knowledge captured in the previous stage), integrating existing ontologies and evaluation. Cyc Method: The Cyc method (Douglas and Guha 1989) has been defined from the experiences gathered in building the Cyc Ontology2. This specifies the phases of manually extracting “common sense knowledge” from knowledge sources and codifying the knowledge aided by tools. However it fails to specify the processes of requirements identification or concept design. So far this method has only been used to develop Cyc knowledge base since the processes defined are quite specific to the building of the Cyc ontology (Mariano, Asun et al. 2002). Gruninger and Fox’s Methodology: This methodology (Gruninger and Fox 1995) is based on the development experiences of the TOVE project ontology3 within the domain of business processes and involves building a logical model of the knowledge to be specified in the form of an ontology. This follows a stage-based approach where an informal model of the knowledge specification is made first and the formalized at a later stage. The steps proposed are: 1) Capture of motivating scenarios (example problems which are not adequately addressed by the existing ontologies); 2) Formulation of informal competency questions (which are considered as expressiveness requirements that are in the form of questions);

1

http://www.aiai.ed.ac.uk/project/enterprise/enterprise/ontology.html

2

http://www.cyc.com/

3

http://www.eil.utoronto.ca/enterprise-modelling/tove/

14

D1.2 / SWS Bootstrapping Methodology 3) The specification of the terminology of the ontology within a formal language (which is done by first extracting the informal ontology from the informal competency questions and then representing them in a formal language); 4) formulation of the formal competency questions using the terminology of the ontology; 5) the formal specification of axioms and term definitions in the ontology, and 6) establishing the conditions for characterizing completeness of the ontology. Bernaras et. Al’s Approach: The approach proposed by Bernaras et. al. (Bernaras, Laresgoiti et al. 1996) is based on the process followed in the KACTUS4 project which involves knowledge reuse in complex technical systems. This approach is tightly coupled with the development process of the application for which the ontology is built. The development phases defined are: Specification of the application (which provides application context and the components the application needs to model), Preliminary design based on the top-level ontological categories (this involves searching ontologies developed for other applications, which are refined and extended for use in the new application) and Ontology refinement and structuring. In this approach, since the building of the ontology is based on the building of a particular application; the ontology developed will be quite specific to the application under concern and therefore can be viewed as an application-dependent strategy(Mariano 1999). Methontology: This development methodology (Fernandez, Gomez-Perez et al. 1997) consists of the following development phases: •

Specification (identifies the intended uses of the ontology);



Conceptualization (consists of identifying concepts and building a conceptual model);



Formalization and Implementation (which transforms the conceptual model into a formal model and representing this in a formal ontology language); and



Maintenance (consists of updates and corrections to the ontology when necessary).

The techniques and guidelines to be followed in each of the development activities are specified here in detail. This methodology also identifies project management activities (planning, control, quality assurance) and support activities (knowledge acquisition, integration, evaluation, documentation and configuration management). On-To-Knowledge: This methodology (Sure, Akkermans et al. 2003) was developed based on the On-To- Knowledge project5 which introduces and maintains ontology based knowledge management applications in enterprises. The stages specified in the ontology development process are: 4

http://hcs.science.uva.nl/projects/NewKACTUS/home.html

5

http://www.ontoknowledge.org/

15

D1.2 / SWS Bootstrapping Methodology

• •

Kick-off (which includes the capture of requirements and analysis of knowledge sources); Refinement (knowledge extraction and formalization);



Evaluation (which includes the technology focused and user focused evaluation of the ontology); and



Application and Evolution (applying the ontology for the intended use and maintenance).

Again this methodology provides detailed guidelines for the activities to be carried out in each stage and focuses on ontology use in industrial contexts. The SENSUS-based Method: This method (Swartout, Patil et al. 1996) is intended to be used when developing ontolgies to be linked to the SENSUS ontology6, which is an ontology for use in natural language processing to provide conceptual structure for machine translators. Hence this method cannot be used for ontology development in general. This involves a series of steps to be followed in ontology development which includes identifying seed terms in the domain to be modelled, manually linking them to the terms in the SENSUS ontology, and extracting the concepts in the path from the seed term to the root of the SENSUS ontology to be included in the new ontology. Jones et. al. (Jones, Bench-Capon et al.) and Fernandez et. al. (Mariano, Asun et al. 2002) have also identified methods and approaches that address specific aspects of ontology development such as; ONIONS for integrating heterogeneous sources of information and OntoClean (Guarino, Welty et al. 2004) that specifies formal guidelines for constructing taxonomical relations in ontologies. However these are not comprehensive enough to be classified as formal ontology design methods. 2.6.2

Learning Ontology from Text

Approaches that learn ontologies from textual sources make the assumption that ontologies are based on the definition of a structured and formalized set of concepts, which usually comes from text analysis, such as formal reports, and technical documentation. The theory of a domain can be found by abstracting concepts from terms used in such documents. The approach of learning ontologies from text is based on utilising linguistics mechanisms to analyse terms used to name concepts in texts to define domains from a conceptual point of view. Researchers in terminology have identified a link between terminology as a practical discipline and artificial intelligence, in particular knowledge engineering. From a knowledge engineering prospective, it is possible to elicit knowledge by using automatic processing tools, widely used in linguistics. Also, one can establish a synergy between research works in artificial intelligence and in linguistics, by means of terminology.

6

http://www.isi.edu/natural-language/projects/ONTOLOGIES.html

16

D1.2 / SWS Bootstrapping Methodology Natural language processing tools may help to support modelling from texts in two ways. First, they can help to find the terms of a domain (Bourigault 1995; R., P. et al. 1996). Existing terminologies or thesauri may be reused and increased or new ones may be created. Second, they can help to structure a terminological base by identifying relations between concepts (Jouis and Mustapha-Elhadi 1995; Garcia 1997). Three steps are necessary to find the terms of a domain. At the beginning, nominal groups are isolated from a corpus considered as being representative of the studied domain. Then, those that can’t be chosen as terms because of morphological or semantic characteristics are eliminated. Finally, the nominal sequences that will be retained as terms are chosen. Usually, this last step requires a human expertise. Identifying relations between concepts is composed of three steps too. The first one identifies the co-occurrences of terms. Two terms are co-occurent if they both appear in a given text window, which may be defined in several ways: a number of words, a documentary segmentation (entire document, section), a syntactic cutting of sentences, etc. The second step computes a similarity between terms with respect to ƒ contexts they share. Then, the third step can determine the terms that are semantically related. Some researchers have focussed on trying to benefit from approaches from both linguistics and knowledge engineering. They have studied mutual contributions, and their work has led them to elaborate the concept of Terminological Knowledge Base (TKB). A TKB is a computer structure that contains conceptual data, represented in a network of domain concepts, but also linguistic data on the terms used to name the concepts. Thus a TKB contains three levels of entities: term, concept and text. It is structured by using three kinds of links. Relations between term and concept allow synonymy and paronymy to be considered. Relations between concepts compose the network of domain concepts. Relations between term and/or concept and text allow normalization choices to be justified or knowledge base to be documented. Building a TKB is seen as an intermediate model that helps toward the construction of a formal ontology, especially because it gathers some linguistic information on terms used to name concepts on. Ontology learning from text is divided into 8 layers, namely Terms (extraction), (Multilingual) Synonyms, Concept Formation, Concept Hierarchy, Relations (extraction), Relation Hierarchy, Axiom Schemata, and General Axioms. Generally, not all these layers are considered whilst developing an ontology, Term extraction, Concept Formation (putting the terms or instances into concepts), and Relation Extraction are usually considered sufficient to build a decent domain ontology. During the last decade several ontology learning systems have been developed such as ASIUM (Faure, 1998 #57), OntoLearn (VELARDI, NAVIGLI et al. 2006), Text2Onto (Cimiano and Volker 2005), OntoGen (Fortuna, Grobelnik et al. 2007), and others. Most of these systems depend on linguistic analysis and machine learning algorithms to find potentially interesting concepts and relations between them. The main aim of ASIUM (Acquisition of Semantic Knowledge Using Machine Learning Methods) is to help experts in the acquisition of semantic knowledge and

17

D1.2 / SWS Bootstrapping Methodology taxonomic relations among terms, extracted from technical texts using syntactic analysis. ASIUM takes as input French tests in natural language and it associates a frequency of occurrence to each word in the text. The learning method is based on conceptual and hierarchical clustering. Basic clusters are formed by words that occur with the same verb after the same preposition, then words are associated with their frequency of appearance in the text to calculate the distance among concepts (those that appear in similar contexts are added by means of an algorithm of conceptual clustering to form the concepts of the ontology). ASIUM uses a metric to compute the semantic similarity between clusters; this metric is then used by the ontology expert to decide whether new concepts are created. Clusters are successively aggregated by the conceptual clustering method to form the concepts of the ontology. Thus the ontology expert defines a minimum threshold for gathering clusters into concepts, and then validates the final output. OntoLearn aims to extract relevant domain terms from a corpus of text, relate them to appropriate concepts in a general-purpose ontology, and detect relations among the concepts. To perform these tasks, natural language analysis and machine learning techniques are used. OntoLearn is based on extracting terminology from a corpus of a domain text such as specialized web sites, then filters the terms using natural language processing and statistical techniques that perform comparative analysis across different domains, or it contrasts corpora. This analysis identifies terminology that is used in the target domain but not seen in other domains. Next, it uses the WordNet lexical knowledge bases to perform semantic interpretation of the terms. OntoLearn then relates concepts according to taxonomic (kind-of) and other relations (extracted by WordNet and other rule-based inductive-learning methods), generating a domain concept forest. Finally, it integrates the domain concepts forest with the WordNet to create a refined and specialised view of the domain ontology. Text2Onto is an integrated environment for building ontologies from textual resources. Text2Onto is based on Text-To-Onto, which discovers conceptual structures from different text sources using knowledge acquisition and machine learning techniques. Text2Onto has implemented some techniques for ontology learning from free/semi-structured text and web documents. It introduced two new paradigms for ontology learning: (i) Probabilistic Ontology Models (POMs) which represent the results of the system by attaching a probability to them and (ii) datadriven change discovery which is responsible for detecting changes in the corpus, calculating POM deltas with respect to the changes and accordingly modifying the POM without recalculating it for the whole document collection. The result of the learning process is a domain ontology that contains domain-specific and domainindependent concepts, which are filtered later to adjust the vocabulary of the domain ontology. The whole process is supervised by an ontology expert who can repeat the process to refine the final ontology. The result of this process is a domain ontology that only contains domain concepts learned from input resources. OntoGen is a semi-automatic and data-driven ontology editor focusing on editing of topic ontologies. The system combines text-mining techniques with an efficient user interface to reduce both: the time spent and complexity for the user. In this way it bridges the gap between complex ontology editing tools and the domain experts who are constructing the ontology and not necessarily having skills of ontology engineering. The two main characteristics of the system are Semi-Automatic and

18

D1.2 / SWS Bootstrapping Methodology Data-driven. Semi-Automatic means that the system is an interactive tool that aids the user during the ontology construction process. It suggests: concepts, relations between the concepts, names for the concepts, automatically assigns instances to the concepts and provides a good overview of the ontology to the user through concept browsing and various kind of visualization. At the same time the user is always full in control of the systems actions and can fully adjust all the properties of the ontology by accepting or rejecting the system ’s suggestions or manually adjusting them. Data-Driven means that most of the aid provided by the system is based on the underlying data provided by the user typically at the beginning of the ontology construction. Also, the system supports automatic extraction of instances (used for forming concepts) and cooccurrences of instances (used for forming relations) from the data.

2.7

Evaluating Ontologies

Ontologies are increasingly being used in a variety of domains and applications varying from knowledge management, natural language processing, e-commerce, information retrieval etc. As with any other resource used in software applications, the content of ontologies should be evaluated before using it in any other ontologies or applications to ensure they are effective and fit for purpose (Staab and Studer 2004). Ontology evaluation is an emergent field and is gathering attention with the increased use of ontologies in research and industry. Several ontology evaluation methods and tools have been proposed to date: Hartmann et. al (Hartmann, Sure et al. 2004) and Brank et. al. (Brank, Grobelnik et al. 2005) present surveys of currently available ontology evaluation methods which are applicable in different situations. For example, OntoMetric (Adolfo and Gomez-Perez 2004) is a method that can be used when knowledge engineers have to choose an appropriate ontology (among several candidate ontologies) to be used in an application or project. OntoMetric proposes the decision criteria to be considered, and the process that should be followed in order to obtain a valuation of the suitability of each candidate ontology that will help in assessing the suitability of the ontologies. On the other hand EvaLexon (Reinberger and Spyns 2004) is a method that can be applied on the results of automatic ontology mining techniques (where ontologies are created from text documents in the application domain). EvalLexon provides a rough reference to determine whether or not the results of ontology mining, capture most of the notions of the input text by using a number of metrics (such as coverage and accuracy (Reinberger and Spyns 2004)). The survey also outlines Natural Language Application metrics that helps in evaluating the content of ontologies with respect to natural language applications (applications that involve populating an ontology of concepts with instances drawn from textual data). These include metrics such as Precision and Recall (Chinchor 1992), and the Tennis measure (Brewster, Alani et al. 2004) (which evaluates the extent to which items in the same cluster are closer together in the ontology than those in different clusters). Out of the available ontology evaluation methods, there are two principal methods that are relevant to ontology developers in general (as pointed out in (Hartmann, Sure et al. 2004)): the method proposed by Gomez-Perez in (Staab and Studer 2004) and the OntoClean method (Guarino, Welty et al. 2004). The principles behind these evaluation methods are presented in detail below.

19

D1.2 / SWS Bootstrapping Methodology

The method proposed by Gomez-Perez in (Staab and Studer 2004) presents several criteria which can be used to evaluate the taxonomic content of ontologies and points out several types of errors that can be made when developing taxonomies, which are: •

Inconsistency: This refers to circular errors (which occurs when a concept is defined as a specialization or generalization of itself, forming cycles in the taxonomy); partition errors (when a concept is defined to be a sub-concept of two or more disjoint concepts) and semantic errors (incorrect semantic classification of concepts).



Incompleteness: This refers the lack of completeness of the ontology with respect to the concept hierarchy, domain and range of relations and omission of disjoint knowledge.



Redundancy: Redundancy errors occur when expressions of the ontology are redefined when they have been already defined explicitly or when they can be inferred from other definitions.

ODEval7 is the publicly available tool that provides automatic evaluation of ontologies supporting the method outlined above. This uses a set of algorithms based on graph theory to detect possible problems in ontology concept taxonomies. For OWL ontologies, this detects circularity problems, partition errors and redundancy problems. OntoClean (Guarino, Welty et al. 2004) which is the other main evaluation method applicable to ontology developers, is based on philosophical notions for a formal evaluation of taxonomic structures. In other words this method helps to remove wrong taxonomic relations in ontologies based on philosophical notions of rigidity, unity and identity. OntoClean method consists of: 1) a set of axioms that formalize definitions and constraints specified in the methodology; and 2) a “meta-ontology” or a “taxonomy of properties” that provides a frame of reference for evaluations. The philosophical notions of – rigidity, unity and identity, on which the OntoClean method is based on, are briefly described below. Rigidity: A property is rigid if it is essential to all its possible instances; an instance of a rigid property cannot stop being an instance of that property in a different world. For example being a human is considered to be a rigid property since no instance of a human can stop being a human. On the other hand being a student is considered as an anti-rigid property since any instance of a student can stop being a student at any point in time. Rigidity and its variants (anti-rigid and non-rigid) are considered as important meta-properties in OntoClean since they impose fundamental constraints on the subsumption relations which are used to check the formal correctness of 7

http://minsky.dia.fi.upm.es/odevalhttp://minsky.dia.fi.upm.es/odeval

20

D1.2 / SWS Bootstrapping Methodology taxonomic links. For example being a student cannot subsume being a human when the former is anti-rigid and the latter is rigid. Unity: Certain properties pertain to ‘wholes’8, that is, all their instances are wholes, and others do not. For example, being (an amount of) water does not have wholes as instances, since each amount can be arbitrarily scattered or confused with other amounts. In other words, knowing an entity is an amount of water does not tell us anything about its parts, or how to recognize it as a single entity. On the other hand, being an ocean is a property that picks up whole objects, as its instances, such as “the Atlantic Ocean,” which are recognizable as single entities. As with rigidity, the metaproperty of unity (and anti-unity) is used to construct constraints on subsumption relations of an ontology. Hence “Ocean” cannot be a subclass of “Water” as the former carries unity and the latter carries anti-unity (oceans are composed of water not a “kind of water”). Identity: Identity refers to the problem of being able to recognize individual entities in the world as being the same (or different). Identity criteria (used to ‘identify’ a certain individual as being that individual) are conditions used to determine equality (sufficient conditions) and that are entailed by equality (necessary conditions). For instance consider the example of statue and the clay: is the statue identical to the clay it is made of? Considering the essential properties: having (more or less) a certain shape is essential for the statue, but not essential for the clay. Therefore, they are different: it can be said that they have different identity criteria, even without knowing exactly what these criteria are. A property carries identity, if it has common identity criteria to identify the instances of that property. As indicated in the examples, the OntoClean method uses these meta-properties and the constraints on the classes carrying these meta-properties, to identify taxonomic relations that are fundamentally wrong and to “clean” the ontology. 2.8 General Guidelines for Ontology Modelling In this section, we would like to provide some general guidelines that would help transitioning existing applications to ontologies. The following points are what we consider as “best practices” for ontology development. 2.8.1

Designing Ontologies as Engineering Systems

Ontologies are knowledge representation artefacts built to be integrated in, and generally to control, an information system. They have to be considered as the result of an engineering activity. As such, they must follow the general guidelines of any engineering activity, namely specification of needs and requirements, iterative evaluation of the relevancy of a solution against requirements, integration with other components and general system architecture, etc. In short, let’s quote Tom Gruber9, “It Is What It Does “. Typical requirements on what ontologies can “do” include

8

The meaning of ‘whole’ to be interpreted as ‘an assemblage of parts that can be regarded as a single entity’

9

http://tomgruber.org/writing/cidoc-ontology-2003.pdf

21

D1.2 / SWS Bootstrapping Methodology •

Migrate data from legacy sources



Control integrity of data against a set of constraints



Support sophisticated user queries



Federate data as description of business objects



Federate applications through common definitions of business objects

2.8.2

Building Ontologies as Part of A Transition Process

The engineering tasks leading to the building and integration of ontologies can most of the time be considered as part of a transition process from a legacy system (document corpus, data, data schemes, vocabulary, etc.) to a target system which will be ontology driven. The ontology is the backbone of the target system, and part of it has to be extracted, inferred, or otherwise migrated from the legacy system(s). In general, no explicit ontology is defined in the legacy system, but some implicit or latent ontology is present in terms of data structures, and will be identified by a careful audit of the legacy to migrate: existing data bases (schema if available, content, available export formats), document corpus to index, classify, or mine, terminology, controlled vocabulary, entity lists, etc. (see Figure 1). To explicit that latent ontology, the knowledge engineer must work closely with domain experts that should be able to provide the following information: •

Are there one or several databases, or even some knowledge bases, in the legacy system that must be transitioned into the target application?



Are there some terminological resources (lexicons, taxonomies, thesaurus, referential tables, others), either part or external to the legacy system, that has to be integrated in the target application?



Should the target application preserve and store the existing legacy data?



Should the data structure (schema or model) be fully maintained or is it preferable to rethink it in order to respond more closely to the target application’s requirements?

Another part of the ontology will be defined from the target system requirements, but there is of course no way to extract this part from the legacy.

22

D1.2 / SWS Bootstrapping Methodology

Base de donnée Data Bases deof the l’applicati existing application

Knowledge engineer Data bases Schémas

Base de donnée de Knowledge l’applicatio Base of the n source existing application

Interviews Knowledge bases models

Base de donnée de and Thesaurus l’applicatio other nterminological source resources

Domain expert Thesaurus

Figure 1: Legacy data sources available to the knowledge engineer

2.8.3

Specifying The Target System Functional Requirements

The added value of the target system against the legacy system is generally defined as a set of extra functional requirements. Those requirements have to be specified clearly and the ability of the ontology to meet them should be evaluated from a qualitative and quantitative viewpoint (performance). In particular, type and number of queries likely to be performed against knowledge bases, user interfaces expected in read and publication modes, are to be assessed whenever a modelling choice is open. Indeed, the specificities of the domain and of the target application often bind the knowledge engineer to make certain modelling assumptions. For example, is it preferable to model a birthplace as an attribute or as a relation? The decision to choose such or such knowledge representation can greatly impact the way the target application will be exploited. For example, a search engine based on the ontology model will not offer the same functionalities to the user according to whether the birthplace was represented as an attribute, being a simple string format, or as a relation between the classes, thus constraining the possible values for the location. In any case, the “good” modelling decision should not be the one which “represents the best” the “domain reality”, but the one which meets the functional requirements with the best performance. 2.8.4

Identifying The Target System Technical Constraints

The ontology will be implemented in a software environment, of which technical characteristics have to be known before developing the ontology. The technical architecture and meta-model used in the target system (e.g., Mondeca ITM metamodel) will put specific technical constraints on the ontology ‘species’ and allowed

23

D1.2 / SWS Bootstrapping Methodology constructs. If reasoning facilities are expected in the target system, the ontology constructs will have to be limited to those constructs supported by the inference tools. 2.8.5

Putting The Business Objects at The Core of The Ontology

The ontology backbone taxonomy is built around core business object types. Those types are generally the ones known by all system users, from business experts to end users. They can be identified by several methods, such as the objects most frequently queried, those which appear as primary keys in databases, as main taxonomy categories, terms in controlled vocabulary etc. Those objects will define the “core classes”, along with their main attributes. The core classes are not necessarily the “upper classes” of the ontology, but the most likely to be instantiated and the most often queried in the target system. Those core classes and attributes are generally not many, since they represent the business core. Whatever the way to extract them, their very definition is almost always the opportunity for domain experts and system users to go through a “conceptual audit” of their core business objects, going trough clarification of semantics and business logic, and disambiguation of terminology. In this task the role of the knowledge engineer is critical. She must push towards and facilitate the conceptual audit, and at the same time keep agnostic on its results. The knowledge engineer is not the domain expert. Stabilization of this core ontology, and its assessment against data migration and functional requirements should be the objective of a prototype system. As necessary, core classes will be further extended by more generic (abstract) classes, generally in order to federate attributes. The generic classes defined this way are technical, and do not necessarily appear in the end user experience. 2.8.6

Manage Both Business Terminology and Logic

The business logic which is formalized in the ontology is considered as distinct from, but not independent from, the terminology used to represent the concepts. The distinction is not always easy to grasp by domain experts, for whom the logic is strongly embedded in the terminology. One task of the knowledge engineer, and particularly at the beginning of the process, is to help making this distinction clear, to show how the same business logic can be represented by different users using different terms, or the other way round, and to discover ambiguity of terms hiding several distinct business logics. 2.8.7

Consider Data Throughout The Transition Process

In an iterative way, and as often as possible, the capacity of the data to be represented using the ontology has to be assessed on samples of legacy data. This task has to be conducted on real data, using the workflow that will be used in production. It has to take into account the data sources and original format, their extraction and transformation in the best ad hoc format (tabulated text, CSV, XML, RDF, etc.) They will be imported into the target system using an integrity checking process and a test suite of queries against the resulting knowledge base. The data migration is often a bottleneck in the process. Legacy data actually are rarely what their administrators think they are, or would like them to be. Assessing integrity

24

D1.2 / SWS Bootstrapping Methodology of data samples against the ontology and finding inconsistencies can lead either to refine or correct bugs in the ontology, or to help legacy administrators to clean and improve their data in order to make them fit for migration. In most projects, both adjustments are needed. 2.9 Discussion As we can see, the process of building ontologies, and by the same way of transitioning legacy systems to ontologies, still relies on a close collaborative work with the domain experts. Indeed the legacy systems are composed of an important part of implicit knowledge that requests the help of the domain experts to be made explicit. From that perspective it seems difficult to automate the transitioning process. That’s why the process we apply in our various projects is mainly manual. Yet we use some utility tools to integrate several heterogeneous legacy data sources and to transform automatically their formats into the target one, being the target ontology model. What would highly assist the knowledge engineer in her task is a tool able to design the transition rules between the different data sources into the ontology format. To our opinion, another major aspect in transitioning is to take into account the existing technical specifications as well as the requirements for the target application. Now the actual will in TAO to analyse not only the data structures, such as the database schemas, but also the documentation available about those specifications, requirements and event the source code is an important move towards the achievement of a successful transitioning process.

3

Semantic Web Services Lifecycle

Semantic Web Services will allow the semi-automatic and automatic annotation, advertisement, discovery, selection, composition, and execution of inter-organization services, making the Internet become a global common platform where organizations and individuals communicate through well-defined interfaces to utilize each other services and share their resources. In order to fully exploit the full power of Web Services, their functionality must be integrated, which can be very difficult to achieve and time consuming if carried out by humans. On the other hand, workflow technologies can play an important role in automating Web services integration, but without having sufficient information about these services, the integration process can still be difficult and error-prone. Semantics can play an important role in providing such information and supporting all stages of Web services lifecycle. The main stages of the Web services lifecycle are creation, description/annotation, advertisement, discovery, selection, composition, execution, monitoring and managing their QoS.

3.1

Semantics and Ontologies

There is a growing concern that semantics alone are sufficient to describe and integrate complex Web services due the degree of heterogeneity, autonomy, and distribution of the Web. Several researchers agree that it is essential for Web services

25

D1.2 / SWS Bootstrapping Methodology to be machine understandable in order to allow the full deployment of efficient solutions supporting all the phases of the lifecycle of Web services. To support the sharing and reuse of formally represented knowledge among distributed systems, it is useful to define the common vocabulary in which shared knowledge is represented. A specification of a representational vocabulary for a shared domain of discourse (definitions of classes, relations, functions, and other objects) is called ontology (Uschold and Grüninger 1996). Ontologies are used in the semantic web as a form of knowledge representation about the world or some part of it. They are considered the basic building block of the Semantic Web as they allow machine-supported data interpretation reducing human involvement in data and process integration. Normally, Web services that share the same ontology can communicate about a domain of discourse, and they adhere to the ontology in their course of actions. 3.2

Semantics for Web services

The notion of utilising specialised, reusable services provided by distributed service providers has been significant in facilitating access to tailored application production by users, without the need for expertise in large-scale software design. By utilising tools that support the construction (and validation) of workflows, novice users can assemble applications simply by plugging together services that expose mutually compatible interfaces. Web services have greatly facilitated the uptake and use of this service-oriented paradigm, due to its being built upon de facto Web standards for syntax, addressing, and communication protocols. The use of syntactic frameworks such as XML has enabled the representation and publication of machine-readable, declarative specifications that can be obtained and used by developers and applications alike. However, two major obstacles for facilitating (and automating) the construction of large-scale workflows within open environments is that of semantic and schematic heterogeneity. The semantics of Web service standards are well defined (perhaps implicitly within Web service tools), and the XML Schema definitions for these standards can be used to validate the service descriptions published by service providers. However, a problem emerges when one considers the definitions of the services themselves, and their messages. Although XML was designed to define the syntax of a document, it says nothing about the semantics of entities within a document, and consequently does not assist in the interpretation or comprehension of messages or exchange sequences (Bussler 2001). Semantic Web services (Payne and Lassila 2004; Payne and Lassila 2004) address this by providing a declarative, ontological framework for describing services, messages, and concepts in a machine-readable format that can also facilitate logical reasoning. Thus, service descriptions can be interpreted based on their meanings, rather than simply a symbolic representation. Provided that there is support for reasoning over a semantic Web service description (i.e. the ontologies used to ground the service concepts are identified, or if multiple ontologies are involved, ontology mappings should provide transformation of concepts from one ontology to the other), workflows and service compositions can be constructed based on the semantic similarity of the concepts used.

26

D1.2 / SWS Bootstrapping Methodology Web services semantics can be classified (according to their usage and the entities they describe) into Functional Semantics, Data Semantics, QoS Semantics and Execution Semantics. Functional Semantics: It has been assumed in several semantic Web service discovery algorithms (Paolucci, Kawamura et al. 2002) that the functionality of the services is characterized by their inputs and outputs. Hence these algorithms look for semantic matching between inputs and outputs of the services and the inputs and outputs of the requirements. This kind of semantic matching may not always retrieve an appropriate set of services that satisfy functional requirements. For example, two services can have the same input/output signature even if they perform entirely different functions. Though semantic matching of inputs and outputs are required, they are not sufficient for discovering relevant services. As a step towards representing the functionality of the service for better discovery and selection, the Web services can be annotated with functional semantics. It can be done by having an ontology in which each concept/class represents a well-defined functionality. The intended functionality of each service can be represented as annotations using this ontology. Data Semantics: All the Web services take a set of inputs and produce a set of outputs. These are represented in the signature of the operations in a specification file. However the signature of an operation provides only the syntactic and structural details of the input/output data. These details (like data types, schema of a XML complex type) are used for service invocation. To effectively perform discovery of services, semantics of the input/output data (e.g. ranges, value constrains, alternatives, etc.) has to be taken into account. Hence, if the data involved in Web service operation is annotated using an ontology, then the added data semantics can be used in matching the semantics of the input/output data of the Web service with the semantics of the input/output data of the requirements. QoS Semantics: Each Web service can have different quality aspect and hence service selection involves locating the service that provides the best quality criteria match. Service selection is also an important activity in web service composition (Cardoso and Sheth 2003). This demands management of QoS metrics for Web services. Web services in different domains can have different quality aspects. Incorporating QoS semantics in Web services descriptions allows organizations to select the most appropriate services to fulfil their requirements, monitor services behaviour at run time, and evaluate alternative strategies when Web services adaptation becomes a necessity. Execution Semantics: Execution semantics of a Web service encompasses the ideas of message sequence, conversation pattern of Web service execution, flow of actions, preconditions and effects of Web service invocation, etc. Some of these details may not be meant for sharing and some may be, depending on the organization and the application that is exposed as a Web service. In any case, the execution semantics of these services are not the same for all services and hence before executing or invoking a service, the execution semantics or requirements of the service should be verified. A proper model for execution semantics can help in coordinating the activities of services provided by independent parties.

27

D1.2 / SWS Bootstrapping Methodology The Web serivces can also be classified based on many other criteria, such as price, service level agreement, legal issues, etc. Since TAO only focuses on generating Semantic Web serivce desciptions, we will not present the details of these classification.

3.3

The SOA Design Lifecycle

The design lifecycle for a service-oriented system is largely divorced from both the specific methodology used to create the system, and from the lifecycle of the individual services within such a system (further discussed in Section 4.). We present the following sketch of the design lifecycle for a SOA system.

Identify Services

Annotate Services

Deploy Services

Evaluate Services

Refine Services

Service Identification: This process refers both to the identification of existing services which can be repackaged within a SOA system, and also to the identification of required functionality (from a business process modelling exercise, for example) that does not currently exist in operational form, and the subsequent implementation of such functionality as services. Service Annotation: In order to allow loose coupling of component services through brokerage and matchmaking, it is necessary to describe the services in a SOA system in sufficient detail that a service requester can find an appropriate service that meets their needs. Service Deployment: Here we refer to the deployment of services within a service execution environment. Parts of the individual service lifecycle that is described in Section 4 (more specifically, the processes from service advertisement to service execution) typically take place within such an environment.

28

D1.2 / SWS Bootstrapping Methodology Service Evaluation: This process refers to the ongoing monitoring of a SOA system to determine whether it meets its design goals. Service Refinement: The refinement of a SOA system typically takes one of three forms: the introduction of new functionalities through the creation of new services; the refactoring of existing service functionality (through aggregation or further decomposition, for example); or the refinement of the service descriptions to better facilitate service matchmaking and brokerage. 3.4

Phases of the Semantic Web service Lifecycle

3.4.1

Semantic Web service Creation

The creation of a Web service includes the development and testing of the service implementation, the definition of the service interface description and the definition of the service implementation description. The Web services implementations can be provided by creating new Web services, transforming existing applications into Web services (the TAO approach), and composing new Web services from other Web services and applications. Developing a new Web service involves using the programming languages and models that are appropriate for the service provider’s environment. Transforming existing applications into Web services involves generating service interfaces and service wrappers to expose the application’s relevant business functions. Composing new Web services from existing Web services involves choreographing and orchestrating message flows between software components directly or through workflow technologies. The Web services that are used to compose a workflow can exist within the single or multiple organizations. The first step in creating a Web service is to design and implement the application that represents the Web service. This step includes the design and coding of the service implementation, and the testing to verify that all of its interfaces work correctly. The application can be implemented as an Enterprise JavaBean (EJB), JavaBean, servlet, C++ or Java class file, or Component Object Model (COM) class. After the Web service is developed, the service interface definition can be generated from the implementation of the service (i.e. the service interface can be derived from the application's Application Programming Interface (API)). The service interface should not be generated until the Web service development is complete because the interface must match the exact implementation of the service. Web services interfaces are usually developed in WSDL (Chinnici, Moreau et al. 2006) documents that define the interface and binding of the corresponding Web service implementations. The WSDL document defines the contract between the Web service requester and provider such that the implementation details (and even implementation platform) of each can be different and can change without any impact on the others. Web service interfaces can be published on the web using XML, or more specifically, WSDL, and can be invoked using web protocols like HTTP and SOAP (Graham, Simeonov et al. 2001).

29

D1.2 / SWS Bootstrapping Methodology Sometimes Web services can be developed to conform to an existing service interface. This type of service interface is usually part of an industry standard, which can be implemented by any number of service providers. In this case the service provider does not own the service interface but only creates a service that implements the interface. Also, in some cases, both the Web service and the service interface exist but they do not match each other, and in this case mapping techniques can be used to map the existing application interfaces to those defined in the service interface definition. This is normally done by creating a wrapper for the application that uses the service interface definition, and contains an implementation that maps the service interface into the existing application interface. Semantic Web services are viewed as a way to extend the capabilities of web services in the direction of dynamic interoperability. The underlying theme is the overcoming of interoperability limitations arising from the need for service and client developers and to agree in advance on the syntax and semantics of interactions, thereby making it possible for clients to successfully utilize web services without prior arrangements between people that are realized in rigid software protocols, and immutable ontologies or meta-data. In order for Semantic Web services to overcome the deficiencies of the current Web services technology (based on SOAP, WSDL, and UDDI), two major obstacles have to be alleviated - incompatibilities of data and information models and the mismatches in exchange protocols utilized by different communities of service providers. One reason for these differences is that they were developed by different groups and they continuously evolve over time. Dynamically accessible semantic descriptions of service capabilities and utilization protocols can contribute to overcoming these barriers. Creating Semantic Web services depends very much on whether the services are described semantically; whether they exchange/use semantically-rich knowledge or simply pass data items; or whether or not they can dynamically reason over the concepts of a message (Payne and Lassila 2004). Specifically, one can view such services as “Semantic + Web services”, whereby existing web service descriptions are annotated with semantically rich descriptions, which can be used by applications and middleware to discover, compose, and validate services and workflows. However, the underlying services are ultimately naïve of this annotation, and they exchange data, rather than exchanging and interpreting concepts. An alternative approach would be to consider services as “Semantic Web + Services”, whereby services are defined that can utilise and interpret OWL assertions and ontologies dynamically. For example, on receiving a message containing a set of OWL assertions, a Semantic Web enabled service should be able to infer any meaningful entailments from this message, and (possibly) proactively seek additional resources that may be necessary, such as alignments between ontologies. Regardless to whether a Web service implementation performs additional functionality to reason about Web services semantics, the interface of a Semantic Web service should include semantic description of the service and its utilization protocol, matchmaking techniques should be developed to mediate incompatibility between services, and ontologies should be created to provide a common understanding among services domain concepts. 3.4.2

Semantic Web service Annotation

30

D1.2 / SWS Bootstrapping Methodology Web service specifications are typically based on standards that only define syntactic characteristics, which is unfortunately insufficient for Web services interoperation. One of the most recognized solutions to solve interoperability problems is to enable applications to understand methods and data by adding meaning to them. Many tools are currently available to create Web services. In technical terms any program that can communicate with other remote entities using SOAP messages can be called a Web service. Since the development of Web services is the first stage in the creation of Web services, it is very important to use semantics at this stage. This implies the specification of data, execution, QoS and functional semantics of Web services. As mentioned earlier, all Web services take a set (which could be empty) of inputs and produce a set (which could also be empty) of outputs. These are represented in the signature of the operations in a WSDL file. However the signature of an operation provides only the syntactic and structural details of the input/output data. To effectively perform operations such as the discovery of services, semantics of the input/output data has to be taken into account. Hence, if the data involved in Web service operation is annotated using an ontology, then the added data semantics can be used in matching the semantics of the input/output data of the Web service with the semantics of the input/output data of the requirements. Annotation languages and frameworks such as OWL-S (Ankolenkar 2002), WSMO (Roman, Keller et al. 2005), Meteor-S (Patil, 2004 #49) and SA-WSDL (Kopecky, Vitvar et al. 2007) provide the mechanisms and tools to achieve automatic and semi-automatic annotation of web services using ontologies. 3.4.3

Semantic Web service Advertisement

After the service is developed and annotated, it has to be advertised to enable discovery. Publishing or advertising Semantic Web services will allow applications (i.e. service requestors) to discover services based on the desired goals and capabilities. A semantic registry is used for registering instances of the service ontology for individual services. The service ontology distinguishes between information which is used for matching during discovery and used during service invocation. In addition, domain knowledge should also be published or linked to the service ontology. The UDDI standard does not define any special fields or bags for semantic information, so there is no explicit function to put/get OWL-S service descriptions into/from UDDI. But UDDI can be combined (through the definition of tModels) with OWL-S to accommodate semantic descriptions (Paolucci, Kawamura et al. 2002) (Srinivasan, Paolucci et al. 2004). UDDI Models can be used to store URI of the OWL-S service description (whole OWL-S or only its profile part). However, UDDI search engine can only compare stored URIs, not actual OWL-S pointed by these URIs. Therefore to overcome this problem, Web service search engines and automated discovery algorithms need to be developed. The discovery mechanisms supported need to be based on Web services profiles with machine process-able semantics.

31

D1.2 / SWS Bootstrapping Methodology 3.4.4

Semantic Web service Discovery

Service discovery is the process by which a client identifies candidate services to achieve the client's objectives. For web services, this is currently a heavily manual process, in that registries like UDDI are designed to be searched by developers of client systems. In contrast, semantic matchmakers use semantic relations to find services described using semantic web languages. The discovery of services consists of a semantic match between the description of a service request and that of a published service. Queries involving the service name, input, output, preconditions and other attributes can be constructed and used for searching the semantic registry. The matching can also be done at the level of tasks or goals to be achieved, followed by a selection or provision of services that solves the task (Stein, Gennings et al. 2006). The discovery of Web services has specific requirements and challenges as compared to previous work on information retrieval systems and information integration systems. Due to the massive amount of services available on the Web, an efficient way discovering Web services still needs to be found (Cardoso and Sheth 2003). Within this stage several issues need to be considered: •

Precision of the discovery process. The search has to be based, not only on syntactic information, but also on data, functional, and QoS semantics.



Enabling the automatic determination of the integration requirements of the discovered Web service.



Requesters must locate and interact with peers or matchmakers that can compare descriptions of queries and capabilities and respond to queries for advertised service descriptions. Also, similar to services providers describing the capabilities of their offered services, requesters must be able to decide whether they can satisfy the preconditions of discovered services before selecting them.

The outcome of this phase is a cluster of Web services that match requesters’ initial requirements. In the next phase (semantic Web service selection), a Web service that more closely matches the requirements is selected. The cluster which contains the list of other services, which also match the requirements, is maintained for later usage. This is because a service may be chosen later in case of failure or breach of contract. 3.4.5

Semantic Web service Selection

Web service selection is a need that is almost as important as service discovery. After discovering Web services whose semantics match the semantics of the requirement, the next step is to select the most suitable service. Each service can have different quality aspect and hence service selection involves locating the service that provides the best quality criteria match. In a more specialized or agent-based type of interaction a negotiation process can be started between a requester and a provider, but that requires that the services themselves be knowledge-based.

32

D1.2 / SWS Bootstrapping Methodology Service selection is also an important activity in Web service composition. This demands management of QoS metrics for Web services. Web services in different domains can have different quality aspects. These are called Domain Independent QoS metrics. There can be some QoS criteria that can be applied to services in all domains irrespective of their functionality or specialty. These are called Domain Specific QoS metrics. Both these kind of QoS metrics need shared semantics for interpreting them as intended by the service provider. This could be achieved by having an ontology (similar to an ontology used for data semantics) that defines the domain specific and domain independent QoS metrics. 3.4.6

Semantic Web service Composition

Composition of Web services or choreography allows Semantic Web services to be defined in terms of other simpler services. This task involves the automatic selection, composition, and interoperation of Web services to perform some complex task, given a high-level description of an objective. Interoperability is a key issue in this stage because more and more organizations are building their own e-commerce/B2B systems to serve their goals. In order for these systems to be successful, they need to understand each other requirements and interoperate seamlessly. Automating inter-organizational services across supply chains presents significant challenges (Stohr and Zhao 2001). Compared to traditional process tasks, Web services are highly autonomous and heterogeneous. Complex methods are necessary to support the composition of Web process. Here again, one possible solution is to explore the use of semantics to enhance interoperability among Web services. A workflow expressing the composition of atomic services can be defined in the service ontology by using appropriate control constructs. This description would be grounded on a syntactic description such as BPEL4WS (Tony and Curbera 2002), WSCI (Arkin, Askary et al. 2002), and BPML (BPMI.org 2001). Dynamic composition is also being considered as an approach during service request in which the atomic services required to solve a request are located and composed on the fly. That requires an invoker which matches the outputs of atomic services against the input of the requested service. While composing services, four kinds of semantics have to be taken into account. The process designer should consider the functionality of the participating services (functional semantics), data that is passed between these services (data semantics), the quality of these services, the quality of the process as a whole (QoS semantics) and the execution pattern of these services, the pattern of the entire process (Execution semantics). Since Web process composition involves all kind of semantics, it may be understood that semantics play a critical role in the success of Web services and in process composition. 3.4.7

Semantic Web service Execution

With the emergence of Semantic Web services, workflow management systems become essential to support, manage, enact, and orchestrate Web services, both between organizations and within the organization. Execution semantics of a Web service encompasses the ideas of message sequence (e.g., request-response, requestresponse), conversation pattern of Web service execution (peer-to-peer pattern, global

33

D1.2 / SWS Bootstrapping Methodology controller pattern), flow of actions (sequence, parallel, and loops), preconditions and effects of Web service invocation, etc. Execution of Semantic Web services involves a number of steps, once the required inputs have been provided by the service requester. First, the service and domain ontologies associated with the service must be instantiated. Second, the inputs must be validated against the ontology types. Finally the service can be invoked or a workflow executed through a specific grounding (i.e. a description of a physical binding for the conceptual, semantic service description). Note that with the help of execution semantics Web services need not be statically bound to specific components. Instead, based on the functional and data semantics (a list of Web services can be short listed), QoS semantics (can be used to select the most appropriate service), and execution semantics services can be bound to web applications managed by workflow management systems.

3.4.8

Semantic Web service Monitoring and QoS

The evolution of Web technologies and in particular e-commerce has added a new dimension to the requirement for specifying and monitoring QoS metrics such as products or services to be delivered, deadlines, quality of products, and cost of service. To enable adequate QoS management, research is required to develop mechanisms that semantically specify, compute, monitor, and control the QoS of the products or services to be delivered. In e-commerce and e-business Web services, suppliers and customers define a binding agreement between the two parties, specifying QoS items such as services to be delivered, deadlines, and cost of services. The management of QoS metrics of semantic Web processes directly impacts the success of organizations participating in e-commerce. Therefore, when services or products are created or managed using Web services, the underlying workflow management system must accept the specifications and be able to estimate, monitor, and control the QoS rendered to customers. A comprehensive QoS model that allows the description of Web services components from a QoS perspective has already been developed (Cardoso, Sheth et al. 2004). One of the models includes three dimensions: time, cost, and reliability. The QoS model is coupled with an algorithm to automatically compute the overall QoS of Web services. These developments can be easily applied to automatically compute the duration, cost, and reliability of Web services.

4

Overview of TAO Transitioning Methodology

Developing the TAO transitioning methodology is one of the key scientific contributions expected from the TAO project. In this section we outline some initial perspectives gleaned from an investigation of the relevant literature, and through collaboration with other partners within the TAO consortium. These approaches do not represent a final methodology, but rather suggest an initial (baseline) approach from which refinements may evolve and be compared with empirically.

34

D1.2 / SWS Bootstrapping Methodology 4.1

Methodology Overview

The TAO Transitioning Methodology defined in this document is presented as a composite lifecycle, which highlights the interactions between existing methodologies for Service-Oriented Architectures (which we use to include Web services and Enterprise Architectures), and for ontology design. As has been stated in previous sections, there is no single methodology for either of these tasks. Rather, we take the abstracted lifecycle sketches introduced previously and demonstrate how and where these should be linked. As such, the TAO methodology provides a refinement of the processes in the existing methodologies, with three key points of alignment between the ontology design process and the SOA design process (illustrated below): •

Learning ontologies from service descriptions



Using domain ontologies to annotate services



Using feedback from service evaluation to refine ontologies

These interactions between component methodologies are effectively a refinement of processes within those methodologies, and reflect the relationship between the products of each individual lifecycle. In particular, we note the task-oriented nature of a domain ontology which is defined with service annotation in mind when compared with a general-purpose ontology for the same domain. Knowledge Acquisition

Refine Ontology

Ontology Learning

Identify Services

Design Ontology

Annotate Services

Evaluate Ontology

Deploy Services

Evaluate Services

4.1.1

Refine Services

Service-Oriented Ontology Learning

In the ontology design lifecycle, the Ontology Learning process (Section 2.6.2) attempts to automatically or semi-automatically derives a knowledge model from a

35

D1.2 / SWS Bootstrapping Methodology document corpus. In our Transitioning Methodology, we have refined this to reflect the contribution made by the structured (but not ontologically-informed) description of an existing body of services (for example, service APIs and developer documentation, SOA design documentation, and so on). We call this refinement Service-Oriented Ontology Learning. It is our expectation that the ontology resulting from an automated ontology learning process should be treated as a candidate ontology which will subsequently be studied refined in the Ontology Design process; the extraction of an ontology from structured sources such as those mentioned above does not obviate the requirement for further work on the domain ontology. However, we believe that the use of structured sources relating to existing services will provide sufficient information to allow the creation of a domain ontology, which is better suited to the task of representing knowledge about services in that domain. 4.1.2

Semantic Service Annotation

The Service Annotation process described in the SOA methodology refers to the description of services at the signature level in languages like WSDL. While these allow rudimentary service matchmaking and brokerage on the basis of the types of the inputs and outputs of a service, these types are typically datatypes taken from XML Schema or similar, rather than richer types that are taken from an ontological characterisation of the domain. This refinement of the annotation process in the Transitioning Methodology uses Semantic Web services frameworks such as SA-WSDL, OWL-S or WSMO to more richly describe services using the domain ontology that results from the Ontology Design process in the parallel ontology design lifecycle. 4.1.3

Service-Driven Ontology Refinement

Both the ontology design lifecycle and the SOA lifecycle contain evaluate-refine steps that represent reflection on the performance of a system and the subsequent reengineering. In creating a service-driven ontology refinement process in the Transitioning Methodology, we represent the synergy between these feedback cycles, and reinforce the task-oriented nature of the domain ontology. 4.2

Ontology Learning

Ontologies and metadata are the foundation of the Semantic Web / Grid and intelligent web services. They provide the frameworks for describing the meaning of resources and services in terms that software agents and other services can understand and manipulate. However, the existing ontology development practice was at a similar stage to software development two decades ago. It presumed that each ontology was started from scratch, and it approached ontology development more as a craft than as a principled engineering discipline. Examples are often small scale ‘toys’ for wellunderstood domains. However, the reality has proved that ontology modelling and maintenance is a time consuming task. Human expert modelling by hand is biased,

36

D1.2 / SWS Bootstrapping Methodology error prone, and expensive. It is very difficult and cumbersome to manually derive ontologies from data. This appears to be true even regardless of the type of data one might consider. This problem has attracted significant attention from researchers and different approaches have been proposed such as developing more appropriate tools and guidance to enable domain experts to build ontologies for their own disciplines more readily. Ontology learning is one of the most significant approaches proposed to date and it tries to tackle the problem by developing tools to learn ontologies automatically or semi-automatically from different sources such as natural language text, semi-structured data (e.g., HTML or XML) or structured data. Ontology learning plays important rules in our methodology, as the resulted knowledge will be used in many phases of the transitioning process. In this subsection, we will discuss the possible activities involved in the ontology learning life cycle. 4.2.1

Domain and Goals of the Ontology

Ontologies are the key elements in Semantic Web services system and they are used to link conceptual real world semantics defined and agreed upon by communities of users. More explicitly, the main reasons for developing ontologies within this methodology are to: •

Represent knowledge about the world coherently and in a machine understandable way. Ontological analysis classifies the structure of knowledge and without ontologies, or the conceptualizations that underlie knowledge, there cannot be a vocabulary for representing knowledge. In order to represent the knowledge, we need to associate terms with the concepts and relations in the ontology and devise a syntax for encoding knowledge in term of the concepts and relations.



Enable knowledge sharing among people or software agents. Suppose that we have developed a high quality set of conceptualizations and their representative terms, for a domain of knowledge. The resulting ontology captures the intrinsic conceptual structure of the domain. We can share this ontology with others who have similar needs for knowledge representation in that domain. Shared ontologies can thus form the basis for domain-specific knowledge-representation languages.



Defining background knowledge/constraints on the domain explicitly, besides of defining common names for important concepts in the domain. This background knowledge is separately represented from system implementation, which makes it possible to reuse the existing codes even if our knowledge about the domain changes. In addition, explicit specifications of domain knowledge are useful for new users who must learn what terms in the domain mean.



Providing a standard vocabulary and framework to create, implement and use services on domain knowledge. Separating the domain knowledge from the knowledge of how to use the domain knowledge allows us to define a unified operational knowledge for describing the domain-independent properties and

37

D1.2 / SWS Bootstrapping Methodology capabilities of their Web services in an unambiguous and computer interpretable form.

4.2.2

Define Guidelines to Ensure Consistency

The creation of ontologies is key to the creation Semantic Web service descriptions; therefore it is important to ensure the correctness of an ontology used. Essentially, the ontology should be consistent in at least three different levels. •

Syntactic consistency. The ontology languages have predefined syntax, e.g. both OWL and WSMO have their RDF/XML syntax. Knowledge represented in these languages must be well formed. Furthermore, to meet different usages, ontology languages are often comes in various sub-languages or “species”, such as OWL has three different flavours – “OWL FULL”, “OWL DL” and “OWL LITE” and WSML also consists of a number of variants based on these different logical formalisms, namely “WSML-Core”, “WSMLDL”, “WSML-Flight”, “WSML-Rule” and “WSML-Full”. Thus, ontology must be built to fall inside the desired species level.



Logical consistency. Ontology cannot contain contradicted information. For example it would be a mistake if we asserted that a pizza was both “Meaty Pizza” and “Vegetarian Pizza” in a knowledge base, given “Meaty Pizza” and “Vegetarian Pizza” are disjoint. Reasoners (Haarslev and Möller 2001; FaCT++ 2003) normally can pick up the logical inconsistency. Tools are also developed to help users to correct the errors (Parsia, Sirin et al. 2005; Wang, Horridge et al. 2005).



Context consistency. An ontology is logically consistent does not necessarily infer that it accurately represents the real world. For example, without asserting that given “Meaty Pizza” and “Vegetarian Pizza” are disjoint, the ontology is logically consistent even we defined a “meaty-vegetarian” pizza. However, it is obviously an error. To discover this kind of problems, the ontology need to be tested by domain experts. There also exist some tools to help preventing users making such kind mistakes10.



Modelling styles. Strictly speaking this is not a kind of errors. However without a good modelling practise, ontologies can sooner become unmaintainable and unusable. Researchers have provided a set of guidelines for developing “good” ontologies11. These guidelines must be kept in mind.

Naming conventions: We also believe it is important to adopt a set of standard conventions for ontology modelling to avoid some errors and also make ontologies more understandable. Below are just a few tips and guidelines: 1. Use consistent delimiters. This can greatly improve the readability of an ontology. 10

http://www.co-ode.org/downloads/owlunittest/

11

http://www.w3.org/2001/sw/BestPractices/

38

D1.2 / SWS Bootstrapping Methodology

2. Use throughout the model either a singular or a plural form for the class and associations names. 3. Use prefixes and suffix conventions in the names to distinguish between classes and properties. 4.2.3

Identify Knowledge Sources

The task of ontology learning from pieces of software (such as Web services or software libraries) is essentially discovering concepts and relations in the source code, accompanying documentation, and external sources (such as the Web). This section explains the potentially relevant data sources that typically come with a set of reusable software components. We first differentiate between knowledge sources according to their stability i.e. static, semi-dynamic, and dynamic data sources. Static data sources: The content of static data sources does not change (or at least very rarely). These are mostly the reference manual, user’s manual, and source code. In some cases the underlying database (if it exists) can represent a valuable source of information. Apart from the database documentation, which semantically explains tables and fields, the structure can also be explored for data types, constraints, and relations between tables and fields. Furthermore, the database content can be “mined” to discover concepts and relations. In the context of the source code (the source code of an application that uses the service as well as the source code of the service itself), each comment can be treated as a semantic tag for a particular functionality. Class, function, and variable names can also be used as concepts as they usually adequately reflect the functionality or purpose of the corresponding software construct. Furthermore, relations between software constructs can be inferred from their structures (classes can be “members” of other classes, classes “inherit” functionality from other classes, and so on). Multimedia contents are often also available (tutorials, lectures, and other audio/video material). However, extracting knowledge from multimedia is far from trivial and is often not pursued. Semi-dynamic data sources: These are less static data sources with the property that there is still some effort required to update their contents. They tend to update on daily, weekly, or even monthly bases. We are mainly referring to more dynamic Web contents such as newsgroups and forums where topics related to the software package are discussed. Another useful semi-dynamic data source are code snippets found on Web pages dedicated (or at least related) to the software package and also posted by users to forums and newsgroups. The snippets are short pieces of code that can be analyzed to reveal functions (or functionalities) that can be used together in order to achieve a certain compound functionality. The latter is strongly related to orchestration/choreography in terms of Web service ontologies. Dynamic data sources: Dynamic data sources are updated automatically – usually no human effort is required. They tend to update rapidly (from hour to hour or even faster). Such data sources are Web and proxy server logs and potentially logs maintained by the system that provides the functionality in question (such

39

D1.2 / SWS Bootstrapping Methodology functionality providers are Web services, database systems, application servers...). These data sources can be mined for functionality usage patterns. Similar to code snippets they reveal important compound functionalities and, what is more, they also reveal current trends in the way the users interact with the service. The users’ needs may change according to some external factors and the change is usually evident from the mining of service usage logs. We can view the data sources, presented in the previous paragraph, from a different perspective in which we distinguish between domain-specific, API-specific, and function-specific (or routine-specific) data sources. At the same time we divide data sources from the three categories into two additional orthogonal categories: structured and unstructured data sources. Domain-specific data sources: These data sources do not discuss the software components explicitly. They rather present the domain on which the components are based (e.g. the GATE software library is based on the theory of information extraction and natural language processing, the TextGarden software library is based on the theory of text mining and text learning). For certain domains ontologies are already available (they can be downloaded from the Web or obtained from the corresponding institutions). Such ontology is a fully structured domain-specific data source with concepts and relations between concepts clearly defined. The domain-specific data sources can also be completely unstructured documents, Web pages, and other textual and multimedia resources about the domain. API-specific data sources: These data sources describe a set of functions (routines) that can be compound to provide certain functionality. A typical Web service, for instance, provides a set of functions, so does a typical software library. The abbreviation API stands for “application programming interface” and denotes a set of routines that an application uses to request and carry out lower-level services. Under the structured API-specific data sources we count the available code snippets and service usage logs. On the other hand there are several unstructured (or semistructured) data sources available: user’s manuals, API-related Web pages, forums, newsgroups, and potentially also API-related tutorials, lectures, and other API-related audio-video material. Function-specific data sources: These data sources describe each atomic (low-level) function that is exposed to the user through the service interface. The main data source in this context is the reference manual. It consists of two parts: one is quite structured, the other is unstructured. The first part describes functions of the corresponding service (or API) in terms of their function names, parameters and their types, return values, related functions, enclosing class names and their hierarchy, and provides similar detailed information about an API. The second part contains mostly natural-language descriptions of these same functions. Apart from the reference manual, the source code of the service can provide valuable information contained in the function and class declarations. A regular Web service can describe itself in a Web services description language (WSDL). A typical self-description contains function declarations on a more abstract level (as a set of communication endpoints capable of exchanging messages). Comments accompanying such formal declarations (e.g. source code comments and WSDL comments) can be treated as natural-language descriptions of the corresponding functions.

40

D1.2 / SWS Bootstrapping Methodology 4.2.4

Building the Ontology

Sabou (Sabou 2006) provides valuable insights into ontology learning for Web services. She summarizes ontology learning approaches, ontology learning tools, and acquisition of software semantics. In this section we elaborate on ontology learning in TAO. The most import step of ontology development is identifying the concepts in a domain. This can be performed by using fully automated approaches, such as unsupervised learning (e.g. clustering) and supervised learning (e.g. classification). Since fully automatic ontology construction is far from trivial, semi-automated approaches, such as active learning or semi-automated unsupervised learning, are often considered. The following is a short presentation of OntoGen, a tool for semi-automatic ontology construction. The tool is soon to be extended with the active learning capabilities and will therefore combine several popular semi-automated approaches. The OntoGen system is geared towards finding an optimal balance between the time and cost of ontology construction versus the precision of the final ontology. This was the ultimate criteria when deciding what features to add to the new version. We will now go trough the list of most important. •

Unsupervised concept learning. The system identifies groups of similar instances in the input data and generates suggestions for new concepts (Anupriya, Katia et al. 2006). The user decides which suggestions to add to the ontology.



Supervised concept learning. The user has a rough idea for a new concept and prepares a query describing the concept. The system uses the query to label a small sample of instances and then starts the active learning process (Avrim and Shuchi 2001)by asking the user if specially chosen instances belong to the concept. After few questions it trains a linear classifier, classifies remaining instances and finally adds the concept to the ontology. This feature helps the user in specifying the concept and is new in the latest version of OntoGen.



Concept naming. Often it is non-trivial to find a suitable name for a concept. Here the system helps the user with keyword extraction methods (Anupriya, Katia et al. 2006) which provide an overview of the concept and its instances.



Concept population. If new data becomes available after the ontology is constructed, the system can help by automatically classifying new instances into appropriate concepts.



Concept management. The user can fully customize each of the concepts by defining its instances. The system helps here by detecting outliers both inside and outside the concept.



Standards. The system can save the final ontology as a RDF schema or as OWL ontology and is therefore compatible with the core technologies of 41

D1.2 / SWS Bootstrapping Methodology semantic web. Apart from concept identification, OntoGen also implicitly infers subsumption relations between concepts (newer versions will also be able to discover some other types of relations). 4.2.5 Ontology Formalization Being a relatively young field, research into Semantic Web and Semantic Web services is still ongoing. There is no an Industry agreed standard to represent ontologies and service annotation in Semantic Web yet. Different approaches have been proposed and each of them has their advantages and disadvantages and can be found more applicable for certain use cases. In this subsection, we will brief introduce these alternatives. There are two very different modelling paradigms proposed for the Semantic Web ontology. One paradigm is based on notions from standard logics, such as propositional logic, first-order logic, and Description Logics (Baader, Calvanese et al. 2003). W3C adopted this paradigm with the introduce of its standards Resource Description Framework (RDF) (Brickley and Guha 2004) and the OWL Web Ontology Language (Bechhofer, Harmelen et al. 2004). At the same time, the objectoriented paradigm; based on notations from frame knowledge-representation systems, database schema and rule languages, remains the dominant approach to knowledge modelling. This paradigm is embodied in a previous version of RDF (Lassila and R.Swick 1999) and several proposals for Semantic Web languages, including OWL Flight (Jos de, Rub et al. 2005). These two paradigms have many similar modeling constructs: both are built around the notion of classes, representing concepts in the domain of discourse; classes have instances; properties (slots) describe attributes of those classes and relationships between them; restrictions and facets express constraints on the values of properties and slots. There are however major differences in the semantics of these constructs and in the way these constructs are used to infer new facts in the ontology or to determine if the ontology is consistent. As a result, the way that the modeling constructs are used in the two paradigms and the implications of definitions are different. (Patel-Schneider and Horrocks 2006; Wang, Noy et al. 2006) and (Jos de, Rub et al. 2005) give a very detailed comparison between the two paradigms. There also exist three most significant Semantic Web service framework proposed to date – Semantic Annotations for WSDL and XML Schema (SA-WSDL) and Web service Modeling Ontology (WSMO) and OWL-S. All are discussed in detail in TAO Deliverable D1.1, but for completeness we summarise the salient facts here. Semantic Annotations for WSDL and XML Schema (SAWSDL) is the latest standard produced by W3C. Based primarily on the earlier work on WSDL-S, it provides a standard means by which WSDL and XML Schema documents can be related to semantic descriptions. The semantic annotations can be added to various parts of a WSDL document such as input and output message structures, interfaces and operations. The SAWSDL specification is compatible with WSDL 2.0, WSDL 1.1 and XML Schema extensibility frameworks. The annotations on WSDL and XML

42

D1.2 / SWS Bootstrapping Methodology schema can be used to publish a Web service in a registry, and also to discover, compose and invoke Web services. SAWSDL introduces three new extension attributes for use in WSDL and XML Schema documents, and discusses some of their possible uses. The semantic annotations reference a concept in an ontology or a mapping document. The annotation mechanism is independent of the ontology expression language and this specification requires and enforces no particular ontology language. It is also independent of mapping languages and does not restrict the possible choices of such languages. WSMO is based on the earlier work on Unified Problem Solving Method, which was part of a “...framework for developing knowledge-intensive reasoning systems based on libraries of generic problem-solving components...” (Fensel and Motta 2001). WSMO provides a framework for semantic descriptions of Web services and acts as a meta-model for such Services based on the Meta Object Facility (MOF)12. Semantic service descriptions, according to the WSMO meta model, can be defined using one of several formal languages defined by WSML (Web service Modeling Language), and consists of four core elements deemed necessary to support Semantic Web services: Ontologies, Goals, Web services and Mediators. Ontologies are described in WSMO at a meta-level. A meta-ontology supports the description of all the aspects of the ontologies that provide the terminology for the other WSMO elements. Goals are defined in WSMO as the objectives that a client may have when consulting a Web service. Web services provide a semantic description of services on the web, including their functional and non-functional properties, as well as other aspects relevant to their interoperation. Mediators in WSMO are special elements used to link heterogeneous components involved in the modeling of a Web service. They define the necessary mappings, transformations and reductions between linked elements. OWL-S (Ankolenkar 2002) formally known as DAML-S, originated from a need to define Web services or Agent capabilities in such a way that was semantically meaningful (within an open environment), and also to facilitate meaningful message exchange between peers. Essentially, OWL-S provides a service model based on which an abstract description of a service can be provided. It is an upper ontology whose root class is the Service class, which directly corresponds to the actual service that is described semantically (every service that is described maps onto an instance of this concept). The upper level Service class is associated with three other classes: ServiceProfile, ServiceModel and ServiceGrounding. In detail, the OWL-S ServiceProfile describes what the service does. Thus, the class SERVICE presents a ServiceProfile. The service profile is the primary construct by which a service is advertised, discovered and selected. The OWL-S ServiceModel tells how the service works. Thus, the class SERVICE is describedBy a ServiceModel. It includes information about the service inputs, outputs, preconditions and effects (IOPE). It also shows the component processes for a complex process and how the control flows between the components. The OWL-S grounding tells how the service is used. It specifies how an agent can pragmatically access a service. Another deliverable of this project gives a detailed introduction and comparison between WSMO and OWL-S (Payne, Sánchez et al. 2007). 12

http://www.omg.org/mof/

43

D1.2 / SWS Bootstrapping Methodology 4.2.6

Ontology Evaluation and Modification

Ontology evaluation is an important issue that must be addressed during our transitioning process. After ontologies are developed, we need to ensure that they fit their requirements. Furthermore, the ontology learning tools used in our methodology could generate many candidate ontologies. An effective evaluation measures is required to select the best one among them. If the resulted ontology is not proper, we need to reconfigure the parameters of learning algorithm or direct the learning process itself. In addition to the approaches for ontology evaluation presented in Section 2.7, there exist other ways of evaluating ontologies, which include: •

Golden standard approach. In this approach, the resulted ontologies are compared with some predefined “standard” ontologies. Higher similarities imply better qualities (Maedche and Staab 2002) (Klaas and Steffen 2006).



Task-based approach. In this approach, ontologies will be used in some applications or tasks. The result of these applications, or its performance on the given tasks, might depend on the used ontology. A good ontology is one which helps to produce good results (Porzel and Malaka 2004).



Data-driven approach. This approach makes the comparisons with a source of data about the domain to be covered by the ontology.



Manual approach. In this approach, the domain experts will be asked to assess how well the ontology meets a set of predefined criteria, standards, requirements, etc. (Guarino, Welty et al. 2004)

(Brank, Grobelnik et al. 2005) and (Hartmann, Sure et al. 2004)provide a detailed survey on different ontology methods and tools. User can choose the most suitable approaches. 4.2.7

Ontology Maintenance and Evolution

Ontology evolution is important in our methodology and also ontology engineering in general. This is because that those practical ontologies are often complicated and large, and the development becomes a more ubiquitous and collaborative process. Furthermore, in our methodology, ontologies are developed progressively. We have to semantically maintain any arisen changes and ensure the consistent propagation of these changes to dependent artefacts. Ontology requires changes for many reasons. Firstly, ontology is an explicit specification of a conceptualization of a domain. Consisting a dynamic world, the changes in the domain will effect its ontology specification. For example, in our case study, GATE could update some of its functionalities, therefore, the ontologies used to annotate these services need to be revised accordingly. Secondly, changes in conceptualization can result from a changing view of the world and from a change in usage perspective. Different tasks may imply different views on the domain and consequently a different conceptualization. Lastly, currently, there has been much

44

D1.2 / SWS Bootstrapping Methodology debate on the most suitable ontology definition languages for Web applications. Description logics based ontology languages, such as OWL, are one major genre and frames based ontology languages are another genre. Each of them has their advantages and disadvantages and can be found more applicable for certain use cases. Before a unified representation was developed, ontology often needs to be translated from one knowledge-representation language to another. Both syntax and semantics of the ontology need to be preserved during translation. In (Ljiljana, Alexander et al.), the authors identified a possible six-phase evolution process, including: •

Change capturing The process of ontology evolution starts with capturing changes either from explicit requirements or from the result of change discovery methods, which induce changes from existing data, surrounding environment and ontology usages.



Change representation To resolve changes, they have to be identified and represented in a suitable format and also on various level of granularity, e.g. as elementary or complex changes.



Semantics of change The semantics of change refers to the effects of the change on the ontology itself, and, in particular the checking and maintenance of the ontology consistency after the change application. The task of this phase has to enable resolution of induced changes in a systematic manner, ensuring consistency of the whole ontology. To help in better understanding of effects of each change, this phase should contribute maximum transparency providing detailed insight into each change being performed.



Change implementation The role of the phase is to r to avoid performing undesired changes, before applying a change to the ontology. A list of all implications to the ontology should be generated and presented to the user in a comprehensive way. User can choose to commit a change or cancel it.



Change propagation Ontologies often reuse and extend other ontologies. Therefore, an ontology update might also corrupt ontologies that depend on the modified ontology and consequently, all artifacts that are based on these ontologies. The task of this phase of the ontology evolution process is to recognize which change in the ontology can affect the consistency of dependent ontologies, instances, as well as functionality of dependent applications. Reactions need to be taken correspondingly.



Change validation There are numerous circumstances where it may be desired to reverse the effects of ontology evolution, to name just a few: • The ontology engineer may fail to understand the actual effect of the change and approve the change that shouldn’t be performed. • It may be desired to change the ontology for experimental purposes. • When working on ontology collaboratively, different ontology engineers may have different ideas about how the ontology should be changed.

45

D1.2 / SWS Bootstrapping Methodology •

The validation phase in the ontology evolution process is used to enable recovering from these situations.

There exist different methods and tools to support the activities in each of above phase. (Patel-Schneider and Horrocks) and (Mariano, Asun et al. 2002 evaluating and reengineering ontologies) gave a detailed survey on state-of-art on different ontology evolution methods and tools.

5

TAO Transitioning Cookbook

In the previous chapter, we have presented a high-level methodology, which outlines some initial perspectives gleaned from an investigation of the relevant literature, and through collaboration with other partners within the TAO consortium. In this section, we present a cookbook-style guide on how the TAO suite can be used to assist on transitioning a legacy application to the SW platform. The Amazon Associates Web service (A2S)13 is used as a case study to illustrate the idea. 5.1 Case Study – Amazon Associates Web service (A2S) In this deliverable, we use the Amazon Web services as a case study to illustrate the main steps involved in our methodology. Amazon Web services provide developers with direct access to Amazon's robust technology platform. By using them, external developers and businesses can build their own applications on AWS in a reliable, flexible, and cost-effective manner. Amazon Web services offers a variety of web services, which include: • Amazon Associates Web service (A2S), which exposes Amazon's product data and e-commerce functionality. • Amazon Elastic Compute Cloud (Amazon EC2), which provides resizable compute capacity in the cloud. • Amazon Flexible Payments Service (Amazon FPS), which is the first payments service designed from the ground up specifically for developers. • Amazon Mechanical Turk, which provides a Web services API for computers to integrate Artificial Intelligence directly into their processing. • Amazon SimpleDB, which is a web service for running queries on structured data in real time. • Amazon Simple Storage Service (Amazon S3), which is used to store and retrieve any amount of data, at any time, from anywhere on the web. • Amazon Simple Queue Service (Amazon SQS), which offers a reliable, highly scalable-hosted queue for storing messages as they travel between computers. • Alexa Site Thumbnail, which provides developers with programmatic access to thumbnail images for the home pages of web sites. • Alexa Top Sites, which provides access to lists of web sites ordered by Alexa Traffic Rank. • Alexa Web Information Service, which makes Alexa's vast repository of information about the traffic and structure of the web available to developers. • Alexa Web Search, which offers programmatic access to Alexa's web search engine. 13

http://www.amazon.com/E-Commerce-Service-AWS-home-page/b/ref=sc_fe_l_6?node=12738641

46

D1.2 / SWS Bootstrapping Methodology

In this deliverable, we focus only on the Amazon Associates Web service (A2S). A2S (formerly named the Amazon E-Commerce Service “ECS”) exposes Amazon's product data through an easy-to-use web services interface that, when combined with the Amazon Associates Program, is a powerful combination for website owners, Web developers, and Amazon sellers to make money. Developers may use the Amazon Associates Web service as long as it's used primarily to drive traffic back to Amazon's web sites or sales of Amazon products and services. The functionality of A2S is defined in a WSDL file, which contains more than 20 operations. These operations support different tasks on Amazon's retail web site, including: • Find items to buy. These items are for sale by Amazon or other merchants. • Find information about those items. • Find customer reviews of items. • Show customers what others think about the items on sale. • Create a fully-functional shopping cart. • Add items that are immediately available or ones that will become available in the future, such as in a pre-sale of a book. • Add, remove, or modify the items in the shopping cart. • Have full control over the contents of their shopping cart. • Find information about the company selling the item • Show customers what others think about the merchant selling the item. • Find similar items for sale • Generate additional sales by suggesting other items similar to the ones the customers are buying. • Purchase the items in the shopping cart. • Once the customer decides to buy the contents in their shopping cart, Amazon takes care of the shipping, payment, and order fulfilment, or notifies you to take care of the same. • Find items on a friend's wishlist, wedding registry or baby registry and purchase those items. 5.1.1

Amazon A2S Data model

There are two types of data available through A2S. 1. Amazon product data, which give the information about the products available through A2S. There are three ways to consider the Amazon product. a. The offers model. A given product from Amazon's Web sites may come from many vendors, on different terms and conditions, at a different price and in a different condition (new, used, etc.). To offer Amazon products for sale, developers must work with these product offers to get current price and availability. b. The variations model. For some products such as apparel and sporting goods, users must specify other variation values such as size and colour before purchasing the product. A2S allows users to extract variation information for multi-version projects. c. Item images and attributes. Every product in ECS consists of images of the product as well as a set of attributes, which varies by product

47

D1.2 / SWS Bootstrapping Methodology type. A2S has over 200 different attribute fields to completely describe items in each product line. 2. Other data. This includes all other data that is adjunct to the product catalogue, such as individual wish lists and basic public information about customers. Develops can also access seller product listings and customer feedback about individual sellers. All of above data models have been defined using XML Schema. 5.2 Transitioning Cookbook In the previous chapter, we proposed an initial high-level methodology for transitioning a legacy application to SWS. There are three main phases in the methodology: • Knowledge Acquisition • Service-Oriented Ontology Learning • Semantic Content and Service Augmentation Based on this methodology, a more detailed and cookbook style guidelines for translating a legacy application to the semantic-based application using TAO suite is presented here. The Amazon A2S is used as a case study to illustrate the idea. Please note that migrating an application to other platforms is still more as a craft and there is no such a magical formula, which can be applied to all kinds of system transitions. For different migration tasks, this guideline may need to be slightly revised. Figure 2 presents a UML diagram to illustrate the main transitioning process and we will explain each of the activity in detail. As mentioned before, our methodology, has three main phases: the knowledge acquisition phase, the ontology learning phase and semantic content and the service augmentation phase. Each phase contains a set of tasks which may interact with each other. Given a legacy application, the domain engineers first check if there are some previously-developed ontologies for the application. Some public ontology search engines or public ontology libraries can be used for this task14 15 16. If such an ontology is found, it can be saved into the knowledge store developed by TAO for future usage, otherwise users have to derive the domain ontology from the legacy software. For the Amazon A2S case study, we develop the ontology from scratch with the assitance of TAO tools.

14

http://protegewiki.stanford.edu/index.php/Protege_Ontology_Library

15

http://swoogle.umbc.edu/

16

http://swse.deri.org/

48

D1.2 / SWS Bootstrapping Methodology

Figure 2: Cookbook methodology overview

49

D1.2 / SWS Bootstrapping Methodology 5.2.1

Knowledge acquisition

To derive the domain ontology from the legacy application using the TAO suite, we first need to collect all the relevant resources about the legacy application. Figure 3 shows the main tasks that a user performs during this phase.

Figure 3: Main tasks for knowledge acquisition

We identify some data sources which are commonly relevant to the description of a legacy system, and divide them into two groups based on their representation (i.e., structured or textual data). In previous chapter, we presented some other ways to classify the knowledge resources. The following shows the subtasks that software engineers perform during the knowledge acquisition task; some non-applicable tasks can be skipped. D2.1 explains those potentially relevant data sources that typically come with a set of reusable software components. In our Amazon A2S example, we have collected its Java source codes, JavaDoc files and A2S WSDL definition. Those documents can be downloaded from

50

D1.2 / SWS Bootstrapping Methodology http://developer.amazonwebservices.com/connect/entry.jspa?externalID=880 &ref=featured.



Knowledge acquisition • Collect structure documents for the legacy application. o Collect source codes o Collect API o Collect WSDL definition o Collect Database schema o Collect JavaDoc o Collect other structure documents • Collect textual documents for the legacy application. o Collect reference manuals o Collect source code comments o Collect Programmer’s guide o Collect annotator’s guide o Collect forum discussion o Collect textual documents

After collecting all the related data sources, we store them in the repository. •

Save the document corpuses to TAO Repository

The heterogeneous knowledge store developed by TAO (WP4) can be used to store these data sources. The heterogeneous knowledge store is designed for efficient management of different types of knowledge: unstructured content (documents), structured data (databases), ontologies, and semantic annotations, which augment the content with links to machine-interpretable metadata. More information about this heterogeneous knowledge store can be found from D4.2. 5.2.2

Ontology Learning

The purpose of ontology learning from pieces of software is essentially discovering concepts and relations in the source code, accompanying documentation, and external sources (such as the Web). Ontology learning is one of the most significant approaches proposed to date for developing ontologies. In previous chapter, we presented a detailed review of different ontology learning approaches. In this section, we show how to learn domain ontologies based on the TAO scenario; LATINO, a part of the TAO Suite, supports this.

Figure 4 shows the set of tasks that comprise ontology learning using LATINO. In the previous step, we collected a set of related data resources that describe the legacy application. To use LATINO to get ontologies from these resources, we need to first identify their contents and structures.

51

D1.2 / SWS Bootstrapping Methodology

Figure 4: Derive domain Ontology using LATINO

• Identify content and structure of software artifacts o Identify the text-mining instances o Assign textual document to instances o Determining the structure of between instances Given a concrete TAO scenario, the first question that needs to be answered by a software engineer is – what are the text-mining instances ( which is used as graph vertices when dealing with the structure) in this particular case, i.e., does the user need to study the data at hand and decide which data entities will play the role of instances17 in the transitioning process. It is impossible to answer this question in general – it depends on the available sources. Here we give some potential choices for TAO users as follows. • Using Java/C++ classes as text-mining instances • Using Java/C++ class methods as text-mining instances • Using XML schema elements and datatype as text-mining instances • Using Database entities as text-mining instances • ……

17

Instances in the data mining sense rather than the ontological sense.

52

D1.2 / SWS Bootstrapping Methodology In the GATE case study, the instances are the source code Java classes, whereas in the Dassault inclusion dependencies case study the instances are the database table columns. In our Amazon A2S case study, we mainly use the java classes and XML schemas as text-mining instances. Next, we need to assign as textual document (description) to each text-mining instance. This step is not obligatory, and perhaps not even possible when the data is such that it does not contain any unstructured textual data. Again there is not a universal standard for which text should be included, but it is important to include only those bits of text that are relevant and will not mislead the text-mining algorithms. Users should develop several (reasonable) rules for what to include and what to leave out, and evaluate each of them in the given setting, choosing the rule that will perform best. In general, the following information can be used to deal with most legacy applications that have well-commented Java/C++ source code available. • class comment, • class name, • field names • field comments • method names • method comments For the A2S case study, we also choose WSDL XML schema type names and assign them higher weight. Next, the user identifies the structural information, which is evident from the data. This step is also not obligatory, provided that textual documents have been attached to the instances. The user should consider any kind of relationships between the instances (e.g. links, references, computed similarities, and so on). Note that it is sometimes necessary to define the instances in a way that makes it possible to exploit the relationships between them. For Java/C++ classes, the potential links that can be extracted include: • Inheritance and interface implementation graph • Type reference graph • Class, operation name similarity graph • Comment reference graph More information about those types of links and the different calculations of link weight can be found in D2.1. For the above steps, the user inputs the instances, the documents, and the links between the instances into LATINO. They then use the following LATINO operations as a minimum (see D2.2 for more details): • CreateNewDocumentNetwork • AddInstance • RegisterRelation • AddArc After this step, the data pre-processing phase is complete.

53

D1.2 / SWS Bootstrapping Methodology • Transform contents and structures into feature vectors The text-mining algorithms employed by LATINO (and also many other data-mining tools) work with feature vectors. Therefore, once the text-mining instances have been enriched with the textual documents and discovered structure information, we need to convert them into feature vectors. LATINO is able to compute the feature vectors from a document network and pass them on to OntoGen. Although a single document network can yield several semantic spaces (i.e. several sets of feature vectors), only one such semantic space can be imported into OntoGen. Optimally, this semantic space would be such that the resulting ontology would be most suitable to the user’s needs. However, setting the parameters when creating a semantic space in LATINO is not a trivial task and influences greatly the amount of effort needed to produce a suitable ontology with OntoGen. It will also have an impact on the quality of ontologies that will be produced by LATINO. To help the user set the parameters, we have developed OntoSight, an application that gives the user insight into document networks and semantic spaces through visualization and interaction. For the usage of OntoGen, please refer to D2.2.2. The feature vectors for the Amazon case study can be found at http://www.ecs.soton.ac.uk/~hw/Amazon/Amazon_DocSimRel.BowW. Figure 5 shows a display of this feature vector file using OntoGen.

Figure 5: Amazon Feature Vector

The user now has two options: either to write out OntoGen files (by invoking WriteOntoGenFiles) and continue his/her work there, or to use LATINO’s ontology learning functions. The latter is not yet an option since the ontology learning functionality will be added to LATINO during the third project year. We thus stick to OntoGen at the moment to construct the domain ontology.

54

D1.2 / SWS Bootstrapping Methodology



Create domain ontology from feature vectors o Create Concept (Unsupervised or Supervised) o Manage concepts o Manage relations o Manage instances

The most important step of ontology development is identifying the concepts in a domain. Using OntoGen/LATINO, this can be performed by using either a fully automated approach such as unsupervised learning (e.g. clustering), or a semiautomated supervised learning (e.g. classification) approach. In the unsupervised approach, the system provides suggestions for possible subconcepts of the selected concept. OntoGen/LATINO has implemented three clustering methods: k-means, LSI and PH k-means. The user can modify a few parameters for these clustering methods (number of clusters or minimal inner-cluster similarity) and on which documents should the clustering be performed (All for all documents in the concept and Unused for documents that are not already in any of the concept’s subconcepts). After selecting the method and parameters the user can get sub-concept suggestions. The user checks these suggestions and decides if they wish to add them to the ontology (i.e., the suggestions are added as sub-concepts of the selected concept), or replace the selected concept with the suggested concepts (i.e., the selected concept is removed from the ontology and replaced with the checked suggested concepts. All of the relations of the selected concept are redirected to/from the new concept), just prune the suggested sub-concept (i.e., the suggested subconcept’s instances are removed from the selected concept). The supervised approach is based on Support vector machines (SVM) (Burges, 1998) active learning method, which are a set of related supervised learning methods used for classification and regression. The user can start this method by submitting a query. After the user enters a query, the active learning system starts asking questions and labelling the instances. On each step the system asks if a particular instance belongs to the concept. Questions are selected so that the most information about the desired concept is retrieved from the user. After some initial labelled sample is collected from the user the system displays some additional information about the concept. It displays the current size (number of documents positively classified into the concept) and most important keywords for the concept (using SVM keyword extraction). The user can continue answering the questions or stop. The more questions that the user answers the more correct assignment of instances in the final concept are. After the concept is constructed it is added to the ontology as a sub-concept of the selected concept. The main advantage of unsupervised methods is that they require very little input from the user. The unsupervised methods provide well-balanced suggestions for sub concepts based on the instances and are also good for exploring the data. The supervised method on the other hand requires more input. The user has to first figure out what should the sub-concept be, he has to describe the sub-concept trough a query and go through the sequence of questions to clarify the query. This is intended for the cases where the user has a clear idea of the sub-concept he wants to add to the

55

D1.2 / SWS Bootstrapping Methodology ontology but the unsupervised methods do not discover it. For the Amazon A2S case study, we have chosen the unsupervised approach, because we have little knowledge about the ontology. Apart from concept identification, OntoGen/LATINO also implicitly infers subsumption relations between concepts (newer version will also be able to discover some other types of relations).The user can fully customize each of the concepts by defining its instances. The system helps here by detecting outliers both inside and outside the concept. If new data becomes available after the ontology is constructed, the system can help by automatically classifying new instances into appropriate concepts. For more detailed instructions about the usage of OntoGen/LATINO, please refer to D2.2. After creating the domain ontology, it is important to refine it and ensure its correctness. Essentially, the ontology should be consistent at different levels. Figure 6 presents the major subtasks for the ontology design task.

Figure 6: Design ontology

• Design Ontology o Ensure the ontology is well-formed First of all, the ontology languages have predefined syntax, e.g. RDF/XML syntax. Knowledge represented in these languages must be well formed. Most ontology editors, including LATINO, can be used to check that the ontology is well-formed. o Ensure the ontology has right formalism Furthermore, to meet different usages, ontology languages often comes in various sub-languages or “species”. OWL has three different flavours – “OWL FULL”, “OWL DL” and “OWL LITE”. Thus, the ontology must be built to fall inside the desired species level. For most cases, the user wants to keep their ontologies within the scope of “OWL DL” or “OWL LITE” for ease of reasoning. Tools like the OWL Ontology Validator can be used to check the species of ontology.

56

D1.2 / SWS Bootstrapping Methodology

o Ensure the ontology is consistent An ontology cannot contain contradictory information. Therefore, next user needs to make sure that the domain ontology is logically consistent. For example it would be a mistake if we asserted that a pizza was both “Meaty Pizza” and “Vegetarian Pizza” in a knowledge base, given “Meaty Pizza” and “Vegetarian Pizza” are disjoint. Reasoners like Pellet, FaCT++ normally can pick up the logical inconsistency. o Ensure the ontology is contextually correct If an ontology is logically consistent, it does not necessarily follow that it accurately represents the real world. For example, without asserting that “Meaty Pizza” and “Vegetarian Pizza” are disjoint, the ontology is logically consistent even if we define a “meaty-vegetarian” pizza, even though this is an obvious error. To discover this kind of problems, the ontology needs to be tested by domain experts. • Save Ontology into TAO Repository After creating the domain ontology, we can save it into the TAO repository. Now we are ready to augment the existing content of a legacy application (including the service definition) semantically. We present the details in the following subsections. 5.2.3

Service and content augmentation

Figure 7 shows the activities for this phase.

57

D1.2 / SWS Bootstrapping Methodology

Figure 7: Service and content augmentation

We first need to identity which Web services users want to provide and also what kinds of other contents to be annotated. • Check if the service definitions exist. o Identify services • Identify other contents to be annotated Please note that normally the first step in creating a Web service is to design and implement the application that represents the Web service. This step includes the design and coding of the service implementation, and the testing to verify that all of its interfaces work correctly. After the Web service is developed, the service interface definition can be generated from the implementation of the service (i.e. the service

58

D1.2 / SWS Bootstrapping Methodology interface can be derived from the application's Application Programming Interface (API)). Web services interfaces are usually developed in WSDL documents that define the interface and binding of the corresponding Web service implementations. In the TAO scenario, we assume that the Web services and the corresponding WSDL definitions for a legacy application have already been developed. Therefore, we focus on helping users to annotate the existing WSDL definitions to get SA-WSDL definitions. TAO has developed a tool named Content Augmentation (CA) to assist users to develop the SA-WSDL definition and annotate other legacy documents. The Content Augmentation suite is a set of Java tools and a Web service being developed in TAO WP3, specifically tailored for automatic augmentation of legacy content. CA will be integrated as a part of TAO suite and it can be used separately as well. • Load/import WSDL or other resources to CA There are three ways for users to load the WSDL definitions or other resources to be annotated as follows: a. Load from a local file b. Load from a remote file using URL address c. Input the resources directly from user input In the Amazon case study, the WSDL file can be downloaded from http://webservices.amazon.com/AWSECommerceService/AWSECommerceService.w sdl. • Load domain ontology We also need to load the “annotation schemas”, which provide the means to define types of annotations in CA. In our case, the domain ontology defined previously will be used as the “annotation schemas”. Users can also define some other annotation schemas. Again, the ontology can be loaded from local files or URLs. • Start annotating o Automatically annotating o Manually annotating CA can annotate the loaded documents either automatically or manually; the user can just click a button. CA then goes through the WSDL file or other legacy content and automatically identifies the pieces of text or tag, which are related to concepts or relations defined in the domain ontology by using NLP techniques. Users can also manually select the text they want to annotate and link it to the proper concept from ontology. We choose the automatic approach to annotate the Amazon WSDL file and the user manual files. Figure 8 and Figure 9 show the screen shots for annotating Amazon user manual and Amazon WSDL file (Note that this screen shot is based on the Gate interface, which will be integrated into CA later). In Figure 7, the user manual explanation for “Help” operation is annontated using the domain ontology. The CA discovers the potential important terms within the document and annotated with the best matched concepts from ontology. In Figure 8, the XML schema name

59

D1.2 / SWS Bootstrapping Methodology defined within Amazon WSDL is linked to the domain ontology cecepts using the SA-WSDL attribute "modelReference”.

Figure 8: Annotating user manual using CA

Figure 9: Annotating WSDL files using CA

• View and revise annotations • Evaluate and refine the domain ontology

After annotating the legacy contents with the domain ontology concepts, we need to ensure that these semantic metadata are correctly asserted. The annotation could be improper in several ways. Domain experts need to manually check the correctness of them.

60

D1.2 / SWS Bootstrapping Methodology •

• •

Missed annotations. If the domain experts realize that there are some WSDL elements or texts in the legacy document, which should be annotated, but were missed by CA, users can manually annotated them. If there is no proper concept within the existing ontology, new concept will be asserted into the ontology. Unnecessary annotations. It is possible that CA has put some unnecessary annotations. Domain experts have to delete those annotations manually. Annotations with wrong concepts. If domain experts realized that the concept CA has chosen for annotation is not the most suitable, they need to revise them. • Ontology population

The CA can also identify a set of potential instances for the classes in the domain ontology from the legacy content. User will decide whether or not to accept these assertions. During above processes, whenever the domain ontology is revised, users need to ensure the ontology is still correct. • Save annotations Finally the legacy contents and the related semantic augmentations are stored in the heterogeneous knowledge store. This is important if users are working with large datasets. It is also the safest way to ensure that the annotations can be reloaded as same as before. The annotation can be saved either separately with the legacy contents or embedded within the legacy files. For more information about the usage of CA, please refer to D3.4. In the SOA lifecycle, the next phases are service deployment and service descriptions evaluation and refinement. • Deploy Services • Evaluate and refine services Service Deployment refer to the process of deploying services within a service execution environment and service evaluation refers to the ongoing monitoring of a SOA system to determine whether it meets its design goals. During the course, if users realize any problems, the domain ontology and SA-WSDL definitions are revised. Because that those phases are not focuses of TAO project (the scope of TAO is just generating the semantic descriptions), we will not give more details about them here. Previous chapter has presented some general guidelines for these tasks. In the coming deliveraible D1.3, the methodology will be evaluated in more details. We plan to carry out the evaluation from several aspects: • What is the performance of the methodology and tools – ontology extraction and annotation performance

61

D1.2 / SWS Bootstrapping Methodology • •

Is the extracted ontology a good basis for ontology building? – expert evaluation Does the extracted ontology and semantic annotated resources support a certain task, such as more effective query and answer? - appropriateness for a task

The coming deliveribles D6.4 and D7.4 also present a detailed evaluations for the Gate and Aviation case studies.

6 Conclusion This document has taken an ontological view of Semantic Web services, by considering the different methodologies available for ontology creation and evaluation, by considering the design criteria used when modelling a domain. The lifecycle of a Semantic Web service was examined, in order to understand the utility of providing the various models necessary for services described using ontological frameworks such as SA-WSDL, OWL-S and WSMO. We also presented a cookbook style transitioning guide specific to TAO suite. In addition, a set of guidelines and lessons learned have been presented that should guide future development of the methodology, and the use of ontologies. Although an initial methodology has been presented in this document, it is anticipated that significant changes will occur as the methodology is implemented and tested empirically, possibly resulting in a re-evaluation of the types of ontologies used, and how such ontologies may be aligned with other models used by other service providers and consumers. However, the contribution of this methodology is significant as it will permit the investigation and evaluation of existing tools and know-how in the development and use of Semantic Web services, leading to improvements in future work.

7

Acknowledgments

The authors would like to thank Ayomi Bandara, Valentina Tamma and Ora Lassila for their valuable comments, discussions and insights into the process of developing and utilising ontologies for Services.

62

D1.2 / SWS Bootstrapping Methodology

Bibliography and references Adolfo, L.-T. and A. Gomez-Perez (2004). "ONTOMETRIC: a method to choose the appropriate ontology." Journal of Database Management 15(2): 1 - 18. Ankolenkar, A. a. B. M. a. H. J. R. a. L. O. a. M. D. L. a. M. D. a. M. S. A. a. N. S. a. P. M. a. (2002). DAML-S: Web service Description for the Semantic Web. Sardinia, Italy\. Anupriya, A., S. Katia, et al. (2006). Supporting online problem-solving communities with the semantic web. Edinburgh, Scotland \, ACM Press}. Arkin, A., S. Askary, et al. (2002). Web service Choreography Interface (WSCI) 1.0. Aussenac-Gilles, N., B. Biebow, et al. (2000). Revisiting Ontology Design: A Methodology Based on Corpus Analysis. EKAW '00: Proceedings of the 12th European Workshop on Knowledge Acquisition, Modeling and Management, London, UK, Springer-Verlag. Avrim, B. and C. Shuchi (2001). Learning from Labeled and Unlabeled Data using Graph Mincuts, Morgan Kaufmann Publishers Inc.}. Baader, F., D. Calvanese, et al. (2003). The Description Logic Handbook : Theory, Implementation and Applications, Cambridge University Press. Bechhofer, S., F. v. Harmelen, et al. (2004, Feb). "OWL Web Ontology Language Reference." Bernaras, A., I. Laresgoiti, et al. (1996). Building and Reusing Ontologies for Electrical Network Applications. ECAI, John Wiley and Sons, Chichester. Blázquez, M., M. Fernández, et al. (1998). Building Ontologies at the Knowledge Level using the Ontology Design Environment. The 11th Knowledge Acquisition Workshop (KAW’98), Banff, Canada. Borst, P. (1997). Construction of Engineering Ontologies for Knowledge Sharing and Reuse. Bourigault, D. (1995). Lexter: A terminology extraction software for knowledge acquisition from texts. {KAW}'95. BPMI.org (2001). Business Process Modeling Language (BPML), Business Process Management Initiative. Brank, J., M. Grobelnik, et al. (2005). A SURVEY OF ONTOLOGY EVALUATION TECHNIQUES. Conference on Data Mining and Data Warehouses (SiKDD05), Ljubljana, Slovenia.

63

D1.2 / SWS Bootstrapping Methodology Brewster, C., H. Alani, et al. (2004). Data Driven Ontology Evaluation. Proceedings of International Conference on Language Resources and Evaluation ({LREC}). Brickley, D. and R. V. Guha. (2004, Feb). "RDF Vocabulary Description Language 1.0: RDF Schema." Burges, C. J. (1998). A Tutorial on Support Vector Machines for Pattern Recognition. Data Min. Knowl. Discov. 2, 2 , 121-167. Bussler, C. (2001). "B2B protocol standards and their Role in Semantic B2B Integration Engines." IEEE Data Engineering Bulletin 24(1): 3--11. Cardoso, J. and A. Sheth (2003). "Semantic E-Workflow Composition." J. Intell. Inf. Syst. 21(3): 191--225. Cardoso, J., A. Sheth, et al. (2004). "Quality of service for workflows and web service processes." Journal of Web Semantics 1(3): 281--308. Chinchor, N. (1992). MUC-4 Evaluation Metrics. Proceedings of the Fourth Message Understanding Conference. Chinnici, R., J. J. Moreau, et al. (2006). Web services Description Language (WSDL) Version 2.0 Part 1: Core Language. Cimiano, P. and J. Volker (2005). Text2Onto - A Framework for Ontology Learning and Data-driven Change Discovery. Proceedings of the 10th International Conference on Applications of Natural Language to Information Systems (NLDB), Alicante, Spain, Springer. Domingue, J. (1998). Tadzebao and WebOnto: Discussing, Browsing, and Editing Ontologies on the Web. Proceedings of the 11th Workshop on Knowledge Acquisition, Modeling and Management (KAW '98), Banff, Canada. Douglas, B. L. and R. V. Guha (1989). Building Large Knowledge-Based Systems; Representation and Inference in the Cyc Project, Addison-Wesley Longman Publishing Co., Inc. Electrical, I. and Electronics (1990). IEEE 90: IEEE Standard Glossary of Software Engineering Terminology. FaCT++. (2003). "http://owl.man.ac.uk/factplusplus/." Fensel, D. and E. Motta (2001). Structured Development of Problem Solving Methods, IEEE Educational Activities Department}. Fernandez, M., A. Gomez-Perez, et al. (1997). METHONTOLOGY: from Ontological Art towards Ontological Engineering. Proceedings of the AAAI97 Spring Symposium Series on Ontological Engineering, Stanford, USA.

64

D1.2 / SWS Bootstrapping Methodology Fortuna, B., M. Grobelnik, et al. (2007). OntoGen: Semi-automatic Ontology Editor. HCI International. Gangemi, A., D. M. Pisanelli, et al. (1999). "An Overview of the ONIONS Project: Applying Ontologies to the Integration of Medical Terminologies." Data Knowledge Engineering 31(2): 183-220. Garcia, D. (1997). "COATIS, an NLP System to Locate Expressions of Actions Connected by Causality Links." j-LECT-NOTES-COMP-SCI 1319: 347--?? Graham, S., S. Simeonov, et al. (2001). Building Web services with Java: Making Sense of XML, SOAP, WSDL and UDDI. Indianapolis, IN, USA, Sams Publishing. Gruber, T. R. (1993). "A translation approach to portable ontology specifications." Knowledge Acquisition 5(2): 199-220. Gruninger, M. and M. Fox (1995). Methodology for the Design and Evaluation of Ontologies. IJCAI'95, Workshop on Basic Ontological Issues in Knowledge Sharing, April 13, 1995. Guarino, N. (1992). "Concepts, attributes and arbitrary relations." Data Knowledge Engineering 8: 249-261. Guarino, N. (1998). Formal Ontology and Information Systems. International Conference On Formal Ontology In Information Systems FOIS'98, Trento, ITALY, Amsterdam, IOS Press. Guarino, N., C. Welty, et al. (2004). An overview of OntoClean. Handbook on Ontologies, Springer: 151-159. Guarino, N. and C. A. Welty (2000). A Formal Ontology of Properties. Knowledge Acquisition, Modeling and Management. Haarslev, V. and R. Möller (2001). RACER System Description. Hartmann, J., Y. Sure, et al. (2004). Methods for ontology evaluation, University of Karlsruhe. Heijst, G. v., A. T. Schreiber, et al. (1997). "Using explicit ontologies in KBS development." Int. J. Hum.-Comput. Stud. 46(2-3): 183-292. Kopecky, J., Vitvar, T., Bournez, C., and Farrell, J. (2007). SAWSDL: Semantic Annotations for WSDL and XML Schema. IEEE Internet Computing. 11(2-3): 60--67.

Jones, D., T. Bench-Capon, et al. Methodologies for Ontology Development. Jos de, B., Rub, et al. (2005). OWL DL vs. OWL flight: conceptual modeling and reasoning for the semantic Web. Chiba, Japan , ACM Press}.

65

D1.2 / SWS Bootstrapping Methodology

Jouis, C. and W. Mustapha-Elhadi (1995). Conceptual Modelling of Database Schema Using Linguistic Knowledge. Application to Terminological Databases? NLDB. Klaas, D. and S. Steffen (2006). In: Proc. of ISWC-2006 International Semantic Web Conference\. Athens, GA, USA\, Springer, LNCS\. Lassila, O. and R. R.Swick (1999). Resource Description Framework (RDF) Model and Syntax Specification - W3C Recommendation 22 February 1999, World Wide Web Consortium. Ljiljana, S., M. Alexander, et al. (2002). User-Driven Ontology Evolution Management, Springer-Verlag}. Maedche, A. and S. Staab (2002). Measuring Similarity between Ontologies. Mariano, F. (1999). Overview of Methodologies for Building Ontologies. IJCAI99's Workshop on Ontologies and Problem Solving Methods: Lessons Learned and Future Trends. Mariano, F. L., G. P. Asun, et al. (2002). A survey on methodologies for developing, maintaining, integrating, evaluating and reengineering ontologies. Noy, N. and M. Musen (2002). The PROMPT suite: Interactive tools for ontology merging and mapping. Paolucci, M., T. Kawamura, et al. (2002). Importing the semantic web in uddi. Parsia, B., E. Sirin, et al. (2005). Debugging OWL ontologies. Proceedings of the 14th International Conference on the World Wide Web WWW'05, Chiba, Japan, ACM Press. Patel-Schneider, P. and I. Horrocks (2006). Position paper: a comparison of two modelling paradigms in the Semantic Web. WWW '06: Proceedings of the 15th international conference on World Wide Web, ACM. Payne, T. and O. Lassila (2004). "Guest Editors' Introduction: Semantic Web services." IEEE Intelligent Systems 19(4): 14--15. Payne, T. and O. Lassila (2004). "Semantic Web services." IEEE Intelligent Systems 19(4): 14-15. Payne, T. R., N. Sánchez, et al. (2007). Requirement analysis and assessment of relevant methodologies. Porzel, R. and R. Malaka (2004). A Task-based Approach for Ontology Evaluation. ECAI-2004 Workshop on Ontology Learning and Population, Valencia, Spain.

66

D1.2 / SWS Bootstrapping Methodology R., O., F. P., et al. (1996). Term identification and Knowledge Extraction. International Conference on Applied Natural Language and Artificial Intelligence, Montreal. Reinberger, M.-L. and P. Spyns (2004). Discovering Knowledge in Texts for the Learning of DOGMA-inspired Ontologies. Proceedings of the workshop Ontology Learning and Population, Valencia, Spain. Roman, D., U. Keller, et al. (2005). "Web services Modeling Ontology." Journal of Applied Ontology 39(1): 77-106. Sabou, M. (2006). Building Web service Ontologies. Srinivasan, N., M. Paolucci, et al. (2004). Adding OWL-S to UDDI, implementation and throughput. First International Workshop on Semantic Web services and Web Process Composition, San Diego, California, USA. Staab, S. and R. Studer (2004). Handbook on Ontologies, http://www.ub.uniduisburg-essen.de/ {UniDUE Signatur TVU8580}. Stein, S., N. R. Gennings, et al. (2006). Flexible Provisioning of Service Workflows. Proceedings of the 17th European Conference on Artificial Intelligence, IOS Press. Stohr, E. A. and J. L. Zhao (2001). "Workflow Automation: Overview and Research Issues." Information Systems Frontiers 3(3): 281--296. Sure, Y., H. Akkermans, et al. (2003). On-To-Knowledge: Semantic Web Enabled Knowledge Management. Web Intelligence. N. Zhong, J. Liu and Y. Yao, SpringerVerlag: 277-300. Swartout, B., R. Patil, et al. (1996). Toward Distributed Use of Large-Scale Ontologies. the 10th Workshop on Knowledge Acquisition, Banff, Canada. Tolksdorf, R., L. J. B. Nixon, et al. (2005). Enabling real world Semantic Web applications through a coordination Middleware. 2nd European Semantic Web Conference, Springer-Verlag. Tony, A. and F. a. D. Curbera, Hitesh and Goland, Yaron and Klein, Johannes and Leymann, Frank and Liu, Kevin and Roller, Dieter and Smith, Doug and Thatte, Satish and Trickovic, Ivana and Weerawarana, Sanjiva (2002). BPEL4WS: Business Process Execution Language for Web services Version 1.1, Web Page. Uschold, M. (1995). Towards a Methodology for Building Ontologies. Uschold, M. and M. Grüninger (1996). "Ontologies: principles, methods, and applications." Knowledge Engineering Review 11(2): 93-155. VELARDI, P., R. NAVIGLI, et al. (2006). "Evaluation of OntoLearn, a methodology for Automatic Learning of Ontologies." Ontology Learning and Population.

67

D1.2 / SWS Bootstrapping Methodology Wang, H., M. Horridge, et al. (2005). Debugging OWL-DL Ontologies: A Heuristic Approach. Proceedings of the 4th International Semantic Web Conference (ISWC 2005), Ireland, ACM Press. Wang, H. H., N. Noy, et al. (2006). Frames and OWL side by side. 9th Intl. Protégé Conference, Stanford, US.

68