CleOn: Resolution of Lexically Incoherent Concepts in an Engineering ...

3 downloads 0 Views 367KB Size Report
sector and the terminology used by academics in mechanical engineering. ... very least, an ontology is a formally structured terminology that allows software.
CleOn: Resolution of Lexically Incoherent Concepts in an Engineering Ontology? Quentin Reul, Derek Sleeman, and David Fowler Department of Computing Science, University of Aberdeen, Aberdeen, AB24 3UE, UK {q.reul,d.sleeman,d.fowler}@abdn.ac.uk

Abstract. Ontologies are vital for the achievement of the Semantic Web. Given their central role it is highly desirable that ontologies be evaluated and made consistent. Several evaluation approaches investigate whether an ontology is “fit for purpose”. Further approaches focus on the consistency of the axioms populating the ontology. More specifically, the OntoClean methodology determines the adequacy of taxonomic relationships based on the philosophical interpretation of concepts. By contrast, we propose a new evaluation approach, CleOn, which evaluates the adequacy of taxonomic relationships between sub-concept concept pairs. This method effectively processes the linguistic information inherent in class names, and defines a lexical path for each concept in the taxonomy in terms of its immediate parent, and recursively for all terms back to the thesaurus’s root node. The lexical adequacy of a link between a parent and a child concept is then defined in terms of their lexical paths. As part of this research, we have applied our methodology to a domain specific ontology and show our approach is simpler to apply than the OntoClean methodology. Further, we demonstrate the discordance between the terminology used by a large company in the aerospace and defence sector and the terminology used by academics in mechanical engineering.

1

Introduction

Product design is a complex problem-solving activity whose aim is to produce a physical artifact (e.g. turbofan engine) to a set of performance requirements, on time, and on or under budget. As the design activity is a collaborative activity among many engineers, the need to facilitate the representation, acquisition, sharing and reuse of corporate design knowledge has become more critical. Furthermore, large companies, like Rolls-Royce, have shifted their focus from selling products to providing services through maintenance service agreements [17]. Therefore, minimizing the maintenance cost throughout the life of an engine is fundamental to the success of this market shift. As a result, engineers ?

Technical Report AUCS/TR0801 (February 2008), Dept. of Computing Science, University of Aberdeen, AB24 3UE, United Kingdom.

must also gather relevant knowledge from maintenance histories of similar products. Hence, the challenge confronting the organisation is to deliver the right information to the right people at the right time from this significant amount of documentation and data. The goal of the IPAS project [18] is to exploit Semantic Web technologies to address this problem. The Semantic Web [2] extends the current World Wide Web (WWW) with resources in machine-readable form. These resources are given meaning through the use of ontologies. An ontology is commonly defined as: “a [formal,] explicit specification of a [shared] conceptualisation” 1 [13]. At the very least, an ontology is a formally structured terminology that allows software agents to search, process and present information in a meaningful manner (across distributed systems). Given the central role of ontologies in the IPAS project, it is highly desirable that these ontologies are evaluated and made consistent. Ontology evaluation should occur during the ontology development process (i.e. ontology building) as well as the ontology maintenance process (i.e. ontology evolution). Several methodologies for evaluating the structure of ontologies have been proposed over the years. For example, the OntoClean methodology [14] introduce a coding scheme for concepts, which uses meta-properties derived from philosophy; namely rigidity, identity, unity, and dependence. These meta-properties suggest several constraints for detecting inadequate taxonomic relationships in ontologies. Although the OntoClean approach seems to be promising, we argue that knowledge engineers (and certainly domain experts) would find it very difficult to apply the meta-properties to domain specific ontologies with any consistency [33]. Thus we advocate an alternative approach detecting lexical incoherencies in ontological structures given a thesaurus. As part of our approach, each concept in the ontology is associated with a term in the thesaurus from which lexical paths are created. Given these lexical paths, we are able to evaluate the lexical adequacy of a link between a parent and a child concept. The rest of the paper is organised as follows. Section 2 covers examples of related research on the use of ontologies in engineering applications and on ontology evaluation. In section 3, we illustrate how our methodology processes the linguistic information inherent in class names to evaluate taxonomic relationships by extracting lexical paths from a thesaurus (i.e. WordNet [9]). The following section describes the current version of the system. In section 5, we evaluate our methodology by applying it to a domain specific ontology (mechanical engineering). Finally, we discuss our results and outline future work.

2

Related Work

In this section, we present several applications that incorporate ontologies in the engineering domain. These systems perform a range of tasks including managing 1

Gruber’s original version is without the words “formal” and “shared”, which nowadays are accepted to describe more precisely the intention of ontologies.

knowledge about past projects, and checking configurations for rule violations. Furthermore, we provide an overview of available ontology evaluation techniques. 2.1

Engineering System Incorporating Ontologies

Szykman et al. [35] propose the introduction of “design repositories” which not only store heterogeneous information (e.g. images, CAD models, mathematical simulation models, etc.), but support the retrieval and reuse of the knowledge (e.g. search for components or assemblies that satisfy required functions). The approach requires the development of a modelling language for representing design objects. The paper proposes to represent geometry data in the Standard for the Exchange of Product model (STEP)2 , and argues for a taxonomy to represent a common terminology for engineering functions and associated flaws. They further argue that this will reduce ambiguity and enhance the uniformity of information across models. The Designers’ Workbench [10] is a system, developed to support designers in large organizations, such as Rolls-Royce, that ensures that the design is consistent with the specification for the particular design as well as with the company’s design rule book(s). The ontology represents parts and features, either geometric or non-geometric, physical or abstract, that can be applied to a configuration. Users can select classes from the ontology, and add an instance of that class to their configuration. Properties of the instance can express the parameters of the feature, and also describe connections to other features. Lukibanov [22] suggests using template-based design to integrate best practices, and hence reduce costly downstream mistakes. A template is a collection of parametric designs that can contain executable geometric macros. Lukibanov proposes an approach which uses a separate ontology for each of the developed templates. The ontology describes the input-, processor- and output- parameters of the template, and also links to CAD objects. The changes occurring in one template can then be propagated to other templates using the relations between the templates in the set of ontologies. Additionally, it must also be consistent with design standards of the company. Golebiowska et al. [12] use ontologies to retrieve similar incidents, detect correlation with other incidents and hence enable the reuse of existing solutions within Renault. The authors describe four ontologies; namely component ontology, problem ontology, service ontology, and project ontology. The concepts present in these are used to annotate the problem descriptions highlighted during the validation phases and are stored in the Problem Management System (PMS). These annotated problem descriptions can then be retrieved by entering terms from one or more ontologies. Furthermore, related descriptions can be explored by moving up or down the hierarchies of the various ontologies. Although designing an artifact to specification is crucial, communication of global design changes (e.g. change of standards, marketing concerns, etc.) to the whole design group can cause significant problems as it can affect several aspects of the design. 2

http://www.tc184-sc4.org

2.2

Ontology evaluation

Given the central role of ontologies in the Semantic Web, knowledge engineers confronted with ontologies should assess the fitness of each ontology with regard to particular applications. Furthermore, it is highly desirable for the quality of ontologies to be evaluated. Quantitative approaches Quantitative approaches evaluate different ontologies with regard to an application. The first alternative compares the target ontology to a Gold Standard 3 . Maedche and Staab [23] propose a set of measures to evaluate the semantic structure of two ontologies. Their approach is based on the notion of semantic cotopy (SC), which is the set of all super- and subconcepts for each concept in an ontology. Local taxonomic overlap (TO) calculates the number of common elements in the SC for the same concept in both ontologies. If the concept is not present in both ontologies, they approximate the local taxonomic overlap by calculating the maximum overlap for a fictitious member of the set, hence giving an “optimistic” value. The global taxonomic overlap (T O) is the average of the local taxonomic overlaps resulting in a similarity measure between two taxonomies. Although comparing an ontology to a gold standard highlights discrepancies between the ontology and the domain, these discrepancies could be caused by a poor Gold Standard. As a result, Brewster et al. [3] propose evaluating the fitness of an ontology with regard to the knowledge present in a domain specific corpus of documents. The methodology starts by organising terms from the corpus in clusters through Expectation-Maximization algorithm [7]. Each term in a cluster is then expanded with two levels of hypernyms from WordNet [9]. The last step maps the terms from the corpus against concepts in the ontology. The authors assert that their approach measures the alignment between the hidden structure in the corpus and the structure of the ontology. Alternatively, Porzel and Malaka [28] evaluate how effective an ontology is for a well defined task compared to a Gold Standard. They propose to base the comparison of concepts, subsumptions and non-taxonomic relationships on the notion of “error rate”; namely the number of insertion-, deletion-, and substitutionerrors. Qualitative approaches Qualitative approaches can be used to evaluate the quality of a single ontology. Haase and Stojanovic [16] propose three different types of ontology consistency; namely structural consistency, logical consistency and user-defined consistency. However, we believe that the term syntactical consistency is closer to the intended meaning of structural consistency. Furthermore, we propose using the term structural consistency to describe the quality of the ontological structure. 3

A gold standard ontology is a manually built reference hierarchy of a domain.

A syntactically consistent ontology is one whose syntax is compliant with the grammatical definition of the language. Ontology editors, such as Prot´eg´e [26, 20] and SWOOP [19], provide some assistance in creating syntactically consistent ontologies. For example, Prot´eg´e [20] provides tests to check whether an ontology complies to the OWL DL specification (i.e. inform the user of OWL Full constructs). A logically consistent ontology is one where all concepts in the ontology are satisfiable, i.e. all concepts in the ontology have a model. The offending axioms can be discovered with the help of reasoners like Racer [15], FaCT++ [36] and Pellet [32]. Several approaches [31, 24, 21] have been proposed for repairing these unsatisfiable concepts. For example, Lam et al. [21] have produced a tool capturing the set of concept components responsible for the unsatisfiability based on the tableau algorithm. Furthermore, the tool suggests two distinct approaches to resolve inconsistencies. On the one hand, they suggest removing the offending part of the axiom or the whole axiom and propose a way for measuring the impact on entailment of removing the axiom. On the other hand, they propose rewriting the offending axioms. A structurally consistent ontology is one where the structure of the ontology is sound. Guarino and Welty [14] have argued that many ontologies contained inadequate taxonomic relationships. In the OntoClean methodology, the authors introduce several meta-properties drawn from philosophy; namely Unity, Identity, Essence (also known as Rigidity) and Dependence. For example, the notion of Unity is defined by analysing whether an individual is a whole (i.e. if it is made by a set of parts unified by relations). Moreover, the evaluation is dictated by the constraints imposed on the different meta-properties. For example, “a property carrying anti-unity (∼U) has to be disjoint of a property carrying unity(+U)” is used to determine the adequacy of a taxonomic relationship using the notion of Unity. As a result, the OntoClean methodology highlights common modelling errors such as the confusion between constitution and subsumption. Appendix C contains a more detailed description of the OntoClean methodology. However, a study by Volker et al. [37] finds that fairly experienced knowledge engineers took between four and six hours to assign OntoClean meta-properties to 266 concepts present in the domain-independent Proton4 ontology. Furthermore, the assignment of these meta-properties was only 38% consistent between the knowledge engineers. As a result, Volker et al. propose to facilitate the assignment of OntoClean meta-properties by associating patterns with each of them, and using these patterns to search the web for instances (of the meta-properties).

3

The CleOn Methodology

In this section, we introduce CleOn, which compares taxonomic relationships in ontologies with terms found in a thesaurus. Our evaluation is based on the 4

http://proton.semanticweb.org/

assumption that the choice of a concept name in the ontology is linked to the linguistic information conveyed by the chosen term. A thesaurus generally classifies linguistic terms by themes or topics. Often, this classification is achieved through the hypernym relation, which specifies that if a term A is a hypernym of a term B then A is more general than B. In this document, concept names in the ontology are denoted in ‘italic’ and terms from the thesaurus in bold.

Fig. 1. The CleOn methodology

Our overall approach contains three phases (Figure 1). We first extract a lexical path for each concept in the ontology. A concept is considered lexically satisfiable only if an exact match to a term in the thesaurus is found. The lexical path of a lexically satisfiable concept is created by adding its matching term in the thesaurus, then its hypernym, then the hypernym of its hypernym, and so on until the root term of the thesaurus is encountered (Definition 1). Definition 1 (Lexical Path). Given CN is a concept name, c1 , . . . , cn are terms, root is the top-level generic term in the thesaurus, the lexical path of CN, denoted as LP(CN), is defined as: {≺ c1 , ..., cn  |(c1 = CN ) ∧ (cn = root) ∧ (∀1

Suggest Documents