TERMINAE: a method and a tool to build a domain ontology - CiteSeerX

7 downloads 108942 Views 148KB Size Report
TERMINAE: a method and a tool to build a domain ontology. Brigitte Biébow and Sylvie Szulman. Université de Paris-Nord,. Laboratoire d'Informatique de ...
TERMINAE: a method and a tool to build a domain ontology Brigitte Bi´ebow and Sylvie Szulman Universit´e de Paris-Nord, Laboratoire d’Informatique de Paris-Nord(LIPN) Av. J.B. Cl´ement 93430 VILLETANEUSE (France) [email protected] [email protected]

Abstract. The purpose of TERMINAE is to help building an ontology, both from scratch and from texts, without control by any task. Requirements have been defined for a methodology on the basis of real experiments. TERMINAE fulfills these requirements, involving theoretical bases from linguistics and knowledge representation. Its strong points are integration of a terminological approach and an ontology management, precise definition of concept types reflecting modeling choices, and traceability facilities. This paper presents briefly the underlying methodology of the tool, which is under development in Java, to introduce the demo. A more comprehensive presentation is given in [2]:

1 A terminological point We use the term ”domain ontology”, with the meaning of ”ontology” in [12]: ” An ontology is a hierarchically structured set of terms for describing a domain that can be used as a skeletal foundation for a knowledge base ”. This definition seems to be totally compatible with that of [7]: ” An ontology is a logical theory accounted for the intended meaning of a formal vocabulary, i.e. its ontological commitment to a particular conceptualization of the world ”. We use also the same distinction between top-level ontology, domain ontology, task ontology and application ontology as in [7]. We agree too with Guarino’s definition of a knowledge base as being obtained by specialization of an ontology to a particular state of the world. In TERMINAE, the computer-aided tool presented in this paper, we often use the term ”knowledge base” for ”ontology of a generic knowledge base” because what we speak about is always state-independent, and the term ”ontology management” is not yet widely used. TERMINAE is used as an ”ontology management” tool, even if its representation language allows the description of facts by individual concepts.

2 A lesson from experiments The lesson we learnt through our modeling experiments reinforces the literature on ontology: when modeling is controlled by the final task and when the domain is well established or concrete, modeling is easier than without any application bias or when the domain is new, informal, and hardly investigated. When there is no task to drive the domain building, the modeling looks like the work of a linguist, lexicologist or terminologist. The major difference is that the final modeling has to be used not only for human understanding or translation, but also for automatic inferences, which need more formal modeling. Researchers from computational lexical semantics ([10],[8]) have begun to investigate the close relations between these two approaches, domain ontology modeling and computational lexical semantics. The description of all the possible semantic uses of a word may be possible if it is restricted to a specialized domain relatively to a corpus, while it seems an inaccessible goal in general language. What is needed in both domains is understanding, i.e making ” reasonable ” inferences. Linguistic methods to define lexical items or terms (lexical items in a specialized domain) are usually introspection (traditional in classical lexicography) and, more recently, corpus analysis (traditional in classical terminology). Even if they are not formal, linguistic methods are rigorous, and there are now usable linguistic tools to help the work. We think, as others ([11], [1], [9]), that domain modeling would benefit from a close interaction between linguistic methods or tools and computer-aided knowledge engineering methods or tools. Ontology, terminology, and lexical semantics aim to describe the world through the words of the language, in all language’s generality for lexical semantics, restricted to a technical domain for terminology and for ontology as we have defined it. Our idea is to push the integration of these disciplines as far as possible into a tool, TERMINAE.

3 Some requirements for TERMINAE Since the beginning of the 90’s, a lot of principles have been elicited for the design of ontologies; the best known may be those of [5], [6]. A lot of ontologies have been designed in big or small projects (see [4] for a review of worldwide known projects and the general literature from the recent conferences or workshops on ontologies or modeling [13], [14] ,[15] ,[16]. But researchers are still asking for guidelines and methodologies, and building usable or reusable ontologies faces the same difficulties. To these existing principles, we propose to add some requirements that fit the need we met during our work on modeling. All the experiments faced the problem of building an ontology of a domain from texts and we needed a tool to help us. Some exist, but none fitting our needs. We wanted to have a linguistic approach, to take advantage of the method and techniques existing in the terminology domain. We wanted a CAKE tool to help the human task as much as possible. We wanted a formal ontology to help validate the ontology, while avoiding most of the common mistakes such as redundancy and inconsistency, and we also wanted to be able to query the ontology and make inferences. This led to the following requirements for building a domain ontology from scratch and from texts without being task-driven.

* Linguistic-based methods: Linguistic methods such as the study of terminology are required. Terminology is studied from domain texts, that is to say a description of a term is elaborated from its occurrences in the texts. * A typology of concepts to highlight the modeling choices: When modeling an ontology, different types of concepts are elaborated. Some come from the text, others from the type of text, from the domain, from metaknowledge, from common-sense knowledge. Some are introduced to structure the ontology bottom-up or top-down. It is important to be able to distinguish the modeling choices in order to understand and maintain the ontology. * Formality to avoid as far as possible incoherence and inconsistencies: The support of an ontology has to be formal to avoid incoherence and to allow further inferences. The drawback of this option is a loss of meaningful substance but this is the price of correctness and automation. * Traceability, maintainability, back linking to texts: A condition of usability is the ability to understand the ontology, i.e. to be able to decide if the underlying conceptualization fits the addressed problem or domain or not. This implies a documented ontology, with links to its sources and comments on the modeling process. TERMINAE has been built to meet these requirements.

4 TERMINAE 4.1 Overview of the tool TERMINAE is a computer-aided knowledge engineering tool written in Java. Its originality is to integrate linguistic and knowledge engineering tools. The linguistic engineering part allows the definition of terminological forms from the study of term occurrences in a corpus. A terminological form defines each meaning of a term, called a notion, using some linguistic relations between notions, such as synonymy. The knowledge engineering part involves knowledge-base management with an editor and browser for the ontology. The tool helps to represent a notion as a concept, which is called a terminological concept. 4.2 The methodology TERMINAE supports a methodology to build terminological concepts from the study of the corresponding term in a corpus. The first step is to establish the list of terms. This requires the constitution of a relevant corpus of texts on the domain. Then LEXTER [3], a term extractor, proposes to the knowledge engineer a set of candidate terms from which the effective terms have to be selected with the help of an expert. The next step is to conceptualize each term. The knowledge engineer analyzes the uses of the term in the corpus to define all the notions (meanings) of the term. He/she gives a definition in natural language for each notion and then translates the definition into a formalism. The new terminological concept finally may or may not be inserted into the ontology, depending on the validity of the insertion. Figure 1 shows the path from text to terminological concepts.

Fig. 1. From text to Knowledge base

5 Conclusion This text sketches the underlying principes of TERMINAE and introduces to a demo which will be presented at the International workshop on Ontological engineering on May 25, 199, Dagsthul Caste. An extended version is included into the proccedings of EKAW’99. It presents a linguistics-based methodology and tool to help knowledge engineering. The tool, like the methodology it supports, has been developed from the requirements of real applications, to facilitate ontology building from texts. The requirements were as follows: * To use the methods and tools from terminology in linguistics to find and define concepts. At present, a term extractor, LEXTER, provides term candidates that may then be modeled through the normalization process. * To provide traceability for maintenance and back linking from the ontology to texts. This is achieved through the links between the text, the different forms and the terminological concepts in the ontology. * To highlight the modeling choices. A modeling typology of concepts leads the designer through the ontology. * To avoid as far as possible incoherences and inconsistencies. A terminological formalism provides a classification mechanism to help the designer to detect redundancies and incompatible definitions. Today, TERMINAE is developed in Java. There is still a lot of work to be done to integrate the state-of-the-art in lexical semantics and computational linguistics. TERMINAE has been designed through real applications, but it has not yet been extensively used and the methodology needs to be developed further.

References 1. AUSSENAC-GILLES N., BOURIGAULT D., CONDAMINES A., GROS C.: How can knowledge acquisition benefit from terminology ? In Proc. of the 9th Banff Knowledge Acquisition for Knowledge-Based Systems Workshop, Banff, (1995) ´ 2. BIEBOW, B., SZULMAN, S.: TERMINAE : A linguistics-based tool for the building of a domain ontology. In Proc. of EKAW’99, (1999) 3. BOURIGAULT D.: LEXTER, un Logiciel d’EXtraction de TERminologie. Application a` l’acquisition des connaissances a` partir de textes. Th`ese, EHESS Paris, (1994)

4. FRIDMAN NOY N., HAFNER C. D.: The state of the art in ontology design: a comparative review. In Proc. of the 1997 AAAI Spring Symposium on Ontological Engineering, (1997) 5. GRUBER T. R.: Toward principles for the design of ontologies used for knowledge sharing. In International Journal of Human-Computer Studies,43, (1995) 907-928 6. GUARINO N.: Concepts, Attributes, and Arbitrary Relations: Some Linguistic and Ontology Criteria for Structuring Knowledge Bases. In Data and Knowledge Engineering, (1992) 7. GUARINO N.: Formal ontology and information systems. In Proc. of the 1st international conference on Formal Ontologies in Information Systems (FOIS’98), Trento, Italy,(1998) 8. KAYSER D.: Ontologically, yours. In Proc. of the 6th International Conference on Conceptual structures, ICCS’98, Montpellier, France,(1998) 9. MIKHEEV A., FINCH S.: A workbench for acquisition of ontological knowledge from natural language. In Proc. of the 9th Banff Knowledge Acquisition for Knowledge-Based Systems Workshop, Banff, (1995) 10. PUSTEJOVSKY J.: Lexical semantics and formal ontologies. In Proc. of the 1st International Conference on Formal Ontologies in Information Systems (FOIS’98), Trento, Italy,(1998) 11. SKUCE D., MEYER I.: Terminology and knowledge acquisition: exploring a symbiotic relation ship. In Proc. of the 6th Banff on Knowledge Acquisition for Knowledge-Based Systems Workshop, (1991) 12. SWARTOUT B., PATIL R., KNIGHT K., RUSS T.: Towards distributed use of large-scale ontologies. In Proc. of the 10th Banff on Knowledge Acquisition for Knowledge-Based Systems Workshop, (1996) 13. Proc. of the 1997 AAAI Spring Symposium on Ontological Engineering, (1997) 14. Proc. of ECAI Workshop on Applications of ontologies and problem-solving methods, 13th Biennal European Conference on Artificial Intelligence, Brighton, UK, (1998) 15. Information modelling and knowledge bases IX, IOS Press, (1998) 16. Proc. of the 1st international conference on Formal Ontologies in Information Systems (FOIS’98), Trento, Italy, (1998)

Suggest Documents